Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. We find that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up sufficiently to provide state-of-the-art results on all embedding tasks. MTEB comes with open-source code and a public leaderboard at https://github.com/embeddings-benchmark/mteb.
翻译:文本嵌入通常在单一任务的少数数据集上进行评估,未覆盖它们可能应用到的其他任务。目前还不清楚,在语义文本相似性方面的最新嵌入方法是否能同样适用于聚类或重排任务。这使得随着不断提出各种模型而缺少适当评估的问题难以解决。为了解决这个问题,我们介绍了大规模文本嵌入基准(MTEB)。MTEB覆盖8个嵌入任务,共58个数据集和112种语言。通过在MTEB上对33个模型进行基准测试,我们建立了迄今为止最全面的文本嵌入基准。我们发现没有特定的文本嵌入方法在所有任务中占据主导地位。这表明该领域尚未达成一种通用的文本嵌入方法并将其提升到足以在所有嵌入任务上提供最先进的结果的规模。MTEB附带开源代码和一个公共排行榜,网址为https://github.com/embeddings-benchmark/mteb。