CMOMgen：基于模式引导上下文学习的复杂多本体对齐 (CMOMgen: Complex Multi-Ontology Alignment via Pattern-Guided In-Context Learning)

Constructing comprehensive knowledge graphs requires the use of multiple ontologies in order to fully contextualize data into a domain. Ontology matching finds equivalences between concepts interconnecting ontologies and creating a cohesive semantic layer. While the simple pairwise state of the art is well established, simple equivalence mappings cannot provide full semantic integration of related but disjoint ontologies. Complex multi-ontology matching (CMOM) aligns one source entity to composite logical expressions of multiple target entities, establishing more nuanced equivalences and provenance along the ontological hierarchy. We present CMOMgen, the first end-to-end CMOM strategy that generates complete and semantically sound mappings, without establishing any restrictions on the number of target ontologies or entities. Retrieval-Augmented Generation selects relevant classes to compose the mapping and filters matching reference mappings to serve as examples, enhancing In-Context Learning. The strategy was evaluated in three biomedical tasks with partial reference alignments. CMOMgen outperforms baselines in class selection, demonstrating the impact of having a dedicated strategy. Our strategy also achieves a minimum of 63% in F1-score, outperforming all baselines and ablated versions in two out of three tasks and placing second in the third. Furthermore, a manual evaluation of non-reference mappings showed that 46% of the mappings achieve the maximum score, further substantiating its ability to construct semantically sound mappings.

翻译：构建全面的知识图谱需要使用多个本体，以便将数据完全置于领域语境中。本体匹配通过发现本体间的概念等价关系，实现本体互连并创建连贯的语义层。虽然简单的成对匹配技术已较为成熟，但简单的等价映射无法实现相关但独立本体间的完整语义集成。复杂多本体匹配（CMOM）将源实体与多个目标实体的复合逻辑表达式对齐，沿着本体层次结构建立更精细的等价关系和溯源信息。本文提出CMOMgen——首个端到端的CMOM策略，能够生成完整且语义合理的映射，且不对目标本体或实体的数量施加任何限制。该方法通过检索增强生成技术选择相关类别以构建映射，并筛选匹配的参考映射作为示例，从而增强上下文学习效果。该策略在三个具有部分参考对齐的生物医学任务中进行了评估。CMOMgen在类别选择方面优于基线方法，证明了专用策略的有效性。在三个任务中的两项任务中，我们的策略F1分数最低达到63%，超越所有基线及消融版本，在第三项任务中位列第二。此外，对非参考映射的人工评估显示，46%的映射获得了最高评分，进一步证实了其构建语义合理映射的能力。