Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3$\times$ fewer than AriGraph and $<$1/20 of GraphRAG. The proposed pipeline enhances the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs.
翻译:知识图谱(KGs)为大型语言模型(LLMs)提供了结构化、可验证的基础,但当前基于LLM的系统通常仅将KGs用作文本检索的辅助结构,其内在质量尚未得到充分探索。本研究提出Wikontic,一种多阶段流水线方法,用于从开放领域文本中构建知识图谱:通过提取带限定条件的候选三元组,施加基于Wikidata的类型与关系约束,并对实体进行归一化以减少重复。所生成的知识图谱具有紧凑性、本体一致性及良好连通性;在MuSiQue数据集上,正确答案实体出现在96%的生成三元组中。在HotpotQA任务中,仅使用三元组的设置达到76.0 F1分数,在MuSiQue上达到59.8 F1分数,匹配或超越了多个仍需依赖文本上下文的检索增强生成基线方法。此外,Wikontic在MINE-1基准测试中取得了最先进的信息保留性能(86%),优于先前的知识图谱构建方法。Wikontic在构建时亦具高效性:知识图谱构建消耗少于1,000个输出词元,约为AriGraph的1/3,且不足GraphRAG的1/20。该流水线提升了生成知识图谱的质量,并为在LLMs中利用结构化知识提供了可扩展的解决方案。