Cross-referencing, which links passages of text to other related passages, can be a valuable study aid for facilitating comprehension of a text. However, cross-referencing requires first, a comprehensive thematic knowledge of the entire corpus, and second, a focused search through the corpus specifically to find such useful connections. Due to this, cross-reference resources are prohibitively expensive and exist only for the most well-studied texts (e.g. religious texts). We develop a topic-based system for automatically producing candidate cross-references which can be easily verified by human annotators. Our system utilizes fine-grained topic modeling with thousands of highly nuanced and specific topics to identify verse pairs which are topically related. We demonstrate that our system can be cost effective compared to having annotators acquire the expertise necessary to produce cross-reference resources unaided.
翻译:交叉参照将文本的段落与其他相关的段落联系起来,可以作为宝贵的研究辅助手段,便利理解文本。但是,交叉参照首先需要对整个物质进行全面的专题知识,其次需要专门通过该物质进行集中搜索,以找到这种有用的联系。因此,交叉参照资源费用太高,而且只存在于研究最透彻的文本(如宗教文本)中。我们开发了一个基于主题的系统,自动产生候选人的交叉参考,供人类教师很容易地核实。我们的系统利用精细区分的专题模型,用数千个高度细微细和具体的专题来识别与主题相关的双曲。我们证明,与让说明者获得必要的专门知识以产生未经协助的交叉参考资源相比,我们的系统成本较高。