Artificial intelligence seems to be taking over the world with systems that model pixels, words, and phonemes. The world is arguably made up, not of pixels, words, and phonemes but of entities (objects, things, including events) with properties and relations among them. Surely we should model these, not the perception or description of them. You might suspect that concentrating on modeling words and pixels is because all of the (valuable) data in the world is in terms of text and images. If you look into almost any company you will find their most valuable data is in spreadsheets, databases and other relational formats. These are not the form that are studied in introductory machine learning, but are full of product numbers, student numbers, transaction numbers and other identifiers that can't be interpreted naively as numbers. The field that studies this sort of data has various names including relational learning, statistical relational AI, and many others. This paper explains why relational learning is not taking over the world -- except in a few cases with restricted relations -- and what needs to be done to bring it to it's rightful prominence.
翻译:人工智能似乎正通过建模像素、词语和音素的系统主导世界。然而,世界本质上并非由像素、词语或音素构成,而是由具有属性和相互关系的实体(对象、事物,包括事件)组成。显然,我们应当直接对这些实体进行建模,而非仅关注其感知或描述形式。人们或许认为,当前集中于词语和像素建模是因为全球(有价值的)数据主要以文本和图像形式存在。但若深入考察各类企业,便会发现其最具价值的数据往往存储于电子表格、数据库及其他关系型格式中。这些并非机器学习入门课程所研究的典型数据形式,却充斥着产品编号、学号、交易号等无法直接作为数值处理的标识符。研究此类数据的领域拥有诸多名称,包括关系学习、统计关系人工智能等。本文旨在阐释为何关系学习尚未主导世界——除少数受限关系场景外——并探讨如何推动其实现应有的重要地位。