Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. We introduce two formalisms that explicitly incorporate relative frequencies into statistical relational artificial intelligence. The first formalism, Lifted Bayesian Networks for Conditional Probability Logic, expresses discrete dependencies on probabilistic data. The second formalism, Functional Lifted Bayesian Networks, expresses continuous dependencies. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by the two formalisms on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations.
翻译:在建模以关系数据为依据的概率依赖性数据时,对一国在域内相对频率的相对依赖性是常见的。例如,流行病期间学校关闭的可能性可能取决于受感染学生超过临界值的比例。通常,而不是取决于离散阈值,依赖性是连续的:例如,任何蚊子叮咬传播疾病的可能性取决于蚊子传播疾病的比例。目前的方法通常只考虑可能的世界的概率,而不是域元素本身的概率。我们引入两种形式主义,明确将相对频率纳入统计关系人工智能。第一个形式主义,即取消的贝叶斯人控制概率逻辑网络,表示不依赖概率数据。第二个形式主义,即功能提升的贝叶斯人网络,表示持续依赖性。结合相对频率不仅有利于建模;在培训和测试或应用领域有不同大小的地方,也提供了更为严格的学习问题的方法。为此,我们提供了两种形式主义模式化的典型概率分布表,即取消的贝叶斯河地区网络,表示对概率数据的不独立依赖性数据。第二个形式主义、功能提升的海湾网络,表示持续的依赖性,不仅有利于模拟;在规模上采用较大规模的区域范围内的地域比例。