We propose a probability distribution for multivariate binary random variables. The probability distribution is expressed as principal minors of the parameter matrix, which is a matrix analogous to the inverse covariance matrix in the multivariate Gaussian distribution. In our model, the partition function, central moments, and the marginal and conditional distributions are expressed analytically. That is, summation over all possible states is not necessary for obtaining the partition function and various expected values, which is a problem with the conventional multivariate Bernoulli distribution. The proposed model has many similarities to the multivariate Gaussian distribution. For example, the marginal and conditional distributions are expressed in terms of the parameter matrix and its inverse matrix, respectively. That is, the inverse matrix represents a sort of partial correlation. The proposed distribution can be derived using Grassmann numbers, anticommuting numbers. Analytical expressions for the marginal and conditional distributions are also useful in generating random numbers for multivariate binary variables. Hence, we investigated sampling distributions of parameter estimates using synthetic datasets. The computational complexity of maximum likelihood estimation from observed data is proportional to the number of unique observed states, not to the number of all possible states as is required in the case of the conventional multivariate Bernoulli distribution. We empirically observed that the sampling distributions of the maximum likelihood estimates appear to be consistent and asymptotically normal.
翻译:我们提出了一种针对多元二元随机变量的概率分布。该概率分布以参数矩阵的主子式表示,该矩阵类似于多元高斯分布中的逆协方差矩阵。在我们的模型中,配分函数、中心矩以及边缘分布和条件分布均能以解析形式表达。这意味着获取配分函数和各种期望值时无需对所有可能状态进行求和,而这是传统多元伯努利分布所面临的问题。所提出的模型与多元高斯分布具有诸多相似之处。例如,边缘分布和条件分布分别通过参数矩阵及其逆矩阵表示,即逆矩阵体现了一种偏相关性。该分布可借助Grassmann数(反对易数)推导得出。边缘分布与条件分布的解析表达式对于生成多元二元变量的随机数同样具有实用价值。因此,我们利用合成数据集研究了参数估计的抽样分布。基于观测数据的最大似然估计计算复杂度与观测到的唯一状态数量成正比,而非传统多元伯努利分布所需的所有可能状态数量。我们通过实证观察到,最大似然估计的抽样分布表现出相合性与渐近正态性。