指示词、数词、形容词与名词顺序的指数分布 (The exponential distribution of the order of demonstrative, numeral, adjective and noun)

The frequency of the preferred order for a noun phrase formed by demonstrative, numeral, adjective and noun has received significant attention over the last two decades. We investigate the actual distribution of the 24 possible orders. There is no consensus on whether it is well-fitted by an exponential or a power law distribution. We find that an exponential distribution is a much better model. This finding and other circumstances where an exponential-like distribution is found challenge the view that power-law distributions, e.g., Zipf's law for word frequencies, are inevitable. We also investigate which of two exponential distributions gives a better fit: an exponential model where the 24 orders have non-zero probability (a geometric distribution truncated at rank 24) or an exponential model where the number of orders that can have non-zero probability is variable (a right-truncated geometric distribution). When consistency and generalizability are prioritized, we find higher support for the exponential model where all 24 orders have non-zero probability. These findings strongly suggest that there is no hard constraint on word order variation and then unattested orders merely result from undersampling, consistently with Cysouw's view.

翻译：过去二十年间，由指示词、数词、形容词和名词构成的名词短语的优选顺序频率受到了广泛关注。本研究考察了24种可能顺序的实际分布。关于其是否更符合指数分布或幂律分布，学界尚未达成共识。我们发现指数分布是更优的模型。这一发现及其他存在类指数分布的情境，对幂律分布（如词频的齐夫定律）具有必然性的观点提出了挑战。我们还比较了两种指数分布的拟合优度：一种是24种顺序均具有非零概率的指数模型（在秩24处截断的几何分布），另一种是非零概率顺序数量可变的指数模型（右截断几何分布）。当优先考虑一致性与泛化能力时，我们发现所有24种顺序均具非零概率的指数模型获得更高支持。这些结果强烈表明，词序变异不存在硬性约束，未观测到的顺序仅源于采样不足，这与Cysouw的观点一致。