Contemporary AGI evaluations report multidomain capability profiles, yet they typically assign symmetric weights and rely on snapshot scores. This creates two problems: (i) equal weighting treats all domains as equally important when human intelligence research suggests otherwise, and (ii) snapshot testing can't distinguish durable capabilities from brittle performances that collapse under delay or stress. I argue that general intelligence -- in humans and potentially in machines -- is better understood as a homeostatic property cluster: a set of abilities plus the mechanisms that keep those abilities co-present under perturbation. On this view, AGI evaluation should weight domains by their causal centrality (their contribution to cluster stability) and require evidence of persistence across sessions. I propose two battery-compatible extensions: a centrality-prior score that imports CHC-derived weights with transparent sensitivity analysis, and a Cluster Stability Index family that separates profile persistence, durable learning, and error correction. These additions preserve multidomain breadth while reducing brittleness and gaming. I close with testable predictions and black-box protocols labs can adopt without architectural access.
翻译:当前的AGI评估报告多领域能力剖面,但通常采用对称权重并依赖瞬时测试分数。这引发两个问题:(一)等权重处理将所有领域视为同等重要,而人类智能研究表明事实并非如此;(二)瞬时测试无法区分持久能力与在延迟或压力下崩溃的脆弱表现。本文论证,通用智能——无论是人类还是潜在机器智能——应被更好地理解为稳态特性集群:即一组能力加上在扰动下维持这些能力共存的机制。基于此视角,AGI评估应根据因果中心性(对集群稳定性的贡献)对领域进行加权,并要求跨测试周期的持久性证据。我提出两种兼容现有测试套件的扩展方案:采用CHC理论衍生权重并包含透明敏感性分析的中心性优先评分,以及区分剖面持久性、长效学习与错误纠正的集群稳定性指数体系。这些补充在保持多领域广度的同时,能降低评估脆弱性与策略博弈风险。最后提出可检验的预测方案及无需架构访问权限的黑盒测试协议供实验室采用。