Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen in large language models. We identify the root cause as a structural misalignment: Transformers assume sequential compositionality, while CTR data demand combinatorial reasoning over high-cardinality semantic fields. Unstructured attention spreads capacity indiscriminately, amplifying noise under extreme sparsity and breaking scalable learning. To restore alignment, we introduce the Field-Aware Transformer (FAT), which embeds field-based interaction priors into attention through decomposed content alignment and cross-field modulation. This design ensures model complexity scales with the number of fields F, not the total vocabulary size n >> F, leading to tighter generalization and, critically, observed power-law scaling in AUC as model width increases. We present the first formal scaling law for CTR models, grounded in Rademacher complexity, that explains and predicts this behavior. On large-scale benchmarks, FAT improves AUC by up to +0.51% over state-of-the-art methods. Deployed online, it delivers +2.33% CTR and +0.66% RPM. Our work establishes that effective scaling in recommendation arises not from size, but from structured expressivity-architectural coherence with data semantics.
翻译:尽管在模型规模上投入巨大,用于点击率预测的深度模型往往表现出收益迅速递减的现象——这与大型语言模型中观察到的平滑、可预测的性能提升形成鲜明对比。我们指出其根本原因在于结构错配:Transformer模型假设序列组合性,而CTR数据需要对高基数语义域进行组合推理。非结构化的注意力机制不加区分地分散模型容量,在极端稀疏条件下放大噪声,破坏了可扩展的学习过程。为恢复结构对齐,我们提出了域感知Transformer,它通过分解的内容对齐和跨域调制机制,将基于域的交互先验嵌入注意力计算中。该设计确保模型复杂度随域数量F而非总词汇量n(n >> F)扩展,从而实现更紧密的泛化能力,并关键性地观察到AUC随模型宽度增加呈现幂律扩展规律。我们基于Rademacher复杂度提出了首个CTR模型的正式扩展定律,用以解释和预测该现象。在大规模基准测试中,FAT相较于最先进方法将AUC提升最高达+0.51%。在线部署后,实现了+2.33%的CTR提升和+0.66%的RPM增长。我们的研究表明,推荐系统中的有效扩展并非源于模型规模,而是来自结构化表达能力——即模型架构与数据语义的协调一致。