Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in real-world businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its customers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a tree-structured space, AutoCross enables efficient generation of high-order cross features, which is not yet visited by existing works. Additionally, we propose successive mini-batch gradient descent and multi-granularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyper-parameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models.
翻译:在本文中,我们介绍AutoCross,这是由4Parigramm公司向其客户,从银行、医院到互联网公司提供的自动特征跨越工具。AutoCross通过在树结构空间进行光束搜索,能够高效生成现有工程尚未访问过的高端交叉特征。此外,我们建议连续使用小型批量梯度下游和多色分解,以进一步提高效率和效力,同时确保简单,从而不需要机器学习专门知识或重复性高参数调。此外,算法旨在降低分布式计算所涉及的计算、传输和存储成本。基准和现实世界商业数据集的实验结果表明,AutoCross能够显著提高线性模型和深层模型的性能和效率。