For e-commerce companies with large product selections, the organization and grouping of products in meaningful ways is important for creating great customer shopping experiences and cultivating an authoritative brand image. One important way of grouping products is to identify a family of product variants, where the variants are mostly the same with slight and yet distinct differences (e.g. color or pack size). In this paper, we introduce a novel approach to identifying product variants. It combines both constrained clustering and tailored NLP techniques (e.g. extraction of product family name from unstructured product title and identification of products with similar model numbers) to achieve superior performance compared with an existing baseline using a vanilla classification approach. In addition, we design the algorithm to meet certain business criteria, including meeting high accuracy requirements on a wide range of categories (e.g. appliances, decor, tools, and building materials, etc.) as well as prioritizing the interpretability of the model to make it accessible and understandable to all business partners.
翻译:对于有大型产品选择的电子商务公司来说,以有意义的方式安排和组合产品对于创造大型客户购物经验和培养权威品牌形象十分重要。产品组合的一个重要方式是确定产品变种的组合,其中变种大多与微小的和不同的差异相同(例如颜色或包装大小)。在本文中,我们采用了一种新颖的方法来确定产品变种。它结合了限制性的组合和定制的NLP技术(例如从非结构化产品标题中提取产品姓氏和识别类似型号的产品),以便实现优于现有基线的绩效,使用香草分类方法。此外,我们设计算法以满足某些商业标准,包括满足对多种类别(如电器、装饰品、工具和建筑材料等)的高精度要求,以及优先安排模型的可解释性,以便使所有商业伙伴都能够理解和理解。