Most existing set encoding algorithms operate under the assumption that all the elements of the set are accessible during training and inference. Additionally, it is assumed that there are enough computational resources available for concurrently processing sets of large cardinality. However, both assumptions fail when the cardinality of the set is prohibitively large such that we cannot even load the set into memory. In more extreme cases, the set size could be potentially unlimited, and the elements of the set could be given in a streaming manner, where the model receives subsets of the full set data at irregular intervals. To tackle such practical challenges in large-scale set encoding, we go beyond the usual constraints of invariance and equivariance and introduce a new property termed Mini-Batch Consistency that is required for large scale mini-batch set encoding. We present a scalable and efficient set encoding mechanism that is amenable to mini-batch processing with respect to set elements and capable of updating set representations as more data arrives. The proposed method respects the required symmetries of invariance and equivariance as well as being Mini-Batch Consistent for random partitions of the input set. We perform extensive experiments and show that our method is computationally efficient and results in rich set encoding representations for set-structured data.
翻译:多数现有数据集编码算法的操作假设是,在培训和推断期间,这套数据集的所有要素都可以使用。此外,还假设有足够的计算资源可用于同时处理大型基数组。然而,如果这套数据集的基数巨大,以至于我们甚至无法将数据集装入内存,则这两种假设都失败。在更极端的情况下,这套数据集的尺寸可能是无限的,而且这套数据集的元素可以以流式方式提供,模型可以不定期地接收全数据集的子集。为了应对大型数据集中的这种实际挑战,我们超越了通常的变异和等变量的限制,并引入了一种称为微型批量一致的新属性,这是大规模微型批量数据集编码所需的。我们提出了一个可缩放和高效的编码机制,用于小型批量处理,以设定元素,并能够在更多数据到达时更新所设定的表达方式。拟议方法尊重所需的差异和差异的对称性,同时要超越大型套件编码编码的通常限制,并引入一个称为微型批量一致性的属性,这是大规模微型批量组合的编码,这是大规模微型批量组合的编码所需的要求。我们设定的计算方法,用来显示我们输入结构的大规模计算结果。