The study of neural representations, both in biological and artificial systems, is increasingly revealing the importance of geometric and topological structures. Inspired by this, we introduce Event2Vec, a novel framework for learning representations of discrete event sequences. Our model leverages a simple, additive recurrent structure to learn composable, interpretable embeddings. We provide a theoretical analysis demonstrating that, under specific training objectives, our model's learned representations in a Euclidean space converge to an ideal additive structure. This ensures that the representation of a sequence is the vector sum of its constituent events, a property we term the linear additive hypothesis. To address the limitations of Euclidean geometry for hierarchical data, we also introduce a variant of our model in hyperbolic space, which is naturally suited to embedding tree-like structures with low distortion. We present experiments to validate our hypothesis. Quantitative evaluation on the Brown Corpus yields a Silhouette score of 0.0564, outperforming a Word2Vec baseline (0.0215), demonstrating the model's ability to capture structural dependencies without supervision.
翻译:在生物和人工系统中,神经表示的研究日益揭示出几何与拓扑结构的重要性。受此启发,我们提出了Event2Vec,一种用于学习离散事件序列表示的新框架。该模型利用简单的加性循环结构来学习可组合、可解释的嵌入表示。我们提供了理论分析,证明在特定训练目标下,模型在欧几里得空间中学到的表示会收敛于理想的加性结构。这确保了序列的表示是其组成事件的向量和,这一性质我们称为线性加性假设。为克服欧几里得几何在处理层次性数据时的局限性,我们还引入了该模型在双曲空间中的变体,该空间天然适用于以低失真嵌入树状结构。我们通过实验验证了假设。在布朗语料库上的定量评估显示,Silhouette分数达到0.0564,优于Word2Vec基线(0.0215),证明了模型在无监督条件下捕捉结构依赖关系的能力。