DHoTT：同伦类型理论的时间扩展用于语义漂移 (DHoTT: A Temporal Extension of Homotopy Type Theory for Semantic Drift)

Dynamic HoTT (DHoTT) is a conservative extension of Homotopy Type Theory designed for evolving texts in conversational AI. In a chat system, a large language model (LLM) is queried with a growing prefix: at turn tau the input is C(tau), the concatenation of all previous prompts and replies, and the new answer extends C(tau+1). We study the logical semantics of these time indexed texts and how their meanings drift or break over time, linking Homotopy Type Theory with distributional semantics and topological data analysis on embedding spaces. For each turn we embed all tokens seen so far using a frozen encoder and map them to the unit sphere, build a good cover by spherical caps, and form the Cech nerve. A Kan fibrant replacement yields a Kan complex ET(tau), the Evolving Text at time tau, where identity types are path spaces and dependent types support ordinary HoTT transport. Time is treated functorially so that all HoTT rules interpret fibrewise and standard substitution and conservativity properties hold. On top of this fibrewise HoTT layer we add a small cross time calculus for semantic change. A carry records when a later use has a certified path back into an earlier fibre under an admissibility policy. A rupture is a positive sigma type storing finite, policy checked failed attempts together with an open horn tag. An append only ledger accumulates carries and ruptures so that earlier failures are not erased when later alignments appear. We illustrate the construction on small, replicable examples from real human LLM dialogues using a concrete DeBERTa based embedding pipeline. This provides a principled, geometrically grounded core for analysing and auditing semantic drift in large language models and other embedding based systems.

翻译：动态同伦类型理论（DHoTT）是同伦类型理论的一个保守扩展，专为对话式人工智能中不断演化的文本而设计。在聊天系统中，大型语言模型（LLM）的查询基于一个增长的前缀：在轮次τ时，输入为C(τ)，即所有先前提示和回复的拼接，而新回答则扩展为C(τ+1)。我们研究了这些时间索引文本的逻辑语义及其意义随时间如何漂移或断裂，将同伦类型理论与分布语义学以及嵌入空间上的拓扑数据分析联系起来。对于每个轮次，我们使用冻结编码器嵌入迄今为止所见的所有标记，并将其映射到单位球面上，通过球帽构建良好覆盖，并形成Čech神经。一个Kan纤维化替换产生一个Kan复形ET(τ)，即时间τ时的演化文本，其中恒等类型为路径空间，依赖类型支持普通HoTT的传输。时间被函子化处理，使得所有HoTT规则在纤维方向上可解释，且标准替换和保守性性质成立。在此纤维方向HoTT层之上，我们添加了一个用于语义变化的小型跨时间演算。一个携带记录在可接纳策略下，后续使用何时具有经过认证的路径返回早期纤维。一个断裂是一个正西格玛类型，存储有限且经过策略检查的失败尝试以及一个开放角标记。一个仅追加账本累积携带和断裂，确保早期失败在后续对齐出现时不被擦除。我们使用基于DeBERTa的具体嵌入流水线，在真实人类LLM对话的小型可复现示例上说明了该构造。这为分析和审计大型语言模型及其他基于嵌入的系统的语义漂移提供了一个原则性、几何基础的核心框架。