In the field of compressed string indexes, recent work has introduced suffixient sets and their corresponding repetitiveness measure $χ$. In particular, researchers have explored its relationship to other repetitiveness measures, notably $r$, the number of runs in the Burrows--Wheeler Transform (BWT) of a string. Navarro et al. (2025) proved that $χ\leq 2r$, although empirical results by Cenzato et al. (2024) suggest that this bound is loose, with real data bounding $χ$ by around $1.13r$ to $1.33r$ when the size of the alphabet is $σ= 4$. To better understand this gap, we present two cases for the asymptotic tightness of the $χ\leq 2r$ bound: a general construction for arbitrary $σ$ values, and a binary alphabet case, consisting of de Bruijn sequences constructed by linear-feedback shift registers (LFSRs) from primitive polynomials over $\mathbb{F}_2$. The second is a novel characterization of which de Bruijn sequences achieve the literature run-minimal pattern for the cyclic BWT. Moreover, we show that de Bruijn sequences fail to close the gap for $σ\geq 3$.
翻译:在压缩字符串索引领域,近期研究引入了后缀集合及其对应的重复性度量$χ$。特别地,研究者探讨了其与其他重复性度量(尤其是字符串Burrows–Wheeler变换中游程数$r$)的关系。Navarro等人(2025)证明了$χ\leq 2r$,尽管Cenzato等人(2024)的实验结果表明该界是宽松的——当字母表规模为$σ=4$时,实际数据将$χ$约束在约$1.13r$至$1.33r$范围内。为深入理解此间隙,我们针对$χ\leq 2r$界的渐近紧性提出两种情形:适用于任意$σ$值的通用构造,以及基于线性反馈移位寄存器(LFSR)通过$\mathbb{F}_2$上本原多项式构造的de Bruijn序列所定义的二元字母表情形。后者首次刻画了哪些de Bruijn序列能达到循环BWT的文献游程最小模式。此外,我们证明de Bruijn序列在$σ\geq 3$时无法缩小该间隙。