Introduced about thirty years ago in the field of Data Compression, the Burrows-Wheeler Transform (BWT) is a string transformation that, besides being a booster of the performance of memoryless compressors, plays a fundamental role in the design of efficient self-indexing compressed data structures. Finding other string transformations with the same remarkable properties of BWT has been a challenge for many researchers for a long time. Among the known BWT variants, the only one that has been recently shown to be a valid alternative to BWT is the Alternating BWT (ABWT), another invertible string transformation introduced about ten years ago in connection with a generalization of Lyndon words. In this paper, we introduce a whole class of new string transformations, called local orderings-based transformations, which have all the myriad virtues of BWT. We show that this new family is a special case of a much larger class of transformations, based on context adaptive alphabet orderings, that includes BWT and ABWT. Although all transformations support pattern search, we show that, in the general case, the transformations within our larger class may take quadratic time for inversion and pattern search. As a further result, we show that the local orderings-based transformations can be used for the construction of the recently introduced r-index, which makes them suitable also for highly repetitive collections. In this context, we consider the problem of finding, for a given string, the BWT variant that minimizes the number of runs in the transformed string, and we provide an algorithm solving this problem in linear time.
翻译:大约30年前在数据压缩领域引入的Burrows-Wheeler变换(BWT)是一个字符串转换,它除了推动无记忆压缩机的性能之外,还在设计高效的自我索引压缩数据结构方面发挥着根本作用。寻找与BWT性质相同的其它字符串变换对许多研究人员来说是一个长期的挑战。在已知的BWT变换中,最近唯一被证明是BWT有效替代的变换是变换BWT(ABWT),这是十年前在Lyndon字的直线转换中引入的另一种不可逆的变换。在本文中,我们引入了一整套新变换的字符串变换,称为以本地定序为基础的变换,这些变换具有BWT的所有优点。我们表明,根据上下文的最小缩放字母顺序顺序排序变换的变换是一个特别的变换类别,包括BWT和ABWT。尽管所有变换模式的变换都支持了模式搜索,但我们在一般情况下显示,我们这个变换行的变换过程中, 新的变换过程可以让我们的变换过程的变换过程的变换过程, 的变换过程的变换过程可以用来在更变换过程的变换过程的变换过程的变换。