Fast locating with the run-length compressed Burrows-Wheeler Transform

Travis Gagie - Universidad Diego Portales, Santiago de Chile
Date and time
Monday, June 5, 2017 at 2:30 PM - Aula C - Rinfresco 14.15, inizio seminario 14.30.
Programme Director
Ferdinando Cicalese
External reference
Publication date
May 29, 2017
Computer Science  


Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become a hot topic since the turn of the millennium.  A simple solution is to use an FM-index based on the run-length compressed Burrows-Wheeler Transform (RLBWT) of the text, which achieves excellent compression and allows us, given a pattern, to count quickly the number of times it appear in the text.  In order to be able to locate quickly the positions where it occurs, however, conventional wisdom has been that we must augment the RLBWT with a sample of the suffix array, and that does not compress so well: the product of the size and the query per occurrence is roughly linear, meaning it is either much larger than the RLBWT or very slow.  In this talk we demonstrate another way to augment the RLBWT, that is simple, does not increase the total size much, and supports locating in predecessor time.  In our preliminary experiments, it is a thousand times faster than a suffix array sample with the same memory footprint.
This is joint work with Gonzalo Navarro and Nicola Prezza.

© 2002 - 2021  Verona University
Via dell'Artigliere 8, 37129 Verona  |  P. I.V.A. 01541040232  |  C. FISCALE 93009870234