Fast locating with the run-length compressed Burrows-Wheeler Transform

Relatore
Travis Gagie - Universidad Diego Portales, Santiago de Chile
Data e ora
lunedì 5 giugno 2017 alle ore 14.30 - Aula C - Rinfresco 14.15, inizio seminario 14.30.
Referente
Ferdinando Cicalese
Referente esterno
Data pubblicazione
29 maggio 2017
Dipartimento
Informatica  

Riassunto

Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become a hot topic since the turn of the millennium.  A simple solution is to use an FM-index based on the run-length compressed Burrows-Wheeler Transform (RLBWT) of the text, which achieves excellent compression and allows us, given a pattern, to count quickly the number of times it appear in the text.  In order to be able to locate quickly the positions where it occurs, however, conventional wisdom has been that we must augment the RLBWT with a sample of the suffix array, and that does not compress so well: the product of the size and the query per occurrence is roughly linear, meaning it is either much larger than the RLBWT or very slow.  In this talk we demonstrate another way to augment the RLBWT, that is simple, does not increase the total size much, and supports locating in predecessor time.  In our preliminary experiments, it is a thousand times faster than a suffix array sample with the same memory footprint.
 
This is joint work with Gonzalo Navarro and Nicola Prezza.





© 2002 - 2021  Universit√† degli studi di Verona
Via dell'Artigliere 8, 37129 Verona  |  P. I.V.A. 01541040232  |  C. FISCALE 93009870234