Faster Run-Length Compressed Suffix Arrays
- Authors: Brown, N.K.; Gagie, T.; Manzini, G.; Navarro, G.; Sciortino, M.
- Publication year: 2025
- Type: Contributo in atti di convegno pubblicato in volume
- OA Link: http://hdl.handle.net/10447/688823
Abstract
We first review how we can store a run-length compressed suffix array (RLCSA) for a text T of length n over an alphabet of size σ whose Burrows-Wheeler Transform (BWT) consists of r runs in O (r log(n/r) + r log σ + σ) bits such that later, given character a and the suffix-array (SA) interval for P, we can find the SA interval for aP in O(log ra + log log n) time, where ra is the number of runs of copies of a in the BWT. We then show how to modify the RLCSA such that we find the SA interval for aP in only O(log ra) time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. We also review two-level indexing and discuss how our faster RLCSA may be useful in improving it. Finally, we briefly discuss how two-level indexing may speed up a recent heuristic for finding maximal exact matches of a pattern with respect to an indexed text.