ERA: Efficient serial and parallel suffix tree construction for very long strings

Handle URI:
http://hdl.handle.net/10754/561870
Title:
ERA: Efficient serial and parallel suffix tree construction for very long strings
Authors:
Mansour, Essam; Allam, Amin ( 0000-0001-5137-0990 ) ; Skiadopoulos, Spiros G.; Kalnis, Panos ( 0000-0002-5060-1360 )
Abstract:
The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program
Publisher:
VLDB Endowment
Journal:
Proceedings of the VLDB Endowment
Issue Date:
1-Sep-2011
DOI:
10.14778/2047485.2047490
Type:
Article
ISSN:
21508097
Appears in Collections:
Articles; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorMansour, Essamen
dc.contributor.authorAllam, Aminen
dc.contributor.authorSkiadopoulos, Spiros G.en
dc.contributor.authorKalnis, Panosen
dc.date.accessioned2015-08-03T09:32:55Zen
dc.date.available2015-08-03T09:32:55Zen
dc.date.issued2011-09-01en
dc.identifier.issn21508097en
dc.identifier.doi10.14778/2047485.2047490en
dc.identifier.urihttp://hdl.handle.net/10754/561870en
dc.description.abstractThe suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.en
dc.publisherVLDB Endowmenten
dc.titleERA: Efficient serial and parallel suffix tree construction for very long stringsen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.identifier.journalProceedings of the VLDB Endowmenten
dc.contributor.institutionDept. of Computer Science and Technology, University of Peloponnese, Greeceen
kaust.authorMansour, Essamen
kaust.authorKalnis, Panosen
kaust.authorAllam, Aminen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.