Highly scalable Ab initio genomic motif identification

Handle URI:
http://hdl.handle.net/10754/564332
Title:
Highly scalable Ab initio genomic motif identification
Authors:
Marchand, Benoit; Bajic, Vladimir B. ( 0000-0001-5435-4750 ) ; Kaushik, Dinesh K.
Abstract:
We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.
KAUST Department:
Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Applied Mathematics and Computational Science Program
Publisher:
Association for Computing Machinery (ACM)
Journal:
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
Conference/Event name:
2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11
Issue Date:
2011
DOI:
10.1145/2063384.2063459
Type:
Conference Paper
ISBN:
9781450307710
Appears in Collections:
Conference Papers; Applied Mathematics and Computational Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorMarchand, Benoiten
dc.contributor.authorBajic, Vladimir B.en
dc.contributor.authorKaushik, Dinesh K.en
dc.date.accessioned2015-08-04T06:23:58Zen
dc.date.available2015-08-04T06:23:58Zen
dc.date.issued2011en
dc.identifier.isbn9781450307710en
dc.identifier.doi10.1145/2063384.2063459en
dc.identifier.urihttp://hdl.handle.net/10754/564332en
dc.description.abstractWe present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.en
dc.publisherAssociation for Computing Machinery (ACM)en
dc.subjectData-flow parallel processingen
dc.subjectMaster-slave MPI parallel processingen
dc.subjectMixed-mode MPI-openMP parallel processingen
dc.subjectMulti-level MPI collective operationsen
dc.subjectMulti-level workload distributionen
dc.titleHighly scalable Ab initio genomic motif identificationen
dc.typeConference Paperen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentApplied Mathematics and Computational Science Programen
dc.identifier.journalProceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11en
dc.conference.date12 November 2011 through 18 November 2011en
dc.conference.name2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11en
dc.conference.locationSeattle, WAen
kaust.authorBajic, Vladimir B.en
kaust.authorMarchand, Benoiten
kaust.authorKaushik, Dinesh K.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.