Highly scalable Ab initio genomic motif identification

Abstract
We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

Citation
Marchand, B., Bajic, V. B., & Kaushik, D. K. (2011). Highly scalableab initiogenomic motif identification. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’11. doi:10.1145/2063384.2063459

Publisher
Association for Computing Machinery (ACM)

Journal
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11

Conference/Event Name
2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11

DOI
10.1145/2063384.2063459

Permanent link to this record