A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

Handle URI:
http://hdl.handle.net/10754/325242
Title:
A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data
Authors:
Sepúlveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab ( 0000-0002-1755-2819 ) ; Clark, Taane G
Abstract:
Background: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.Results: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.Conclusions: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. 2013 Seplveda et al.; licensee BioMed Central Ltd.
KAUST Department:
Biological and Environmental Sciences and Engineering (BESE) Division; Bioscience Program; Computational Bioscience Research Center (CBRC)
Citation:
Sepúlveda N, Campino SG, Assefa SA, Sutherland CJ, Pain A, et al. (2013) A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data. BMC Genomics 14: 128. doi:10.1186/1471-2164-14-128.
Publisher:
BioMed Central
Journal:
BMC Genomics
Issue Date:
26-Feb-2013
DOI:
10.1186/1471-2164-14-128
PubMed ID:
23442253
PubMed Central ID:
PMC3679970
Type:
Article
ISSN:
14712164
Appears in Collections:
Articles; Bioscience Program; Computational Bioscience Research Center (CBRC); Biological and Environmental Sciences and Engineering (BESE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorSepúlveda, Nunoen
dc.contributor.authorCampino, Susana Gen
dc.contributor.authorAssefa, Samuel Aen
dc.contributor.authorSutherland, Colin Jen
dc.contributor.authorPain, Arnaben
dc.contributor.authorClark, Taane Gen
dc.date.accessioned2014-08-27T09:41:56Z-
dc.date.available2014-08-27T09:41:56Z-
dc.date.issued2013-02-26en
dc.identifier.citationSepúlveda N, Campino SG, Assefa SA, Sutherland CJ, Pain A, et al. (2013) A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data. BMC Genomics 14: 128. doi:10.1186/1471-2164-14-128.en
dc.identifier.issn14712164en
dc.identifier.pmid23442253en
dc.identifier.doi10.1186/1471-2164-14-128en
dc.identifier.urihttp://hdl.handle.net/10754/325242en
dc.description.abstractBackground: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.Results: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.Conclusions: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. 2013 Seplveda et al.; licensee BioMed Central Ltd.en
dc.language.isoenen
dc.publisherBioMed Centralen
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.rights.urihttp://creativecommons.org/licenses/by/2.0en
dc.subjectmultidrug resistance protein 1en
dc.subjectaccuracyen
dc.subjectCLAG3.2 geneen
dc.subjectcomparative genomic hybridizationen
dc.subjectcomputer programen
dc.subjectcontrolled studyen
dc.subjectcopy number variationen
dc.subjectfalse positive resulten
dc.subjectgeneen
dc.subjectgene amplificationen
dc.subjectgene deletionen
dc.subjectgene locusen
dc.subjectintermethod comparisonen
dc.subjectMDR1 geneen
dc.subjectmethodologyen
dc.subjectnucleotide sequenceen
dc.subjectPlasmodium falciparumen
dc.subjectPoisson distributionen
dc.subjectquality controlen
dc.subjectsequence analysisen
dc.subjectsimulationen
dc.subjectstrain differenceen
dc.subjectDNA Copy Number Variationsen
dc.subjectFalse Positive Reactionsen
dc.subjectGenomicsen
dc.subjectModels, Statisticalen
dc.subjectPlasmodium falciparumen
dc.subjectPoisson Distributionen
dc.subjectSequence Analysisen
dc.subjectSoftwareen
dc.subjectPlasmodium falciparumen
dc.titleA Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage dataen
dc.typeArticleen
dc.contributor.departmentBiological and Environmental Sciences and Engineering (BESE) Divisionen
dc.contributor.departmentBioscience Programen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.identifier.journalBMC Genomicsen
dc.identifier.pmcidPMC3679970en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionLondon School of Hygiene and Tropical Medicine, London, United Kingdomen
dc.contributor.institutionCenter of Statistics and Applications, University of Lisbon, Lisbon, Portugalen
dc.contributor.institutionWellcome Trust Sanger Institute, Hinxton, United Kingdomen
dc.contributor.institutionDepartment of Clinical Parasitology, Hospital for Tropical Diseases, London, United Kingdomen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorPain, Arnaben

Related articles on PubMed

This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.