Show simple item record

dc.contributor.authorAlthubaiti, Sara
dc.contributor.authorGkoutos, Georgios
dc.contributor.authorHoehndorf, Robert
dc.date.accessioned2020-01-27T08:09:22Z
dc.date.available2020-01-27T08:09:22Z
dc.date.issued2020-1-20
dc.identifier.urihttp://hdl.handle.net/10754/661209
dc.description.abstractIntroduction Identifying and prioritizing driver mutations that play main role to develop cancer still a  major challenge. Several computational approaches involved machine learning and statistical methods exist to access finding these driver mutations depending on pre-computed pathogenicity scores derived from different tools. We have developed CANcerVariant Prioritization (CAN-VP) system to identify and prioritize driver mutations. Ourtool exploits the background knowledge behind using different ontologies that utilize cellular phenotypes, functions, and whole-body physiological phenotypes besides combining region-based information as features. We demonstrate the performance of CAN-VP in prioritizing causative driver mutations on a number of synthetic whole exome from The  Cancer Genome Atlas (TCGA), targeting 4 different primary sites. We find that CAN-VP could identify most of the causative driver mutations compared to the existing tools which showed its capability as a tool for discovering driver mutations. Methods and Materials Data sources We relied on two main types of datasets, first one is from well-known cancer-related databases such as:  COSMIC1, CanProVar2, IntOGen3. The second one is the real samples included in The Cancer Genome Atlas (TCGA)4 which involve more than 60 different projects covering 67 primary sites;  but till now we focus on 4 projects (Sarcoma,  Kidney, Lung, and Bladder). Moreover, we used the 579 validated driver mutations in Bailey, Matthew H., etal5. Results and Discussion 1. Prediction model 1.1 Model details We implemented our CAN-VP using a fully connected neural network model in Python 3.6 as shown in Figure 4. We used Keraswith a TensorFlow backend. We ignored the missing values for all the features being used. We added additional flags for missing values as features.  We retrieved genes embeddings from and used them as features in the prediction model. 1.2 Training and testing data We downloaded COSMIC mutations VCF file on 26th Jul, 2019.  It includes 4,788,121cancer mutations.  We also downloaded DoCMdataset as a VCF file on 18th Nov, 2019. It includes 1364 curated driver mutations.  Moreover, we downloaded CanProVaras afastqfile on 18th Nov, 2019.  It includes 156,671 driver mutations. Based on that, we tried to find how much mutations of DoCM+ CanProVarexist within COSMIC and consider them as positives; otherwise, they would be negatives. As Table 1 showed, the number of negatives data (unknown driver somatic mutations) are much more than the positive ones (validated as driver mutations). 1.3 Prediction performance We trained our model in Figure 2 using the dataset in Table 1 and do the testing on the synthetic datasets. The updated results of CAN-VP compared to the other tools are shown in Table 2. In terms of evaluating the importance of different features in our prediction model, we first test the different combinations of features from CanDrAwhich includes (86 from CHASMplus and 3 from Mutation Assessor) plus 3 from UCSC. Moreover, we add the gene embeddings and the results become better by 3%. Table 3summaries the performance for each experiment. Future Work - Test CAN-VP on much comprehensive cancer-related datasets. - Integrate graph-basedfeaturestoCAN-VP model. References 1SallyBamford et al. “The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. In: British journal of cancer 91.2 (2004), p. 355.   2 Jing Li, Dexter T Duncan, and Bing Zhang. “CanProVar: a human cancer proteome variation database. In: Human mutation 31.3 (2010), pp. 219–228.   3 GunesGundemet al. “IntOGen: integration and data mining of multidimensional oncogenomic data. In: Nature methods 7.2 (2010), p. 92.   4 Katarzyna Tomczak, Patrycja Czerwínska, andMaciejWiznerowicz. “The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. In: Contemporaryoncology19.1A(2015), A68.   5 Matthew H Bailey et al. “Comprehensive characterization of cancer driver genes and mutations. In: Cell173.2(2018), pp. 371–385.
dc.relation.urlhttps://epostersonline.com//dh2020/node/58
dc.titleCAN-VP: CANcer Variant Prioritization
dc.typePoster
dc.contributor.departmentBio-Ontology Research Group (BORG)
dc.contributor.departmentComputational Bioscience Research Center (CBRC)
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.conference.dateJAN 20 - 22, 2020
dc.conference.nameDigital Health 2020
dc.conference.locationKAUST
dc.contributor.institution
kaust.personHoehndorf, Robert
refterms.dateFOA2020-01-27T08:09:22Z


Files in this item

This item appears in the following Collection(s)

Show simple item record