Digital Health 2020

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 18
  • Poster

    CAN-VP: CANcer Variant Prioritization

    (2020-1-20) Althubaiti, Sara; Gkoutos, Georgios; Hoehndorf, Robert; Bio-Ontology Research Group (BORG); Computational Bioscience Research Center (CBRC); Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division;

    Introduction

    Identifying and prioritizing driver mutations that play main role to develop cancer still a  major challenge. Several computational approaches involved machine learning and statistical methods exist to access finding these driver mutations depending on pre-computed pathogenicity scores derived from different tools. We have developed CANcerVariant Prioritization (CAN-VP) system to identify and prioritize driver mutations. Ourtool exploits the background knowledge behind using different ontologies that utilize cellular phenotypes, functions, and whole-body physiological phenotypes besides combining region-based information as features. We demonstrate the performance of CAN-VP in prioritizing causative driver mutations on a number of synthetic whole exome from The  Cancer Genome Atlas (TCGA), targeting 4 different primary sites. We find that CAN-VP could identify most of the causative driver mutations compared to the existing tools which showed its capability as a tool for discovering driver mutations.

    Methods and Materials

    Data sources

    We relied on two main types of datasets, first one is from well-known cancer-related databases such as:  COSMIC1, CanProVar2, IntOGen3. The second one is the real samples included in The Cancer Genome Atlas (TCGA)4 which involve more than 60 different projects covering 67 primary sites;  but till now we focus on 4 projects (Sarcoma,  Kidney, Lung, and Bladder). Moreover, we used the 579 validated driver mutations in Bailey, Matthew H., etal5.

    Results and Discussion

    1. Prediction model

    1.1 Model details

    We implemented our CAN-VP using a fully connected neural network model in Python 3.6 as shown in Figure 4. We used Keraswith a TensorFlow backend. We ignored the missing values for all the features being used. We added additional flags for missing values as features.  We retrieved genes embeddings from and used them as features in the prediction model.

    1.2 Training and testing data

    We downloaded COSMIC mutations VCF file on 26th Jul, 2019.  It includes 4,788,121cancer mutations.  We also downloaded DoCMdataset as a VCF file on 18th Nov, 2019. It includes 1364 curated driver mutations.  Moreover, we downloaded CanProVaras afastqfile on 18th Nov, 2019.  It includes 156,671 driver mutations.

    Based on that, we tried to find how much mutations of DoCM+ CanProVarexist within COSMIC and consider them as positives; otherwise, they would be negatives. As Table 1 showed, the number of negatives data (unknown driver somatic mutations) are much more than the positive ones (validated as driver mutations).

    1.3 Prediction performance

    We trained our model in Figure 2 using the dataset in Table 1 and do the testing on the synthetic datasets. The updated results of CAN-VP compared to the other tools are shown in Table 2.

    In terms of evaluating the importance of different features in our prediction model, we first test the different combinations of features from CanDrAwhich includes (86 from CHASMplus and 3 from Mutation Assessor) plus 3 from UCSC.

    Moreover, we add the gene embeddings and the results become better by 3%. Table 3summaries the performance for each experiment.

    Future Work

    • Test CAN-VP on much comprehensive cancer-related datasets.

    - Integrate graph-basedfeaturestoCAN-VP model.

    References

    1SallyBamford et al. “The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. In: British journal of cancer 91.2 (2004), p. 355.

    2 Jing Li, Dexter T Duncan, and Bing Zhang. “CanProVar: a human cancer proteome variation database. In: Human mutation 31.3 (2010), pp. 219–228.

    3 GunesGundemet al. “IntOGen: integration and data mining of multidimensional oncogenomic data. In: Nature methods 7.2 (2010), p. 92.

    4 Katarzyna Tomczak, Patrycja Czerwínska, andMaciejWiznerowicz. “The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. In: Contemporaryoncology19.1A(2015), A68.

    5 Matthew H Bailey et al. “Comprehensive characterization of cancer driver genes and

    mutations. In: Cell173.2(2018), pp. 371–385.

  • Poster

    Semi-automative binary classification workflow

    (2020-1-20) Savenko, Oksana; Chahid, Abderrazak; Laleg-Kirati, Taous-Meriem;

    SEMI-AUTOMATED BINARY CLASSIFICATION WORKFLOW

    MOTIVATION

    Nowadays, there is a recent call to build a human-independent intelligence which can assist clinicians during medical diagnosis. In our research, we focus on development of a standartized and tunable workflow for the binary classification problem whithin variuos biological signals (EEG, fMRI, NIRS). This workflow can be openly used by others in their own studies, research and clinical practice.

    OBJECTIVES

    1. Evaluate efficiency of different classification methods on well- known in community datasets

    2. Merge produced code into separate workflows for datasets depending on their dimentionality (single-channel, multi- channel, volumetric)

    CHALLENGES

    1. Absence of default data formating among researchers which requires human intervention.

    2. Optimization of the code architecture and workflow parallelization (choosing among a range of preprocessing steps, feature generation and classification methods.

    METHODS

    In the central part of the poster one may see a picture of two best performing on fMRI Star-Plus dataset workflows we have inside our generalized workflow for multi-dimensional data. Below we describe other options we provide the user whithin the project.

    PREPROCESSING

    Frequency filtering

    Independent Component Analysis

    Smoothing (exponentional, rollong mean)

    Global Signal Regression

    FEATURE GENERATION

    Fast Fourier Transform

    Semi-Classical Signal Analysis

    UTILIZED CLASSIFIERS

    Support Vector Classifier

    Logistic Regression

    Decision Tree Classifier

    K-Nearest Neighbours

    Neural Network

    Convolutional Neural Network

    CONCLUSIONS

    We tried to include state of the art practics of brain data analysis and include some novel methods like SCSA. Currently the project is still under development but if you findit interesting and prospectively useful for yourself we suggest to pass our tutorial on single-channel data analysis which is a part of the project and explains the methods behind the workflow.

  • Poster

    Understanding genetic disease: Structural analysis of proteins with patient-derived mutations

    (2020-1-20) J. Guzmán-Vega, Francisco; T. Arold, Stefan; King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Biological and Environmental Science and Engineering (BESE), Thuwal, 23955-6900, Saudi Arabia

    UNDERSTANDING GENETIC DISEASE: STRUCTURAL ANALYSIS OF PROTEINS WITH PATIENT-DERIVED MUTATIONS

    Francisco J. Guzmán-Vega, Stefan T. Arold

    Abstract

    Currently, over 8,000 genes have been identified with mutations that are closely associated with human inherited disease. An important application of protein modelling techniques is the analysis of mutated proteins with potential functional or structural alterations that might result in a disease phenotype in humans. We studied and modeled 26 mutations from 15 different proteins related to disease, and classified them by their structural features, which in some cases allow us to predict a mechanism causing the aberrant phenotype.

    Introduction

    The most common cause of monogenic disease is a single-base DNA variant resulting in an amino acid substitution, which can affect protein function by different mechanisms:

    • Folding of the polypeptide chain and stability of the folded conformation

    • Ligand binding or interaction with binding partners

    • Posttranslational modifications

    • Catalytic activity

    Most common methods to identify and classify non-synonymous mutations with deleterious effects on protein function include:

    • Using structural information of its three-dimensional environment

    • Level of conservation and type of residues present at a particular sequence position

    • Calculation of residue solvent accessibility

  • Poster

    Respiration and Heart Movement Monitoring Using AM Continuous-Wave Radar

    (2020-1-20) Ibrahim, Safwan; Al-Nafforui, Tareq; Zayat, Abdullah; Muqaibel, Ali; Ballal, Tarig; King Abdulaziz University, King Fahd University of Petroleum and Minerals, King Abdullah University of Science and Technology

    AbstractA wireless and contactless device senses chest move-ments and measures the respiration and heart rates. A signal is sent toward the body, which will cause a reflected signal. This signal is recorded via the receiver antenna and the information is extracted from its amplitude and phase.Safwan H. Ibrahim, Abdullah A. Zayat, Ali H. Muqaibel, Tarig Ballal and Tareq Y. Al-NaffouriRespiration and Heart Movement MonitoringUsing AM Continuous-Wave RadarMotivationWireless & Contactless Monitoring:Multiple people at the same timeLow-costLess noisyFlexibilityMethodSet-upSample ResultsConclusionReferences The proposed system was able to measure respiration and heart rates with high accuracy compared to con-tact devices. The system seems to have a harmonics problem coming from chest movement and multipath reflections. In future, the work will be focused on canceling the harmonics and monitoring multi-targetsFig.2: A comparison between the system results and the ground-truth (Oximeter)———Maximum AccuracyMinimum AccuracyHeart Rate98.6%95.3%Breathing Rate100%97.2Table.1: Maximum and Minimum Accuracy of the system compared to the OximeterFig.3: User Interface of the system (LabVIEW)Fig.1: The system diagram and the mathematical equations

  • Poster

    Improving variant calling workflow by analyzing allele frequencies in a Saudi population to detect rare variants associated with disease

    (2020-1-20) Al-Saedi, Sakhaa; Hoehndorf, Robert; Bio-Ontology Research Group (BORG); Computational Bioscience Research Center (CBRC); Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division;