Digital Health 2020
Recent Submissions
-
CAN-VP: CANcer Variant Prioritization(2020-1-20) [Poster]Introduction Identifying and prioritizing driver mutations that play main role to develop cancer still a major challenge. Several computational approaches involved machine learning and statistical methods exist to access finding these driver mutations depending on pre-computed pathogenicity scores derived from different tools. We have developed CANcerVariant Prioritization (CAN-VP) system to identify and prioritize driver mutations. Ourtool exploits the background knowledge behind using different ontologies that utilize cellular phenotypes, functions, and whole-body physiological phenotypes besides combining region-based information as features. We demonstrate the performance of CAN-VP in prioritizing causative driver mutations on a number of synthetic whole exome from The Cancer Genome Atlas (TCGA), targeting 4 different primary sites. We find that CAN-VP could identify most of the causative driver mutations compared to the existing tools which showed its capability as a tool for discovering driver mutations. Methods and Materials Data sources We relied on two main types of datasets, first one is from well-known cancer-related databases such as: COSMIC1, CanProVar2, IntOGen3. The second one is the real samples included in The Cancer Genome Atlas (TCGA)4 which involve more than 60 different projects covering 67 primary sites; but till now we focus on 4 projects (Sarcoma, Kidney, Lung, and Bladder). Moreover, we used the 579 validated driver mutations in Bailey, Matthew H., etal5. Results and Discussion 1. Prediction model 1.1 Model details We implemented our CAN-VP using a fully connected neural network model in Python 3.6 as shown in Figure 4. We used Keraswith a TensorFlow backend. We ignored the missing values for all the features being used. We added additional flags for missing values as features. We retrieved genes embeddings from and used them as features in the prediction model. 1.2 Training and testing data We downloaded COSMIC mutations VCF file on 26th Jul, 2019. It includes 4,788,121cancer mutations. We also downloaded DoCMdataset as a VCF file on 18th Nov, 2019. It includes 1364 curated driver mutations. Moreover, we downloaded CanProVaras afastqfile on 18th Nov, 2019. It includes 156,671 driver mutations. Based on that, we tried to find how much mutations of DoCM+ CanProVarexist within COSMIC and consider them as positives; otherwise, they would be negatives. As Table 1 showed, the number of negatives data (unknown driver somatic mutations) are much more than the positive ones (validated as driver mutations). 1.3 Prediction performance We trained our model in Figure 2 using the dataset in Table 1 and do the testing on the synthetic datasets. The updated results of CAN-VP compared to the other tools are shown in Table 2. In terms of evaluating the importance of different features in our prediction model, we first test the different combinations of features from CanDrAwhich includes (86 from CHASMplus and 3 from Mutation Assessor) plus 3 from UCSC. Moreover, we add the gene embeddings and the results become better by 3%. Table 3summaries the performance for each experiment. Future Work - Test CAN-VP on much comprehensive cancer-related datasets. - Integrate graph-basedfeaturestoCAN-VP model. References 1SallyBamford et al. “The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. In: British journal of cancer 91.2 (2004), p. 355. 2 Jing Li, Dexter T Duncan, and Bing Zhang. “CanProVar: a human cancer proteome variation database. In: Human mutation 31.3 (2010), pp. 219–228. 3 GunesGundemet al. “IntOGen: integration and data mining of multidimensional oncogenomic data. In: Nature methods 7.2 (2010), p. 92. 4 Katarzyna Tomczak, Patrycja Czerwínska, andMaciejWiznerowicz. “The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. In: Contemporaryoncology19.1A(2015), A68. 5 Matthew H Bailey et al. “Comprehensive characterization of cancer driver genes and mutations. In: Cell173.2(2018), pp. 371–385.
-
Ontology-based Representations of Biological Entities(2020-1-20) [Poster]Biomedical ontologies are widely used as a way to formally structure and represent knowledge in the biomedical field. Ontologies describe biological concepts and their relations through logical axioms and annotation properties (meta-data). The structure and information contained in biomedical ontologies and their annotations make them valuable for data analysis and knowledge extraction tasks. Despite being a rich source of biomedical information, ontologies are poorly unexploited by ontology-based analysis methods such as semantic similarity measures, which only use limited information from the ontologies. We propose two methods, Onto2Vec and OPA2Vec that can be used to generate vector representations of biological entities, by encoding most of the information in ontologies and their annotations. 1.Onto2Vec: We propose a method that learns dense-vector representations of biological entities based on logical axioms and ontology-based annotations of biological entities: Fig 1. Onto2Vec workflow Onto2Vec learns the vector representations in three steps: •Inferring new axioms using a semantic reasoner. •Representing entity-concept associations as axioms and merging them with the ontology axioms in the corpus. •Training Word2Vec on the ontology corpus. ● 2.OPA2Vec: In addition to formal axioms, ontologies encode a rich meta-data in natural language describing different aspects of the biological concepts (e.g. labels, descriptions, …). This meta-data is completely unexploited by data analysis methods that use ontologies. OPA2Vec generates vector representations of biological entities by: • Combining formal ontology axioms with the ontology meta-data. •Pre-training Word2Vec on PubMed to provide background knowledge about the words and concepts used in the ontology annotation properties 1.Protein interactions using Onto2Vec: - We apply Onto2Vec on the gene ontology (GO) and produce protein vector representations. - The obtained vectors are then trained (using cosine similarity and a neural network) to predict protein interactions on human and yeast and compared to Resnik semantic similarity: human yeast Fig 3. ROC curves for PPI prediction using Onto2Vec 2.Enzyme visualization using Onto2Vec: - The vectors obtained through Onto2Vec can also be used for clustering and identifying entities within the same functional group. - As an example, we illustrate the vector representations of 10,000 enzymes labelled by their first-level EC category: Fig 4. TSNE visualization of 10,000 enzymes using Onto2Vec Protein interaction prediction using OPA2Vec: - To evaluate OPA2Vec, we also apply it on the Gene Ontology and protein-GO annotations to produce vector representations of proteins. To make a better use of the rich meta-data available in GO in the form of labels, descriptions, synonyms, etc, we pre-train Word2Vec on Medline and PMC, and use the trained models to produce the protein vectors. The obtained results are then used to predict protein interactions and compared to Onto2Vec and Resnik: Fig 5. AUC values for PPI prediction using OPA2Vec. Gene—disease association prediction using OPA2VEC: - As an additional evaluation , we applied OPA2Vec on PhenomeNet, jointly with the known gene-phenotype and disease-phenotype associations to obtain vector representations of genes and diseases. - The obtained vectors have then been used to predict gene-disease associations on human and mouse datasets: human mouse Fig 6. ROC curves for gene-disease association prediction using OPA2Vec - We have developed two methods, Onto2Vec and OPA2Vec that can be used to produce vector representations of biological entities based on ontologies and their annotations to properly utilize most of the information encoded in ontology axioms and meta-data. - Our workflow is quite generic and can be applied to a wide range of ontology-based analysis tasks.
-
Smart Oqal: Posture Analysis and Correction Using an Inertia Measurement Unit(2020-1-20) [Poster]AbstractThis project aims to design a head wearable device that measures head movement to track user posture using wireless channel to transmit data and display it through a PC/mobile applica-tion.Safwan H. Ibrahim, Mohanad Ahmed, Ali H. Muqaibel and Tareq Y. Al-NaffouriSmart Oqal: Posture Analysis and Correction Using an Inertia Measurement UnitMotivation2 Hours Spent in PC daily2-4 Hours spent using mobile phones80% of Adults suffer from back pains atsome point of their lives.2nd Leading cause of early retirement inGermanyHardware ImplementationDesign OverviewData CollectionAdvantage over other solutionsConclusionPrototype SpecificationsComponents SelectionIMUuCBluetoothCharging ModuleBatteryFigure below shows the discrete hardware implementation and printed circuit beard (PCB) design of the posture correction device that measures head orientation. The PCB is designed to reduce the device size.SpecificationsPrototypeCommunication technologyBluetooth 4.0 (2.402 – 2.48GHz)BatteryLi-ion, 3.7V, 500mAhCharging portType-cMax current drawn per day460 mA (with vibrator and Bluetooth ON)Avg. current drawn per day300 mAPCB dimensions34.3 × 24.6 × 5mmMobile platformAndroidPC platformWindowsComparison between data of normal daily behavior and data of using the mobile phone (measuring the front view of the head)Number of occurrenceNumber of occurrenceAngle of headAngle of headThere are several products in the market for posture correction:However, our product has two advantages:1– Part of the national headwear of Saudis and GCC citizens (Oqal)2– It directly measures the movement of the head, which has a strong indication of posture problems (As oppose to other product that put their devices on the back)Part of National Headwear (On/In Oqal)Very light & Low power consumptionAssociated with Mobile/PC application (Via Bluetooth)The App intern to alerts the user to incorrect & prolonged posture and form a healthy habit.The device collects data that physicians and physiotherapists can use to diagnose posture problems.
-
Understanding genetic disease: Structural analysis of proteins with patient-derived mutations(2020-1-20) [Poster]UNDERSTANDING GENETIC DISEASE: STRUCTURAL ANALYSIS OF PROTEINS WITH PATIENT-DERIVED MUTATIONS Francisco J. Guzmán-Vega, Stefan T. Arold Abstract Currently, over 8,000 genes have been identified with mutations that are closely associated with human inherited disease. An important application of protein modelling techniques is the analysis of mutated proteins with potential functional or structural alterations that might result in a disease phenotype in humans. We studied and modeled 26 mutations from 15 different proteins related to disease, and classified them by their structural features, which in some cases allow us to predict a mechanism causing the aberrant phenotype. Introduction The most common cause of monogenic disease is a single-base DNA variant resulting in an amino acid substitution, which can affect protein function by different mechanisms: - Folding of the polypeptide chain and stability of the folded conformation - Ligand binding or interaction with binding partners - Posttranslational modifications - Catalytic activity Most common methods to identify and classify non-synonymous mutations with deleterious effects on protein function include: - Using structural information of its three-dimensional environment - Level of conservation and type of residues present at a particular sequence position - Calculation of residue solvent accessibility
-
Respiration and Heart Movement Monitoring Using AM Continuous-Wave Radar(2020-1-20) [Poster]AbstractA wireless and contactless device senses chest move-ments and measures the respiration and heart rates. A signal is sent toward the body, which will cause a reflected signal. This signal is recorded via the receiver antenna and the information is extracted from its amplitude and phase.Safwan H. Ibrahim, Abdullah A. Zayat, Ali H. Muqaibel, Tarig Ballal and Tareq Y. Al-NaffouriRespiration and Heart Movement MonitoringUsing AM Continuous-Wave RadarMotivationWireless & Contactless Monitoring:Multiple people at the same timeLow-costLess noisyFlexibilityMethodSet-upSample ResultsConclusionReferences The proposed system was able to measure respiration and heart rates with high accuracy compared to con-tact devices. The system seems to have a harmonics problem coming from chest movement and multipath reflections. In future, the work will be focused on canceling the harmonics and monitoring multi-targetsFig.2: A comparison between the system results and the ground-truth (Oximeter)———Maximum AccuracyMinimum AccuracyHeart Rate98.6%95.3%Breathing Rate100%97.2Table.1: Maximum and Minimum Accuracy of the system compared to the OximeterFig.3: User Interface of the system (LabVIEW)Fig.1: The system diagram and the mathematical equations
-
Novel Feature Generation for Multiple Hand Gestures Classification(2020-1-20) [Poster]Novel Feature Generation for Multiple Hand Gestures Classification Abderrazak Chahid 1, Rami Khushaba 2, Adel Al-Jumaily 2 and Taous-Meriem Laleg-Kirati 1 1 King Abdullah University of Science and Technology (KAUST). 2 University of Technology, Sydney (UTS), Australia Abstract Surface electromyography (sEMG) signals represent an opportunity to control a multifunctional prosthetic hand in a non-invasive way. In this work, we investigate a novel feature extraction method that improves the interpretation of sEMG signal of multiple hand gestures. So, missing body parts could be perfectly restored!! Introduction Since prosthesis invention, several prostheses were proposed to replace a missing body part, which may be lost through trauma, diseases, etc. Some of these solutions use the non-invasive sEMG signals to control this device 1,2. Objective: - Build a smart prosthetic hand using artificial intelligence (AI) techniques. - Develop a generalizable and robust AI model for multiple hand gesture’ predictions using a novel feature extraction method. Challenges: - Some hand gesture have similar sEMG signals, - Prosthesis response in Real-Time and low cost. Framework The proposed framework is described as follows: - Quantization: sEMG signals are converted into sequences using a uniform Quantizer - QuPWM features: different features are extracted based on the Position Weight Matrix (PWM) method using multiple patterns (k-mers) 4. - Classification: the extracted features are fed to standard classifiers for hand gestures classification. Conclusion - We developed a new feature extraction method using Quantization-based PWM (QuPWM) method. - The obtained results are very encouraging and with high accuracy for different subjects. - We believe that signal processing is a key to extract the inherent features from biomedical signals such as sEMG,…etc. - The proposed features will enhance human–computer interaction (HCI). Future work - Extensive validation using more dataset, - Combine these features with deep learning classifier to deal with big data, - Integrate the QuPWM in clinical practice: prosthesis. References 1 Ciancio AL, Cordella F, Hoffmann KP, Schneider A, Guglielmelli E, Zollo L. Current achievements and future directions of hand prostheses controlled via peripheral nervous system. InThe Hand 2017 (pp. 75-95). Springer, Cham. 2 Ahsan MR, Ibrahimy MI, Khalifa OO. Electromygraphy (EMG) signal based hand gesture recognition using artificial neural network (ANN). In2011 4th International Conference on Mechatronics (ICOM) 2011 May 17 (pp. 1-6). IEEE. 3 Du Y, Wenguang J, Wentao W, Geng W. CapgMyo: a high density surface electromyography database for gesture recognition. 4 Chahid A, Albalawi F, Alotaiby TN, Al-Hameed MH, Alshebeili S, Laleg-Kirati TM. QuPWM: Feature Extraction Method for MEG Epileptic Spike Detection. Under revision in IEEE Journal of Biomedical and Health Informatics, arXiv preprint arXiv:1907.02596. 2019 Jul 3.
-
NNfold: RNA secondary structure prediction by deep learning(2020-1-20) [Poster]NNfold: RNA secondary structure prediction by deep learning RNA molecules have a plethora of functions within the cell. These functions can be divided into information-carrier, catalytic, or structural (scaffolding of other molecules), or a combination. For the catalytic or regulation functionality the structure that the RNA molecule has is pivotal and predicting to which structure it is most likely to fold is therefore essential to fully understand its biological role. In general, RNA affects extensively protein regulation, through its control of gene expression, post-transcriptional modifications, or translational regulation. RNA secondary structure can be obtained by techniques such as X-ray diffraction and NMR. However, biological experimental methods are still inefficient and expensive. Thus, computational prediction algorithms are still widely used for predicting RNA secondary structures. Taking the raw sequence represented in a string, we first use a one-hot encoding. The encoded matrix has a dimension of L by 4. Then, the encoding will go through two models, the local model and the global model, to extract local contact information and global contact information, respectively. Regrading the local model, the input for the model are two chunks of the raw encoding, whose dimensions are 20 by 20. Then we concatenate those 20 by 20 chunk matrices into the L by L local contact information matrix. We used six 1D convolutional layers and one fully-connected layer to model the local information. In terms of the global model, we use three 1D convolutional layers to predict whether a base can pair with any other base or not, whose output is a vector of length L. In the vector, 1 means the corresponding base may pair with the other base and 0 means that the corresponding base does not pair with this base. To combine the local information and the global information, we convert the global vector into a symmetric matrix of L by L and perform a pairwise multiplication between the global information and the local information, enforcing the global constraint into the preliminary contact map. After combining the global information and the local information, the obtained global contact map may still violate the two constraints mentioned above. We used the following greedy sorting algorithm to resolve the conflict. We introduce NNfold, a sequence based deep learning method to predict RNA secondary structure. The predictions are made in two steps: first we construct a matrix with likelihood of each nucleotide pairing by predicting all potential interactions using convolutional deep learning model. Next, we modify the base pairs list obtained from the matrix using second model whose output is used to ensure validity of the final secondary structure. NNfold performed much better than thermodynamics-based methods on the diverse set of RNA sequences, improving average F1 score by 0.20. It is also capable of predicting pseudoknots which is a challenging task for other approaches.
-
Semi-automative binary classification workflow(2020-1-20) [Poster]SEMI-AUTOMATED BINARY CLASSIFICATION WORKFLOW MOTIVATION Nowadays, there is a recent call to build a human-independent intelligence which can assist clinicians during medical diagnosis. In our research, we focus on development of a standartized and tunable workflow for the binary classification problem whithin variuos biological signals (EEG, fMRI, NIRS). This workflow can be openly used by others in their own studies, research and clinical practice. OBJECTIVES 1. Evaluate efficiency of different classification methods on well- known in community datasets 2. Merge produced code into separate workflows for datasets depending on their dimentionality (single-channel, multi- channel, volumetric) CHALLENGES 1. Absence of default data formating among researchers which requires human intervention. 2. Optimization of the code architecture and workflow parallelization (choosing among a range of preprocessing steps, feature generation and classification methods. METHODS In the central part of the poster one may see a picture of two best performing on fMRI Star-Plus dataset workflows we have inside our generalized workflow for multi-dimensional data. Below we describe other options we provide the user whithin the project. PREPROCESSING Frequency filtering Independent Component Analysis Smoothing (exponentional, rollong mean) Global Signal Regression FEATURE GENERATION Fast Fourier Transform Semi-Classical Signal Analysis UTILIZED CLASSIFIERS Support Vector Classifier Logistic Regression Decision Tree Classifier K-Nearest Neighbours Neural Network Convolutional Neural Network CONCLUSIONS We tried to include state of the art practics of brain data analysis and include some novel methods like SCSA. Currently the project is still under development but if you findit interesting and prospectively useful for yourself we suggest to pass our tutorial on single-channel data analysis which is a part of the project and explains the methods behind the workflow.
-
BEOL NEM Relay-Based Inductorless DC-DC Converters(2020-1-20) [Poster]BEOL NEM Relay-Based Inductorless DC-DC Converters Ren Li, Dias Azhigulov, Ahmed Allehyani, and Hossein Fariborzi Introduction •The voltage level requirement varies across the integrated circuit chip. Therefore, on-chip voltageconversions are crucial for anychip to be operational. •This work proposes a novel solution for voltage regulation using Back-End-Of-Line (BEOL)NEM relays. •The proposed Buck and Boost converters offer low voltage ripple, high conversion efficiency and low area overhead. Relay and its Operation •The relay is actuated electrostatically by the voltage difference between G and B. •Two inherent threshold voltages:VPI (pull-in voltage) and VPO(pull-out voltage). Relay Buck Converter •The buck converter uses Switched-Capacitor Buck Converter (SCBC) configuration. Relay Boost Converter •The boost converter utilizes Switched-Capacitor Voltage Doubler (SCVD) concept. Results and comparisons •Custom Verilog-A relay model is used for Relays •Validated against a commercialFEM tool, MEMS Conclusion The BEOL NEM relay-based converters have the following benefits: •Capable of handling higher voltages; •Utilize switched-capacitorconfiguration to avoid bulky on-chip inductors; •Zero sub-threshold leakage andsharp turn on/off curves; •Low output voltage ripple and highconversion efficiency. •Perfect for ultra low power applications, such as IoT and biological, implantable medical devices. References 1 R. Li, D. Azhigulov, A. Allehyani and H. Fariborzi, BEOL NEM Relay-Based Inductorless DC-DC Converters in 2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020. 2 D. Ma and R. Bondade, Reconfigurable switched-capacitor power converters: principles and designs for self-powered microsystems, 1 ed. New York: Springer-Verlag New York, 2013, p. 178. 3 R. Li, R. Alhadrami, and H. Fariborzi, BEOL NEM Relay Based Sequential Logic Circuits, in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019.
-
Targeted individual molecule sequencing enables the detection of ultra-rare variants(2020-1-20) [Poster]Targeted individual molecule sequencing enables the detection of ultra-rare variants Chongwei Bi, Lin Wang, Baolei Yuan, and Mo Li Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, KSA Abstract Genome contains hereditary information of the species, and its integrity is vital to life. With the developing of next-generation sequencing (NGS) technologies, we are gaining an unprecedented advance in decoding the genomic information in recent years. However, the current sequencing strategies are mainly population-based, in which samples come from DNA pools of bulk cells. They are not suitable for detecting rare mutations in a subpopulation of cells. Here we introduce targeted individual DNA molecule sequencing (IDMseq), which is a strategy to label and sequence individual DNA molecules from a pool. We applied IDMseq to detect rare mutations. Results showed that IDMseq is broadly applicable to any high-throughput sequencing platforms, and is capable to detect ultra-rare mutations at 1:10,000 allele frequency. Conclusion Inourpreliminaryworkwehave demonstrated that IDMseq is efficient and sensitive in detecting ultra-rare mutations, without limitation in the sequencing platform. IDMseqalsoshowedtheability to detect somatic mutations within the genome. The somatic SNP load calculated from IDMseq is comparable with the published data2. IDMseqwithlongamplicon showed the ability in detecting large SVs at allele resolution, which will potentially benefit other studies in the field.
-
Deep Learning Enables Rapid Identification of Antibiotic Resistance Genes(2020-1-20) [Poster]Antibiotic Resistance Genes (ARGs) are one of the key components of antibioticresistance, which has become one of the most urgent threats to global health.Here we propose an endto end Hierarchical Multi task Deep learningframework for Antibiotic Resistance Gene annotation (HMD ARG), taking rawsequence encoding as input and then annotating ARGs sequences from threeaspects: resistant drug type, the underlying mechanism of resistance, and genemobility. Experimental results suggest that HMD ARG can serve as a useful toolfor the ARG investigation.
-
Towards a Mechanistic Understanding of the Tumor Suppressor Function of Wiskott-Aldrich Syndrome Protein(2020-1-20) [Poster]Wiskott-Aldrich syndrome (WAS) is a rare pediatric disorder caused by mutations in the WAS gene. The biological features of this disease include thrombocytopenia, eczema, complex immunodeficiency, and malignancy. WAS protein (WASP), encoded by the WAS gene, is a classical actin nucleation-promoting factor. Yet, the well-known functions of WASP fail to fully explain the high rate (13%~ 22%) of cancer in children with WAS. Recently, WASP was identified as a tumor suppressor by Chiarle’s group; however, the mechanism of its tumor suppressor function is not clear. Mounting evidence has already demonstrated that the ribosomal DNA (rDNA) gene inside the nucleolus is critical for genome stability, chromatin structure, and cancer pathogenesis. In addition, the perinucleolar heterochromatin shows structural alterations in cancer cells. Here, we use induced pluripotent stem cells (iPSCs) from patients with WAS (WAS-iPSC), isogenic gene- corrected cells (cWAS-iPSC), WASP knock out iPSCs, and B lymphoblastoid cell lines to study the mechanisms of WAS pathogenesis. Our results show that WASP interacts physically with partners inside the nucleolus and binds to the rDNA. Mutation cells undergo 5S rDNA copy number amplification and 45S rDNA array loss. WASP deficiency results in perinucleolar heterochromatin lost, irregulate nucleolus shape, and chromosomal aberrations. Taken together, our results show that WASP is important for genome stability, revealing its tumor suppressor mechanisms in blood cells.
-
A novel iPSC model of isogenic knockout of entire WAS gene can recapitulate WAS phenotypes in iPSC derived macrophages(2020-1-20) [Poster]A novel iPSC model of isogenic knockout of entire WAS gene can recapitulate WAS phenotypes in iPSC derived macrophages Baolei Yuan1, Xuan Zhou1, Gerardo Ramos-Mandujano1, Lorena V. Cortes-Medina2, Chongwei Bi1, Mo Li1 Wiskott-Aldrich syndrome (WAS) is an X-linked recessive disease caused by mutations in the WAS protein (WASP). WAS is associated with devastating symptoms including microthrombocytopenia, eczema, autoimmunity and cancer. The molecular mechanism underlying WAS remains elusive thus far. The genotype-phenotype relationship in WAS is complex. There are over 200 mutations that lead to hypomorphic levels or complete loss of WASP, while it is impossible to predict clinical severity based on the mutation alone. To help evaluate phenotype variability due to mutational background of different patients, we developed an isogenic WASP-knockout (WASP-KO) induced pluripotent stem cell (iPSC) model using the CRISPR/Cas9 technique that completely removed the WAS gene. The isogenic iPSC model was differentiated into macrophages, which are reported to be affected by WASP mutations. This model can be used for studying the WASP functions, WAS disease mechanism and drug screening.
-
Deetal-Perio: DEEp denTAL Advisor for Periodontitis Diagnosis based on Two-step Segmentation of Teeth and Gingiva with Lower-dimensional Features(2020-1-20) [Poster]Deetal-Perio: DEEp denTAL Advisor for Periodontitis Diagnosis based on Two-step Segmentation of Teeth and Gingiva with Lower-dimensional Features Haoyang Li1,2, Juexiao Zhou1,3 , Xin Gao1,* 1 Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia 2 MOE Key Laboratory of Symbolic Computation and Knowledge Engineering, College of Computer Science and Technology, Jilin University, Changchun 130012, China 3 Department of Biology, Southern University of Science and Technology, Shenzhen 518055, China Background Periodontitis is often known as Gum Disease and is a very common condition in which the gums and deeper periodontal structures become inflamed. This inflammation is the result of response to the invasion bacteria influenced by genetic and lifestyle-associated factors1. Periodontitis usually takes the form of redness, swelling and a tendency to bleed during tooth brushing and the severe periodontitis ranks sixth in the Global Burden of Disease study that affects 11% of the world population2. Also, periodontitis may be a risk factor for cardiovascular disease3 and has an additive effect on development of diabetic complications4. X-ray is a widely used, economy and convenient method to scan the teeth and study the periodontal diseases. Therefore, the prediction of periodontitis based on X-ray image has high practical application value. Highlights Lower-dimensional and interpretable features. Outperforms other state-of-the-art methods. Reveals the significance of crown-root ratio(CR) as the key feature for periodontitis prediction Introduction The majority of the previous works on the prediction of periodontitis focus on mainly two categories of methods, traditional machine learning methods and CNN based methods, while the general form of input data are the raw image or multi-modal data of patients. Methods In this project, we predict the class of periodontitis based on X-ray images of patients following two-step segmentation of tooth and gingiva. • DatasetX-ray images of 300 patients are from dental clinics in China. The contour of teeth, gingiva and the level of periodontitis are annotated by professional dentists. • Segmentation of Teeth and GingivaThe segmentation of teeth and gingiva is based on our well-trained Mask-RCNN model. • Prediction and Calibration of Tooth Numbering The teeth numbering is predicted by both the multi-class Mask-RCNN (exact teeth numbering in the FDI numbering system) and binary Mask-RCNN (is a tooth or not). Then our calibration method will output the final teeth numbering results by integrating the results of both types of Mask-RCNN. • Calculation of ABL (Feature of Periodontitis) After the segmentation of teeth and gingiva, for each tooth, the loss of alveolar bone (ABL) is calculated with the largest perpendicular distance of both teeth crown and teeth root to the intersected gingiva. The 32 teeth of each sample will be reorganized into a 1x32 vector for the prediction of periodontitis. • Prediction of PeriodontitisThe 1x32 vector of teeth ratio is post-processed with interpolation, then the Synthetic Minority Oversampling (SMOTE) is adopted to solve the class-imbalance issue. Next, the XGboost is applied to do the classification of periodontitis. • Evaluation of MethodsMean average precision (mAP), Dice coefficient, Accuracy and F1-Score are used to evaluate our results. Results• Our method is powerful for teeth segmentation and numbering • Our method can handle both 3-Classes and 4-Classes classification and outperforms other compare methods • Our method is robust with respect to the class size References 1. Page RC, Kornman KS. The pathogenesis of human periodontitis: An introduction. Periodontol 2000 1997; 14: 9–11.2. Marcenes W, Kassebaum NJ, Bernabé E, et al. Global burden of oral conditions in 1990-2010: A systematic analysis. J Dent Res2013; 92: 592–597.3. Tonetti MS, Van Dyke TE; Working Group 1 of the Joint EFP/AAP Workshop. Periodontitis and atherosclerotic cardiovascular disease: Consensus report of the Joint European Federation of Periodontology and the American Academy of Periodontology Workshop on periodontitis and systemic diseases. J Clin Periodontol 2013; 40(Suppl. 14): S24–S29. 4. Lalla E, Papapanou PN. Diabetes mellitus and periodontitis: A tale of two common interrelated diseases. Nat Rev Endocrinol 2011; 7: 738–748.
-
Prioritizing Copy Number Variants using Phenotype and Gene Functional Similarity(2020-1-20) [Poster]Abstract Background: There are many types of genetic variation in the human genome, ranging from large chromosome anomalies to Single Nucleotide Variant (SNV). It is becoming necessary to develop methods for distinguishing disease-causing variants from a large number of neutral genetic variation in an individual. This problem is also relevant to Copy Number Variants (CNVs), which is a class of genetic variation where large segments of the genome differ in copy number amongst various individuals. Results:. We have built a method that incorporates biological background knowledge about the relation between phenotypes resulting from a loss of function in mouse genes, gene functions as described using the Gene Ontology (GO), as well as the anatomical site of gene expression along with a score that predicts the pathogenicity of CNV SVScore. We use this information to build a machine learning model that ranks CNVs based on their predicted pathogenicity and the relation between genes affected by the CNV and the phenotype we observe in affected individuals. Our method achieves an F-score of 99.23%, with 99.18% precision in our evaluation set. Introduction Over the past several years, much progress has been made in the area of CNVs detection and understanding their role in human diseases 1,2,3. We now understand that CNVs account for much of human variability. Correspondingly, there have been several methods introduced to find disease-associated genes and SNVs 4,5,6. Constructing similar methods for CNV is challenging due to the heterogeneity in variant size, type and the possibility of multiple genes being affected by large CNVs. CNV impact prediction methods should consider these factors in order to robustly prioritize pathogenic variants. Results The performance of our methods is based on a dataset of CNVs detected in structure variants with known phenotypes. These CNVs were evaluated as harmful or benign. Our results show that incorporating this information leads to improvement over a baseline model (Fig 2) which uses only similarity scores between gene phenotype associations and disease associated phenotypes, as well as improvement over using only pathogenicity prediction methods for CNVs. Our method achieves an F-score of 99.23%, with 99.18% precision. Future work Future work is required to evaluate and improve our model using patient-derived WGS data. Moreover, establishing a workflow that incorporating existing tools for CNV calling from BAM/Fastq file to SV. Then we can test the method using real samples with known CNV disease.