• Login
    Search 
    •   Home
    • Research
    • Articles
    • Search
    •   Home
    • Research
    • Articles
    • Search
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Filter by Category

    Author
    Gao, Xin (9)
    Li, Yu (4)Han, Renmin (3)Chen, Wei (2)Ding, Lizhong (2)View MoreDepartmentComputational Bioscience Research Center (CBRC) (9)Computer Science Program (9)Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division (9)Biological and Environmental Sciences and Engineering (BESE) Division (1)Bioscience Program (1)JournalBioinformatics (6)BMC bioinformatics (1)Genomics, Proteomics & Bioinformatics (1)KAUST Acknowledged Support UnitOffice of Sponsored Research (OSR) (1)KAUST Grant Number
    URF/1/2602-01 (9)
    URF/1/3007-01 (8)URF/1/3450-01 (7)FCC/1/1976-04 (6)URF/1/3412-01 (6)View MorePublisherOxford University Press (OUP) (5)Cold Spring Harbor Laboratory (2)Elsevier BV (1)Springer Science and Business Media LLC (1)SubjectCryo-ET (1)Fine-grained alignment (1)MPI (1)Stochastic average gradient (1)View MoreTypeArticle (8)Preprint (1)Year (Issue Date)2019 (2)2018 (6)2017 (1)Item AvailabilityOpen Access (9)

    Browse

    All of KAUSTCommunitiesIssue DateSubmit DateThis CollectionIssue DateSubmit Date

    My Account

    Login

    Quick Links

    Open Access PolicyORCID LibguidePlumX LibguideSubmit an Item

    Statistics

    Display statistics
     

    Search

    Show Advanced FiltersHide Advanced Filters

    Filters

    Now showing items 1-9 of 9

    • List view
    • Grid view
    • Sort Options:
    • Relevance
    • Title Asc
    • Title Desc
    • Issue Date Asc
    • Issue Date Desc
    • Submit Date Asc
    • Submit Date Desc
    • Results Per Page:
    • 5
    • 10
    • 20
    • 40
    • 60
    • 80
    • 100

    • 9CSV
    • 9RefMan
    • 9EndNote
    • 9BibTex
    • Selective Export
    • Select All
    • Help
    Thumbnail

    SupportNet: a novel incremental learning framework through deep learning and support data

    Li, Yu; Li, Zhongxiao; Ding, Lizhong; Hu, Yuhui; Chen,Wei; Gao, Xin (Cold Spring Harbor Laboratory, 2018-05-08) [Preprint]
    Motivation: In most biological data sets, the amount of data is regularly growing and the number of classes is continuously increasing. To deal with the new data from the new classes, one approach is to train a classification model, e.g., a deep learning model, from scratch based on both old and new data. This approach is highly computationally costly and the extracted features are likely very different from the ones extracted by the model trained on the old data alone, which leads to poor model robustness. Another approach is to fine tune the trained model from the old data on the new data. However, this approach often does not have the ability to learn new knowledge without forgetting the previously learned knowledge, which is known as the catastrophic forgetting problem. To our knowledge, this problem has not been studied in the field of bioinformatics despite its existence in many bioinformatic problems. Results: Here we propose a novel method, SupportNet, to solve the catastrophic forgetting problem efficiently and effectively. SupportNet combines the strength of deep learning and support vector machine (SVM), where SVM is used to identify the support data from the old data, which are fed to the deep learning model together with the new data for further training so that the model can review the essential information of the old data when learning the new information. Two powerful consolidation regularizers are applied to ensure the robustness of the learned model. Comprehensive experiments on various tasks, including enzyme function prediction, subcellular structure classification and breast tumor classification, show that SupportNet drastically outperforms the state-of-the-art incremental learning methods and reaches similar performance as the deep learning model trained from scratch on both old and new data. Availability: Our program is accessible at: \url{https://github.com/lykaust15/SupportNet}.
    Thumbnail

    A fast fiducial marker tracking model for fully automatic alignment in electron tomography

    Han, Renmin; Zhang, Fa; Gao, Xin (Bioinformatics, Oxford University Press (OUP), 2017-10-20) [Article]
    Automatic alignment, especially fiducial marker-based alignment, has become increasingly important due to the high demand of subtomogram averaging and the rapid development of large-field electron microscopy. Among the alignment steps, fiducial marker tracking is a crucial one that determines the quality of the final alignment. Yet, it is still a challenging problem to track the fiducial markers accurately and effectively in a fully automatic manner.In this paper, we propose a robust and efficient scheme for fiducial marker tracking. Firstly, we theoretically prove the upper bound of the transformation deviation of aligning the positions of fiducial markers on two micrographs by affine transformation. Secondly, we design an automatic algorithm based on the Gaussian mixture model to accelerate the procedure of fiducial marker tracking. Thirdly, we propose a divide-and-conquer strategy against lens distortions to ensure the reliability of our scheme. To our knowledge, this is the first attempt that theoretically relates the projection model with the tracking model. The real-world experimental results further support our theoretical bound and demonstrate the effectiveness of our algorithm. This work facilitates the fully automatic tracking for datasets with a massive number of fiducial markers.The C/C ++ source code that implements the fast fiducial marker tracking is available at https://github.com/icthrm/gmm-marker-tracking. Markerauto 1.6 version or later (also integrated in the AuTom platform at http://ear.ict.ac.cn/) offers a complete implementation for fast alignment, in which fast fiducial marker tracking is available by the
    Thumbnail

    Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization.

    Lü, Yongchun; Zeng, Xiangrui; Zhao, Xiaofang; Li, Shirui; Li, Hua; Gao, Xin; Xu, Min (BMC bioinformatics, Springer Science and Business Media LLC, 2019-08-28) [Article]
    Background Cryo-electron tomography (Cryo-ET) is an imaging technique used to generate three-dimensional structures of cellular macromolecule complexes in their native environment. Due to developing cryo-electron microscopy technology, the image quality of three-dimensional reconstruction of cryo-electron tomography has greatly improved. However, cryo-ET images are characterized by low resolution, partial data loss and low signal-to-noise ratio (SNR). In order to tackle these challenges and improve resolution, a large number of subtomograms containing the same structure needs to be aligned and averaged. Existing methods for refining and aligning subtomograms are still highly time-consuming, requiring many computationally intensive processing steps (i.e. the rotations and translations of subtomograms in three-dimensional space). Results In this article, we propose a Stochastic Average Gradient (SAG) fine-grained alignment method for optimizing the sum of dissimilarity measure in real space. We introduce a Message Passing Interface (MPI) parallel programming model in order to explore further speedup. Conclusions We compare our stochastic average gradient fine-grained alignment algorithm with two baseline methods, high-precision alignment and fast alignment. Our SAG fine-grained alignment algorithm is much faster than the two baseline methods. Results on simulated data of GroEL from the Protein Data Bank (PDB ID:1KP8) showed that our parallel SAG-based fine-grained alignment method could achieve close-to-optimal rigid transformations with higher precision than both high-precision alignment and fast alignment at a low SNR (SNR=0.003) with tilt angle range ±60∘ or ±40∘. For the experimental subtomograms data structures of GroEL and GroEL/GroES complexes, our parallel SAG-based fine-grained alignment can achieve higher precision and fewer iterations to converge than the two baseline methods.
    Thumbnail

    DeeReCT-PolyA: a robust and generic deep learning method for PAS identification

    Xia, Zhihao; Li, Yu; Zhang, Bin; Li, Zhongxiao; Hu, Yuhui; Chen, Wei; Gao, Xin (Bioinformatics, Oxford University Press (OUP), 2018-11-30) [Article]
    Motivation \nPolyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PAS) identification is not only desired for the purpose of better transcripts’ end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. Although many methods have been proposed for PAS recognition, most of them are PAS motif-specific and human-specific, which leads to high risks of overfitting, low generalization power, and inability to reveal the connections between the underlying mechanisms of different mammals. \nResults \nIn this work, we propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features. We show that our single model trained over all human PAS motifs not only outperforms the state-of-theart methods trained on specific motifs, but can also be generalized well to two mouse data sets. Moreover, we further increase the prediction accuracy by transferring the deep learning model trained on the data of one species to the data of a different species. Several novel underlying poly(A) patterns are revealed through the visualization of important oligomers and positions in our trained models. Finally, we interpret the deep learning models by converting the convolutional filters into sequence logos and quantitatively compare the sequence logos between human and mouse datasets.
    Thumbnail

    Deep Learning Deepens the Analysis of Alternative Splicing

    Zou, Xudong; Gao, Xin; Chen, Wei (Genomics, Proteomics & Bioinformatics, Elsevier BV, 2019-05-14) [Article]
    Thumbnail

    Systematic selection of chemical fingerprint features improves the Gibbs energy prediction of biochemical reactions

    Alazmi, Meshari; Kuwahara, Hiroyuki; Soufan, Othman; Ding, Lizhong; Gao, Xin (Bioinformatics, Oxford University Press (OUP), 2018-12-24) [Article]
    Motivation \nAccurate and wide-ranging prediction of thermodynamic parameters for biochemical reactions can facilitate deeper insights into the workings and the design of metabolic systems. \n \nResults \nHere, we introduce a machine learning method with chemical fingerprint-based features for the prediction of the Gibbs free energy of biochemical reactions. From a large pool of 2D fingerprint-based features, this method systematically selects a small number of relevant ones and uses them to construct a regularized linear model. Since a manual selection of 2D structurebased features can be a tedious and time-consuming task, requiring expert knowledge about the structure-activity relationship of chemical compounds, the systematic feature selection step in our method offers a convenient means to identify relevant 2D fingerprint-based features. By comparing our method with state-of-the-art linear regression-based methods for the standard Gibbs free energy prediction, we demonstrated that its prediction accuracy and prediction coverage are most favorable. Our results show direct evidence that a number of 2D fingerprints collectively provide useful information about the Gibbs free energy of biochemical reactions and that our systematic feature selection procedure provides a convenient way to identify them.
    Thumbnail

    DLBI: deep learning guided Bayesian inference for structure reconstruction of super-resolution fluorescence microscopy

    Li, Yu; Xu, Fan; Zhang, Fa; Xu, Pingyong; Zhang, Mingshu; Fan, Ming; Li, Lihua; Gao, Xin; Han, Renmin (Bioinformatics, Oxford University Press (OUP), 2018-06-27) [Article]
    Super-resolution fluorescence microscopy with a resolution beyond the diffraction limit of light, has become an indispensable tool to directly visualize biological structures in living cells at a nanometer-scale resolution. Despite advances in high-density super-resolution fluorescent techniques, existing methods still have bottlenecks, including extremely long execution time, artificial thinning and thickening of structures, and lack of ability to capture latent structures.Here, we propose a novel deep learning guided Bayesian inference (DLBI) approach, for the time-series analysis of high-density fluorescent images. Our method combines the strength of deep learning and statistical inference, where deep learning captures the underlying distribution of the fluorophores that are consistent with the observed time-series fluorescent images by exploring local features and correlation along time-axis, and statistical inference further refines the ultrastructure extracted by deep learning and endues physical meaning to the final image. In particular, our method contains three main components. The first one is a simulator that takes a high-resolution image as the input, and simulates time-series low-resolution fluorescent images based on experimentally calibrated parameters, which provides supervised training data to the deep learning model. The second one is a multi-scale deep learning module to capture both spatial information in each input low-resolution image as well as temporal information among the time-series images. And the third one is a Bayesian inference module that takes the image from the deep learning module as the initial localization of fluorophores and removes artifacts by statistical inference. Comprehensive experimental results on both real and simulated datasets demonstrate that our method provides more accurate and realistic local patch and large-field reconstruction than the state-of-the-art method, the 3B analysis, while our method is more than two orders of magnitude faster.The main program is available at https://github.com/lykaust15/DLBI.Supplementary data are available at Bioinformatics online.
    Thumbnail

    DeepSimulator: a deep simulator for Nanopore sequencing

    Li, Yu; Han, Renmin; Bi, Chongwei; Li, Mo; Wang, Sheng; Gao, Xin (Bioinformatics, Cold Spring Harbor Laboratory, 2018-04-06) [Article]
    Oxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals.Here we propose a deep learning based simulator, Deep- Simulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83% to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection.The software can be accessed freely at: https://github.com/lykaust15/DeepSimulator.
    Thumbnail

    OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction

    Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Bioinformatics, Oxford University Press (OUP), 2018-11-08) [Article]
    Motivation:Ontologies are widely used in biology for data annotation, integration, and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions, or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such. Results:We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology metadata. We apply a Word2Vec model that has been pre-trained on either a corpus or abstracts or full-text articles to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins in a similarity measure to predict protein-protein interaction on two different datasets. Second, we evaluate our method on predicting gene-disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene-disease associations using mouse model phenotypes. We demonstrate that OPA2Vec significantly outperforms existing methods for predicting gene-disease associations. Using evidence from mouse models, we apply OPA2Vec to identify candidate genes for several thousand rare and orphan diseases. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology. Availability:https://github.com/bio-ontology-research-group/opa2vec.
    DSpace software copyright © 2002-2019  DuraSpace
    Quick Guide | Contact Us | Send Feedback
    Open Repository is a service hosted by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.