• Login
    Search 
    •   Home
    • Research
    • Articles
    • Search
    •   Home
    • Research
    • Articles
    • Search
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Filter by Category

    AuthorGao, Xin (8)Li, Yu (5)Han, Renmin (4)Wang, Sheng (4)Chen, Wei (2)View MoreDepartmentComputational Bioscience Research Center (CBRC) (9)
    Computer Science Program (9)
    Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division (9)Biological and Environmental Sciences and Engineering (BESE) Division (1)Bioscience Program (1)JournalBioinformatics (5)Bioinformatics (Oxford, England) (1)Genomics, Proteomics & Bioinformatics (1)Quantitative Biology (1)KAUST Acknowledged Support UnitOffice of Sponsored Research (OSR) (1)KAUST Grant Number
    URF/1/3412-01 (9)
    FCC/1/1976-04 (8)URF/1/3450-01 (8)URF/1/3007-01 (7)URF/1/2602-01 (6)View MorePublisherOxford University Press (OUP) (5)Cold Spring Harbor Laboratory (2)Elsevier BV (1)Springer Nature (1)Subjectbase-calling (1)bi-directional WaveNets (1)deep learning (1)nanopore sequencing (1)third generation sequencing (1)View MoreTypeArticle (8)Preprint (1)Year (Issue Date)2019 (2)2018 (7)Item AvailabilityOpen Access (8)Embargoed (1)

    Browse

    All of KAUSTCommunitiesIssue DateSubmit DateThis CollectionIssue DateSubmit Date

    My Account

    Login

    Quick Links

    Open Access PolicyORCID LibguidePlumX LibguideSubmit an Item

    Statistics

    Display statistics
     

    Search

    Show Advanced FiltersHide Advanced Filters

    Filters

    Now showing items 1-9 of 9

    • List view
    • Grid view
    • Sort Options:
    • Relevance
    • Title Asc
    • Title Desc
    • Issue Date Asc
    • Issue Date Desc
    • Submit Date Asc
    • Submit Date Desc
    • Results Per Page:
    • 5
    • 10
    • 20
    • 40
    • 60
    • 80
    • 100

    • 9CSV
    • 9RefMan
    • 9EndNote
    • 9BibTex
    • Selective Export
    • Select All
    • Help
    Thumbnail

    SupportNet: a novel incremental learning framework through deep learning and support data

    Li, Yu; Li, Zhongxiao; Ding, Lizhong; Hu, Yuhui; Chen,Wei; Gao, Xin (Cold Spring Harbor Laboratory, 2018-05-08) [Preprint]
    Motivation: In most biological data sets, the amount of data is regularly growing and the number of classes is continuously increasing. To deal with the new data from the new classes, one approach is to train a classification model, e.g., a deep learning model, from scratch based on both old and new data. This approach is highly computationally costly and the extracted features are likely very different from the ones extracted by the model trained on the old data alone, which leads to poor model robustness. Another approach is to fine tune the trained model from the old data on the new data. However, this approach often does not have the ability to learn new knowledge without forgetting the previously learned knowledge, which is known as the catastrophic forgetting problem. To our knowledge, this problem has not been studied in the field of bioinformatics despite its existence in many bioinformatic problems. Results: Here we propose a novel method, SupportNet, to solve the catastrophic forgetting problem efficiently and effectively. SupportNet combines the strength of deep learning and support vector machine (SVM), where SVM is used to identify the support data from the old data, which are fed to the deep learning model together with the new data for further training so that the model can review the essential information of the old data when learning the new information. Two powerful consolidation regularizers are applied to ensure the robustness of the learned model. Comprehensive experiments on various tasks, including enzyme function prediction, subcellular structure classification and breast tumor classification, show that SupportNet drastically outperforms the state-of-the-art incremental learning methods and reaches similar performance as the deep learning model trained from scratch on both old and new data. Availability: Our program is accessible at: \url{https://github.com/lykaust15/SupportNet}.
    Thumbnail

    DeeReCT-PolyA: a robust and generic deep learning method for PAS identification

    Xia, Zhihao; Li, Yu; Zhang, Bin; Li, Zhongxiao; Hu, Yuhui; Chen, Wei; Gao, Xin (Bioinformatics, Oxford University Press (OUP), 2018-11-30) [Article]
    Motivation \nPolyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PAS) identification is not only desired for the purpose of better transcripts’ end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. Although many methods have been proposed for PAS recognition, most of them are PAS motif-specific and human-specific, which leads to high risks of overfitting, low generalization power, and inability to reveal the connections between the underlying mechanisms of different mammals. \nResults \nIn this work, we propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features. We show that our single model trained over all human PAS motifs not only outperforms the state-of-theart methods trained on specific motifs, but can also be generalized well to two mouse data sets. Moreover, we further increase the prediction accuracy by transferring the deep learning model trained on the data of one species to the data of a different species. Several novel underlying poly(A) patterns are revealed through the visualization of important oligomers and positions in our trained models. Finally, we interpret the deep learning models by converting the convolutional filters into sequence logos and quantitatively compare the sequence logos between human and mouse datasets.
    Thumbnail

    Deep Learning Deepens the Analysis of Alternative Splicing

    Zou, Xudong; Gao, Xin; Chen, Wei (Genomics, Proteomics & Bioinformatics, Elsevier BV, 2019-05-14) [Article]
    Thumbnail

    An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing

    Han, Renmin; Li, Yu; Gao, Xin; Wang, Sheng (Bioinformatics, Oxford University Press (OUP), 2018-09-08) [Article]
    Motivation Long-reads, point-of-care and polymerase chain reaction-free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the end-to-end mapping between the raw electrical current signal sequence and the reference expected signal sequence serves as the key building block to signal labeling, and the following signal visualization, variant identification and methylation detection. One of the classic algorithms to solve the signal mapping problem is the dynamic time warping (DTW). However, the ultra-long nanopore sequencing and an order of magnitude difference in the sampling speed complexify the scenario and make the classical DTW infeasible to solve the problem. Results Here, we propose a novel multi-level DTW algorithm, continuous wavelet DTW (cwDTW), based on continuous wavelet transforms with different scales of the two signal sequences. Our algorithm starts from low-resolution wavelet transforms of the two sequences, such that the transformed sequences are short and have similar sampling rates. Then the peaks and nadirs of the transformed sequences are extracted to form feature sequences with similar lengths, which can be easily mapped by the original DTW. Our algorithm then recursively projects the warping path from a lower-resolution level to a higher-resolution one by building a context-dependent boundary and enabling a constrained search for the warping path in the latter. Comprehensive experiments on two real nanopore datasets on human and on Pandoraea pnomenusa demonstrate the efficiency and effectiveness of the proposed algorithm. In particular, cwDTW can gain remarkable acceleration with tiny loss of the alignment accuracy. On the real nanopore datasets, cwDTW can finish an alignment task in few seconds, which is about 3000 times faster than the original DTW. By successfully applying cwDTW on the tasks of signal labeling and ultra-long sequence comparison, we further demonstrate the power and applicability of cwDTW. Availability and implementation Our program is available at https://github.com/realbigws/cwDTW. Supplementary information Supplementary data are available at Bioinformatics online.
    Thumbnail

    DLBI: deep learning guided Bayesian inference for structure reconstruction of super-resolution fluorescence microscopy

    Li, Yu; Xu, Fan; Zhang, Fa; Xu, Pingyong; Zhang, Mingshu; Fan, Ming; Li, Lihua; Gao, Xin; Han, Renmin (Bioinformatics, Oxford University Press (OUP), 2018-06-27) [Article]
    Super-resolution fluorescence microscopy with a resolution beyond the diffraction limit of light, has become an indispensable tool to directly visualize biological structures in living cells at a nanometer-scale resolution. Despite advances in high-density super-resolution fluorescent techniques, existing methods still have bottlenecks, including extremely long execution time, artificial thinning and thickening of structures, and lack of ability to capture latent structures.Here, we propose a novel deep learning guided Bayesian inference (DLBI) approach, for the time-series analysis of high-density fluorescent images. Our method combines the strength of deep learning and statistical inference, where deep learning captures the underlying distribution of the fluorophores that are consistent with the observed time-series fluorescent images by exploring local features and correlation along time-axis, and statistical inference further refines the ultrastructure extracted by deep learning and endues physical meaning to the final image. In particular, our method contains three main components. The first one is a simulator that takes a high-resolution image as the input, and simulates time-series low-resolution fluorescent images based on experimentally calibrated parameters, which provides supervised training data to the deep learning model. The second one is a multi-scale deep learning module to capture both spatial information in each input low-resolution image as well as temporal information among the time-series images. And the third one is a Bayesian inference module that takes the image from the deep learning module as the initial localization of fluorophores and removes artifacts by statistical inference. Comprehensive experimental results on both real and simulated datasets demonstrate that our method provides more accurate and realistic local patch and large-field reconstruction than the state-of-the-art method, the 3B analysis, while our method is more than two orders of magnitude faster.The main program is available at https://github.com/lykaust15/DLBI.Supplementary data are available at Bioinformatics online.
    Thumbnail

    WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets

    Wang, Sheng; Li, Zhen; Yu, Yizhou; Gao, Xin (Quantitative Biology, Springer Nature, 2018-11-24) [Article]
    Background \nThe Oxford MinION nanopore sequencer is the recently appealing third-generation genome sequencing device that is portable and no larger than a cellphone. Despite the benefits of MinION to sequence ultra-long reads in real-time, the high error rate of the existing base-calling methods, especially indels (insertions and deletions), prevents its use in a variety of applications. \n \nMethods \nIn this paper, we show that such indel errors are largely due to the segmentation process on the input electrical current signal from MinION. All existing methods conduct segmentation and nucleotide label prediction in a sequential manner, in which the errors accumulated in the first step will irreversibly influence the final base-calling. We further show that the indel issue can be significantly reduced via accurate labeling of nucleotide and move labels directly from the raw signal, which can then be efficiently learned by a bi-directionalWaveNet model simultaneously through feature sharing. Our bi-directional WaveNet model with residual blocks and skip connections is able to capture the extremely long dependency in the raw signal. Taking the predicted move as the segmentation guidance, we employ the Viterbi decoding to obtain the final base-calling results from the smoothed nucleotide probability matrix. \nResults \nOur proposed base-caller, WaveNano, achieves good performance on real MinION sequencing data from Lambda phage. \nConclusions \nThe signal-level nanopore base-callerWaveNano can obtain higher base-calling accuracy, and generate fewer insertions/deletions in the base-called sequences.
    Thumbnail

    DeepSimulator: a deep simulator for Nanopore sequencing

    Li, Yu; Han, Renmin; Bi, Chongwei; Li, Mo; Wang, Sheng; Gao, Xin (Bioinformatics, Cold Spring Harbor Laboratory, 2018-04-06) [Article]
    Oxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals.Here we propose a deep learning based simulator, Deep- Simulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83% to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection.The software can be accessed freely at: https://github.com/lykaust15/DeepSimulator.
    Thumbnail

    Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing.

    Han, Renmin; Wang, Sheng; Gao, Xin (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-10-08) [Article]
    MOTIVATION:Genome diagnostics have gradually become a prevailing routine for human healthcare. With the advances in understanding the causal genes for many human diseases, targeted sequencing provides a rapid, cost-efficient and focused option for clinical applications, such as SNP detection and haplotype classification, in a specific genomic region. Although nanopore sequencing offers a perfect tool for targeted sequencing because of its mobility, PCR-freeness, and long read properties, it poses a challenging computational problem of how to efficiently and accurately search and map genomic subsequences of interest in a pool of nanopore reads (or raw signals). Due to its relatively low sequencing accuracy, there is no reliable solution to this problem, especially at low sequencing coverage. RESULTS:Here, we propose a brand new signal-based subsequence inquiry pipeline as well as two novel algorithms to tackle this problem. The proposed algorithms follow the principle of subsequence dynamic time warping and directly operate on the electrical current signals, without loss of information in base-calling. Therefore, the proposed algorithms can serve as a tool for sequence inquiry in targeted sequencing. Two novel criteria are offered for the consequent signal quality analysis and data classification. Comprehensive experiments on real-world nanopore datasets show the efficiency and effectiveness of the proposed algorithms. We further demonstrate the potential applications of the proposed algorithms in two typical tasks in nanopore-based targeted sequencing: SNP detection under low sequencing coverage, and haplotype classification under low sequencing accuracy. AVAILABILITY:The project is accessible at https://github.com/icthrm/cwSDTWnano.git, and the presented bench data is available upon request.
    Thumbnail

    OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction

    Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Bioinformatics, Oxford University Press (OUP), 2018-11-08) [Article]
    Motivation:Ontologies are widely used in biology for data annotation, integration, and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions, or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such. Results:We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology metadata. We apply a Word2Vec model that has been pre-trained on either a corpus or abstracts or full-text articles to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins in a similarity measure to predict protein-protein interaction on two different datasets. Second, we evaluate our method on predicting gene-disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene-disease associations using mouse model phenotypes. We demonstrate that OPA2Vec significantly outperforms existing methods for predicting gene-disease associations. Using evidence from mouse models, we apply OPA2Vec to identify candidate genes for several thousand rare and orphan diseases. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology. Availability:https://github.com/bio-ontology-research-group/opa2vec.
    DSpace software copyright © 2002-2019  DuraSpace
    Quick Guide | Contact Us | Send Feedback
    Open Repository is a service hosted by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.