Recent Submissions

  • A Method for 3D Reconstruction and Virtual Reality Analysis of Glial and Neuronal Cells.

    Cali, Corrado; Kare, Kalpana; Agus, Marco; Veloz Castillo, Maria Fernanda; Boges, Daniya; Hadwiger, Markus; Magistretti, Pierre J. (Journal of visualized experiments : JoVE, MyJove Corporation, 2019-10-15) [Article]
    Serial sectioning and subsequent high-resolution imaging of biological tissue using electron microscopy (EM) allow for the segmentation and reconstruction of high-resolution imaged stacks to reveal ultrastructural patterns that could not be resolved using 2D images. Indeed, the latter might lead to a misinterpretation of morphologies, like in the case of mitochondria; the use of 3D models is, therefore, more and more common and applied to the formulation of morphology-based functional hypotheses. To date, the use of 3D models generated from light or electron image stacks makes qualitative, visual assessments, as well as quantification, more convenient to be performed directly in 3D. As these models are often extremely complex, a virtual reality environment is also important to be set up to overcome occlusion and to take full advantage of the 3D structure. Here, a step-by-step guide from image segmentation to reconstruction and analysis is described in detail.
  • An explicit marching-on-in-time scheme for solving the time domain Kirchhoff integral equation.

    Chen, Rui; Sayed, Sadeed B; Al-Harthi, Noha A.; Keyes, David E.; Bagci, Hakan (The Journal of the Acoustical Society of America, Acoustical Society of America (ASA), 2019-10-09) [Article]
    A fully explicit marching-on-in-time (MOT) scheme for solving the time domain Kirchhoff (surface) integral equation to analyze transient acoustic scattering from rigid objects is presented. A higher-order Nyström method and a PE(CE)m-type ordinary differential equation integrator are used for spatial discretization and time marching, respectively. The resulting MOT scheme uses the same time step size as its implicit counterpart (which also uses Nyström method in space) without sacrificing from the accuracy and stability of the solution. Numerical results demonstrate the accuracy, efficiency, and applicability of the proposed explicit MOT solver.
  • Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing.

    Han, Renmin; wang, sheng; Gao, Xin (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-10-09) [Article]
    MOTIVATION:Genome diagnostics have gradually become a prevailing routine for human healthcare. With the advances in understanding the causal genes for many human diseases, targeted sequencing provides a rapid, cost-efficient and focused option for clinical applications, such as SNP detection and haplotype classification, in a specific genomic region. Although nanopore sequencing offers a perfect tool for targeted sequencing because of its mobility, PCR-freeness, and long read properties, it poses a challenging computational problem of how to efficiently and accurately search and map genomic subsequences of interest in a pool of nanopore reads (or raw signals). Due to its relatively low sequencing accuracy, there is no reliable solution to this problem, especially at low sequencing coverage. RESULTS:Here, we propose a brand new signal-based subsequence inquiry pipeline as well as two novel algorithms to tackle this problem. The proposed algorithms follow the principle of subsequence dynamic time warping and directly operate on the electrical current signals, without loss of information in base-calling. Therefore, the proposed algorithms can serve as a tool for sequence inquiry in targeted sequencing. Two novel criteria are offered for the consequent signal quality analysis and data classification. Comprehensive experiments on real-world nanopore datasets show the efficiency and effectiveness of the proposed algorithms. We further demonstrate the potential applications of the proposed algorithms in two typical tasks in nanopore-based targeted sequencing: SNP detection under low sequencing coverage, and haplotype classification under low sequencing accuracy. AVAILABILITY:The project is accessible at https://github.com/icthrm/cwSDTWnano.git, and the presented bench data is available upon request.
  • An explicit marching-on-in-time scheme for solving the time domain Kirchhoff integral equation.

    Chen, Rui; Sayed, Sadeed B; Al-Harthi, Noha A.; Keyes, David E.; Bagci, Hakan (The Journal of the Acoustical Society of America, Acoustical Society of America (ASA), 2019-10-09) [Article]
    A fully explicit marching-on-in-time (MOT) scheme for solving the time domain Kirchhoff (surface) integral equation to analyze transient acoustic scattering from rigid objects is presented. A higher-order Nyström method and a PE(CE)m-type ordinary differential equation integrator are used for spatial discretization and time marching, respectively. The resulting MOT scheme uses the same time step size as its implicit counterpart (which also uses Nyström method in space) without sacrificing from the accuracy and stability of the solution. Numerical results demonstrate the accuracy, efficiency, and applicability of the proposed explicit MOT solver.
  • Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs.

    Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Astro, Veronica; Hong, Seungbeom; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian G; Huser, Raphaël; Ali, Amal J.; Merzaban, Jasmeen; Adamo, Antonio; Jaremko, Mariusz; Jaremko, Lukasz; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T. (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-10-05) [Article]
    MOTIVATION:Leucine-aspartic acid (LD) motifs are short linear interaction motifs (SLiMs) that link paxillin family proteins to factors controlling cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. RESULTS:To enable a proteome-wide assessment of LD motifs, we developed an active-learning based framework (LDmotif finder; LDMF) that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome revealed a dozen new proteins containing LD motifs. We found that LD motif signalling evolved in unicellular eukaryotes more than 800 Myr ago, with paxillin and vinculin as core constituents, and nuclear export signal (NES) as a likely source of de novo LD motifs. We show that LD motif proteins form a functionally homogenous group, all being involved in cell morphogenesis and adhesion. This functional focus is recapitulated in cells by GFP-fused LD motifs, suggesting that it is intrinsic to the LD motif sequence, possibly through their effect on binding partners. Our approach elucidated the origin and dynamic adaptations of an ancestral SLiM, and can serve as a guide for the identification of other SLiMs for which only few representatives are known. AVAILABILITY:LDMF is freely available online at www.cbrc.kaust.edu.sa/ldmf; Source code is available at https://github.com/tanviralambd/LD/. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
  • CDPath: Cooperative driver pathways discovery using integer linear programming and Markov clustering

    Yang, Ziying; Yu, Guoxian; Guo, Maozu; Yu, Jiantao; Zhang, Xiangliang; Wang, Jun (IEEE/ACM Transactions on Computational Biology and Bioinformatics, Institute of Electrical and Electronics Engineers (IEEE), 2019-10-01) [Article]
    Discovering driver pathways is an essential task to understand the pathogenesis of cancer and to design precise treatments for cancer patients. Increasing evidences have been indicating that multiple pathways often function cooperatively in carcinogenesis. In this study, we propose an approach called CDPath to discover cooperative driver pathways. CDPath firstly uses Integer Linear Programming to explore driver core modules from mutation profiles by enforcing co-occurrence and functional interaction relations between modules, and by maximizing the mutual exclusivity and coverage within modules. Next, to enforce cooperation of pathways and help the follow-up exact cooperative driver pathways discovery, it performs Markov clustering on pathway-pathway interaction network to cluster pathways. After that, it identifies pathways in different modules but in the same clusters as cooperative driver pathways. We apply CDPath on two TCGA datasets: breast cancer (BRCA) and endometrial cancer (UCEC). The results show that CDPath can identify known (i.e., TP53) and potential driver genes (i.e., SPTBN2). In addition, the identified cooperative driver pathways are related with the target cancer, and they are involved with carcinogenesis and several key biological processes. CDPath can uncover more potential biological associations between pathways (over 100%) and more cooperative driver pathways (over 200%) than competitive approaches.
  • CDPath: Cooperative driver pathways discovery using integer linear programming and Markov clustering

    Yang, Ziying; Yu, Guoxian; Guo, Maozu; Yu, Jiantao; Zhang, Xiangliang; Wang, Jun (IEEE/ACM Transactions on Computational Biology and Bioinformatics, Institute of Electrical and Electronics Engineers (IEEE), 2019-10-01) [Article]
    Discovering driver pathways is an essential task to understand the pathogenesis of cancer and to design precise treatments for cancer patients. Increasing evidences have been indicating that multiple pathways often function cooperatively in carcinogenesis. In this study, we propose an approach called CDPath to discover cooperative driver pathways. CDPath firstly uses Integer Linear Programming to explore driver core modules from mutation profiles by enforcing co-occurrence and functional interaction relations between modules, and by maximizing the mutual exclusivity and coverage within modules. Next, to enforce cooperation of pathways and help the follow-up exact cooperative driver pathways discovery, it performs Markov clustering on pathway-pathway interaction network to cluster pathways. After that, it identifies pathways in different modules but in the same clusters as cooperative driver pathways. We apply CDPath on two TCGA datasets: breast cancer (BRCA) and endometrial cancer (UCEC). The results show that CDPath can identify known (i.e., TP53) and potential driver genes (i.e., SPTBN2). In addition, the identified cooperative driver pathways are related with the target cancer, and they are involved with carcinogenesis and several key biological processes. CDPath can uncover more potential biological associations between pathways (over 100%) and more cooperative driver pathways (over 200%) than competitive approaches.
  • Computer-aided drug repurposing for cancer therapy: Approaches and opportunities to challenge anticancer targets.

    Mottini, Carla; Napolitano, Francesco; Li, Zhongxiao; Gao, Xin; Cardone, Luca (Seminars in cancer biology, Elsevier BV, 2019-09-29) [Article]
    Despite huge efforts made in academic and pharmaceutical worldwide research, current anticancer therapies achieve effective treatment in a limited number of neoplasia cases only. Oncology terms such as big killers - to identify tumours with yet a high mortality rate - or undruggable cancer targets, and chemoresistance, represent the current therapeutic debacle of cancer treatments. In addition, metastases, tumour microenvironments, tumour heterogeneity, metabolic adaptations, and immunotherapy resistance are essential features controlling tumour response to therapies, but still, lack effective therapeutics or modulators. In this scenario, where the pharmaceutical productivity and drug efficacy in oncology seem to have reached a plateau, the so-called drug repurposing - i.e. the use of old drugs, already in clinical use, for a different therapeutic indication - is an appealing strategy to improve cancer therapy. Opportunities for drug repurposing are often based on occasional observations or on time-consuming pre-clinical drug screenings that are often not hypothesis-driven. In contrast, in-silico drug repurposing is an emerging, hypothesis-driven approach that takes advantage of the use of big-data. Indeed, the extensive use of -omics technologies, improved data storage, data meaning, machine learning algorithms, and computational modeling all offer unprecedented knowledge of the biological mechanisms of cancers and drugs' modes of action, providing extensive availability for both disease-related data and drugs-related data. This offers the opportunity to generate, with time and cost-effective approaches, computational drug networks to predict, in-silico, the efficacy of approved drugs against relevant cancer targets, as well as to select better responder patients or disease' biomarkers. Here, we will review selected disease-related data together with computational tools to be exploited for the in-silico repurposing of drugs against validated targets in cancer therapies, focusing on the oncogenic signaling pathways activation in cancer. We will discuss how in-silico drug repurposing has the promise to shortly improve our arsenal of anticancer drugs and, likely, overcome certain limitations of modern cancer therapies against old and new therapeutic targets in oncology.
  • 3D cellular reconstruction of cortical glia and parenchymal morphometric analysis from Serial Block-Face Electron Microscopy of juvenile rat.

    Cali, Corrado; Agus, Marco; Kare, Kalpana; Boges, Daniya J; Lehväslaiho, Heikki; Hadwiger, Markus; Magistretti, Pierre J. (Progress in neurobiology, Elsevier BV, 2019-09-25) [Article]
    With the rapid evolution in the automation of serial electron microscopy in life sciences, the acquisition of terabyte-sized datasets is becoming increasingly common. High resolution serial block-face imaging (SBEM) of biological tissues offers the opportunity to segment and reconstruct nanoscale structures to reveal spatial features previously inaccessible with simple, single section, two-dimensional images, with a particular focus on glial cells, whose reconstruction efforts in literature are still limited, compared to neurons. Here, we imaged a 750000 cubic micron volume of the somatosensory cortex from a juvenile P14 rat, with 20 nm accuracy. We recognized a total of 186 cells using their nuclei, and classified them as neuronal or glial based on features of the soma and the processes. We reconstructed for the first time 4 almost complete astrocytes and neurons, 4 complete microglia and 4 complete pericytes, including their intracellular mitochondria, 186 nuclei and 213 myelinated axons. We then performed quantitative analysis on the three-dimensional models. Out of the data that we generated, we observed that neurons have larger nuclei, which correlated with their lesser density, and that astrocytes and pericytes have a higher surface to volume ratio, compared to other cell types. All reconstructed morphologies represent an important resource for computational neuroscientists, as morphological quantitative information can be inferred, to tune simulations that take into account the spatial compartmentalization of the different cell types.
  • Cross-Species Protein Function Prediction with Asynchronous-Random Walk

    Zhao, Yingwen; Wang, Jun; Guo, Maozu; Zhang, Xiangliang; Yu, Guoxian (IEEE/ACM Transactions on Computational Biology and Bioinformatics, Institute of Electrical and Electronics Engineers (IEEE), 2019-09-24) [Article]
    Protein function prediction is a fundamental task in the post-genomic era. Available functional annotations of proteins are incomplete and the annotations of two homologous species are complementary to each other. However, how to effectively leverage mutually complementary annotations of different species to further boost the prediction performance is still not well studied. In this paper, we propose a cross-species protein function prediction approach by performing Asynchronous Random Walk on a heterogeneous network (AsyRW). AsyRW firstly constructs a heterogeneous network to integrate multiple functional association networks derived from different biological data, established homology-relationships between proteins from different species, known annotations of proteins and Gene Ontology (GO). To account for the intrinsic structures of intra- and inter-species of proteins and that of GO, AsyRW quantifies the individual walk lengths of each network node using the gravity-like theory and performs asynchronous-random walk with the individual length to predict associations between proteins and GO terms. Experiments on annotations archived in different years show that individual walk length and asynchronous-random walk can effectively leverage the complementary annotations of different species, AsyRW has a significantly improved performance to other related and competitive methods. The codes of AsyRW are available at: http://mlda.swu.edu.cn/codes.php?name=AsyRW.
  • Quantitative Phase and Intensity Microscopy Using Snapshot White Light Wavefront Sensing

    Wang, Congli; Fu, Qiang; Dun, Xiong; Heidrich, Wolfgang (Scientific Reports, Springer Science and Business Media LLC, 2019-09-24) [Article]
    Phase imaging techniques are an invaluable tool in microscopy for quickly examining thin transparent specimens. Existing methods are limited to either simple and inexpensive methods that produce only qualitative phase information (e.g. phase contrast microscopy, DIC), or significantly more elaborate and expensive quantitative methods. Here we demonstrate a low-cost, easy to implement microscopy setup for quantitative imaging of phase and bright field amplitude using collimated white light illumination.
  • Ontology based mining of pathogen–disease associations from literature

    Kafkas, Senay; Hoehndorf, Robert (Journal of Biomedical Semantics, Springer Science and Business Media LLC, 2019-09-18) [Article]
    Background Infectious diseases claim millions of lives especially in the developing countries each year. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen–disease associations that can be utilized in computational studies. A large number of pathogen–disease associations is available from the literature in unstructured form and we need automated methods to extract the data. Results We developed a text mining system designed for extracting pathogen–disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3420 pathogen–disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research. Conclusions To the best of our knowledge, we present the first study focusing on extracting pathogen–disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from https://github.com/bio-ontology-research-group/padimi and through a public SPARQL endpoint from http://patho.phenomebrowser.net/.
  • A Lagrangian Method for Extracting Eddy Boundaries in the Red Sea and the Gulf of Aden

    Friederici, Anke; Mahamadou Kele, Habib Toye; Hoteit, Ibrahim; Weinkauf, Tino; Theisel, Holger; Hadwiger, Markus (IEEE, 2019-09-05) [Conference Paper]
    Mesoscale ocean eddies play a major role for both the intermixing of water and the transport of biological mass. This makes the identification and tracking of their shape, location and deformation over time highly important for a number of applications. While eddies maintain a roughly circular shape in the free ocean, the narrow basins of the Red Sea and Gulf of Aden lead to the formation of irregular eddy shapes that existing methods struggle to identify. We propose the following model: Inside an eddy, particles rotate around a common core and thereby remain at a constant distance under a certain parametrization. The transition to the more unpredictable flow on the outside can thus be identified as the eddy boundary. We apply this algorithm on a realistic simulation of the Red Sea circulation, where we are able to identify the shape of irregular eddies robustly and more coherently than previous methods. We visualize the eddies as tubes in space-time to enable the analysis of their movement and deformation over several weeks.
  • Integration of dynamic contrast-enhanced magnetic resonance imaging and T2-weighted imaging radiomic features by a canonical correlation analysis-based feature fusion method to predict histological grade in ductal breast carcinoma.

    Fan, Ming; Liu, Zuhui; Xie, Sudan; Xu, Maosheng; Wang, Shiwei; Gao, Xin; Li, Lihua (Physics in medicine and biology, IOP Publishing, 2019-08-31) [Article]
    Tumour histological grade has prognostic implications in breast cancer. Tumour features in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and T2-weighted (T2W) imaging can provide related and complementary information in the analysis of breast lesions to improve MRI-based histological status prediction in breast cancer. A dataset of 167 patients with invasive ductal carcinoma (IDC) was assembled, consisting of 72 low/intermediate-grade and 95 high-grade cases with preoperative DCE-MRI and T2W images. The data cohort was separated into development (n=111) and validation (n=56) cohorts. Each tumour was segmented in the precontrast and the intermediate and last postcontrast DCE-MR images and was mapped to the tumour in the T2W images. Radiomic features, including texture, morphology, and histogram distribution features in the tumour image, were extracted for those image series. Features from the DCE-MR and T2W images were fused by a canonical correlation analysis (CCA)-based method. The support vector machine (SVM) classifiers were trained and tested on the development and validation cohorts, respectively. SVM-based recursive feature elimination (SVM-RFE) was adopted to identify the optimal features for prediction. The areas under the ROC curves (AUCs) for the T2W images and the DCE-MRI series of precontrast, intermediate and last postcontrast images were 0.750±0.047, 0.749±0.047, and 0.788±0.045, respectively, for the development cohort and 0.715±0.068, 0.704±0.073, and 0.744±0.067, respectively, for the validation cohort. After the CCA-based fusion of features from the DCE-MRI series and T2W images, the AUCs increased to 0.751±0.065, 0.803±0.0600 and 794±0.060 in the validation cohort. Moreover, the method of fusing features between DCE-MRI and T2W images using CCA achieved better performance than the concatenation-based feature fusion or classifier fusion methods. Our results demonstrated that anatomical and functional MR images yield complementary information, and feature fusion of radiomic features by matrix transformation to optimize their correlations produced a classifier with improved performance for predicting the histological grade of IDC.
  • Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization.

    Lü, Yongchun; Zeng, Xiangrui; Zhao, Xiaofang; Li, Shirui; Li, Hua; Gao, Xin; Xu, Min (BMC bioinformatics, Springer Science and Business Media LLC, 2019-08-29) [Article]
    Background Cryo-electron tomography (Cryo-ET) is an imaging technique used to generate three-dimensional structures of cellular macromolecule complexes in their native environment. Due to developing cryo-electron microscopy technology, the image quality of three-dimensional reconstruction of cryo-electron tomography has greatly improved. However, cryo-ET images are characterized by low resolution, partial data loss and low signal-to-noise ratio (SNR). In order to tackle these challenges and improve resolution, a large number of subtomograms containing the same structure needs to be aligned and averaged. Existing methods for refining and aligning subtomograms are still highly time-consuming, requiring many computationally intensive processing steps (i.e. the rotations and translations of subtomograms in three-dimensional space). Results In this article, we propose a Stochastic Average Gradient (SAG) fine-grained alignment method for optimizing the sum of dissimilarity measure in real space. We introduce a Message Passing Interface (MPI) parallel programming model in order to explore further speedup. Conclusions We compare our stochastic average gradient fine-grained alignment algorithm with two baseline methods, high-precision alignment and fast alignment. Our SAG fine-grained alignment algorithm is much faster than the two baseline methods. Results on simulated data of GroEL from the Protein Data Bank (PDB ID:1KP8) showed that our parallel SAG-based fine-grained alignment method could achieve close-to-optimal rigid transformations with higher precision than both high-precision alignment and fast alignment at a low SNR (SNR=0.003) with tilt angle range ±60∘ or ±40∘. For the experimental subtomograms data structures of GroEL and GroEL/GroES complexes, our parallel SAG-based fine-grained alignment can achieve higher precision and fewer iterations to converge than the two baseline methods.
  • ScaleTrotter: Illustrative Visual Travels Across Negative Scales

    Halladjian, Sarkis; Miao, Haichao; Kouril, David; Groller, M. Eduard; Viola, Ivan; Isenberg, Tobias (IEEE Transactions on Visualization and Computer Graphics, Institute of Electrical and Electronics Engineers (IEEE), 2019-08-22) [Article]
    We present ScaleTrotter, a conceptual framework for an interactive, multi-scale visualization of biological mesoscale data and, specifically, genome data. ScaleTrotter allows viewers to smoothly transition from the nucleus of a cell to the atomistic composition of the DNA, while bridging several orders of magnitude in scale. The challenges in creating an interactive visualization of genome data are fundamentally different in several ways from those in other domains like astronomy that require a multi-scale representation as well. First, genome data has intertwined scale levels-the DNA is an extremely long, connected molecule that manifests itself at all scale levels. Second, elements of the DNA do not disappear as one zooms out-instead the scale levels at which they are observed group these elements differently. Third, we have detailed information and thus geometry for the entire dataset and for all scale levels, posing a challenge for interactive visual exploration. Finally, the conceptual scale levels for genome data are close in scale space, requiring us to find ways to visually embed a smaller scale into a coarser one. We address these challenges by creating a new multi-scale visualization concept. We use a scale-dependent camera model that controls the visual embedding of the scales into their respective parents, the rendering of a subset of the scale hierarchy, and the location, size, and scope of the view. In traversing the scales, ScaleTrotter is roaming between 2D and 3D visual representations that are depicted in integrated visuals. We discuss, specifically, how this form of multi-scale visualization follows from the specific characteristics of the genome data and describe its implementation. Finally, we discuss the implications of our work to the general illustrative depiction of multi-scale data.
  • Multi-Scale Procedural Animations of Microtubule Dynamics Based on Measured Data

    Klein, Tobias; Viola, Ivan; Groller, Eduard; Mindek, Peter (IEEE Transactions on Visualization and Computer Graphics, Institute of Electrical and Electronics Engineers (IEEE), 2019-08-22) [Article]
    Biologists often use computer graphics to visualize structures, which due to physical limitations are not possible to image with a microscope. One example for such structures are microtubules, which are present in every eukaryotic cell. They are part of the cytoskeleton maintaining the shape of the cell and playing a key role in the cell division. In this paper, we propose a scientificallyaccurate multi-scale procedural model of microtubule dynamics as a novel application scenario for procedural animation, which can generate visualizations of their overall shape, molecular structure, as well as animations of the dynamic behaviour of their growth and disassembly. The model is spanning from tens of micrometers down to atomic resolution. All the aspects of the model are driven by scientific data. The advantage over a traditional, manual animation approach is that when the underlying data change, for instance due to new evidence, the model can be recreated immediately. The procedural animation concept is presented in its generic form, with several novel extensions, facilitating an easy translation to other domains with emergent multi-scale behavior.
  • Cracking open the black box: What observations can tell us about reinforcement learning agents

    Dethise, Arnaud; Canini, Marco; Kandula, Srikanth (ACM Press, 2019-08-14) [Conference Paper]
    Machine learning (ML) solutions to challenging networking problems, while promising, are hard to interpret; the uncertainty about how they would behave in untested scenarios has hindered adoption. Using a case study of an ML-based video rate adaptation model, we show that carefully applying interpretability tools and systematically exploring the model inputs can identify unwanted or anomalous behaviors of the model; hinting at a potential path towards increasing trust in ML-based solutions.
  • Structured Regularization of Functional Map Computations

    Ren, Jing; Panine, Mikhail; Wonka, Peter; Ovsjanikov, Maks (Computer Graphics Forum, Wiley, 2019-08-12) [Article]
    We consider the problem of non-rigid shape matching using the functional map framework. Specifically, we analyze a commonly used approach for regularizing functional maps, which consists in penalizing the failure of the unknown map to commute with the Laplace-Beltrami operators on the source and target shapes. We show that this approach has certain undesirable fundamental theoretical limitations, and can be undefined even for trivial maps in the smooth setting. Instead we propose a novel, theoretically well-justified approach for regularizing functional maps, by using the notion of the resolvent of the Laplacian operator. In addition, we provide a natural one-parameter family of regularizers, that can be easily tuned depending on the expected approximate isometry of the input shape pair. We show on a wide range of shape correspondence scenarios that our novel regularization leads to an improvement in the quality of the estimated functional, and ultimately pointwise correspondences before and after commonly-used refinement techniques.
  • DeepGOPlus: Improved protein function prediction from sequence.

    Kulmanov, Maxat; Hoehndorf, Robert (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-07-28) [Article]
    MOTIVATION:Protein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein-protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time. RESULTS:We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins (if available). We evaluate the performance of DeepGOPlus using the CAFA3 evaluation measures and achieve an Fmax of 0:390, 0:557 and 0:614 for BPO, MFO and CCO evaluations, respectively. These results would have made DeepGOPlus one of the three best predictors in CCO and the second best performing method in the BPO and MFO evaluations. We also compare DeepGOPlus with state-of-the-art methods such as DeepText2GO and GOLabeler on another dataset. DeepGOPlus can annotate around 40 protein sequences per second on common hardware, thereby making fast and accurate function predictions available for a wide range of proteins. AVAILABILITY:http://deepgoplus.bio2vec.net/. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

View more