Recent Submissions

  • Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs.

    Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Astro, Veronica; Hong, Seungbeom; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian G; Huser, Raphaël; Ali, Amal J.; Merzaban, Jasmeen; Adamo, Antonio; Jaremko, Mariusz; Jaremko, Lukasz; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T. (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-10-05) [Article]
    MOTIVATION:Leucine-aspartic acid (LD) motifs are short linear interaction motifs (SLiMs) that link paxillin family proteins to factors controlling cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. RESULTS:To enable a proteome-wide assessment of LD motifs, we developed an active-learning based framework (LDmotif finder; LDMF) that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome revealed a dozen new proteins containing LD motifs. We found that LD motif signalling evolved in unicellular eukaryotes more than 800 Myr ago, with paxillin and vinculin as core constituents, and nuclear export signal (NES) as a likely source of de novo LD motifs. We show that LD motif proteins form a functionally homogenous group, all being involved in cell morphogenesis and adhesion. This functional focus is recapitulated in cells by GFP-fused LD motifs, suggesting that it is intrinsic to the LD motif sequence, possibly through their effect on binding partners. Our approach elucidated the origin and dynamic adaptations of an ancestral SLiM, and can serve as a guide for the identification of other SLiMs for which only few representatives are known. AVAILABILITY:LDMF is freely available online at; Source code is available at SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
  • Modeling and Experimental Study of the Vibration Effects in Urban Free-Space Optical Communication Systems

    Cai, Wenqi; Ndoye, Ibrahima; Ooi, Boon S.; Alouini, Mohamed-Slim; Laleg-Kirati, Taous-Meriem (IEEE Photonics Journal, IEEE, 2019-10-04) [Article]
    Free-space optical (FSO) communication, considered as a last-mile technology, is widely used in many urban scenarios. However, the performance of urban free-space optical (UFSO) communication systems fades in the presence of system vibration caused by many factors in the chaotic urban environment. In this paper, we develop a dedicated indoor vibration platform and atmospheric turbulence to estimate the Bifurcated-Gaussian (B-G) distribution model of the receiver optical power under different vibration levels and link distances using nonlinear iteration method. Mean square error (MSE) and coefficient of determination ($R^2$) metrics have been used to show a good agreement between the PDFs of the experimental data with the resulting B-G distribution model. Besides, the UFSO channel under the effects of both vibration and atmospheric turbulence is also explored under three atmospheric turbulence conditions. Our proposed B-G distribution model describes the vibrating UFSO channels properly and can easily help to perform and evaluate the link performance of UFSO systems, e.g., bit-error-rate (BER), outage probability. Furthermore, this work paves the way for constructing completed auxiliary control subsystems for robust UFSO links and contributes to more extensive optical communication scenarios, such as underwater optical communication, etc.
  • Disruption of the coordination between host circadian rhythms and malaria parasite development alters the duration of the intraerythrocytic cycle

    Subudhi, Amit; O'Donnell, Aidan John; Ramaprasad, Abhinay; Abkallo, Hussein M.; Kaushik, Abhinav; Ansari, Hifzur Rahman; Abdel-Haleem, Alyaa M.; Rached, Fathia Ben; Kaneko, Osamu; Culleton, Richard; Reece, Sarah E.; Pain, Arnab (Cold Spring Harbor Laboratory, 2019-10-03) [Preprint]
    Malaria parasites complete their intra-erythrocytic developmental cycle (IDC) in multiples of 24 hours (depending on the species), suggesting a circadian basis to the asexual cell cycle, but the mechanism controlling this periodicity is unknown. Combining in vivo and in vitro approaches using rodent and human malaria parasites, we reveal that: (i) 57% of Plasmodium chabaudi genes exhibit 24 h circadian periodicity in transcription; (ii) 58% of these genes lose transcriptional rhythmicity when the IDC is out-of-synchrony with host rhythms; (iii) 9% of Plasmodium falciparum genes show circadian transcription under free-running conditions; (iv) Serpentine receptor 10 (SR10) has a circadian transcription profile and disrupting it in rodent malaria parasites shortens the IDC by 2-3 hours; (v) Multiple processes including DNA replication and the ubiquitin and proteasome pathways are affected by loss of coordination with host rhythms and by disruption of SR10. Our results show that malaria parasites are at least partly responsible for scheduling their IDCs explaining the fitness benefits of coordination with host rhythms.
  • Computer-aided drug repurposing for cancer therapy: Approaches and opportunities to challenge anticancer targets.

    Mottini, Carla; Napolitano, Francesco; Li, Zhongxiao; Gao, Xin; Cardone, Luca (Seminars in cancer biology, Elsevier BV, 2019-09-29) [Article]
    Despite huge efforts made in academic and pharmaceutical worldwide research, current anticancer therapies achieve effective treatment in a limited number of neoplasia cases only. Oncology terms such as big killers - to identify tumours with yet a high mortality rate - or undruggable cancer targets, and chemoresistance, represent the current therapeutic debacle of cancer treatments. In addition, metastases, tumour microenvironments, tumour heterogeneity, metabolic adaptations, and immunotherapy resistance are essential features controlling tumour response to therapies, but still, lack effective therapeutics or modulators. In this scenario, where the pharmaceutical productivity and drug efficacy in oncology seem to have reached a plateau, the so-called drug repurposing - i.e. the use of old drugs, already in clinical use, for a different therapeutic indication - is an appealing strategy to improve cancer therapy. Opportunities for drug repurposing are often based on occasional observations or on time-consuming pre-clinical drug screenings that are often not hypothesis-driven. In contrast, in-silico drug repurposing is an emerging, hypothesis-driven approach that takes advantage of the use of big-data. Indeed, the extensive use of -omics technologies, improved data storage, data meaning, machine learning algorithms, and computational modeling all offer unprecedented knowledge of the biological mechanisms of cancers and drugs' modes of action, providing extensive availability for both disease-related data and drugs-related data. This offers the opportunity to generate, with time and cost-effective approaches, computational drug networks to predict, in-silico, the efficacy of approved drugs against relevant cancer targets, as well as to select better responder patients or disease' biomarkers. Here, we will review selected disease-related data together with computational tools to be exploited for the in-silico repurposing of drugs against validated targets in cancer therapies, focusing on the oncogenic signaling pathways activation in cancer. We will discuss how in-silico drug repurposing has the promise to shortly improve our arsenal of anticancer drugs and, likely, overcome certain limitations of modern cancer therapies against old and new therapeutic targets in oncology.
  • Functional metagenomic analysis of dust-associated microbiomes above the Red Sea.

    Aalismail, Nojood; Ngugi, David K; Diaz Rua, Ruben; Alam, Intikhab; Cusack, Michael; Duarte, Carlos M. (Scientific reports, Springer Science and Business Media LLC, 2019-09-26) [Article]
    Atmospheric transport is a major vector for the long-range transport of microbial communities, maintaining connectivity among them and delivering functionally important microbes, such as pathogens. Though the taxonomic diversity of aeolian microorganisms is well characterized, the genomic functional traits underpinning their survival during atmospheric transport are poorly characterized. Here we use functional metagenomics of dust samples collected on the Global Dust Belt to initiate a Gene Catalogue of Aeolian Microbiome (GCAM) and explore microbial genetic traits enabling a successful aeolian lifestyle in Aeolian microbial communities. The GCAM reported here, derived from ten aeolian microbial metagenomes, includes a total of 2,370,956 non-redundant coding DNA sequences, corresponding to a yield of ~31 × 106 predicted genes per Tera base-pair of DNA sequenced for the aeolian samples sequenced. Two-thirds of the cataloged genes were assigned to bacteria, followed by eukaryotes (5.4%), archaea (1.1%), and viruses (0.69%). Genes encoding proteins involved in repairing UV-induced DNA damage and aerosolization of cells were ubiquitous across samples, and appear as fundamental requirements for the aeolian lifestyle, while genes coding for other important functions supporting the aeolian lifestyle (chemotaxis, aerotaxis, germination, thermal resistance, sporulation, and biofilm formation) varied among the communities sampled.
  • 3D cellular reconstruction of cortical glia and parenchymal morphometric analysis from Serial Block-Face Electron Microscopy of juvenile rat.

    Cali, Corrado; Agus, Marco; Kare, Kalpana; Boges, Daniya J; Lehväslaiho, Heikki; Hadwiger, Markus; Magistretti, Pierre J. (Progress in neurobiology, Elsevier BV, 2019-09-25) [Article]
    With the rapid evolution in the automation of serial electron microscopy in life sciences, the acquisition of terabyte-sized datasets is becoming increasingly common. High resolution serial block-face imaging (SBEM) of biological tissues offers the opportunity to segment and reconstruct nanoscale structures to reveal spatial features previously inaccessible with simple, single section, two-dimensional images, with a particular focus on glial cells, whose reconstruction efforts in literature are still limited, compared to neurons. Here, we imaged a 750000 cubic micron volume of the somatosensory cortex from a juvenile P14 rat, with 20 nm accuracy. We recognized a total of 186 cells using their nuclei, and classified them as neuronal or glial based on features of the soma and the processes. We reconstructed for the first time 4 almost complete astrocytes and neurons, 4 complete microglia and 4 complete pericytes, including their intracellular mitochondria, 186 nuclei and 213 myelinated axons. We then performed quantitative analysis on the three-dimensional models. Out of the data that we generated, we observed that neurons have larger nuclei, which correlated with their lesser density, and that astrocytes and pericytes have a higher surface to volume ratio, compared to other cell types. All reconstructed morphologies represent an important resource for computational neuroscientists, as morphological quantitative information can be inferred, to tune simulations that take into account the spatial compartmentalization of the different cell types.
  • Ontology based mining of pathogen–disease associations from literature

    Kafkas, Senay; Hoehndorf, Robert (Journal of Biomedical Semantics, Springer Science and Business Media LLC, 2019-09-18) [Article]
    Background Infectious diseases claim millions of lives especially in the developing countries each year. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen–disease associations that can be utilized in computational studies. A large number of pathogen–disease associations is available from the literature in unstructured form and we need automated methods to extract the data. Results We developed a text mining system designed for extracting pathogen–disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3420 pathogen–disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research. Conclusions To the best of our knowledge, we present the first study focusing on extracting pathogen–disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from and through a public SPARQL endpoint from
  • Monitoring of the toxic dinoflagellate Alexandrium catenella in Osaka Bay, Japan using a massively parallel sequencing (MPS)-based technique

    Nagai, Satoshi; Chen, Hungyen; Kawakami, Yoko; Yamamoto, Keigo; Sildever, Sirje; Kanno, Nanako; Oikawa, Hiroshi; Yasuike, Motoshige; Nakamura, Yoji; Hongo, Yuki; Fujiwara, Atushi; Kobayashi, Takanori; Gojobori, Takashi (Harmful Algae, Elsevier BV, 2019-09-12) [Article]
    Since 2002, blooms of Alexandrium catenella sensu Fraga et al. (2015) and paralytic shellfish toxicity events have occurred almost yearly in Osaka Bay, Japan. To better understand the triggers for reoccurring A. catenella blooms in Osaka Bay, phytoplankton community was monitored during the spring seasons of 2012–2015. Monitoring was performed using massively parallel sequencing (MPS)-based technique on amplicon sequences of the 18S rRNA gene. Dense blooms of A. catenella occurred every year except in 2012, however, there was no significant correlation with the environmental parameters investigated. Plankton community diversity decreased before and middle of the A. catenella blooms, suggesting that the decline in diversity could be an indicator for the bloom occurrence. The yearly abundance pattern of A. catenella cells obtained by morphology-based counting coincided with the relative sequence abundances, which supports the effectiveness of MPS-based phytoplankton monitoring.
  • Mining biosynthetic gene clusters in Virgibacillus genomes.

    Othoum, Ghofran K.; Bougouffa, Salim; Bokhari, Ameerah; Lafi, Feras Fawzi; Gojobori, Takashi; Hirt, Heribert; Mijakovic, Ivan; Bajic, Vladimir B.; Essack, Magbubah (BMC genomics, Springer Science and Business Media LLC, 2019-09-05) [Article]
    BACKGROUND:Biosynthetic gene clusters produce a wide range of metabolites with activities that are of interest to the pharmaceutical industry. Specific interest is shown towards those metabolites that exhibit antimicrobial activities against multidrug-resistant bacteria that have become a global health threat. Genera of the phylum Firmicutes are frequently identified as sources of such metabolites, but the biosynthetic potential of its Virgibacillus genus is not known. Here, we used comparative genomic analysis to determine whether Virgibacillus strains isolated from the Red Sea mangrove mud in Rabigh Harbor Lagoon, Saudi Arabia, may be an attractive source of such novel antimicrobial agents. RESULTS:A comparative genomics analysis based on Virgibacillus dokdonensis Bac330, Virgibacillus sp. Bac332 and Virgibacillus halodenitrificans Bac324 (isolated from the Red Sea) and six other previously reported Virgibacillus strains was performed. Orthology analysis was used to determine the core genomes as well as the accessory genome of the nine Virgibacillus strains. The analysis shows that the Red Sea strain Virgibacillus sp. Bac332 has the highest number of unique genes and genomic islands compared to other genomes included in this study. Focusing on biosynthetic gene clusters, we show how marine isolates, including those from the Red Sea, are more enriched with nonribosomal peptides compared to the other Virgibacillus species. We also found that most nonribosomal peptide synthases identified in the Virgibacillus strains are part of genomic regions that are potentially horizontally transferred. CONCLUSIONS:The Red Sea Virgibacillus strains have a large number of biosynthetic genes in clusters that are not assigned to known products, indicating significant potential for the discovery of novel bioactive compounds. Also, having more modular synthetase units suggests that these strains are good candidates for experimental characterization of previously identified bioactive compounds as well. Future efforts will be directed towards establishing the properties of the potentially novel compounds encoded by the Red Sea specific trans-AT PKS/NRPS cluster and the type III PKS/NRPS cluster.
  • Redox control of vascular biology.

    Obradovic, Milan; Essack, Magbubah; Zafirovic, Sonja; Sudar-Milovanovic, Emina; Bajic, Vladan P; Van Neste, Christophe Marc; Trpkovic, Andreja; Stanimirovic, Julijana; Bajic, Vladimir B.; Isenovic, Esma R (BioFactors (Oxford, England), Wiley, 2019-09-05) [Article]
    Redox control is lost when the antioxidant defense system cannot remove abnormally high concentrations of signaling molecules, such as reactive oxygen species (ROS). Chronically elevated levels of ROS cause oxidative stress that may eventually lead to cancer and cardiovascular and neurodegenerative diseases. In this review, we focus on redox effects in the vascular system. We pay close attention to the subcompartments of the vascular system (endothelium, smooth muscle cell layer) and give an overview of how redox changes influence those different compartments. We also review the core aspects of redox biology, cardiovascular physiology, and pathophysiology. Moreover, the topic-specific knowledgebase DES-RedoxVasc was used to develop two case studies, one focused on endothelial cells and the other on the vascular smooth muscle cells, as a starting point to possibly extend our knowledge of redox control in vascular biology.
  • Bounded bilinear control of coupled first-order hyperbolic PDE and infinite dimensional ODE in the framework of PDEs with memory

    Mechhoud, Sarra; Laleg-Kirati, Taous-Meriem (Journal of Process Control, Elsevier Ltd, 2019-09-01) [Article]
    In this work, we consider the problem of bounded bilinear tracking control of a system of coupled first-order hyperbolic partial differential equation (PDE) with an infinite dimensional ordinary differential equation (ODE). This coupled PDE-infinite ODE system can be viewed as a degenerate system of two coupled first-order hyperbolic PDEs, the velocity of the ODE part vanishing. First, we convert this PDE-infinite ODE system into a first-order hyperbolic PDE with memory and investigate the bounded bilinear control problem in this framework. We consider as manipulated variable the constrained wave propagation velocity, which makes the control problem bounded and bilinear, and we take the measurements at the boundaries. To account for the actuator's constraints, we develop conditions under which the bounded control law ensures stability and tracking performances. This leads to a specification of the state-space region that enforces the desired system's closed-loop behaviour. To overcome the lack of full-state measurements, we design an observer-based bounded output-feedback control law which guarantees the reference tracking and uniform asymptotic stability of the system in closed-loop. A strong motivation of our work is the control problem of the solar collector parabolic trough where the manipulated control variable (the pump volumetric flow rate) is bilinear with respect to the PDE-infinite ODE model, and the measurements are taken at the boundary (tube's outlet). Simulation results illustrate the efficiency of the proposed control strategy.
  • Integration of dynamic contrast-enhanced magnetic resonance imaging and T2-weighted imaging radiomic features by a canonical correlation analysis-based feature fusion method to predict histological grade in ductal breast carcinoma.

    Fan, Ming; Liu, Zuhui; Xie, Sudan; Xu, Maosheng; Wang, Shiwei; Gao, Xin; Li, Lihua (Physics in medicine and biology, IOP Publishing, 2019-08-31) [Article]
    Tumour histological grade has prognostic implications in breast cancer. Tumour features in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and T2-weighted (T2W) imaging can provide related and complementary information in the analysis of breast lesions to improve MRI-based histological status prediction in breast cancer. A dataset of 167 patients with invasive ductal carcinoma (IDC) was assembled, consisting of 72 low/intermediate-grade and 95 high-grade cases with preoperative DCE-MRI and T2W images. The data cohort was separated into development (n=111) and validation (n=56) cohorts. Each tumour was segmented in the precontrast and the intermediate and last postcontrast DCE-MR images and was mapped to the tumour in the T2W images. Radiomic features, including texture, morphology, and histogram distribution features in the tumour image, were extracted for those image series. Features from the DCE-MR and T2W images were fused by a canonical correlation analysis (CCA)-based method. The support vector machine (SVM) classifiers were trained and tested on the development and validation cohorts, respectively. SVM-based recursive feature elimination (SVM-RFE) was adopted to identify the optimal features for prediction. The areas under the ROC curves (AUCs) for the T2W images and the DCE-MRI series of precontrast, intermediate and last postcontrast images were 0.750±0.047, 0.749±0.047, and 0.788±0.045, respectively, for the development cohort and 0.715±0.068, 0.704±0.073, and 0.744±0.067, respectively, for the validation cohort. After the CCA-based fusion of features from the DCE-MRI series and T2W images, the AUCs increased to 0.751±0.065, 0.803±0.0600 and 794±0.060 in the validation cohort. Moreover, the method of fusing features between DCE-MRI and T2W images using CCA achieved better performance than the concatenation-based feature fusion or classifier fusion methods. Our results demonstrated that anatomical and functional MR images yield complementary information, and feature fusion of radiomic features by matrix transformation to optimize their correlations produced a classifier with improved performance for predicting the histological grade of IDC.
  • Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization.

    Lü, Yongchun; Zeng, Xiangrui; Zhao, Xiaofang; Li, Shirui; Li, Hua; Gao, Xin; Xu, Min (BMC bioinformatics, Springer Science and Business Media LLC, 2019-08-29) [Article]
    Background Cryo-electron tomography (Cryo-ET) is an imaging technique used to generate three-dimensional structures of cellular macromolecule complexes in their native environment. Due to developing cryo-electron microscopy technology, the image quality of three-dimensional reconstruction of cryo-electron tomography has greatly improved. However, cryo-ET images are characterized by low resolution, partial data loss and low signal-to-noise ratio (SNR). In order to tackle these challenges and improve resolution, a large number of subtomograms containing the same structure needs to be aligned and averaged. Existing methods for refining and aligning subtomograms are still highly time-consuming, requiring many computationally intensive processing steps (i.e. the rotations and translations of subtomograms in three-dimensional space). Results In this article, we propose a Stochastic Average Gradient (SAG) fine-grained alignment method for optimizing the sum of dissimilarity measure in real space. We introduce a Message Passing Interface (MPI) parallel programming model in order to explore further speedup. Conclusions We compare our stochastic average gradient fine-grained alignment algorithm with two baseline methods, high-precision alignment and fast alignment. Our SAG fine-grained alignment algorithm is much faster than the two baseline methods. Results on simulated data of GroEL from the Protein Data Bank (PDB ID:1KP8) showed that our parallel SAG-based fine-grained alignment method could achieve close-to-optimal rigid transformations with higher precision than both high-precision alignment and fast alignment at a low SNR (SNR=0.003) with tilt angle range ±60∘ or ±40∘. For the experimental subtomograms data structures of GroEL and GroEL/GroES complexes, our parallel SAG-based fine-grained alignment can achieve higher precision and fewer iterations to converge than the two baseline methods.
  • Construction of complete Tupaia belangeri transcriptome database by whole-genome and comprehensive RNA sequencing

    Sanada, Takahiro; Tsukiyama-Kohara, Kyoko; Shin-I, Tadasu; Yamamoto, Naoki; Kayesh, Mohammad Enamul Hoque; Yamane, Daisuke; Takano, Jun ichiro; Shiogama, Yumiko; Yasutomi, Yasuhiro; Ikeo, Kazuho; Gojobori, Takashi; Mizokami, Masashi; Kohara, Michinori (Scientific Reports, Nature Publishing GroupHoundmillsBasingstoke, HampshireRG21 6XS, 2019-08-26) [Article]
    The northern tree shrew (Tupaia belangeri) possesses high potential as an animal model of human diseases and biology, given its genetic similarity to primates. Although genetic information on the tree shrew has already been published, some of the entire coding sequences (CDSs) of tree shrew genes remained incomplete, and the reliability of these CDSs remained difficult to determine. To improve the determination of tree shrew CDSs, we performed sequencing of the whole-genome, mRNA, and total RNA and integrated the resulting data. Additionally, we established criteria for the selection of reliable CDSs and annotated these sequences by comparison to the human transcriptome, resulting in the identification of complete CDSs for 12,612 tree shrew genes and yielding a more accurate tree shrew genome database (TupaiaBase: Transcriptome profiles in hepatitis B virus infected tree shrew livers were analyzed for validation. Gene ontology analysis showed enriched transcriptional regulation at 1 day post-infection, namely in the “type I interferon signaling pathway”. Moreover, a negative regulator of type I interferon, SOCS3, was induced. This work, which provides a tree shrew CDS database based on genomic DNA and RNA sequencing, is expected to serve as a powerful tool for further development of the tree shrew model.
  • Sequenceserver: a modern graphical user interface for custom BLAST databases.

    Priyam, Anurag; Woodcroft, Ben J; Rai, Vivek; Moghul, Ismail; Mungala, Alekhya; Ter, Filip; Chowdhary, Hiten; Pieniak, Iwo Lukasz; Gibbins, Mark Anthony; Moon, HongKee; Davis-Richardson, Austin; Uludag, Mahmut; Watson-Haigh, Nathan S; Challis, Richard; Nakamura, Hiroyuki; Favreau, Emeline; Cifuentes, Esteban Gόmez; Pluskal, Tomás; Leonard, Guy; Rumpf, Wolfgang; Wurm, Yannick (Molecular biology and evolution, Oxford University Press (OUP), 2019-08-15) [Article]
    Comparing newly obtained and previously known nucleotide and amino-acid sequences underpins modern biological research. BLAST is a well-established tool for such comparisons but is challenging to use on new datasets. We combined a user-centric design philosophy with sustainable software development approaches to create Sequenceserver, a tool for running BLAST and visually inspecting BLAST results for biological interpretation. Sequenceserver uses simple algorithms to prevent potential analysis errors, and provides flexible text-based and visual outputs to support researcher productivity. Our software can be rapidly installed for use by individuals or on shared servers. Sequenceserver is AGPLv3-licensed at
  • Kalman filter based estimation algorithm for the characterization of the spatiotemporal hemodynamic response in the brain

    Belkhatir, Zehor; Mechhoud, Sarah; Laleg-Kirati, Taous-Meriem (Control Engineering Practice, Elsevier Ltd, 2019-08-01) [Article]
    The characterization of the spatiotemporal hemodynamic response (stHR) in the brain is important for understanding the interaction between neighboring brain voxels and regions. In this paper, we design an identification algorithm for the characterization of the cerebral stHR which is modeled by a system of coupled hyperbolic partial differential equation (PDE) and infinite-dimensional ordinary differential equation (ODE). The proposed algorithm provides estimates of the hemodynamic variables (cerebral blood flow and mass density contributed by blood) and physiological parameters using non-invasive Blood Oxygenation Level Dependent (BOLD) data measured with functional Magnetic Resonance Imaging (fMRI) modality. The proposed solution concept follows three main steps: (i) discretization of the stHR model using Galerkin-based finite element method; (ii) estimation of the output derivative using high-order sliding mode differentiator; and (iii) estimation of the state, input, and parameters from sampled-in-space measurements using the reduced-order approximation model and a constrained extended Kalman filter with unknown input algorithm. In addition, sufficient conditions that depend on the chosen discretization scheme, and which guarantee the structural identifiability of the input and parameters, and also the observability of the system are provided. The performance of the proposed algorithm is assessed using both synthetic and real data. The set of the used real data represents the 1-D BOLD signal collected from the visual cortex and acquired in 3 Tesla fMRI scanner.
  • DeepGOPlus: Improved protein function prediction from sequence.

    Kulmanov, Maxat; Hoehndorf, Robert (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-07-28) [Article]
    MOTIVATION:Protein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein-protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time. RESULTS:We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins (if available). We evaluate the performance of DeepGOPlus using the CAFA3 evaluation measures and achieve an Fmax of 0:390, 0:557 and 0:614 for BPO, MFO and CCO evaluations, respectively. These results would have made DeepGOPlus one of the three best predictors in CCO and the second best performing method in the BPO and MFO evaluations. We also compare DeepGOPlus with state-of-the-art methods such as DeepText2GO and GOLabeler on another dataset. DeepGOPlus can annotate around 40 protein sequences per second on common hardware, thereby making fast and accurate function predictions available for a wide range of proteins. AVAILABILITY: SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
  • Membrane fouling modeling and detection in direct contact membrane distillation

    Karam, Ayman M.; Laleg-Kirati, Taous-Meriem (Journal of Process Control, Elsevier BV, 2019-07-28) [Article]
    This paper develops a lumped parameter model-based method for membrane fouling detection in Direct Contact Membrane Distillation (DCMD) process. First, a previously published mathematical model of DCMD is extended to account for the thermal resistance of the fouling layer. Then, an adaptive nonlinear descriptor observer is developed. The adaptive observer estimates the thermal resistance of the fouling layer in addition to the temporal and spatial temperature distribution of the bulk feed and permeate solutions and at membrane interface layers. Simulation results are presented to illustrate the performance of the proposed method.
  • GCN-MF: Disease-gene association identification by graph convolutional networks and matrix factorization

    Han, Peng; Shang, Shuo; Yang, Peng; Liu, Yong; Zhao, Peilin; Zhou, Jiayu; Gao, Xin; Kalnis, Panos (Association for Computing, 2019-07-25) [Conference Paper]
    Discovering disease-gene association is a fundamental and critical biomedical task, which assists biologists and physicians to discover pathogenic mechanism of syndromes. With various clinical biomarkers measuring the similarities among genes and disease phenotypes, network-based semi-supervised learning (NSSL) has been commonly utilized by these studies to address this class-imbalanced large-scale data issue. However, most existing NSSL approaches are based on linear models and suffer from two major limitations: 1) They implicitly consider a local-structure representation for each candidate; 2) They are unable to capture nonlinear associations between diseases and genes. In this paper, we propose a new framework for disease-gene association task by combining Graph Convolutional Network (GCN) and matrix factorization, named GCN-MF. With the help of GCN, we could capture nonlinear interactions and exploit measured similarities. Moreover, we define a margin control loss function to reduce the effect of sparsity. Empirical results demonstrate that the proposed deep learning algorithm outperforms all other state-of-the-art methods on most of metrics.
  • Metagenomic Methods: From Seawater to the Database

    Reza, Md. Shaheed; Kobiyama, Atsushi; Rashid, Jonaira; Yamada, Yuichiro; Ikeda, Yuri; Ikeda, Daisuke; Mizusawa, Nanami; Yanagisawa, Saki; Ikeo, Kazuho; Sato, Shigeru; Ogata, Takehiko; Kudo, Toshiaki; Kaga, Shinnosuke; Watanabe, Shiho; Naiki, Kimiaki; Kaga, Yoshimasa; Segawa, Satoshi; Mineta, Katsuhiko; Bajic, Vladimir B.; Gojobori, Takashi; Watabe, Shugo (Springer Singapore, 2019-07-24) [Book Chapter]
    In this article, methods or techniques of metagenomics including targeted 16S/18S rRNA analyses and shotgun sequencing will be discussed. It is sometimes difficult, especially for beginners, to follow the manufacturer’s recommendation as mentioned in the protocol and to go through different steps from the preparation of starting material (e.g., DNA), library preparation, and so on. We will try to explain all the steps in detail and share our experience here. It all starts with collection of samples and collection of ecological/environmental metadata followed by sample fractionation (optional), extraction of DNA, sequencing, and finally data analyses to interpret results. Sample collection has always been the most important part of a study as it requires proper planning, a good workforce to execute, permission(s) of sampling from appropriate authority, and precaution(s) about endangered species during sampling. Here, we first describe methodology for a shallow river and in the later section methodology for a deep marine bay. In either case, slight modifications can be made to succeed in sampling. Determination of physicochemical parameters as metadata simultaneously is also an important task. These samples are then processed to extract DNA which needs to be representative of all cells present in the sample. Finally, sequencing is done by a next-generation sequencer, and data analyses are completed. Through these methods, scientists are now able to overcome the unculturability problem of more than 99% of environmental microorganisms and uncovered functional gene diversity of environmental microorganisms.

View more