• Welcome Address

      Frechet, Jean (2016-01-25) [Presentation]
    • Trusted Allies with New Benefits: Repositioning Existing Drugs

      Gao, Xin (2016-01-25) [Presentation]
      The classical assumption that one drug cures a single disease by binding to a single drug-target has been shown to be inaccurate. Recent studies estimate that each drug on average binds to at least six known and several unknown targets. Identifying the “off-targets” can help understand the side effects and toxicity of the drug. Moreover, off-targets for a given drug may inspire “drug repositioning”, where a drug already approved for one condition is redirected to treat another condition, thereby overcoming delays and costs associated with clinical trials and drug approval. In this talk, I will introduce our work along this direction. We have developed a structural alignment method that can precisely identify structural similarities between arbitrary types of interaction interfaces, such as the drug-target interaction. We have further developed a novel computational framework, iDTP that constructs the structural signatures of approved and experimental drugs, based on which we predict new targets for these drugs. Our method combines information from several sources including sequence independent structural alignment, sequence similarity, drug-target tissue expression data, and text mining. In a cross-validation study, we used iDTP to predict the known targets of 11 drugs, with 63% sensitivity and 81% specificity. We then predicted novel targets for these drugs—two that are of high pharmacological interest, the peroxisome proliferator-activated receptor gamma and the oncogene B-cell lymphoma 2, were successfully validated through in vitro binding experiments.
    • Three-Dimentional Structures of Autophosphorylation Complexes in Crystals of Protein Kinases

      Dumbrack, Roland (2016-01-26) [Presentation]
      Protein kinase autophosphorylation is a common regulatory mechanism in cell signaling pathways. Several autophosphorylation complexes have been identified in crystals of protein kinases, with a known serine, threonine, or tyrosine autophosphorylation site of one kinase monomer sitting in the active site of another monomer of the same protein in the crystal. We utilized a structural bioinformatics method to identify all such autophosphorylation complexes in X-ray crystallographic structures in the Protein Data Bank (PDB) by generating all unique kinase/kinase interfaces within and between asymmetric units of each crystal and measuring the distance between the hydroxyl oxygen of potential autophosphorylation sites and the oxygen atoms of the active site aspartic acid residue side chain. We have identified 15 unique autophosphorylation complexes in the PDB, of which 5 complexes have not previously been described in the relevant publications on the crystal structures (N-terminal juxtamembrane regions of CSF1R and EPHA2, activation loop tyrosines of LCK and IGF1R, and a serine in a nuclear localization signal region of CLK2. Mutation of residues in the autophosphorylation complex interface of LCK either severely impaired autophosphorylation or increased it. Taking the autophosphorylation complexes as a whole and comparing them with peptide-substrate/kinase complexes, we observe a number of important features among them. The novel and previously observed autophosphorylation sites are conserved in many kinases, indicating that by homology we can extend the relevance of these complexes to many other clinically relevant drug targets.
    • The power of data: structural bioinformatics yesterday and today

      Tramontano, Anna (2016-01-25) [Presentation]
      The protein structure database was established in 1971. At the time it contained seven structures, today there are more than 100,000. The improvement is not only a matter of quantity, but also of quality. Did we effectively exploit this information to gain knowledge? The answer is certainly affirmative. I will illustrate how this wealth of experimental data has allowed us to explore the landscape of macromolecular structures on one side, and to uncover the properties of specific protein families on the other. The latter plays an essential role in pursuing exciting new avenues in biomedical and biotechnological sciences. Experimental data are also part of a virtuous cycle whereby they reinforce and guide our ability to infer unknown macromolecular structures, which, while providing relevant information to scientists, permits to gauge the level of our understanding of the complex problem of protein folding. A paradigmatic example of the latter is represented by the “Critical Assessment of Techniques for Protein Structure Prediction” (CASP) initiative that I will briefly discuss.
    • The Glory and Misery of Electronic Health Records

      Smith, Barry (2016-01-27) [Presentation]
      While bioinformatics has witnessed enormous technological advances since the turn of the millennium, progress in the EHR field has been stymied by outdated approaches entrenched through ill-conceived government mandates. In the US, especially, the dominant EHR systems are expensive, difficult to use, fail to ensure even a minimal level of interoperability, and detract from patient care. I will outline the reasons for some of these failures, and sketch an evolutionary path towards the sort of EHR landscape that will be needed in the future, in which consistency with biomedical ontologies will play a central role.
    • The Genomic Code: Genome Evolution and Potential Applications

      Bernardi, Giorgio (2016-01-25) [Presentation]
      The genome of metazoans is organized according to a genomic code which comprises three laws: 1) Compositional correlations hold between contiguous coding and non-coding sequences, as well as among the three codon positions of protein-coding genes; these correlations are the consequence of the fact that the genomes under consideration consist of fairly homogeneous, long (≥200Kb) sequences, the isochores; 2) Although isochores are defined on the basis of purely compositional properties, GC levels of isochores are correlated with all tested structural and functional properties of the genome; 3) GC levels of isochores are correlated with chromosome architecture from interphase to metaphase; in the case of interphase the correlation concerns isochores and the three-dimensional “topological associated domains” (TADs); in the case of mitotic chromosomes, the correlation concerns isochores and chromosomal bands. Finally, the genomic code is the fourth and last pillar of molecular biology, the first three pillars being 1) the double helix structure of DNA; 2) the regulation of gene expression in prokaryotes; and 3) the genetic code.
    • Systems Biology for Mapping Genotype-Phenotype Relations in Yeast

      Nielsen, Jens (2016-01-25) [Presentation]
      The yeast Saccharomyces cerevisiae is widely used for production of fuels, chemicals, pharmaceuticals and materials. Through metabolic engineering of this yeast a number of novel new industrial processes have been developed over the last 10 years. Besides its wide industrial use, S. cerevisiae serves as an eukaryal model organism, and many systems biology tools have therefore been developed for this organism. Among these genome-scale metabolic models have shown to be most successful as they easy integrate with omics data and at the same time have been shown to have excellent predictive power. Despite our extensive knowledge of yeast metabolism and its regulation we are still facing challenges when we want to engineer complex traits, such as improved tolerance to toxic metabolites like butanol and elevated temperatures or when we want to engineer the highly complex protein secretory pathway. In this presentation it will be demonstrated how we can combine directed evolution with systems biology analysis to identify novel targets for rational design-build-test of yeast strains that have improved phenotypic properties. In this lecture an overview of systems biology of yeast will be presented together with examples of how genome-scale metabolic modeling can be used for prediction of cellular growth at different conditions. Examples will also be given on how adaptive laboratory evolution can be used for identifying targets for improving tolerance towards butanol, increased temperature and low pH and for improving secretion of heterologous proteins.
    • Protein phosphorylation in bcterial signaling and regulation

      Mijakovic, Ivan (2016-01-26) [Presentation]
      In 2003, it was demonstrated for the first time that bacteria possess protein-tyrosine kinases (BY-kinases), capable of phosphorylating other cellular proteins and regulating their activity. It soon became apparent that these kinases phosphorylate a number of protein substrates, involved in different cellular processes. More recently, we found out that BY-kinases can be activated by several distinct protein interactants, and are capable of engaging in cross-phosphorylation with other kinases. Evolutionary studies based on genome comparison indicate that BY-kinases exist only in bacteria. They are non-essential (present in about 40% bacterial genomes), and their knockouts lead to pleiotropic phenotypes, since they phosphorylate many substrates. Surprisingly, BY-kinase genes accumulate mutations at an increased rate (non-synonymous substitution rate significantly higher than other bacterial genes). One direct consequence of this phenomenon is no detectable co-evolution between kinases and their substrates. Their promiscuity towards substrates thus seems to be “hard-wired”, but why would bacteria maintain such promiscuous regulatory devices? One explanation is the maintenance of BY-kinases as rapidly evolving regulators, which can readily adopt new substrates when environmental changes impose selective pressure for quick evolution of new regulatory modules. Their role is clearly not to act as master regulators, dedicated to triggering a single response, but they might rather be employed to contribute to fine-tuning and improving robustness of various cellular responses. This unique feature makes BY-kinases a potentially useful tool in synthetic biology. While other bacterial kinases are very specific and their signaling pathways insulated, BY-kinase can relatively easily be engineered to adopt new substrates and control new biosynthetic processes. Since they are absent in humans, and regulate some key functions in pathogenic bacteria, they are also very promising targets for new antibacterial drugs.
    • Opening Remarks

      Bajic, Vladimir B. (2016-01-25) [Presentation]
    • Network-based analysis of proteomic profiles

      Wong, Limsoon (2016-01-26) [Presentation]
      Mass spectrometry (MS)-based proteomics is a widely used and powerful tool for profiling systems-wide protein expression changes. It can be applied for various purposes, e.g. biomarker discovery in diseases and study of drug responses. Although RNA-based high-throughput methods have been useful in providing glimpses into the underlying molecular processes, the evidences they provide are indirect. Furthermore, RNA and corresponding protein levels have been known to have poor correlation. On the other hand, MS-based proteomics tend to have consistency issues (poor reproducibility and inter-sample agreement) and coverage issues (inability to detect the entire proteome) that need to be urgently addressed. In this talk, I will discuss how these issues can be addressed by proteomic profile analysis techniques that use biological networks (especially protein complexes) as the biological context. In particular, I will describe several techniques that we have been developing for network-based analysis of proteomics profile. And I will present evidence that these techniques are useful in identifying proteomics-profile analysis results that are more consistent, more reproducible, and more biologically coherent, and that these techniques allow expansion of the detected proteome to uncover and/or discover novel proteins.
    • Modeling structure of G protein-coupled receptors in huan genome

      Zhang, Yang (2016-01-26) [Presentation]
      G protein-coupled receptors (or GPCRs) are integral transmembrane proteins responsible to various cellular signal transductions. Human GPCR proteins are encoded by 5% of human genes but account for the targets of 40% of the FDA approved drugs. Due to difficulties in crystallization, experimental structure determination remains extremely difficult for human GPCRs, which have been a major barrier in modern structure-based drug discovery. We proposed a new hybrid protocol, GPCR-I-TASSER, to construct GPCR structure models by integrating experimental mutagenesis data with ab initio transmembrane-helix assembly simulations, assisted by the predicted transmembrane-helix interaction networks. The method was tested in recent community-wide GPCRDock experiments and constructed models with a root mean square deviation 1.26 Å for Dopamine-3 and 2.08 Å for Chemokine-4 receptors in the transmembrane domain regions, which were significantly closer to the native than the best templates available in the PDB. GPCR-I-TASSER has been applied to model all 1,026 putative GPCRs in the human genome, where 923 are found to have correct folds based on the confidence score analysis and mutagenesis data comparison. The successfully modeled GPCRs contain many pharmaceutically important families that do not have previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin and Neuropeptide Y receptors. All the human GPCR models have been made publicly available through the GPCR-HGmod database at http://zhanglab.ccmb.med.umich.edu/GPCR-HGmod/ The results demonstrate new progress on genome-wide structure modeling of transmembrane proteins which should bring useful impact on the effort of GPCR-targeted drug discovery.
    • Molecular Genetic Diversity of Date (Phoenix dactylifera) Germplasm in Qatar based on Microsatellite Markers

      Ahmed, Talaat (2016-01-25) [Presentation]
      Depending on morphological traits alone, studying the genetic diversity of date palm is a very difficult task since morphological characteristics are highly affected by the environment. DNA markers are excellent option that can help and enhance the discriminatory power of morphological characteristics. To study the genetic diversity among date palm cultivars grown in Qatar, fifteen Date palm samples were collected from Qatar University Experimental Farm. DNAs were extracted from fresh leaves by using commercial DNeasy Plant System Kit (Qiagen, Inc., Valencia, CA). Total of 18 (Inter Simple Sequence Repeat) ISSR single primers were used to amplify DNA fragments using genomic DNA of the 15 samples. First screening was done to test the ability of these primers to amplify clear bands using Date palm genomic DNA. All 18 ISSR primers successfully produced clear bands in the first screening. Then, each primer was used separately to genotype the whole set of 15 Date palm samples. Total of 4794 bands were generated using 18 ISSR primers for the 15 Date palm samples. On average, each primer generated 400 bands. The Number of amplified bands varied from cultivar to cultivar. The highest number of bands was obtained using Primers 2, 5 and 12 for the 15 (470 bands), while the lowest number of bands were obtained by Primers 1, 7 and 8 where they produced only 329 bands. Markers were scored for the presence and absence of the corresponding band among the different cultivars. Data were subjected to cluster analysis. A similarity matrix was constructed and the similarity values were used for cluster analysis.
    • Machine learning and complex-network for personalized and systems biomedicine

      Cannistraci, Carlo Vittorio (2016-01-27) [Presentation]
      The talk will begin with an introduction on using machine learning to discover hidden information and unexpected patterns in large biomedical datasets. Then, recent results on the use of complex network theory in biomedicine and neuroscience will be discussed. In particular, metagenomics and metabolomics data, approaches for drug-target repositioning, functional/structural MR connectomes and gut-brain axis data will be presented. The conclusion will outline the novel and exciting perspectives offered by the translation of these methods from systems biology to systems medicine.
    • Knowledge-based analysis of phenotypes

      Hoendorf, Robert (2016-01-27) [Presentation]
      Phenotypes are the observable characteristics of an organism, and they are widely recorded in biology and medicine. To facilitate data integration, ontologies that formally describe phenotypes are being developed in several domains. I will describe a formal framework to describe phenotypes. A formalized theory of phenotypes is not only useful for domain analysis, but can also be applied to assist in the diagnosis of rare genetic diseases, and I will show how our results on the ontology of phenotypes is now applied in biomedical research.
    • Knowledge Exploration from Big Data in Biomedicine

      Bajic, Vladimir B. (2016-01-27) [Presentation]
      The last few decades have witnessed an enormous accumulation of data and information in various forms in the domain of Biomedicine. To search for accurate and rich information on any particular topic in this domain appears challenging. The main reasons are that a) useful pieces of information are scattered across numerous sources, b) data is contained in a variety of formats, c) data/information are not indexed with standard identifiers, d) a lot of information is in a free text format, and e) frequently the information needed is not explicitly presented in any single data/information source. This situation requires new approaches to search for, extract and explore the desired information. We will present a system developed at KAUST that addresses some of these challenges. This system is a representative of a technological solution to what can be named Next Generation Knowledge Mining Systems for the biomedical domain.
    • High throughtput comparisons and profiling of metagenomes for industrially relevant enzymes

      Alam, Intikhab (2016-01-26) [Presentation]
      More and more genomes and metagenomes are being sequenced since the advent of Next Generation Sequencing Technologies (NGS). Many metagenomic samples are collected from a variety of environments, each exhibiting a different environmental profile, e.g. temperature, environmental chemistry, etc… These metagenomes can be profiled to unearth enzymes relevant to several industries based on specific enzyme properties such as ability to work on extreme conditions, such as extreme temperatures, salinity, anaerobically, etc.. In this work, we present the DMAP platform comprising of a high-throughput metagenomic annotation pipeline and a data-warehouse for comparisons and profiling across large number of metagenomes. We developed two reference databases for profiling of important genes, one containing enzymes related to different industries and the other containing genes with potential bioactivity roles. In this presentation we describe an example analysis of a large number of publicly available metagenomic sample from TARA oceans study (Science 2015) that covers significant part of world oceans.
    • Function and Phenotype prediction through Data and Knowledge Fusion

      Vespoor, Karen (2016-01-27) [Presentation]
      The biomedical literature captures the most current biomedical knowledge and is a tremendously rich resource for research. With over 24 million publications currently indexed in the US National Library of Medicine’s PubMed index, however, it is becoming increasingly challenging for biomedical researchers to keep up with this literature. Automated strategies for extracting information from it are required. Large-scale processing of the literature enables direct biomedical knowledge discovery. In this presentation, I will introduce the use of text mining techniques to support analysis of biological data sets, and will specifically discuss applications in protein function and phenotype prediction, as well as analysis of genetic variants that are supported by analysis of the literature and integration with complementary structured resources.
    • Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

      Arold, Stefan T. (2016-01-25) [Presentation]
      Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.
    • Emerging experimental and computational technologies for purpose designed engineering of photosynthetic prokaryotes

      Lindblad, Peter (2016-01-25) [Presentation]
      With recent advances in synthetic molecular tools to be used in photosynthetic prokaryotes, like cyanobacteria, it is possible to custom design and construct microbial cells for specific metabolic functions. This cross-disciplinary area of research has emerged within the interfaces of advanced genetic engineering, computational science, and molecular biotechnology. We have initiated the development of a genetic toolbox, using a synthetic biology approach, to custom design, engineer and construct cyanobacteria for selected function and metabolism. One major bottleneck is a controlled transcription and translation of introduced genetic constructs. An additional major issue is genetic stability. I will present and discuss recent progress in our development of genetic tools for advanced cyanobacterial biotechnology. Progress on understanding the electron pathways in native and engineered cyanobacterial enzymes and heterologous expression of non-native enymzes in cyanobacterial cells will be highlighted. Finally, I will discuss our attempts to merge synthetic biology with synthetic chemistry to explore fundamantal questions of protein design and function.
    • Diversity Indices as Measures of Functional Annotation Methods in Metagenomics Studies

      Jankovic, Boris R. (2016-01-26) [Presentation]
      Applications of high-throughput techniques in metagenomics studies produce massive amounts of data. Fragments of genomic, transcriptomic and proteomic molecules are all found in metagenomics samples. Laborious and meticulous effort in sequencing and functional annotation are then required to, amongst other objectives, reconstruct a taxonomic map of the environment that metagenomics samples were taken from. In addition to computational challenges faced by metagenomics studies, the analysis is further complicated by the presence of contaminants in the samples, potentially resulting in skewed taxonomic analysis. The functional annotation in metagenomics can utilize all available omics data and therefore different methods that are associated with a particular type of data. For example, protein-coding DNA, non-coding RNA or ribosomal RNA data can be used in such an analysis. These methods would have their advantages and disadvantages and the question of comparison among them naturally arises. There are several criteria that can be used when performing such a comparison. Loosely speaking, methods can be evaluated in terms of computational complexity or in terms of the expected biological accuracy. We propose that the concept of diversity that is used in the ecosystems and species diversity studies can be successfully used in evaluating certain aspects of the methods employed in metagenomics studies. We show that when applying the concept of Hill’s diversity, the analysis of variations in the diversity order provides valuable clues into the robustness of methods used in the taxonomical analysis.