We are all witnessing an explosion in the volumes of biological data generated by the latest high-throughput technologies. It is likely that the rate of increase in data volumes will expand even further in the future. The enormity of volumes, complexity and interdependence of data all pose analytical challenges. What to make out of data? How to analyze it? What is the deeply hidden knowledge buried under this complexity? These are just some of the questions that contemporary life sciences in general and bioinformatics in particular have to grapple with in the quest to improve lives through knowledge discovery.
Challenges arising from size and complexity of data sets are not unique to life sciences; examples are high-energy physics, climate science, astrophysics, national security data, etc. What is common to all these fields is that the analysis of data in such cases requires a new approach in the paradigm of Big Data and exascale computing. In this seminar, some of the state-of-the-art approaches to Big Data challenges will be discussed. The seminar presents an opportunity for those who generate data and those who analyze it to discuss possible ways forward towards more efficient analysis, knowledge discovery and modeling.
Conference web site: http://www.cbrc.kaust.edu.sa/cbrcweb/sp/bd2016.php
(2016-01-25) Gao, Xin; Computational Bioscience Research Center (CBRC); Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
The classical assumption that one drug cures a single disease by binding to a single drug-target has been shown to be inaccurate. Recent studies estimate that each drug on average binds to at least six known and several unknown targets. Identifying the “off-targets” can help understand the side effects and toxicity of the drug. Moreover, off-targets for a given drug may inspire “drug repositioning”, where a drug already approved for one condition is redirected to treat another condition, thereby overcoming delays and costs associated with clinical trials and drug approval.
In this talk, I will introduce our work along this direction. We have developed a structural alignment method that can precisely identify structural similarities between arbitrary types of interaction interfaces, such as the drug-target interaction. We have further developed a novel computational framework, iDTP that constructs the structural signatures of approved and experimental drugs, based on which we predict new targets for these drugs. Our method combines information from several sources including sequence independent structural alignment, sequence similarity, drug-target tissue expression data, and text mining.
In a cross-validation study, we used iDTP to predict the known targets of 11 drugs, with 63% sensitivity and 81% specificity. We then predicted novel targets for these drugs—two that are of high pharmacological interest, the peroxisome proliferator-activated receptor gamma and the oncogene B-cell lymphoma 2, were successfully validated through in vitro binding experiments.
(2016-01-26) Dumbrack, Roland; The Fox Chase Cancer Center in Philadelphia, PA, USA
Protein kinase autophosphorylation is a common regulatory mechanism in cell signaling pathways. Several autophosphorylation complexes have been identified in crystals of protein kinases, with a known serine, threonine, or tyrosine autophosphorylation site of one kinase monomer sitting in the active site of another monomer of the same protein in the crystal.
We utilized a structural bioinformatics method to identify all such autophosphorylation complexes in X-ray crystallographic structures in the Protein Data Bank (PDB) by generating all unique kinase/kinase interfaces within and between asymmetric units of each crystal and measuring the distance between the hydroxyl oxygen of potential autophosphorylation sites and the oxygen atoms of the active site aspartic acid residue side chain.
We have identified 15 unique autophosphorylation complexes in the PDB, of which 5 complexes have not previously been described in the relevant publications on the crystal structures (N-terminal juxtamembrane regions of CSF1R and EPHA2, activation loop tyrosines of LCK and IGF1R, and a serine in a nuclear localization signal region of CLK2. Mutation of residues in the autophosphorylation complex interface of LCK either severely impaired autophosphorylation or increased it.
Taking the autophosphorylation complexes as a whole and comparing them with peptide-substrate/kinase complexes, we observe a number of important features among them. The novel and previously observed autophosphorylation sites are conserved in many kinases, indicating that by homology we can extend the relevance of these complexes to many other clinically relevant drug targets.
The protein structure database was established in 1971. At the time it contained seven structures, today there are more than 100,000. The improvement is not only a matter of quantity, but also of quality. Did we effectively exploit this information to gain knowledge? The answer is certainly affirmative.
I will illustrate how this wealth of experimental data has allowed us to explore the landscape of macromolecular structures on one side, and to uncover the properties of specific protein families on the other. The latter plays an essential role in pursuing exciting new avenues in biomedical and biotechnological sciences.
Experimental data are also part of a virtuous cycle whereby they reinforce and guide our ability to infer unknown macromolecular structures, which, while providing relevant information to scientists, permits to gauge the level of our understanding of the complex problem of protein folding. A paradigmatic example of the latter is represented by the “Critical Assessment of Techniques for Protein Structure Prediction” (CASP) initiative that I will briefly discuss.
(2016-01-27) Smith, Barry; The State University of New York at Buffalo
While bioinformatics has witnessed enormous technological advances since the turn of the millennium, progress in the EHR field has been stymied by outdated approaches entrenched through ill-conceived government mandates. In the US, especially, the dominant EHR systems are expensive, difficult to use, fail to ensure even a minimal level of interoperability, and detract from patient care.
I will outline the reasons for some of these failures, and sketch an evolutionary path towards the sort of EHR landscape that will be needed in the future, in which consistency with biomedical ontologies will play a central role.