The classical assumption that one drug cures a single disease by binding to a single drug-target has been shown to be inaccurate. Recent studies estimate that each drug on average binds to at least six known and several unknown targets. Identifying the “off-targets” can help understand the side effects and toxicity of the drug. Moreover, off-targets for a given drug may inspire “drug repositioning”, where a drug already approved for one condition is redirected to treat another condition, thereby overcoming delays and costs associated with clinical trials and drug approval.
In this talk, I will introduce our work along this direction. We have developed a structural alignment method that can precisely identify structural similarities between arbitrary types of interaction interfaces, such as the drug-target interaction. We have further developed a novel computational framework, iDTP that constructs the structural signatures of approved and experimental drugs, based on which we predict new targets for these drugs. Our method combines information from several sources including sequence independent structural alignment, sequence similarity, drug-target tissue expression data, and text mining.
In a cross-validation study, we used iDTP to predict the known targets of 11 drugs, with 63% sensitivity and 81% specificity. We then predicted novel targets for these drugs—two that are of high pharmacological interest, the peroxisome proliferator-activated receptor gamma and the oncogene B-cell lymphoma 2, were successfully validated through in vitro binding experiments.
The last few decades have witnessed an enormous accumulation of data and information in various forms in the domain of Biomedicine. To search for accurate and rich information on any particular topic in this domain appears challenging. The main reasons are that a) useful pieces of information are scattered across numerous sources, b) data is contained in a variety of formats, c) data/information are not indexed with standard identifiers, d) a lot of information is in a free text format, and e) frequently the information needed is not explicitly presented in any single data/information source.
This situation requires new approaches to search for, extract and explore the desired information. We will present a system developed at KAUST that addresses some of these challenges. This system is a representative of a technological solution to what can be named Next Generation Knowledge Mining Systems for the biomedical domain.
More and more genomes and metagenomes are being sequenced since the advent of Next Generation Sequencing Technologies (NGS). Many metagenomic samples are collected from a variety of environments, each exhibiting a different environmental profile, e.g. temperature, environmental chemistry, etc… These metagenomes can be profiled to unearth enzymes relevant to several industries based on specific enzyme properties such as ability to work on extreme conditions, such as extreme temperatures, salinity, anaerobically, etc..
In this work, we present the DMAP platform comprising of a high-throughput metagenomic annotation pipeline and a data-warehouse for comparisons and profiling across large number of metagenomes. We developed two reference databases for profiling of important genes, one containing enzymes related to different industries and the other containing genes with potential bioactivity roles.
In this presentation we describe an example analysis of a large number of publicly available metagenomic sample from TARA oceans study (Science 2015) that covers significant part of world oceans.
Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses.
LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy.
LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.
Applications of high-throughput techniques in metagenomics studies produce massive amounts of data. Fragments of genomic, transcriptomic and proteomic molecules are all found in metagenomics samples. Laborious and meticulous effort in sequencing and functional annotation are then required to, amongst other objectives, reconstruct a taxonomic map of the environment that metagenomics samples were taken from. In addition to computational challenges faced by metagenomics studies, the analysis is further complicated by the presence of contaminants in the samples, potentially resulting in skewed taxonomic analysis.
The functional annotation in metagenomics can utilize all available omics data and therefore different methods that are associated with a particular type of data. For example, protein-coding DNA, non-coding RNA or ribosomal RNA data can be used in such an analysis. These methods would have their advantages and disadvantages and the question of comparison among them naturally arises. There are several criteria that can be used when performing such a comparison. Loosely speaking, methods can be evaluated in terms of computational complexity or in terms of the expected biological accuracy.
We propose that the concept of diversity that is used in the ecosystems and species diversity studies can be successfully used in evaluating certain aspects of the methods employed in metagenomics studies. We show that when applying the concept of Hill’s diversity, the analysis of variations in the diversity order provides valuable clues into the robustness of methods used in the taxonomical analysis.
Metagenome produces a tremendous amount of data that comes from the organisms living in the environments. This big data enables us to examine not only microbial genes but also the community structure, interaction and adaptation mechanisms at the specific location and condition. The Red Sea has several unique characteristics such as high salinity, high temperature and low nutrition. These features must contribute to form the unique microbial community during the evolutionary process.
Since 2014, we started monthly samplings of the metagenomes in the Red Sea under KAUST-CCF project. In collaboration with Kitasato University, we also collected the metagenome data from the ocean in Japan, which shows contrasting features to the Red Sea. Therefore, the comparative metagenomics of those data provides a comprehensive view of the Red Sea microbes, leading to identify key microbes, genes and networks related to those environmental differences.
Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations:
1) HapMap Data (1,417 individuals)
2) HGDP (Human Genome Diversity Project) Data (940 individuals)
3) 1000 genomes Data (2,504 individuals)
If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets.
Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.
Export search results
The export option will allow you to export the current search results of the entered query to a file. Different
formats are available for download. To export the items, click on the button corresponding with the preferred download format.
By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.
For anonymous users the allowed maximum amount is 50 search results.
To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export.
The amount of items that can be exported at once is similarly restricted as the full export.
After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.