Differentiating isoform functions with collaborative matrix factorization.
KAUST DepartmentComputer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Machine Intelligence & kNowledge Engineering Lab
Embargo End Date2021-03-17
Permanent link to this recordhttp://hdl.handle.net/10754/662248
MetadataShow full item record
AbstractMOTIVATION:Isoforms are alternatively spliced mRNAs of genes. They can be translated into different functional proteoforms, and thus greatly increase the functional diversity of protein variants (or proteoforms). Differentiating the functions of isoforms (or proteoforms) helps understanding the underlying pathology of various complex diseases at a deeper granularity. Since existing functional genomic databases uniformly record the annotations at the gene-level, and rarely record the annotations at the isoform-level, differentiating isoform functions is more challenging than the traditional gene-level function prediction. RESULTS:Several approaches have been proposed to differentiate the functions of isoforms. They generally follow the multi-instance learning paradigm by viewing each gene as a bag and the spliced isoforms as its instances, and push functions of bags onto instances. These approaches implicitly assume the collected annotations of genes are complete and only integrate multiple RNA-seq datasets. As such, they have compromised performance. We propose a data integrative solution (called DisoFun) to Differentiate isoform Functions with collaborative matrix factorization. DisoFun assumes the functional annotations of genes are aggregated from those of key isoforms. It collaboratively factorizes the isoform data matrix and gene-term data matrix (storing Gene Ontology annotations of genes) into low-rank matrices to simultaneously explore the latent key isoforms, and achieve function prediction by aggregating predictions to their originating genes. In addition, it leverages the PPI network and Gene Ontology structure to further coordinate the matrix factorization. Extensive experimental results show that DisoFun improves the area under the receiver operating characteristic curve and area under the precision-recall curve of existing solutions by at least 7.7 and 28.9%, respectively. We further investigate DisoFun on four exemplar genes (LMNA, ADAM15, BCL2L1 and CFLAR) with known functions at the isoform-level, and observed that DisoFun can differentiate functions of their isoforms with 90.5% accuracy. AVAILABILITY AND IMPLEMENTATION:The code of DisoFun is available at mlda.swu.edu.cn/codes.php?name=DisoFun. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
CitationWang, K., Wang, J., Domeniconi, C., Zhang, X., & Yu, G. (2019). Differentiating isoform functions with collaborative matrix factorization. Bioinformatics. doi:10.1093/bioinformatics/btz847
SponsorsThis work was supported by National Natural Science Foundation of China [61872300, 61873214]; Fundamental Research Funds for the Central Universities [XDJK2019B024]; and Natural Science Foundation of CQ CSTC [cstc2018jcyjAX0228].
PublisherOxford University Press (OUP)
JournalBioinformatics (Oxford, England)