Attributed heterogeneous network fusion via collaborative matrix tri-factorization
KAUST DepartmentComputer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Machine Intelligence & kNowledge Engineering Lab
Online Publication Date2020-06-26
Print Publication Date2020-11
Embargo End Date2022-07-02
Permanent link to this recordhttp://hdl.handle.net/10754/664146
MetadataShow full item record
AbstractHeterogeneous network based data fusion can encode diverse inter- and intra-relations between objects, and has been sparking increasing attention in recent years. Matrix factorization based data fusion models have been invented to fuse multiple data sources. However, these models generally suffer from the widely-witnessed insufficient relations between nodes and from information loss when heterogeneous attributes of diverse network nodes are transformed into ad-hoc homologous networks for fusion. In this paper, we introduce a general data fusion model called Attributed Heterogeneous Network Fusion (AHNF). AHNF firstly constructs an attributed heterogeneous network composed with different types of nodes and the diverse attribute vectors of these nodes. It uses indicator matrices to differentiate the observed inter-relations from the latent ones, and thus reduces the impact of insufficient relations between nodes. Next, it collaboratively factorizes multiple adjacency matrices and attribute data matrices of the heterogeneous network into low-rank matrices to explore the latent relations between these nodes. In this way, both the network topology and diverse attributes of nodes are fused in a coordinated fashion. Finally, it uses the optimized low-rank matrices to approximate the target relational data matrix of objects and to effectively accomplish the relation prediction. We apply AHNF to predict the lncRNA-disease associations using diverse relational and attribute data sources. AHNF achieves a larger area under the receiver operating curve 0.9367 (by at least 2.14%), and a larger area under the precision-recall curve 0.5937 (by at least 28.53%) than competitive data fusion approaches. AHNF also outperforms competing methods on predicting de novo lncRNA-disease associations, and precisely identifies lncRNAs associated with breast, stomach, prostate, and pancreatic cancers. AHNF is a comprehensive data fusion framework for universal attributed multi-type relational data. The code and datasets are available at http://mlda.swu.edu.cn/codes.php?name=AHNF.
CitationYu, G., Wang, Y., Wang, J., Domeniconi, C., Guo, M., & Zhang, X. (2020). Attributed heterogeneous network fusion via collaborative matrix tri-factorization. Information Fusion, 63, 153–165. doi:10.1016/j.inffus.2020.06.012
SponsorsThis work is supported by Natural Science Foundation of China (61872300 and 61873214).