Attributed heterogeneous network fusion via collaborative matrix tri-factorization
Name:
AHNF.pdf
Size:
656.3Kb
Format:
PDF
Description:
Accepted manuscript
Embargo End Date:
2022-07-02
Type
ArticleKAUST Department
Computer Science ProgramComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Machine Intelligence & kNowledge Engineering Lab
Date
2020-06-26Online Publication Date
2020-06-26Print Publication Date
2020-11Embargo End Date
2022-07-02Submitted Date
2019-07-17Permanent link to this record
http://hdl.handle.net/10754/664146
Metadata
Show full item recordAbstract
Heterogeneous network based data fusion can encode diverse inter- and intra-relations between objects, and has been sparking increasing attention in recent years. Matrix factorization based data fusion models have been invented to fuse multiple data sources. However, these models generally suffer from the widely-witnessed insufficient relations between nodes and from information loss when heterogeneous attributes of diverse network nodes are transformed into ad-hoc homologous networks for fusion. In this paper, we introduce a general data fusion model called Attributed Heterogeneous Network Fusion (AHNF). AHNF firstly constructs an attributed heterogeneous network composed with different types of nodes and the diverse attribute vectors of these nodes. It uses indicator matrices to differentiate the observed inter-relations from the latent ones, and thus reduces the impact of insufficient relations between nodes. Next, it collaboratively factorizes multiple adjacency matrices and attribute data matrices of the heterogeneous network into low-rank matrices to explore the latent relations between these nodes. In this way, both the network topology and diverse attributes of nodes are fused in a coordinated fashion. Finally, it uses the optimized low-rank matrices to approximate the target relational data matrix of objects and to effectively accomplish the relation prediction. We apply AHNF to predict the lncRNA-disease associations using diverse relational and attribute data sources. AHNF achieves a larger area under the receiver operating curve 0.9367 (by at least 2.14%), and a larger area under the precision-recall curve 0.5937 (by at least 28.53%) than competitive data fusion approaches. AHNF also outperforms competing methods on predicting de novo lncRNA-disease associations, and precisely identifies lncRNAs associated with breast, stomach, prostate, and pancreatic cancers. AHNF is a comprehensive data fusion framework for universal attributed multi-type relational data. The code and datasets are available at http://mlda.swu.edu.cn/codes.php?name=AHNF.Citation
Yu, G., Wang, Y., Wang, J., Domeniconi, C., Guo, M., & Zhang, X. (2020). Attributed heterogeneous network fusion via collaborative matrix tri-factorization. Information Fusion, 63, 153–165. doi:10.1016/j.inffus.2020.06.012Sponsors
This work is supported by Natural Science Foundation of China (61872300 and 61873214).Publisher
Elsevier BVJournal
Information FusionAdditional Links
https://linkinghub.elsevier.com/retrieve/pii/S1566253520303079ae974a485f413a2113503eed53cd6c53
10.1016/j.inffus.2020.06.012