Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation

Social relations are often used to improve recommendation quality when user-item interaction data is sparse in recommender systems. Most existing social recommendation models exploit pairwise relations to mine potential user preferences. However, real-life interactions among users are very complicated and user relations can be high-order. Hypergraph provides a natural way to model complex high-order relations, while its potentials for improving social recommendation are under-explored. In this paper, we fill this gap and propose a multi-channel hypergraph convolutional network to enhance social recommendation by leveraging high-order user relations. Technically, each channel in the network encodes a hypergraph that depicts a common high-order user relation pattern via hypergraph convolution. By aggregating the embeddings learned through multiple channels, we obtain comprehensive user representations to generate recommendation results. However, the aggregation operation might also obscure the inherent characteristics of different types of high-order connectivity information. To compensate for the aggregating loss, we innovatively integrate self-supervised learning into the training of the hypergraph convolutional network to regain the connectivity information with hierarchical mutual information maximization. The experimental results on multiple real-world datasets show that the proposed model outperforms the SOTA methods, and the ablation study verifies the effectiveness of the multi-channel setting and the self-supervised task. The implementation of our model is available via https://github.com/Coder-Yu/RecQ.


INTRODUCTION
Over the past decade, the social media boom has dramatically changed people's ways of thinking and behaving. It has been revealed that people may alter their attitudes and behaviors in response to what they perceive their friends might do or think, which is known as the social influence [7]. Meanwhile, there are also studies [25] showing that people tend to build connections with others who have similar preferences with them, which is called the homophily. Based on these findings, social relations are often integrated into recommender systems to mitigate the data sparsity issue [13,33]. Generally, in a social recommender system, if a user has few interactions with items, the system would rely on her friends' interactions to infer her preference and generate better recommendations. Upon this paradigm, a large number of social recommendation models have been developed [12,21,23,55,57,61] and have shown stronger performance compared with general recommendation models.
Recently, graph neural networks (GNNs) [43] have achieved great success in a wide range of areas. Owing to their powerful capability in modeling relational data, GNNs-based models also have shown prominent performance in social recommendation [9,19,[40][41][42]58]. However, a key limitation of these GNNs-based social recommendation models is that they only exploit the simple pairwise user relations and ignore the ubiquitous high-order relations among users. Although the long-range dependencies of arXiv:2101.06448v4 [cs.IR] 27 Feb 2022 relations (i.e. transitivity of friendship), which are also considered high-order, can be captured by using k graph neural layers to incorporate features from k-hop social neighbors, these GNNs-based models are unable to formulate and capture the complex high-order user relation patterns (as shown in Fig. 1) beyond pairwise relations. For example, it is natural to think that two users who are socially connected and also purchased the same item have a stronger relationship than those who are only socially connected, whereas the common purchase information in the former is often neglected in previous social recommendation models.
Hypergraph [4], which generalizes the concept of edge to make it connect more than two nodes, provides a natural way to model complex high-order relations among users. Despite the great advantages over the simple graph in user modeling, the strengths of hypergraph are under-explored in social recommendation. In this paper, we fill this gap by investigating the potentials of fusing hypergraph modeling and graph convolutional networks, and propose a Multi-channel Hypergraph Convolutional Network (MHCN) to enhance social recommendation by exploiting high-order user relations. Technically, we construct hypergraphs by unifying nodes that form specific triangular relations, which are instances of a set of carefully designed triangular motifs with underlying semantics (shown in Fig. 2). As we define multiple categories of motifs which concretize different types of high-order relations such as 'having a mutual friend', 'friends purchasing the same item', and 'strangers but purchasing the same item' in social recommender systems, each channel of the proposed hypergraph convolutional network undertakes the task of encoding a different motif-induced hypergraph. By aggregating multiple user embeddings learned through multiple channels, we can obtain the comprehensive user representations which are considered to contain multiple types of high-order relation information and have the great potentials to generate better recommendation results with the item embeddings.
However, despite the benefits of the multi-channel setting, the aggregation operation might also obscure the inherent characteristics of different types of high-order connectivity information [54], as different channels would learn embeddings with varying distributions on different hypergraphs. To address this issue and fully inherit the rich information in the hypergraphs, we innovatively integrate a self-supervised task [15,37] into the training of the multi-channel hypergraph convolutional network. Unlike existing studies which enforce perturbations on graphs to augment the ground-truth [53], we propose to construct self-supervision signals by exploiting the hypergraph structures, with the intuition that the comprehensive user representation should reflect the user node's local and global high-order connectivity patterns in different hypergraphs. Concretely, we leverage the hierarchy in the hypergraph structures and hierarchically maximizes the mutual information between representations of the user, the user-centered sub-hypergraph, and the global hypergraph. The mutual information here measures the structural informativeness of the sub-and the whole hypergraph towards inferring the user features through the reduction in local and global structure uncertainty. Finally, we unify the recommendation task and the self-supervised task under a primary & auxiliary learning framework. By jointly optimizing the two tasks and leveraging the interplay of all the components, the performance of the recommendation task achieves significant gains.
The major contributions of this paper are summarized as follows: • We investigate the potentials of fusing hypergraph modeling and graph neural networks in social recommendation by exploiting multiple types of high-order user relations under a multi-channel setting. • We innovatively integrate self-supervised learning into the training of the hypergraph convolutional network and show that a self-supervised auxiliary task can significantly improve the social recommendation task. • We conduct extensive experiments on multiple real-world datasets to demonstrate the superiority of the proposed model and thoroughly ablate the model to investigate the effectiveness of each component with an ablation study.

RELATED WORK 2.1 Social Recommendation
As suggested by the social science theories [7,25], users' preferences and decisions are often influenced by their friends. Based on this fact, social relations are integrated into recommender systems to alleviate the issue of data sparsity. Early exploration of social recommender systems mostly focuses on matrix factorization (MF), which has a nice probabilistic interpretation with Gaussian prior and is the most used technique in social recommendation regime. The extensive use of MF marks a new phase in the research of recommender systems. A multitude of studies employ MF as their basic model to exploit social relations since it is very flexible for MF to incorporate prior knowledge. The common ideas of MF-based social recommendation algorithms can be categorized into three groups: co-factorization methods [22,46], ensemble methods [20], and regularization methods [23]. Besides, there are also studies using socially-aware MF to model point-of-interest [48,51,52], preference evolution [39], item ranking [55,61], and relation generation [11,57]. Over the recent years, the boom of deep learning has broadened the ways to explore social recommendation. Many research efforts demonstrate that deep neural models are more capable of capturing high-level latent preferences [49,50]. Specifically, graph neural networks (GNNs) [63] have achieved great success in this area, owing to their strong capability to model graph data. GraphRec [9] is the first to introduce GNNs to social recommendation by modeling the user-item and user-user interactions as graph data. DiffNet [41] and its extension DiffNet++ [40] model the recursive dynamic social diffusion in social recommendation with a layer-wise propagation structure. Wu et al. [42] propose a dual graph attention network to collaboratively learn representations for two-fold social effects. Song et al. develop DGRec [34] to model both users' session-based interests as well as dynamic social influences. Yu et al. [58] propose a deep adversarial framework based on GCNs to address the common issues in social recommendation. In summary, the common idea of these works is to model the user-user and user-item interactions as simple graphs with pairwise connections and then use multiple graph neural layers to capture the node dependencies.

Hypergraph in Recommender Systems
Hypergraph [4] provides a natural way to model complex highorder relations and has been extensively employed to tackle various problems. With the development of deep learning, some studies combine GNNs and hypergraphs to enhance representation learning. HGNN [10] is the first work that designs a hypergraph convolution operation to handle complex data correlation in representation learning from a spectral perspective. Bai et al. [2] introduce hypergraph attention to hypergraph convolutional networks to improve their capacity. However, despite the great capacity in modeling complex data, the potentials of hypergraph for improving recommender systems have been rarely explored. There are only several studies focusing on the combination of these two topics. Bu et al. [5] introduce hypergraph learning to music recommender systems, which is the earliest attempt. The most recent combinations are HyperRec [38] and DHCF [16], which borrow the strengths of hypergraph neural networks to model the short-term user preference for nextitem recommendation and the high-order correlations among users and items for general collaborative filtering, respectively. As for the applications in social recommendation, HMF [62] uses hypergraph topology to describe and analyze the interior relation of social network in recommender systems, but it does not fully exploit high-order social relations since HMF is a hybrid recommendation model. LBSN2Vec [47] is a social-aware POI recommendation model that builds hyperedges by jointly sampling friendships and check-ins with random walk, but it focuses on connecting different types of entities instead of exploiting the high-order social network structures.

Self-Supervised Learning
Self-supervised learning [15] is an emerging paradigm to learn with the ground-truth samples obtained from the raw data. It was firstly used in the image domain [1,59] by rotating, cropping or colorizing the image to create auxiliary supervision signals. The latest advances in this area have extended self-supervised learning to graph representation learning [28,29,35,37]. These studies mainly develop self-supervision tasks from the perspective of investigating graph structure. Node properties such as degree, proximity, and attributes, which are seen as local structure information, are often used as the ground truth to fully exploit the unlabeled data [17]. For example, InfoMotif [31] models attribute correlations in motif structures with mutual information maximization to regularize graph neural networks. Meanwhile, global structure information like node pair distance is also harnessed to facilitate representation learning [35]. Besides, contrasting congruent and incongruent views of graphs with mutual information maximization [29,37] is another way to set up a self-supervised task, which has also shown promising results.
As the research of self-supervised learning is still in its infancy, there are only several works combining it with recommender systems [24,44,45,64]. These efforts either mine self-supervision signals from future/surrounding sequential data [24,45], or mask attributes of items/users to learn correlations of the raw data [64]. However, these thoughts cannot be easily adopted to social recommendation where temporal factors and attributes may not be available. The most relevant work to ours is GroupIM [32], which maximizes mutual information between representations of groups and group members to overcome the sparsity problem of group interactions. As the group can be seen as a special social clique, this work can be a corroboration of the effectiveness of social selfsupervision signals.

PROPOSED MODEL 3.1 Preliminaries
.., } denote the user set (| | = ), and = { 1 , 2 , ..., } denote the item set (| | = ). I ( ) is the set of user consumption in which items consumed by user are included. ∈ R × is a binary matrix that stores user-item interactions. For each pair ( , ), = 1 indicates that user consumed item while = 0 means that item is unexposed to user , or user is not interested in item . In this paper, we focus on top-K recommendation, andd enotes the probability of item to be recommended to user . As for the social relations, we use ∈ R × to denote the relation matrix which is asymmetric because we work on directed social networks. In our model, we have multiple convolutional layers, and we use { (1) , (2) , · · · , ( ) } ∈ R × and { (1) , (2) , · · · , ( ) } ∈ R × to denote the user and item embeddings of size learned at each layer, respectively. In this paper, we use bold capital letters to denote matrices and bold lowercase letters to denote vectors.
Definition 1: Let = ( , ) denote a hypergraph, where is the vertex set containing unique vertices and is the edge set containing hyperedges. Each hyperedge ∈ can contain any number of vertices and is assigned a positive weight , and all the weights formulate a diagonal matrix ∈ R × . The hypergraph can be represented by an incidence matrix ∈ R × where = 1 if the hyperedge ∈ contains a vertex ∈ , otherwise 0. The vertex and edge degree matrices are diagonal matrices denoted by and , respectively, where = =1 ; = =1 . It should be noted that, in this paper, is uniformly assigned 1 and hence is an identity matrix.

Multi-Channel Hypergraph Convolutional Network for Social Recommendation
In this section, we present our model MHCN, which stands for Multi-channel Hypergraph Convolutional Network. In Fig. 3, the schematic overview of our model is illustrated.

Hypergraph Construction.
To formulate the high-order information among users, we first align the social network and user-item interaction graph in social recommender systems and then build hypergraphs over this heterogeneous network. Unlike prior models which construct hyperedges by unifying given types of entities [5,47], our model constructs hyperedges according to the graph structure. As the relations in social networks are often directed, the connectivity of social networks can be of various types. In this paper, we use a set of carefully designed motifs to depict the common types of triangular structures in social networks, which guide the hypergraph construction. Motif, as the specific local structure involving multiple nodes, is first introduced in [26]. It has been widely used to describe complex structures in a wide range of networks. In this paper, we only focus on triangular motifs because of the ubiquitous triadic closure in social networks, but our model can be seamlessly extended to handle on more complex motifs. Fig. 2 shows all the used triangular motifs. It has been revealed that M 1 − M 7 are crucial for social computing [3], and we further design M 8 − M 10 to involve user-item interactions to complement. Given motifs M 1 − M 10 , we categorize them into three groups according to the underlying semantics. M 1 − M 7 summarize all the possible triangular relations in explicit social networks and describe the high-order social connectivity like 'having a mutual friend'. We name this group 'Social Motifs'. M 8 − M 9 represent the compound relation, that is, 'friends purchasing the same item'. This type of relation can be seen as a signal of strengthened tie, and we name M 8 − M 9 'Joint Motifs'. Finally, we should also consider users who have no explicit social connections. So, M 10 is non-closed and defines the implicit high-order social relation that users who are not socially connected but purchased the same item. We name M 10 'Purchase Motif '. Under the regulation of these three types of motifs, we can construct three hypergraphs that contain different high-order user relation patterns. We use the incidence matrices , and to represent these three motif-induced hypergraphs, respectively, where each column of these matrices denotes a hyperedge. For example, in Fig. 3, { 1 , 2 , 3 } is an instance of M 4 , and we use 1 to denote this hyperedge. Then, according to definition 1, we have

Multi-Channel Hypergraph Convolution.
In this paper, we use a three-channel setting, including 'Social Channel (s)', 'Joint Channel (j)', and 'Purchase Channel (p)', in response to the three types of triangular motifs, but the number of channels can be adjusted to adapt to more sophisticated situations. Each channel is responsible for encoding one type of high-order user relation pattern. As different patterns may show different importances to the final recommendation performance, directly feeding the full base user embeddings (0) to all the channels is unwise. To control the information flow from the base user embeddings (0) to each channel, we design a pre-filter with self-gating units (SGUs), which is defined as: where ∈ R × , ∈ R are parameters to be learned, ∈ { , , } represents the channel, ⊙ denotes the element-wise product and is the sigmoid nonlinearity. The self-gating mechanism effectively serves as a multiplicative skip-connection [8] that learns a nonlinear gate to modulate the base user embeddings at a featurewise granularity through dimension re-weighting, then we obtain the channel-specific user embeddings (0) .
Referring to the spectral hypergraph convolution proposed in [10], we define our hypergraph convolution as: (2) The difference is that we follow the suggestion in [6,14] to remove the learnable matrix for linear transformation and the nonlinear activation function (e.g. leaky ReLU). By replacing with any of , and , we can borrow the strengths of hypergraph convolutional networks to learn user representations encoded high-order information in the corresponding channel. As and are diagonal matrices which only re-scale embeddings, we skip them in the following discussion. The hypergraph convolution can be viewed as a two-stage refinement performing 'node-hyperedge-node' feature transformation upon hypergraph structure. The multiplication operation ⊤ ( ) defines the message passing from nodes to hyperedges and then premultiplying is viewed to aggregate information from hyperedges to nodes. However, despite the benefits of hypergraph convolution, there are a huge number of motif-induced hyperedges (e.g. there are 19,385 social triangles in the used dataset, LastFM), which would cause a high cost to build the incidence matrix . But as we only exploit triangular motifs, we show that this problem can be solved in a flexible and efficient way by leveraging the associative property of matrix multiplication.
Following [60], we let = ⊙ and = − be the adjacency matrices of the bidirectional and unidirectional social networks respectively. We use to represent the motif-induced adjacency matrix and ( ) , = 1 means that vertex and vertex appear in one instance of M . As two vertices can appear in multiple instances of M , ( ) , is computed by: Table 1 shows how to calculate in the form of matrix multiplication. As all the involved matrices in Table 1 are sparse matrices, can be efficiently calculated. Specifically, the basic unit in Table  1 is in a general form of ⊙ , which means 1 to 9 may be sparser than (i.e. or ) or as sparse as . 10 could be a little denser, but we can filter out the popular items (we think consuming

Graph Conv
Self-Gating Self-Gating Self-Gating Figure 3: An overview of the proposed model (1-layer). Each triangle in the left graph is a hyperedge and also an instance of defined motifs. , and denote the three motif-induced hypergraphs constructed based on social, joint, and purchase motifs, respectively. , , and in the three dotted ellipses denote three ego-networks with 2 as the center, which are subgraphs of , and , respectively. popular items might not reflect the users' personalized preferences) when calculating 10 and remove the entries less than a threshold (e.g. 5) in 10 to keep efficient calculation. For symmetric motifs, = , and for the asymmetric ones = + . Obviously, without considering self-connection, the summation of 1 to 7 is equal to ⊤ , as each entry of ⊤ ∈ R × also indicates how many social triangles contain the node pair represented by the row and column index of the entry. Analogously, the summation of 8 to 9 is equal to ⊤ without self-connection and 10 is equal to ⊤ . Taking the calculation of 1 as an example, it is evident that constructs a unidirectional path connecting three vertices, and the operation ⊙ makes the path into a loop, which is an instance of 1 . As 10 also contains the triangles in 8 and 9 . So, we remove the redundancy from 10 . Finally, we use = 7

=1
, = 8 + 9 , and = 10 − to replace ⊤ , ⊤ , and ⊤ in Eq. (2), respectively. Then we have a transformed hypergraph convolution, defined as: where^∈ R × is the degree matrix of . Obviously, Eq (4) is equivalent to Eq (2), and can be a simplified substitution of the hypergraph convolution. Since we follow the design of LightGCN which has subsumed the effect of self-connection, and thus skipping self-connection in adjacency matrix does not matter too much. In this way, we circumvent the individual hyperedge construction and computation, and greatly reduce the computational cost.

Learning Comprehensive User Representations.
After propagating the user embeddings through layers, we average the embeddings obtained at each layer to form the final channel-specific user representation: * = 1 +1 =0 ( ) to avoid the over-smoothing problem [14]. Then we use the attention mechanism [36] to selectively aggregate information from different channel-specific user embeddings to form the comprehensive user embeddings. For each user , a triplet ( , , ) is learned to measure the different contributions of the three channel-specific embeddings to the final recommendation performance. The attention function att is defined as: where ∈ R and ∈ R × are trainable parameters, and the comprehensive user representation * = ∈ { , , } * ,. Note that, since the explicit social relations are noisy and isolated relations are not a strong signal of close friendship [55,56], we discard those relations which are not part of any instance of defined motifs. So, we do not have a convolution operation directly working on the explicit social network . Besides, in our setting, the hypergraph convolution cannot directly aggregate information from the items (we do not incorporate the items into and ). To tackle this problem, we additionally perform simple graph convolution on the user-item interaction graph to encode the purchase information and complement the multi-channel hypergraph convolution. The simple graph convolution is defined as: where ( ) is the gated user embeddings for simple graph convolution, ( ) is the combination of the comprehensive user embeddings and ( ) , and ∈ R × and ∈ R × are degree matrices of and ⊤ , respectively. Finally, we obtain the final user and item embeddings and defined as: where (0) and (0) are randomly initialized.

Model Optimization.
To learn the parameters of MHCN, we employ the Bayesian Personalized Ranking (BPR) loss [30], which is a pairwise loss that promotes an observed entry to be ranked higher than its unobserved counterparts: where Φ denotes the parameters of MHCN,^, = ⊤ is the predicted score of on , and (·) here is the sigmoid function. Each time a triplet including the current user , the positive item purchased by , and the randomly sampled negative item which is disliked by or unknown to , is fed to MHCN. The model is  Figure 4: Hierarchical mutual information maximization on hypergraphs.
optimized towards ranking higher than in the recommendation list for . In addition, 2 regularization with the hyper-parameter is imposed to reduce generalized errors.

Enhancing MHCN with Self-Supervised Learning
Owing to the exploitation of high-order relations, MHCN shows great performance (reported in Table 3 and 4). However, a shortcoming of MHCN is that the aggregation operations (Eq. 5 and 6) might lead to a loss of high-order information, as different channels would learn embeddings with varying distributions on different hypergraphs [54]. Concatenating the embeddings from different channels could be the alternative, but it uniformly weighs the contributions of different types of high-order information in recommendation generation, which is not in line with the reality and leads to inferior performance in our trials. To address this issue and fully inherit the rich information in the hypergraphs, we innovatively integrate self-supervised learning into the training of MHCN.
In the scenarios of representation learning, self-supervised task usually either serves as a pretraining strategy or an auxiliary task to improve the primary task [17]. In this paper, we follow the primary & auxiliary paradigm, and set up a self-supervised auxiliary task to enhance the recommendation task (primary task). The recent work Deep Graph Infomax (DGI) [37] is a general and popular approach for learning node representations within graph-structured data in a self-supervised manner. It relies on maximizing mutual information (MI) between node representations and corresponding high-level summaries of graphs. However, we consider that the graph-node MI maximization stays at a coarse level and there is no guarantee that the encoder in DGI can distill sufficient information from the input data. Therefore, with the increase of the graph scale, the benefits brought by MI maximization might diminish. For a better learning method which fits our scenario more, we inherit the merits of DGI to consider mutual information and further extend the graph-node MI maximization to a fine-grained level by exploiting the hierarchical structure in hypergraphs.
Recall that, for each channel of MHCN, we build the adjacency matrix to capture the high-order connectivity information. Each row in represents a subgraph of the corresponding hypergraph centering around the user denoted by the row index. Then we can induce a hierarchy: 'user node ← user-centered sub-hypergraph ← hypergraph' and create self-supervision signals from this structure.
Our intuition of the self-supervised task is that the comprehensive user representation should reflect the user node's local and global high-order connectivity patterns in different hypergraphs, and this goal can be achieved by hierarchically maximizing the mutual information between representations of the user, the user-centered sub-hypergraph, and the hypergraph in each channel. The mutual information measures the structural informativeness of the suband the whole hypergraph towards inferring the user preference through the reduction in local and global structure uncertainty.
To get the sub-hypergraph representation, instead of averaging the embeddings of the users in the sub-hypergraph, we design a readout function out 1 : R × → R , which is permutationinvariant and formulated as: where = gate ( ) is to control the participated magnitude of to avoid overfitting and mitigate gradient conflict between the primary and auxiliary tasks, is the row vector of corresponding to the center user , and ( ) denotes how many connections in the sub-hypergraph. In this way, the weight (importance) of each user in the sub-hypergraph is considered to form the sub-hypergraph embedding . Analogously, we define the other readout function out 2 : R × → R , which is actually an average pooling to summarize the obtained sub-hypergraph embeddings into a graphlevel representation: We tried to use InfoNCE [27] as our learning objective to maximize the hierarchical mutual information. But we find that the pairwise ranking loss, which has also been proved to be effective in mutual information estimation [18], is more compatible with the recommendation task. We then define the objective function of the self-supervised task as follows: (·) : R × R ↦ −→ R is the discriminator function that takes two vectors as the input and then scores the agreement between them. We simply implement the discriminator as the dot product between two representations. Since there is a bijective mapping between and , they can be the ground truth of each other. We corrupt by both row-wise and column-wise shuffling to create negative examples˜. We consider that, the user should have a stronger connection with the sub-hypergraph centered with her (local structure), so we directly maximize the mutual information between their representations. By contrast, the user would not care all the other users too much (global structure), so we indirectly maximize the mutual information between the representations of the user and the complete hypergraph by regarding the sub-hypergraph as the mediator. Compared with DGI which only maximizes the mutual information between node and graph representations, our hierarchical design can preserve more structural information of the hypergraph into the user representations (comparison is shown in Section 4.3). Figure 4 illustrates the hierarchical mutual information maximization.
Finally, we unify the objectives of the recommendation task (primary) and the task of maximizing hierarchical mutual information (auxiliary) for joint learning. The overall objective is defined as: where is a hyper-parameter used to control the effect of the auxiliary task and L can be seen as a regularizer leveraging hierarchical structural information of the hypergraphs to enrich the user representations in the recommendation task for a better performance.

Complexity Analysis
In this section, we discuss the complexity of our model. Model size. The trainable parameters of our model consist of three parts: user and item embeddings, gate parameters, and attention parameters. For the first term, we only need to learn the 0 ℎ layer user embeddings (0) ∈ R × and item embeddings (0) ∈ R × . As for the second term, we employ seven gates, four for MHCN and three for the self-supervised task. Each of the gate has parameters of size ( + 1) × , while the attention parameters are of the same size. To sum up, the model size approximates ( + + 8 ) in total. As min( , ) ≫ , our model is fairly light.
Time complexity. The computational cost mainly derives from four parts: hypergraph/graph convolution, attention, self-gating, and mutual information maximization. For the multi-channel hypergraph convolution through layers, the propagation consumption is less than O (| + | ), where | + | denotes the number of nonzero elements in , and here | + | = max(| + |, | + |, | + |). Analogously, the time complexity of the graph convolution is O (| + | ). As for the attention and self gating mechanism, they both contribute O ( 2 ) time complexity. The cost of mutual information maximization is mainly from 1 , which is O (| + | ). Since we follow the setting in [14] to remove the learnable matrix for linear transformation and the nonlinear activation function, the time complexity of our model is much lower than that of previous GNNs-based social recommendation models.

Experimental Settings
Datasets. Three real-world datasets: LastFM 1 , Douban 2 , and Yelp [49] are used in our experiments. As our aim is to generate Top-K recommendation, for Douban which is based on explicit ratings, we leave out ratings less than 4 and assign 1 to the rest. The statistics of the datasets is shown in Table 2. We perform 5-fold cross-validation on the three datasets and report the average results. Baselines. We compare MHCN with a set of strong and commonlyused baselines including MF-based and GNN-based models: • BPR [30] is a popular recommendation model based on Bayesian personalized ranking. It models the order of candidate items by a pairwise ranking loss. • SBPR [61] is a MF based social recommendation model which extends BPR and leverages social connections to model the relative order of candidate items. • LightGCN [14] is an efficient GCN-based general recommendation model that leverages the user-item proximity to learn node representations and generate recommendations. • GraphRec [9] is the first GNN-based social recommendation model that models both user-item and user-user interactions. • DiffNet++ [40] is the latest GCN-based social recommendation method that models the recursive dynamic social diffusion in both the user and item spaces. • DHCF [16] is a recent hypergraph convolutional network-based method that models the high-order correlations among users and items for general recommendation.
Two versions of the proposed multi-channel hypergraph convolutional network are investigated in the experiments. MHCN denotes the vanilla version and 2 -MHCN denotes the self-supervised version.
Metrics. To evaluate the performance of all methods, two relevancybased metrics Precision@10 and Recall@10 and one ranking-based metric NDCG@10 are used. We perform item ranking on all the candidate items instead of the sampled item sets to calculate the values of these three metrics, which guarantees that the evaluation process is unbiased. Settings. For a fair comparison, we refer to the best parameter settings reported in the original papers of the baselines and then use grid search to fine tune all the hyperparameters of the baselines to ensure the best performance of them. For the general settings of all the models, the dimension of latent factors (embeddings) is empirically set to 50, the regularization coefficient = 0.001, and the batch size is set to 2000. We use Adam to optimize all these models. Section 4.4 reports the influence of different parameters (i.e. and the depth) of MHCN, and we use the best parameter settings in Section 4.2, and 4.3.

Recommendation Performance
In this part, we validate if MHCN outperforms existing social recommendation baselines. Since the primary goal of social recommendation is to mitigate data sparsity issue and improve the recommendation performance for cold-start users. Therefore, we respectively conduct experiments on the complete test set and the cold-start test set in which only the cold-start users with less than 20 interactions are contained. The experimental results are shown in Table  3 and Table 4. The improvement is calculated by subtracting the best performance value of the baselines from that of 2 -MHCN and then using the difference to divide the former. Analogously, 2 -improvement is calculated by comparing the values of the performance of MHCN and and 2 -MHCN. According to the results, we can draw the following conclusions:  • MHCN shows great performance in both the general and coldstart recommendation tasks. Even without self-supervised learning, it beats all the baselines by a fair margin. Meanwhile, selfsupervised learning has great ability to further improve MHCN. Compared with the vanilla version, the self-supervised version shows decent improvements in all the cases. Particularly, in the cold-start recommendation task, self-supervised learning brings significant gains. On average, 2 -MHCN achieves about 5.389% improvement in the general recommendation task and 9.442% improvement in the cold-start recommendation task compared with MHCN. Besides, it seems that, the sparser the dataset, the more improvements self-supervised learning brings. • GNN-based recommendation models significantly outperform the MF-based recommendation models. Even the general recommendation models based on GNNs show much better performance than MF-based social recommendation models. However, when compared with the counterparts based on the same building block (i.e. MF-based vs. MF-based, GNNs-based vs. GNNs-based), social recommendation models are still competitive and by and large outperform the corresponding general recommendation models except LightGCN. • LightGCN is a very strong baseline. Without considering the two variants of MHCN, LightGCN shows the best or the second best performance in most cases. This can be owed to the removal of the redundant operations including the nonlinear activation function and transformation matrices. The other baselines such as GraphRec might be limited by these useless operations, and fail to outperform LightGCN, though the social information is incorporated. • Although DHCF is also based on hypergraph convolution, it does not show any competence in all the cases. We are unable to reproduce its superiority reported in the original paper [16]. There are two possible causes which might lead to its failure. Firstly, it only exploits the user-item high-order relations. Secondly, the way to construct hyperedges is very impractical in this model, which leads to a very dense incidence matrix. The model would then encounter the over-smoothing problem and suffer from heavy computation.

Ablation Study
In this section, we conduct an ablation study to investigate the interplay of the components in 2

LastFM
Pr ec @ 10 R ec @ 10 N D C G @ 10 level of LightGCN shown in Table 3. By contrast, removing Social channel or Joint channel would not have such a large impact on the final performance. Comparing Social channel with Joint channel, we can observe that the former contributes slightly more on LastFM and Yelp, while the latter, in terms of the performance contribution, is more important on Douban.
To further investigate the contribution of each channel when they are all employed, we visualize the attention scores learned along with other model parameters, and draw a box plot to display the distributions of the attention weights. According to Fig. 6, we can observe that, for the large majority of users in LastFM, Social channel has limited influence on the comprehensive user representations. In line with the conclusions from Fig. 5, Purchase channel plays the most important role in shaping the comprehensive user representations. The importance of Joint channel falls between the other two. The possible reason could be that, social relations are usually noisy and the users who are only socially connected might not always share similar preferences.

Investigation of Self-supervised Task.
To investigate the effectiveness of the hierarchical mutual information maximization (MIM), we break this procedure into two parts: local MIM between the user and user-centered sub-hypergraph, and global MIM between the user-centered sub-hypergraph and hypergraph. We then run MHCN with either of these two to observe the performance changes. We also compare hierarchical MIM with the node-graph MIM used in DGI to validate the rationality of our design. We implement DGI by referring to the original paper [37]. The results are illustrated in Fig. 7, and we use Disabled to denote the vanilla MHCN. Unlike the bars in Fig. 6, each bar in Fig. 7 represents the case where only the corresponding module is used. As can be seen, hierarchical MIM shows the best performance while local MIM achieves the second best performance. By contrast, global MIM contributes less but it still shows better performance on Douban Yelp when compared with DGI. Actually, DGI almost rarely contributes on the latter two datasets and we can hardly find a proper parameter that can make it compatible with our task. On some metrics, training MHCN with DGI even lowers the performance. According to these results, we can draw a conclusion that the selfsupervised task is effective and our intuition for hierarchical mutual information maximization is more reasonable compared with the node-graph MIM in DGI.

Parameter Sensitivity Analysis
In this section, we investigate the sensitivity of and .
As we adopt the primary & auxiliary paradigm, to avoid the negative interference from the auxiliary task in gradient propagating, we can only choose small values for . We search the proper value in a small interval and empirically set it from 0.001 to 0.5. We then start our attempts from 0.001, and proceed by gradually increasing the step size. Here we report the performance of 2 -MHCN with eight representative values {0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.5}. As can be seen in Fig. 8, with the increase of the value of , the performance of 2 -MHCN on all the datasets rises. After reaching the peak when is 0.01 on all the datasets, it steadily declines. According to Fig. 8, we can draw a conclusion that even a very small can promote the recommendation task, while a larger would mislead it. The benefits brought by the self-supervised task could be easily neutralized and the recommendation task is sensitive to the magnitude of self-supervised task. So, choosing a small value is more likely to facilitate the primary task when there is little prior knowledge about the data distribution.
Finally, we investigate the influence of to find the optimal depth for 2 -MHCN. We stack hypergraph convolutional layers from 1layer to 5-layer setting. According to Fig. 9, the best performance  With the continuing increase of the number of layer, the performance of 2 -MHCN declines on all the datasets. Obviously, a shallow structure fits 2 -MHCN more. A possible reason is that 2 -MHCN aggregates high-order information from distant neighbors. As a result, it is more prone to encounter the over-smoothing problem with the increase of depth. This problem is also found in DHCF [16], which is based on hypergraph modeling as well. Considering the over-smoothed representations could be a pervasive problem in hypergraph convolutional network based models, we will work against it in the future.

CONCLUSION
Recently, GNN-based recommendation models have achieved great success in social recommendation. However, these methods simply model the user relations in social recommender systems as pairwise interactions, and neglect that real-world user interactions can be high-order. Hypergraph provides a natural way to model high-order user relations, and its potential for social recommendation has not been fully exploited. In this paper,we fuse hypergraph modeling and graph neural networks and then propose a multichannel hypergraph convolutional network (MHCN) which works on multiple motif-induced hypergraphs to improve social recommendation. To compensate for the aggregating loss in MHCN, we innovatively integrate self-supervised learning into the training of MHCN. The self-supervised task serves as the auxiliary task to improve the recommendation task by maximizing hierarchical mutual information between the user, user-centered sub-hypergraph, and hypergraph representations. The extensive experiments conducted on three public datasets verify the effectiveness of each component of MHCN, and also demonstrate its state-of-the-art performance.