 Research
 Open Access
 Published:
A reinforced collaborative filtering approach based on similarity propagation and score predication graph
EURASIP Journal on Wireless Communications and Networking volume 2016, Article number: 210 (2016)
Abstract
In the era of big data, the rapid development of mobile participatory sensing devices brings the explosive expansion of data, making information overload a serious problem. In this case, a personalized recommendation system on mobile social media appears. Collaborative filtering is the most widely used approach in a recommendation system. Nevertheless, there still exist many problems, such as the serious data sparsity problem and the cold start problem. Existing approaches cannot effectively solve these problems. Most of the existing recommendation approaches are based on single information source and cannot effectively solve the cold start and data sparsity problems. In addition, some approaches proposed to solve data sparsity fail to consider the effects of users’ influences and prediction order on recommendation accuracy. Accordingly, from the perspective of increasing the categories of information, the similarity propagation approach based on a heterogeneous network is proposed to ease the cold start problems by improving the similarity calculation method. In addition, to ease the data sparsity problems, we propose a hybrid collaborative filtering approach based on a score prediction graph to finish the useritem score matrix in order. Finally, we conduct validation experiments on the MovieLens dataset. Compared with five stateoftheart approaches, our approach outperforms them in terms of the performances of mean absolute error, rootmeansquare error, recall, and diversity.
Introduction
With the rapid development of mobile Internet, the mobile social media services [1, 2] are increasingly abundant. The mobile phone as a representative of mobile participatory sensing devices has become a part of people’s daily life. However, the limited capacity of mobile users to receive information and the explosive growth of information in the mobile environment makes it difficult for mobile users to choose what they need from a lot of information quickly and effectively in the era of big data. Recommendation system plays an indispensible role in solving this problem.
Collaborative filtering is one of the most widely used approaches in a recommendation system. Nevertheless, there still exist many problems, such as the serious data sparsity problem and the cold start problem. In order to solve the two problems, many researchers have put forward several solutions. These studies can be divided into two categories. The first category is filling the useritem matrix by default or by prediction [3]. Filling the useritem matrix by default is inefficient and erases users’ personalized information. Filling the useritem matrix by prediction is to predict scores according to the nearest neighbors of users or items. It is usually a onetime effort without considering the effects of users and user prediction order on recommendation accuracy, resulting in predication deviation on its nearest neighbors.
The other category is improving users’ interest model and focusing on a certain aspect of users’ or items’ information to reduce the data sparseness. For example, the itembased collaborative filtering approach focuses on the scores of items in the nearest neighbor selection and the contentbased collaborative filtering approach focuses on the content of items to build users’ interest model [4]. This kind of solution, relying on a single information source, fails to satisfy users’ diversified demand due to its inaccuracy and single recommendation.
Accordingly, in this paper, we first propose a similarity propagation approach based on a heterogeneous network to effectively ease the cold start and data sparsity problems. The proposed similarity propagation approach based on heterogeneous networks analyzes users’ preferences from multiperspectives by combining several types of information, which overcomes the drawbacks and disadvantages caused by single information source. Then we propose a score prediction graph (SPGraph) generation approach and work out a prediction node sequence under the principle that the less influence a node has, the earlier it will be predicted. Based on the prediction node sequence, we fill the useritem matrix step by step to generate a recommendation list, which can reduce the impact of a node’s predication deviation on its nearest neighbors to really improve the prediction accuracy.
The main contributions of this paper are summarized as follows:

1.
We integrate several types of information and relations into recommendation heterogeneous networks and propose the similarity propagation approach which mitigates the impacts of cold start and data sparsity problems caused by single information source.

2.
We propose the prediction node sequence generation approachbased SPGraph to improve accuracy by reducing the impact of a node’s predication deviation on its nearest neighbors.

3.
We conduct sufficient experiments on the MovieLens dataset, which demonstrate that our approach outperforms five stateoftheart approaches.
The rest of this paper is organized as follows: Section 2 summarizes the related work. Section 3 thoroughly demonstrates the proposed similarity propagation approach. Section 4 explicates the details of the SPGraphbased collaborative filtering approach. Section 5 shows the experimental data and results along with a thorough analysis. Lastly, Section 6 concludes the paper and discusses the future work.
Related work
In this section, we briefly review the existing recommendation approaches which fall into four main categories: collaborative filtering recommendation, contentbased recommendation, knowledgebased recommendation, and hybrid recommendation.
Collaborative filtering recommendation
It serves to predict and recommend the items that target users might like according to the interests of their nearest neighbors who share with them the similar behavioral characteristics obtained from the analysis of their behavioral habits [5]. Recent years have witnessed an endless stream of studies and researches on collaborative filtering approaches which can be divided into two categories: neighborhoodbased and modelbased. Neighborhoodbased approaches, further divided into userbased [6] and itembased approaches [7, 8], serve to identify similar users with the target user according to their similarity which is measured by their feedbacks on shared items and then compute predictions based on these similar users’ feedbacks on other items. It is faced with the feedback scarcity that arises in practice because a user may only give feedbacks on a limited number of items, namely data sparsity. Modelbased approaches, such as aspect models [9], latent factor models [10], Bayesian models [11], and decision trees, alleviate the feedback scarcity by generating a global model based on the given training data and then using the model to predict the active user’s preference on unknown items, but most of them suffer from high computational overheads caused by the tuning of a large number of parameters embedded in the models. As a result, it is hard to apply them into largescale social networks.
Contentbased recommendation
It is realized by matching users’ characteristics with items’ content, which has been studied in many papers. For example, Yu et al. [12] proposed the recommendation approach for multiple interests and multiple contents. Hannon et al. [13] put forward the UPR model used for twitter forward recommendation. Wu et al. [14] combined contentbased recommendation with systembased recommendation to predict and recommend according to the CCAM model. Ronen et al. [15] studied a contentbased characteristic selection method which is independent of recommendation systems. Most existing contentbased recommendation systems, in which items are usually described with keywords, are designed to recommend items according to text contents. However, similarity evaluations based on keywords may be misleading due to the ambiguity of natural languages. Besides, this kind of approaches may also result in deviation of the results, of which the single and inaccurate information source is the root cause.
Knowledgebased recommendation
This kind of recommendation is closely linked and sometimes even interactive with users’ requirements. When users input their requirements, the system will work out recommendation results to match. If no results show up, users will have to modify their requirements. Burke [16] proposed the restraintbased recommendation system based on recommendation knowledge base while in [17] Burke proposed the casebased recommendation approach.
Hybrid recommendation
Hybrid recommendation systems are the integrative, parallel, or linear combinations of several recommendation systems with an effort to fill in the gaps of single recommendation systems. TopN based collaborative filtering (TNCF) and majorizing similarity based collaborative filtering (MSCF) [18] proposed by Song are hybrid collaborative filtering approaches which integrate score similarity and property similarity. They first compute user similarity and select the top N nearest neighbors of the target user and then predict scores and provide recommendation. This method improves the accuracy, while it greatly increases the complexity of the computation.
Collaborative filtering recommendation, contentbased recommendation, and knowledgebased recommendation approaches are all based on a single information source and fail to satisfy users’ diversified demand and effectively solve the cold start and data sparsity problems.
Although hybrid recommendation approaches try to overcome the cold start and data sparsity problems by combing several recommendation systems, they are just linear combinations and cause high approach complexity and nonaccurate prediction.
In addition, these approaches proposed to solve data sparsity fail to consider the effects of users’ influences and prediction order on recommendation accuracy.
Similarity propagation approach based on heterogeneous networks
In this section, we propose a similarity propagation approach based on heterogeneous networks to overcome the cold start and data sparsity problems. We first define some terms used in our paper. Then we describe our similarity propagation approach based on heterogeneous networks.
Preliminaries
Definition 1
Recommendation heterogeneous network: As shown in Fig. 1, a recommendation heterogeneous network is made up of four major entities, namely users, items, tags, and properties. Six types of such entity relations mainly exist on the network, as UP (between users and properties), UI (relations between users and items), UT (relations between users and tags), IP (relations between items and properties), IT (relations between items and tags), and H (relations between homogeneous entities).
A recommendation heterogeneous network can be represented as G _{ r } = (V,E,W), where V = V _{ u }∪V _{ i }∪V _{ t }∪V _{ p }, E = E _{ UP }∪E _{ UI }∪E _{ UT }∪E _{ IP }∪E _{ IT }∪E _{ H }, and W is the weight of the relations. V is the union set of V _{ u }, V _{ i }, V _{ t }, and V _{ p }; V _{ u } is the user set; V _{ i } is the item set; V _{ t } is the tag set; and V _{ p } is the property set. E is the union set of E _{ UP }, E _{ UI }, E _{ UT }, E _{ IP }, E _{ IT }, and E _{ H }; E _{ UP } is the relation between the user and the property; E _{ UI } is the relation between the user and the item; E _{ UT } is the relation between the user and the tag; E _{ IP } is the relation between the item and the property; E _{ IT } is the relation between the item and the tag; and E _{ H } is the relation between homogeneous entities.
We define the following rules to determine whether relations exist between entities.

If a user u possesses the property p, then <u,p>∈E _{ UP }∈E

If a user u purchases the item i and grades it as d, then <u,i>∈E _{ UI }∈E when and only when d > \( \overline{d} \).

If a user u is tagged as t, then <u,t>∈E _{ UT }∈E.

If an item i possesses the property p, then <i,p>∈E _{ IP }∈E.

If an item i is tagged as t, then <i,t>∈E _{ IT }∈E.
Definition 2
Meta path: A meta path is defined as the path whose length is between two random nodes v _{ i } and v _{ j } in the recommendation heterogeneous network, denoted as \( {v}_i\overset{l_{im}}{\to }{v}_m\overset{l_{mj}}{\to }{v}_j \), where l _{ im } and l _{ mj } represent certain types of relations, either of the same type or of different types.
There are three types of meta paths between users in the recommendation heterogeneous network, as shown in Fig. 2.
Property 1
Meta paths represent the similarities between entities.
Users purchasing the same item, users labeled with the same tag, and users possessing the same property all share some similarities. The more items, tags, and properties they share, the more similar they are.
Property 2
Similarities can transit to entities with no meta paths as long as they are connected with another common entity by at least one meta path.
For example, in Fig. 3, although there is no meta path between u _{1} and u _{3}, they still share some similarities because both u _{1} and u _{3} are connected with u _{2} by meta paths. Between u _{1} and u _{3} exists a random walking path composed of two or more meta paths which in nature are special random walking paths composed of only one meta path.
Definition 3
Similarity propagation matrix: Similarity propagation matrix can be defined as follows:
where U is users set, I is items set, T is tags set and P is properties set.
Similarity propagation matrix belongs to symmetric matrix. t _{ uv }∈T _{ UU } is the similarity propagation probability between user u and user v. t _{ ui }∈T _{ UI } is the similarity propagation probability between user u and item i. t _{ ut }∈T _{ UT } is the similarity propagation probability between user u and tag t. t _{ up }∈T _{ UP } is the similarity propagation probability between user u and property p. t _{ ij }∈T _{ II } is the similarity propagation probability between item i and item j. t _{ it }∈T _{ IT } is the similarity propagation probability between item i and tag t. t _{ ip }∈T _{ IP } is the similarity propagation probability between item i and property p. t _{ mn }∈T _{ TT } is the similarity propagation probability between tag m and tag n. t _{ tp }∈T _{ TP } is the similarity propagation probability between tag t and property p. And t _{ pq }∈T _{ PP } is the similarity propagation probability between property p and property q.
During the process of random walking, different types of relation accounts for various degrees of contribution and therefore are given different weights—w _{ up }, w _{ ui },w _{ ut },w _{ ip }, and w _{ it } for E _{ UP }, E _{ UI }, E _{ UT }, E _{ IP }, and E _{ IT }, respectively. And the weight of relations between homogeneous entities is set as β. These parameters are defaulted as 1 in the experiments of this paper. The initialization of each submatrix in T is as follows.
Initialization of user probability propagation matrix
T _{ UU } is the user probability propagation matrix, and the similarities between users are set as the number of the initial propagation matrixes. When users grade the same item, the improved Pearson coefficient will be used to measure the similarities between them.
The similarity between user u _{ i } and user u _{ j } is defined according to Eq. (2):
where P is the common items that u _{ i } and u _{ j } have graded and \( {\overline{r}}_{u_i} \) and \( {\overline{r}}_{u_j} \) are the average score of u _{ i } and u _{ j }, respectively.
If the users have no common grading items, sim(u _{ i },u _{ j }) will be defined according to Eq. (3):
where p _{ i } is the items user u _{ i } has purchased and p _{ j } is the items user u _{ j } has purchased.
The formula of the submatrix of T _{ UU } is as follows:
where β is the weight of the relation between homogeneous entities.
Initialization of useritem probability propagation matrix
T _{ UI }, denoting the useritem probability propagation matrix, is defined as follows:
where w _{ ui } is the weight of the relation between users and items.
When user u _{ i } purchased item i _{ j } and graded it as s, if s is larger than the threshold value δ, e _{ ui } = 1; otherwise, e _{ ui } = 0.
Initialization of usertag probability propagation matrix
T _{ UT } is the usertag probability propagation matrix. We employ the term frequency–inverse document frequency (TFIDF) approach to measure the similarity between users and tags. The more often user u _{ i } uses or is labeled with tag t _{ j } and the less popular tag t _{ j } is, the more similar user u _{ i } and tag t _{ j } are.
T _{ UI } is defined as follows:
where n _{ u,t } is the times user u _{ i } uses tag t _{ j }, n _{ t } ^{(u)} is the times tag t _{ j } is used, and w _{ ut } is the weight of the relation between users and tags. e _{ ut } = 1 indicates that the user has used the tag while e _{ ut } = 0 indicates the user has not used the tag.
Initialization of userproperty probability propagation matrix
T _{ UP }, denoting the userproperty probability propagation matrix, is defined as follows:
where w _{ up } is the weight of the relation between users and properties. If user u _{ i } possesses property p _{ j }, then e _{ up } = 1; otherwise, e _{ up } = 0.
Initialization of item probability propagation matrix
T _{ II }is the item probability propagation matrix, and the similarities between items are set as the number of the initial propagation matrixes. When item I _{ i } and item I _{ j } are graded by a common user, the improved Pearson coefficient will be used to measure the similarities between them.
The similarities between two item entities, denoted as sim(I _{ i },I _{ j }), is defined according to Eq. (8):
where U is the users who have graded both I _{ i } and I _{ j } and \( {\overline{r}}_{u_i} \) and \( {\overline{r}}_{u_j} \) are the average grades of u _{ i } and u _{ j }, respectively.
If I _{ i } and I _{ j } have not been graded by a common user, sim(I _{ i },I _{ j }) will be defined according to Eq. (9):
where U _{ i } is the users who have purchased item I _{ i } while U _{ j } is the users who have purchased item I _{ j }.
The formula of the submatrix of T _{ II } is as follows:
where \( \beta \) is the weight of the relation between homogeneous entities.
Initialization of itemtag probability propagation matrix
T _{ IT } is the userproperty probability propagation matrix. We employ the TFIDF approach to measure the similarity between items and tags. The more often item I _{ i } is labeled with tag T _{ j } and the less popular tag T _{ j } is, the more similar item I _{ i } and tag T _{ j } are.
T _{ IT } is defined as follows:
where n _{ i,t } is the times item I _{ i } is labeled with tag T _{ j }, n _{ t } ^{(i)} is the times tag T _{ j } is used, and w _{ it } is the weight of the relation between items and tags. e _{ it } = 1 indicates that the item has been labeled with the tag.
Initialization of itemproperty probability propagation matrix
T _{ IP }, denoting the itemproperty probability propagation matrix, is defined as follows:
where w _{ ip } is the weight of the relation between items and properties. If item I _{ i } possesses property p _{ j }, then e _{ ip } = 1; otherwise, e _{ ip } = 0.
Initialization of tag probability propagation matrix
T _{ TT }, denoting the tag probability propagation matrix, refers to the similarities between tags and is defined as follows:
where w _{ tt } is the weight of the relation between tags, N(b) is the tag set containing tag b, and n _{ b,j } is the number of users that label item i with tag b.
Initialization of propertytag probability propagation matrix
T _{ TP }, denoting the propertytag probability propagation matrix, is a null matrix as a result of the lack of direct relations between properties and tags.
Initialization of property probability propagation matrix
T _{ PP }, denoting the property probability propagation matrix, is also a null matrix because no direct relations exist between properties.
Finally, we normalize to 1 each line of the propagation probability matrix.
Similarity propagation approach
If a dlength path exists between two random nodes v _{ o } and v _{ s } in a heterogeneous network, it will take d random walks from v _{ o } to v _{ s } and the path between v _{ o } and v _{ s } will be a dlength random path. When arriving at node v _{ t } during the random walking, we can either proceed to another node at the propagation probability between node v _{ t } and its neighbor node or restart at the certain probability α [19]. Until the probability of access to each node converges to a number, it ceases to propagate. Both methods will lead to a Markov chain.
Random walking paths are made up of meta paths which represent the similarities between entities. And the similarities, propagated during the process of random walking, are positively correlated with the number of random walking paths and negatively correlated with the length of them.
Therefore, the formula of the similarity propagation between v _{ i } and v _{ j } is defined according to Eq. (14):
where l is a path from v _{ i } to v _{ j } and length(γ) is a γlength path from v _{ i } to v _{ j }.
Turning the above formula into a matrix, we get a similarity matrix, defined as follows:
where R _{ UU } is the similarity matrix of users and R _{ II } is the similarity matrix of items.
SPGraphbased collaborative filtering approach
Based on the user similarity matrix and item similarity matrix deduced from the similarity propagation approach, this section proposes a hybrid collaborative filtering approach based on the score prediction graph (SPGraph). Figure 3 illustrates the framework of our approach, which consists of two stages.
The offline training stage constructs the SPGraph by searching for the nearest neighbors of users or items via similarity matrix, generates prediction node sequence by anticentrality sort principle after calculating the centrality of each node, and finishes the useritem score matrix via the hybrid collaborative filtering approach. The online recommendation stage searches for the positive K nearest neighbors of the target user via the similarity propagation approach and predicts users’ scores via the hybrid collaborative filtering approach to form recommendation lists.
Construction of SPGragh
Definition 4
SPGraph: SPGraph is an isomorphic undigraph with weight generated by the nearest neighbor selection in the user similarity matrix and the item similarity matrix. A SPGraph can be represented as SPGraph = (V,E,W), where V is one type of entity, either users or items, E denotes one type of relation, and W is the similarity between entities. <v _{ i } ,v _{ j }>∈E means that item v _{ i } and item v _{ j } or user v _{ i } and user v _{ j } have similarity which is represented as w _{ ij }∈W.
Then we will introduce the SPGraph generation approach with the example of the user similarity matrix. And the user score predication graph is denoted as SPG_{ U }. Near neighbors with low similarity not only occupy computing resources but also reduce the accuracy of predication; therefore, in the nearest neighbor selection, we set a threshold value \( \delta \) to eliminate them. Approach 1 presents the pseudocode of SPGraph generation approach.
At the beginning, there only exists the set of independent user nodes V = {v _{1},v _{2},…,v _{ n }} in SPG_{ U }. Lines 7–12 demonstrate that, for every value in the matrix (r _{ ij }∈R _{ UU }), if r _{ ij } is greater than \( \delta \), an edge <v _{ i } ,v _{ j }> will be added to the SPG_{ U } and the weight of the edge w _{ ij } is set to r _{ ij }. Figure 4(1) briefly shows the generation approach of SPG_{ U }.
Generation of prediction node sequence based on anticentrality sort
As is shown in Fig. 4(2), the centrality of user u _{5} is the lowest. So if we first predicate the score of u _{5}, its deviation will only affect the score predication of u _{6} to some extent and have little effect on other users. However, if we first predicate the score of u _{3}, its deviation will directly affect the score prediction of u _{1}, u _{4}, and u _{6}. The score prediction of u _{4} will be affected the most due to its high similarity with u _{3}. According to the principle “The less influential the node is, the lower centrality the node has, the earlier the node is predicted, the less rating error is.”, we propose a prediction node sequence generation approach based on anticentrality sort. The pseudo of the prediction node sequence generation approach is as follows.
Lines 4–8 describe the computing of the centrality of nodes in SPG_{ U }. Lines 11–12 describe that we first search for the node with the lowest centrality and then add it to the node sequence array in order. Lines 14–16 describe that we delete the node added to the node sequence array and its similarity, recompute the centrality of rest, and then repeat the process from line 10 to line 16 till the completion of the node sequence. The generating process of the prediction node sequence of SPG_{ U } is as shown in Fig. 4(2).
Hybrid collaborative filtering approach
Based on the prediction node sequence, we use the hybrid approach integrating the userbased nearest neighbor recommendation approach (UbCF) with the itembased nearest neighbor recommendation approach (IbCF) to predict the score that user u gives to item i.
Based on the UbCF approach, the predicated score user u gives to item i is computed according to Eq. (16).
where Sim(u) is the nearest neighbors of user u, sim(u,s) is the similarity between user u and user s, r _{ s,i } is the score that user s gives to item i, and \( {\overline{r}}_s \) is the average score that user s gives to all items.
Based on the IbCF approach, the predicated score user u gives to item i is computed according to Eq. (17):
where Sim(i) is the nearest neighbors of item i, sim(i,p) is the similarity between item i and item p, and r _{ u,p } is the score user u gives to item p.
Due to the impact of the similarity of the near neighbor set, UbCF and IbCF vary in the accuracy of predication. For example, it is obvious that the UbCFbased recommendation results are more accurate when the similarity of the user’s near neighbor set is {1,0.8,0.9} while the similarity of the item’s near neighbor set is {0.4,0.5,0.5}. So the confidence weight [20] is introduced to balance the final prediction result. And the larger the similarity of the near neighbors set is, the bigger its confidence weight is.
The confidence weight of the user is defined according to Eq. (18):
The confidence weight of the item is defined according to Eq. (19):
Larger confidence weight results in more accurate predication.
Besides, different data sets and users may put varied weight on these two recommendation approaches. Therefore, parameter \( \theta \) is introduced to measure the weight a user gives to an approach. w _{ u } denoting the weight of the UbCF approach and w _{ v } denoting the weight of the IbCF approach are defined as follows:
Furthermore, when neither the users’ nearest neighbor set Sin(u) nor the items’ nearest neighbor set Sin(i) are null sets, the hybrid recommendation approach is defined as follows:
where the sum of w _{ u } and w _{ v } is 1.
When Sin(i) is a null set and Sin(u) is not, the hybrid recommendation approach equals to the UbCF. And when Sin(u) is a null set and Sin(i) is not, the hybrid recommendation approach equals to the IbCF.
If both Sin(u) and Sin(i) are null sets during the useritem matrix filling phase in the offline training phase, pred(u,i) = null. As to online prediction, cold start problems have been solved by similarity propagation approach because there will be a corresponding nearest neighbor set for every new user or item. Besides, in order to improve the accuracy of predication, we choose the positive K nearest neighbor instead of the top K nearest neighbor. And positive K is defined as follows:
The value of K varies in users as a result of the different number of users filtered by the similarity threshold value δ.
Experiments and comparison
In this section, we evaluate our approach. We first introduce the experiment dataset, the evaluation metrics, and the parameter setting. Then we perform some experiments to investigate the performance of our approach compared with five stateoftheart approaches.
Datasets, evaluation metrics, and parameter setting
In this experiment, we employ the available movie datasets, MovieLens, which can be obtained from the MovieLens site [21]. Table 1 tabulates the details about the datasets.
The dataset is so sparse that we need to preprocess it by deleting users whose movie records are less than 50 and movies which are graded by less than 50 users.
In the experiment, we employ four commonly used evaluation metrics including mean absolute error (MAE), rootmeansquare error (RMSE), recall rate, and diversity. They are defined and summarized as follows.
where T is the number of rating records in the test set, r _{ u,i } is the actual score for user u on item i and \( {\overline{r}}_{u,i} \) is the predicted score by recommendation system.
where P(u) is the item set that target users graded in the test set and R(u) is the item set recommended by the recommendation system.
where sim(i,j) is the similarity between item i and item j and R(u) is the recommendation list.
The diversity of the recommendation system is the mean value of the diversities of all the users’ recommendation lists and is defined according to Eq. (30).
where U is the user set in the test set.
The threshold values and parameters in our experiment are optimized with the restart factor 0.8, the threshold value 0.4, and the balance factor 0.9. The weights of the six types of relations are initialized as 1. For lack of space, the parameter optimization process is not mentioned in this paper. The threshold and the balance factor are optimized. Here only Fig. 5 is used to demonstrate the effects of the restart factor on the performance of the approach.
Experiment results
In order to confirm whether our approach can perform better than other approaches, we compare our approach with five stateoftheart approaches. More details are provided below:
UbCF and IbCF [22], whose information source is the scores users give to items, are userbased and itembased nearest neighbor recommendation approaches, respectively.
Hybrid useritem based collaborative filtering (HCF) [22] is the hybrid recommendation approach which integrates the userbased nearest neighbor recommendation with the itembased nearest neighbor recommendation.
TNCF is a hybrid recommendation approach which integrates the rating similarity with the property similarity. And it adopts the top N method to select the nearest neighbors.
MSCF is the reinforced approach of TNCF by improving the nearest neighbor selection method.
Then we randomly split the dataset into 10 subsets for 10fold crossvalidation. The nine subsets are training datasets, and the remaining one subset is the test dataset. All approaches will be repeated 20 times in every experiment to avoid sample bias. The mean values of the four evaluation metrics are calculated when the length of the recommendation list is 20 and 30. The comparison results are summarized in Table 2 and Fig. 5.
Figure 6 shows clearly the performance of the six approaches on MAE, recall, and diversity. It is very clear that the performance of HCF on all the four evaluation metrics is better than UbCF and IbCF because HCF is a hybrid recommendation approach which integrates the userbased nearest neighbor recommendation with the itembased nearest neighbor recommendation.
As for TNCF, its improvement on similarity computing by introducing property similarity alleviates the impact of sparse data to some extent. So TNCF performs better than HCF. On the other hand, MSCF performs better than TNCF as a result of its improvement on both similarity computing and nearest neighbor selection.
Among the six approaches, our approach outperforms other approaches in terms of the performance of MAE, RMSE, recall, and diversity. To be more specific, compared with MSCF, our approach improves 19.3 % in MAE, 1.9 % in RMSE, 7.2 % in recall rate, and 11.5 % in diversity. There are two reasons that our approach is the best. On the one hand, our approach adds other properties and information to similarity computing to further reduce the effect of sparse matrix. On the other hand, our approach uses the hybrid collaborative filtering approach to predicate scores and fill the useritem matrix step by step based on prediction node sequence which can really improve the prediction accuracy.
Conclusions
In this paper, we address the following issues. Firstly, most of the existing recommendation approaches are based on single information source and cannot effectively solve the cold start and data sparsity problems. In addition, some approaches proposed to solve data sparsity fail to consider the effects of users’ influences and prediction order on recommendation accuracy. To solve these problems, the paper proposes the similarity propagation approach based on heterogeneous networks and the predication node sequence generation approach based on anticentrality sort, the former integrating various types of information to effectively solve the cold start problem and the latter solving data sparsity by gradually filling the useritem score matrix based on prediction node sequence. We conduct experiments on the MovieLens dataset. Compared with five stateoftheart approaches, our approach outperforms them in terms of the performances of MAE, RMSE, recall, and diversity.
There are several areas in which we can improve our work. Firstly, more feature extraction methods [23] can be introduced to analyze the user preference. Secondly, social community discovery [24] and precise semantic analysis method [25–28] can be introduced to the similarity computing so as to more accurately and effectively work out user preference. Thirdly, more methods [29] can be used to filter the training dataset in order to make the dataset trustworthy. Fourthly, we can implement the Sparkbased similarity propagation approach to improve approach efficiency.
References
 1.
JJPC Rodrigues, M Oliveira, B Vaidya, New trends on ubiquitous mobile multimedia applications. EURASIP J. Wireless. Commun. Netw. 2010(1), 1 (2010)
 2.
L Yang, X Geng, H Liao, A web sentiment analysis method on fuzzy clustering for mobile social media users. EURASIP J. Wireless. Commun. Netw. 2016(1), 1 (2016)
 3.
Breese J S, Heckerman D, Kadie C, Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the Conference on Uncertainty in Artificial Intelligence. (Madison, Wisconsin July 2426, 1998), p. 4352.
 4.
Yang W, Cui X, Liu J, et al., User’s interestsbased movie recommendation in heterogeneous network. International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI Beijing, China 2015). IEEE, 7477
 5.
Y Jiang, J Liu, M Tang, X Liu, An effective web service recommendation method based on personalized collaborative filtering.Web Services (ICWS), 2011 IEEE International Conference on(Washington, DC, USA). IEEE, 211–218 (2011)
 6.
JL Herlocker, JA Konstan, A Borchers, J Riedl, An algorithmic framework for performing collaborative filtering, in SIGIR ’99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 1999), pp. 230–237
 7.
G Linden, B Smith, J York, Amazon.com recommendations: itemtoitem collaborative filtering. IEEE Int. Comput. 7(1), 76–80 (2003)
 8.
B Sarwar, G Karypis, J Konstan, J Reidl, Itembased collaborative filtering recommendation algorithms, in WWW ’01: Proceedings of the 10th International Conference on World Wide Web (ACM, New York, 2001), pp. 285–295
 9.
T Hofmann, Latent semantic models for collaborative filtering. ACM. Trans. Inf. Syst. 22(1), 89–115 (2004)
 10.
J Canny, Collaborative filtering with privacy via factor analysis, in SIGIR ’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2002), pp. 238–245
 11.
Y Zhang, J Koren, Efficient Bayesian hierarchical user modeling for recommendation system, in SIGIR ’07: Proceedings of the 30th Annual International ACM SIGIR Conference on Jb Research and Development in Information Retrieval (ACM, New York, 2007), pp. 47–54
 12.
L Yu, L Liu, X Li, A hybrid collaborative filtering method for multipleinterests and multiplecontent recommendation in ECommerce. Expert. Syst. Appl. 28(1), 67–77 (2005)
 13.
Hannon J, McCarthy K, Smyth B. 2011. Finding useful users on twitter: twittomender the followee recommender. Advances in information retrieval. Springer Berlin Heidelberg, 784787.
 14.
ML Wu, CH Chang, RZ Liu, Integrating contentbased filtering with collaborative filtering using coclustering with augmented matrices. Expert. Syst. Appl. 41(6), 2754–2761 (2014)
 15.
Ronen R, Koenigstein N, Ziklik E, et al, Selecting contentbased features for collaborative filtering recommenders. ACM Conference on Recommender Systems, 407410 2013
 16.
Burke, Knowledgebased recommender systems. In A. Kent (ed.), Vol. 69, Supplement 32. New York: Marcel Dekker, 180200 2000.
 17.
Burke R, The Wasabi Personal Shopper: a casebased recommender system. Proceedings of the sixteenth national conference on artificial intelligence and the eleventh innovative applications of artificial intelligence conference innovative applications of artificial intelligence. American Association for Artificial Intelligence(AAAI/IAAI Orlando, Florida, USA), 844849 2000.
 18.
Song Ruiping, A study on hybrid recommendation algorithm. Guanzhou:Lanzhou University, 2014
 19.
Tang M, Dai X, Cao B, et al. WSWalker: A Random Walk Method for QoSAware Web Service Recommendation. 2015 IEEE 22nd International Conference on Web Services. (ICWS New York, USA 2015). 591–598 (2015)
 20.
Zheng Z, Ma H, Lyu M R, et al, WSRec: A collaborative filtering based web service recommender system.Web Services, 2009. ICWS 2009. (IEEE International Conference on (Losangeles ,CA ,USA 2009). IEEE, 2009 p. 43744
 21.
Social Computing Research at the University of Minnesota, MovieLens latest datasets [DB/OL]. http://www.grouplens.org/datasets/movielens/, 201601. 1 Mar 2016
 22.
NP Kumar, Z Fan, Hybrid useritem based collaborative filtering. Procedia. Comput. Sci. 60(1), 1453–1461 (2015)
 23.
J Liu, B Li, W Zhang, Feature extraction using maximum variance sparse mapping. Neural. Comput. Appl. 21(8), 1827–1833 (2012)
 24.
L* Jin, Z Jing et al., Irregular community discovery for social CRM in cloud computing. J. Supercomputing. 61(2), 317–336 (2012)
 25.
X Luo, Z Xu, J Yu et al., Building association link network for semantic link on web resources. IEEE Trans. Automation. Sci. Eng. 8(3), 482–494 (2011)
 26.
C Hu, Z Xu, Y Liu et al., Semantic link networkbased model for organizing multimedia big data. IEEE Transactions. Emerg. Top. Comput. 2(3), 376–387 (2014)
 27.
Z Xu, X Wei, X Luo et al., Knowle: a semantic link network based system for organizing large scale online news events. Future. Generation. Comput. Syst. 43, 40–50 (2015)
 28.
Z Xu, X Luo, S Zhang et al., Mining temporal explicit and implicit semantic relations between entities using web search engines. Future. Generation. Comput. Syst. 37, 468–477 (2014)
 29.
X Wei, X Luo, Q Li et al., Online commentbased hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map. IEEE Trans. Fuzzy Syst. 23(1), 72–84 (2015)
Acknowledgements
This work is partly supported by the grants of Sciencetechnology Support Plan Projects of Hubei (2014BAA089) and Natural Science Foundation of Hubei (2011CDB072).
Competing interests
The authors declare that they have no competing interests.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Yin, X., Chen, T., Liu, W. et al. A reinforced collaborative filtering approach based on similarity propagation and score predication graph. J Wireless Com Network 2016, 210 (2016) doi:10.1186/s1363801607105
Received:
Accepted:
Published:
Keywords
 Big data
 Recommendation system
 Similarity propagation
 Heterogeneous network
 Score prediction graph