Data preparation
Experimental data were from micro-blog of www.sina.com. A large number of users micro-blog messages with user ID were collected by crawler system, and the time, website, and forwarding times of the messages were stored in a database. Then, the messages were reprocessed and computed by sentiment algorithm; we can obtaine the sentiment values of the messages in different periods. In the experiment, we selected 50 micro-blog users and found their sentiment values in 10 periods. Denote object domain as X = {x
1, x
2, ⋯, x
n
}; let object attributes of x
i
be (x
i1, x
i2, ⋯, x
im
), where n = 50, m = 10.
Fuzzy clustering
Based on fuzzy clustering theory knowledge, we make the original data normalization and construct fuzzy equivalence matrix; finally, generated dynamic fuzzy clustering diagram show in Fig. 2.
In Fig. 2, users’ sentiment change is illustrated by calculating the user network news; no matter what social circles users are in, they will show their emotions, so sentiment analysis is not directly related to social categories. As illustrated in Fig. 2, when clustering threshold λ is more close to 1, classification number is more, λ is small to a certain value, and the samples will belong to one class (λ is set to 0.72; all users are incorporated into one category). The advantage of the clustering method is that λ can be selected according to the actual needs in order to get the appropriate classification.
Denote classification number as r, as shown in Fig. 2, the classification number r is respectively 1, 2, 6, 7, 14, 16, 19, 22, 25, 31, 35, 37, 38, 40, 43, 44, 45, 46, 47, 48, 50. r = 1 represents that all 50 users is in one class; r = 50 represents that each user is in one class respectively, namely 50 classes. Five kinds of classification are analyzed as follows:
-
1)
r = 2:{x
21},{x
1–x
20, x
22–x
50}.
-
2)
r = 6:{x
21},{x
34},{x
29, x
49},{x
6},{x
4},{x
1–x
3, x
5, x
7–x
20, x
22–x
28, x
30–x
33, x
35–x
48, x
50}.
-
3)
r = 7:{x
21},{x
34},{x
29, x
49},{x
6}, {x
4},{x
47},{x
1–x
3, x
5, x
7–x
20,,
x
22–x
28, x
30–x
33, x
35–x
46, x
48, x
50}.
-
4)
r = 14:{x
21}.{x
34}.{x
29, x
49},{x
6},{x
4},{x
47},{x
50},{x
38},{x
23, x
43},{x
14, x
41},{x
8},{x
7},{x
3},{x
1, x
2, x
5, x
9–x
13, x
15–x
20, x
22, x
24–x
28, x
30–x
33, x
35–x
37, x
39–x
40, x
42, x
44–x
46, x
48}.
-
5)
r = 16:{x
21},{x
34},{x
29,x
49},{x
6},{x
4},{x
47},{ x
50},{ x
38},{ x
23, x
43},{ x
14, x
41},{ x
8},{ x
7},{ x
3},{ x
12, x
27}, { x
9},{ x
1, x
2,x
5, x
10–x
11, x
13, x
15–x
20, x
22, x
24–x
26, x
28, x
30–x
33, x
35–x
37, x
39, x
40, x
42, x
44–x
46, x
48}.
Clustering effect test and result analysis
In view of the above five kinds of classification results, in order to determine the optimum classification, F test is used to F statistics: define the classification number as r, sample number of the no. j class is n
j
, the sample of the jth class is denoted as \( {x}_1^{(j)},{x}_2^{(j)},\cdots, {x}_{n_j}^{(j)} \): clustering center, the jth class is a vector \( {\overline{x}}^{(j)}=\left({\overline{x}}_1^{(j)},{\overline{x}}_2^{(j)},\cdots, {\overline{x}}_m^{(j)}\right) \), which \( {\overline{x}}_k^{(j)} \) is the average of the kth feature, namely
$$ {\overline{x}}_k^{(j)}=\frac{1}{n_j}{\displaystyle \sum_{i=1}^{n_j}{x}_{ik}^{(j)}},\kern0.5em \left(k=1,2,\cdots, m\right) $$
(16)
$$ F=\frac{{\displaystyle {\sum}_{j=1}^r{n}_j\left\Vert {\overline{x}}^{(j)}-\overline{x}\right\Vert }/\left(r-1\right)}{{\displaystyle {\sum}_{j=1}^r{\displaystyle {\sum}_{i=1}^{n_j}\left\Vert {x_i}^{(j)}-{\overline{x}}^{(j)}\right\Vert }}/\left(n-r\right)} $$
(17)
where \( \left\Vert {\overline{x}}^{(j)}-\overline{x}\right\Vert =\sqrt{{\displaystyle {\sum}_{k=1}^m{\left({\overline{x}}_k^{(j)}-{\overline{x}}_k\right)}^2}} \) stands for the distance between \( {\overline{x}}^{(j)} \) and \( \overline{x} \); \( \left\Vert {x_i}^{(j)}-{\overline{x}}^{(j)}\right\Vert \) represents the distance between the ith sample in the jth class, x
i
(j) and its center, \( {\overline{x}}^{(j)} \), the F value obeys the F distribution in which the pair of degrees of freedom is r − 1, n − r, the numerator of F stands for the distance between two classes, and the denominator of F represents the distance between each sample in the class. The greater the F value is, the greater the distance between two classes is, namely, the greater the difference between two classes is, the better the classification results are.
Based on the above principle, we can obtain the following results:
-
1)
If r = 2, then n
1 = 1, n
2 = 49, we can obtain F
2 = 58.457 %
-
2)
If r = 6, then n
1 = 1, n
2 = 1, n
3 = 2, n
4 = 1, n
5 = 1, n
6 = 44, F
6 = 76.111 % can be computed.
-
3)
If r = 14,then n
1 = 1, n
2 = 1, n
3 = 2, n
4 = 1, n
5 = 1, n
6 = 1, n
7 = 1, n
7 = 1, n
8 = 1, n
9 = 2, n
10 = 2, n
11 = 1, n
12 = 1, n
13 = 1, n
14 = 33, we can derive F
14 = 58.441 %.
-
4)
If r = 16, n
1 = 1, n
2 = 1, n
3 = 2, n
4 = 1, n
5 = 1, n
6 = 1, n
7 = 1, n
7 = 1, n
8 = 1, n
9 = 2, n
10 = 2, n
11 = 1, n
12 = 1, n
13 = 1, n
14 = 2, n
15 = 1, n
16 = 30, F
16 = 61.236 % can be obtained.
From above data, we can get F6 > F16 > F2 > F14, so dividing into six categories is the best classification method.
When 50 users are divided into six classes, sentiment tendency graph of each user is shown in Fig. 2.
Figure 3 illustrates that emotional states of the users in the first class are not very good as a whole; emotional value is below zero value in most cases, namely negative emotions are more serious. Therefore, relevant departments should take corresponding measures, to avoid users do something radical actions and to maintain social harmony and stability. Moods of users in the second category is a bit low early in the month, then, is slowly and slightly better; although there are some fluctuations of moods, the user’s emotional values are overall positive; the users belong to the normal situation. Two users’ emotional trends in the third class are consistent, except sentiment values were close to −1 at 2014/12/04; their negative affection degrees were strong, and their emotions are preferred in most times. And so on, we can analyze the user’s emotional state in the fourth, fifth, and sixth class. Through the emotional charts, we can not only understand the emotional change of individual users but also find users within the similar emotional states by different classifications.