Artificial immune network clustering based on a cultural algorithm

Deng, Liyuan; Yang, Ping; Liu, Weidong

doi:10.1186/s13638-020-01779-1

Research
Open access
Published: 03 September 2020

Artificial immune network clustering based on a cultural algorithm

Liyuan Deng¹,
Ping Yang¹ &
Weidong Liu¹

EURASIP Journal on Wireless Communications and Networking volume 2020, Article number: 168 (2020) Cite this article

1651 Accesses
3 Citations
Metrics details

Abstract

Data mining technology has been applied in many fields. Prototype-based cluster analysis is an important data mining method, but its ability to discover knowledge is limited because of the need to know the number of target data categories and cluster prototypes in advance. Artificial immune evolutionary network clustering is a clustering method based on network structure. Compared with prototype-based cluster analysis, it has the advantage of realizing unsupervised learning and clustering without any prior knowledge of data. However, artificial immune evolutionary network clustering also has problems such as a lack of guidance in the clustering process, fuzzy boundary sensitivity, and difficulty in determining parameters. To solve these problems, an artificial immune network clustering algorithm based on a cultural algorithm is proposed. First, three kinds of knowledge are constructed: normative knowledge is used to regulate the spatial range of population initialization to avoid blindness; state knowledge is used to distinguish the type of antigen, and immune defense measures are taken to prevent the network structure caused by noise and boundaries from being unclear; and topology knowledge is used to guide the antigen for optimal antibody search. Second, topology knowledge in the cultural algorithm is used to characterize the distribution of antigens and antibodies in space, and elite learning is used to improve the traditional clone mutation operator. Based on the shadow set theory, a method for adaptively determining the compression threshold is proposed. Finally, the results of simulation experiments show that the proposed algorithm can effectively overcome the above problems, and the clustering performances on a synthetic dataset and an actual dataset are satisfactory.

1 Introduction

Data mining is the process of discovering specific information hidden in massive databases through various algorithms [1]. With the accumulation of massive data brought by the development of information technology, the use of data mining technology to transform these data into useful information has been used in various fields [2, 3]. In the field of cloud services, work [4] proposed a distributed cloud service method based on distributed sensitive hashing in multisource data. Work [5] proposed a big data-driven mashup building method that supports economic software developments. In the field of the Internet of Things, work [6] proposed a multidimensional data processing and query method, work [7] studied IoT offloading utilities that support edge computing. In the field of business services, Bismita S. Jena [8] and others analyzed the business information mining technology used on transaction datasets. Zhang et al. [9,10,11] researched related data mining techniques such as business service recommendation and service quality query; Chen et al. [12, 13] researched data for business intelligence and business service computing. Data mining technology is also widely used in e-commerce, media, energy services, automotive engineering, and other fields [14,15,16,17,18,19].

Clustering analysis is an important method of knowledge discovery in data mining [20,21,22]. If we know the number of target data categories and clustering prototypes in advance, we can use a prototype-based cluster analysis method, but this will inevitably limit the ability of clustering analysis to discover knowledge. Therefore, a clustering method without any prior knowledge of data is of course preferable for clustering analysis [23,24,25]. Based on the artificial immune evolutionary network (aiNet) clustering algorithm, this paper proposes a new algorithm of artificial immune network clustering based on a cultural algorithm. It guides the aiNet clustering process by constructing normative knowledge and topological knowledge in the trust space and introduces the principle of immune defense in the human immune system into the algorithm. In this paper, the input taboo threshold is avoided, and the shadow set theory is used to realize the adaptive determination of the compression threshold.

The algorithm is called the cultural evolutionary artificial immune network (CaiNet). The algorithm uses topological knowledge in the cultural algorithm to characterize antigens and antibodies in space [26, 27]. When an antigen searches for the antibody with the highest affinity, it searches through the antibodies in the topological unit where it is located. The algorithm uses immune defense to improve the flexibility of the application and improves the traditional clone mutation operator through elite learning.

Compared with the aiNet algorithm, the CaiNet algorithm has higher average accuracy and smaller variance. In the simulation experiment, the seed dataset is selected as the experimental object. The accuracy rate of the CaiNet algorithm is improved by 5.8%, and the variance is reduced by 3.71%. When the Wine dataset is selected as the experimental object, the accuracy rate of the CaiNet algorithm reaches 98.78%. The CaiNet algorithm has a certain improvement in balance, accuracy, recall rate, and hit rate and has better convergence.

The main innovations of this article are summarized as follows:

1.
We propose using the cultural algorithm to guide the aiNet clustering algorithm and use the topology knowledge in the cultural algorithm to characterize the distribution of antigens and antibodies in the space, which greatly reduces the complexity of the algorithm search.
2.
Based on the concept and theory of the shadow set, we propose a method for adaptive determination of the compression threshold based on the shadow set, which improves the ability of the algorithm to quickly solve the algorithm.
3.
Drawing on the immune defense suppression measures adopted in medicine to avoid excessive epidemic prevention, we propose a new algorithm immune defense mode, which improves the flexibility of algorithm application.

The organizational structure of the paper is as follows. First, we discussed related work in Section 2. Subsequently, we introduce the structure of the CaiNet algorithm in Section 3, define three kinds of knowledge in Section 3.1, design and improve the operation operator in Section 3.2, simplify the optimal antibody search for antigens through topology knowledge steps, formulate the mutation rules of the algorithm, propose a method of adaptive determination of the compression threshold based on the shadow set, and formulate the immune defense criteria of the CaiNet algorithm. The algorithm steps of the CaiNet algorithm are summarized in Section 4. Finally, we select three types of datasets in Section 5 for simulation experiments and evaluate the stability and convergence performance of the CaiNet algorithm.

2 Related work

Artificial immune evolutionary network clustering is a clustering method based on network structure [28]. Compared with the prototype clustering method, it can achieve real unsupervised learning and clustering. Based on the basic aiNet algorithm, Li Jie et al. introduced the concept of taboo cloning in immunology to the artificial immune network clustering algorithm, which solved the problem that aiNet cannot handle the fuzzy boundary of the sample subset [29]. Considering the problem of memory network dynamics and irregular changes caused by the lack of objective function guidance of the aiNet algorithm, Guo Jianhua et al. established the overall objectives and constraints of the memory network by defining quality evaluation standards, thus realizing the guidance of the algorithm, and discussed the value of the compression threshold [30]. To overcome the problem that the monoclonal algorithm easily falls into local optimum, Zhou Yang et al. proposed an evolutionary immune network clustering algorithm based on polyclonal algorithms [31]. Ma Li et al. applied a variety of artificial immune system operators to the clustering process. Based on the basic principles of biological immunity and cloning, they proposed an adaptive multiclone clustering algorithm that automatically adjusts clustering categories by setting affinity functions to increase the antibody population diversity of individuals to expand the search range of the solution and avoid precocity of the algorithm [32]. It has been found in experiments that for unbalanced datasets, clusters of small samples are easily undetectable when using a large taboo threshold. However, the improved aiNet algorithm does not have a unified understanding of the death threshold, compression threshold, and taboo threshold to be input. In many cases, it needs to be determined according to the characteristics of the data, which makes the algorithm more difficult to apply.

This paper defines three kinds of knowledge in the CaiNet algorithm: normative knowledge, topology knowledge, and state knowledge. Normative knowledge provides a code of conduct for evolution, topological knowledge easily guides the expansion of the network in different spaces, and state knowledge is used to control the strength of antigen activation networks in different states. In this paper, the topology unit is used to form the topology knowledge in the cultural algorithm, which simplifies the optimal antibody search step and formulates new mutation rules, which overcomes the limitations of the traditional algorithm. The determination of the compression threshold is the difficulty of most algorithms. Based on the concept and theory of the shadow set, this paper proposes an adaptive determination method for the compression threshold based on the shadow set, which is conducive to the rapid solution of the algorithm. To avoid the unclear structure of the immune network caused by the boundary data in the traditional algorithm, it may prevent the network from accurately expressing the distribution of antigens so that it does not activate the immune network. This article refers to the immune defense suppression measures taken to avoid excessive defense in medicine. For the noise, the boundary, and the antigen inside the cluster, three different methods are used to treat them differently.

To test the effect of the new algorithm, we select three UCI datasets as the experimental objects and compare the average accuracy and variance in the algorithm. The experimental results show that the stability and convergence of the new algorithm and the performance significantly improve.

3 Method

Cultural algorithms use trust space and population space for double-layer evolution. The population space forms different types of knowledge through trial and error in the processing of trust space and then guides the evolution of the population space. The designed algorithm structure is shown in Fig. 1.

3.1 Background knowledge

In this algorithm, three kinds of knowledge are defined. Normative knowledge defines the interval range of the antigen and each generation of antibodies and provides a behavioral rule for evolution. Topological knowledge expresses the distribution of antigens and antibodies in the search space and provides opinions and recommendations for immune recognition suggestions that are helpful to guide the expansion of the network in different spaces. State knowledge records the different states that the antigens may be in and is used to control the strength of the antigen activation network in different states.

Definition 1 Antibody-antigen affinity. Antibody-antigen affinity is the measurement of affinity between the antibody and antigen and is described in detail in formula (1).

$$ f\left(\mathbf{g},\mathbf{b}\right)=\frac{1}{1+\left\Vert \mathbf{g}-\mathbf{b}\right\Vert } $$

(1)

In the formula, ‖⋅‖ represents the Euclidean distance, G represents the antigen collection, and g_i represents a single data sample. B_k represents the immune network, that is, the antibody collection, and b_k,j represents the k antibody in the jth network.

Definition 2 Antibody-antibody affinity. Antibody-antibody affinity is expressed by the Euclidean distance d_i,j between the antibodies, which can form the affinity matrix $ {D}_k={\left({d}_{i,j}\right)}_{N_k\times {N}_k} $ of the network, and N_k represents the number of antibody neurons in the k-th network.

Definition 3 Clone operation. The clone operation selects a part of the antibody with a high affinity to copy. For antibody b_i, the clone operation can be expressed as:

$$ C\left({b}_i\right)=\left[{b}_{i,0},{b}_{i,1},L\kern0.5em ,{b}_{i,n-1}\right],\kern0.5em n= Int\left({N}_c\times \frac{A_i}{\sum \limits_{j=1}^N{A}_j}\right) $$

(2)

where N_c represents the total antibody size after cloning, A_i represents the affinity of the i antibody, and N represents the number of antibodies participating in the clone.

Definition 4 Normative knowledge. Normative knowledge records the spatial range of antibody production; one range is the value interval of each dimension of the antigen, and the other is the value interval of each dimension of the memory network antibody neuron, which is represented by N₀ and N_t, and its formal definition is:

$$ {N}_0=\left\{\;{l}_1,{u}_1;{l}_2,{u}_2;L;{l}_n,{u}_n\;\right\} $$

(3)

$$ {N}_t=\left\{\;{l}_1^t,{u}_1^t;{l}_2^t,{u}_2^t;L;{l}_m^t,{u}_m^t\;\right\} $$

(4)

where l_i represents the lower bound, u_i represents the upper bound, and i represents the ith dimension. N₀ is static knowledge and does not change throughout the clustering process; N_t is dynamic knowledge, which changes with each network change. The superscripts of $ {l}_i^t $ and$ {\mu}_i^t $ in N_t represent the number of iterations.

In the internal image of antigens, antibody neurons are generally distributed in the space determined by all antigens; therefore, the antibody population should be within the space determined by N₀ during initialization. As the network evolves, it should gradually converge because such a network is more refined and clustering is more obvious. To achieve this goal, N_t is used to guide the initialization of the antibody population. When the population is initialized, most of the antibodies (80%) are generated in the specified space. To avoid a suboptimal algorithm solution, some of the antibodies (20%) are also generated in the residual set of N_t relative to N₀, forming a disturbance and preventing the network from falling into the local optimal solution.

Definition 5 Topology knowledge. The topological unit refers to the hypergeometric region with an antibody as the center and l_j as the edge length of the jth dimension. The knowledge about antibody and antigen features contained in all topological units is called topology knowledge. The topological unit represented by antibody b_j can be expressed as:

$$ {T}_i=\left\{{b}_{i,1}-\frac{l_1}{2},{b}_{i,1}+\frac{l_1}{2};{b}_{i,2}-\frac{l_2}{2},{b}_{i,2}+\frac{l_2}{2};L;{b}_{i,m}-\frac{l_m}{2},{b}_{i,m}+\frac{l_m}{2}\right\} $$

(5)

In the formula, $ {b}_{i,j}-\frac{l_j}{2} $ and $ {b}_{i,j}+\frac{l_j}{2} $ represent the upper and lower bounds, respectively, of the topological unit represented by the antibody on the first dimension. By calculating the coordinates of the antibody and antigen in space, we can determine whether it belongs to a topological unit. If an antigen g_j belongs to a topological unit T_i, it is recorded as g_j ∈ T_i. There may be intersections between topological units. When an antigen belongs to more than one topological space, the distance between the antigen and the center of the topological unit is calculated, and the smallest distance is taken as the topological unit of the antigen. Due to the distribution of antibodies, some antigens may not be in any topological units. In this case, the distance between antigens and the center of all topological units is calculated, and the one with the smallest distance is taken as the topological unit.

When the topological unit is determined, the antigen can be mapped into the topological unit. Since antibody b_i is the center of topological unit T_i, and according to the principle of immune network clustering, the antibody is the inner image of the network. Therefore, we call antibody b_i the representative point of the antigen contained in topological unit T_i. In particular, when the topological unit does not contain any antigens, it is deleted.

Definition 6 State knowledge. Without losing generality, the data in the dataset are divided into noise, boundary, and cluster internal points, and state knowledge is used to record the different antigen states.

Topological elements can be regarded as grids with knowledge characteristics. According to the existing grid-based clustering methods, noise and boundary points (including fuzzy boundaries) are significantly different from the data within the cluster. It has been found that the noise and boundary points of the data include but are not limited to the following features: the area where the noise and boundary points are located is generally sparse; the difference between the boundary points and the class interior is that the latter often has close neighbors in multiple directions, that is, the uniformity is relatively good; the density of the area where the boundary points are located generally has a jump. The difference between different point sets mainly lies in the density and uniformity, the noise density is small, the density at the boundary is small and uneven, and the data density inside the cluster is large and evenly distributed. The density is expressed by the number of antigens in the grid, i.e.,

$$ {\rho}_j=\sum \limits_{b_i\in {T}_j}1 $$

(6)

The joint entropy method is used to measure the uniformity of the data distribution in the topological unit. For each antigen b_j,i in the topological unit, the number of antigens is calculated in its ε neighborhood, and it is recorded as ρ_{j, i}, $ \varepsilon =\raisebox{1ex}{${l}_j$}\!\left/ \!\raisebox{-1ex}{$4$}\right. $. l_j is the length of the side of T_j, and

$$ {p}_{j,i}=\frac{\rho_{j,i}}{\rho_j} $$

(7)

is recorded. The entropy of b_j,i can be expressed as:

$$ {H}_{j,i}=-{p}_{j,i}\log {p}_{j,i} $$

(8)

Furthermore, we can obtain the combined entropy of all antigens in T_j

$$ {H}_j=\sum \limits_{b_{j,i}\in {T}_j}{H}_{j,i}=-\sum \limits_{b_{j,i}\in {T}_j}{p}_{j,i}\log {p}_{j,i} $$

(9)

The data can be divided into 3 categories according to prior knowledge, so this is a two-dimensional clustering problem with a known number of categories, which can be solved well using methods such as fuzzy C-means. After the clustering is completed, the antigens in the corresponding topological units can be labeled as noise, boundary points, and cluster internal data.

When a data point is marked incorrectly, the algorithm may be guided in the wrong direction. The distribution of antibodies has randomness, and a clustering algorithm is not always effective. Therefore, misclassification always occurs. To avoid the impact of this situation, the idea of evidence accumulation is introduced. Evidence accumulation refers to adding 1 to the evidence value of an antigen if it is labeled in the same state in the adjacent time sequence, and 1 is subtracted from the evidence value if it is labeled in different states in the adjacent time sequence. Because of the randomness of the antibody, this can greatly reduce the impact of misclassification. According to the above methods, state knowledge can be expressed as:

$$ S=\left\{\kern0.5em {S}_1,{D}_1;{S}_2,{D}_2;L;{S}_i,{D}_i;L;{S}_n,{D}_n\kern0.5em \right\} $$

(10)

where S_i represents the state of the ith antigen, D_i represents the evidence of the state, and D_i is equal to 1 at the initialization phase.

3.2 AiNet clustering based on a cultural algorithm

3.2.1 Optimal antibody search

We use topological units (hypergeometry) to form topology knowledge in cultural algorithms. Topology knowledge includes two parts: antigen and antibody. Therefore, we hope to simplify the optimal antibody search by topology knowledge.

According to topological knowledge, antibodies can be regarded as representative points of antigens in antibody units, and the distance between antibodies with a high affinity and their representative points should be small. Therefore, we can first use a representative point antibody to find the k^' > k antibody with the smallest distance, then calculate the affinity between the k^' antibody and antigen, and take the k antibody with the highest affinity as the optimal k antibody. Its pseudocode is:

For other antigens belonging to T_j, only step 2 is needed to find the optimal K antibody, which can greatly reduce the complexity.

The value of k^’ should be greater than k because there is a certain distance deviation between b_j and g_i. In practice, the greater the difference between k^’ and k, the more accurate the results obtained, and the cost is the expansion of the search range. Considering the uniformity of antibody distribution in the network, k^’ is generally taken as $ \raisebox{1ex}{$3k$}\!\left/ \!\raisebox{-1ex}{$2$}\right. $.

3.2.2 Elite learning variation

In traditional aiNet clustering, antibody improvement is achieved by clone variation, expressed as

$$ {b}_j={b}_j-\alpha\;\left({b}_j-{g}_i\right) $$

(11)

where α represents variability, and the value decreases with increasing b_j and g_i affinity. Formula (11) improves the antibody by reducing the distance between antibody b_j and antigen g_i, but this method still has some limitations, such as b_j being only close to the antigen and not focusing on learning from other antibodies. To make the target antibody obtain the advantage information of outstanding antibodies at the same time, the following variation rules are formulated:

$$ {b}_j={b}_j-\alpha \left[\;{r}_1\left({b}_j-{g}_i\right)+{r}_2\left({b}_j-{b}_0\right)\;\right] $$

(12)

where b₀ represents the antibody with the highest affinity with g_i. In the current network, r₁ and r₂ are weighting factors, meeting the requirement of r₁₊ r₂ = 1; if b₀≡b_j, r₁ = 1. In fact, when r₁ = 1, it degenerates into the mutation strategy of a traditional algorithm.

3.2.3 Compression threshold determination

There is no unified understanding of how to set the compression threshold. The general guidance is to take a very small compression threshold first, for example, 10⁻³, and gradually increase it with the change in the network. There is little discussion on this in the existing literature. According to the concept and theory of shadow sets, we propose an adaptive method to determine the compression threshold.

Shadow sets is a theory proposed by Pedrycz to address fuzzy problems, in which set levels 1,0 and [0,1] are used to describe and simplify fuzzy relationships. The sample points corresponding to level 1 belong to a set completely, [0,1] indicates whether the sample point belongs to a set or not. The 0 corresponding sample point does not belong to a collection at all. The above three levels correspond to the complements of the lower approximation, upper approximation and lower approximation relative to the upper approximation.

The purpose of network compression is to improve the affinity between antibody and antibody, that is, to increase the distance between antibodies and prevent network redundancy caused by a small distance. The smaller the distance, the more likely it is to be compressed, and the greater the distance, the more likely it is to not be compressed. Without losing generality, we use the normalization of distance to express the possibility membership degree of whether the antibody should be compressed. The possibility membership of whether the antibody should be compressed is defined as the mapping of the distance between the antibody and the antigen to the [0,1] closed interval, expressed by the formula:

$$ {u}_{i,j}=\frac{d_{i,j}-{d}_{\mathrm{min}}}{d_{\mathrm{max}}-{d}_{\mathrm{min}}},\kern0.5em i,j=1,2,L\kern0.3em ,n $$

(13)

The objective function is defined as:

$$ \underset{\alpha }{\arg \min }F\left(\alpha \right)=\left|\;{\xi}_1+{\xi}_2-{\xi}_3\;\right|,\kern1em \alpha \in \left(0,0.5\right) $$

(14)

where a is in the range of (0,0.5], $ {\xi}_1=\sum \limits_{u_{i,j}\le \alpha }{u}_{i,j} $, $ {\xi}_2=\sum \limits_{u_{i,j}\ge 1-\alpha}\left(1-{u}_{i,j}\right) $, and ξ₃ = card (I) represents modulo set A, and I = { i | α < u_ij < 1 − α}. When the α value is determined, the part of the antibody satisfying μ_{i, j} ≤ α needs to be compressed.

Obviously, according to this threshold determination method, a certain number of antibodies are compressed each time, which is consistent with the actual situation of network compression in the algorithm. In addition, F(α) is a simple step-like unimodal function that can be quickly solved by methods such as dichotomy.

3.2.4 Immune defense

According to the traditional aiNet clustering method, regardless of the nature of the antigen, the antibody can generate an immune response and then activate the antibody network. This is the main reason for the unclear structure of the immune network due to “abnormal” data such as noise and fuzzy boundaries.

The immune defense mechanism means that the immune system can attack, destroy, and clear “alien components” such as bacteria, viruses, and foreign bodies, which is a very important protection mechanism for the human body. We simulate this process in the algorithm.

To defend against “alien elements,” we must first identify the “alien elements” according to the state knowledge constructed in the cultural algorithm. It is convenient to determine the “alien component,” that is, the parts marked as the noise and boundary in the state knowledge.

In the clustering problem, because the boundary data easily cause the immune network structure to be unclear, it does not activate the immune network, which creates the problem that it may make the network unable to accurately express the distribution of the antigens.

To avoid this problem, three different methods are adopted to treat the antigens in noise, boundary and cluster according to the immune defense inhibition measures taken to avoid overdefense in medicine, namely,

$$ \left\{\begin{array}{l}{g}_i\in {S}^0,\kern1em \mathrm{Clonal}\kern0.5em \mathrm{do}\mathrm{minant}\kern0.5em \mathrm{antibody}\kern0.5em \mathrm{selection}\kern0.5em \mathrm{and}\kern0.5em \mathrm{variation}\\ {}{g}_i\in {S}^1,\kern1em \mathrm{do}\mathrm{minat}\kern0.5em \mathrm{antibody}\kern0.5em \mathrm{selection}\kern0.5em \mathrm{and}\kern0.5em \mathrm{mutation}\\ {}{g}_i\in {S}^2,\kern1em \mathrm{do}\kern0.5em \mathrm{not}\kern0.5em \mathrm{operate}\end{array}\right. $$

(15)

where S⁰, S¹, and S² represent the interior, boundary, and noise antigen set of the cluster, respectively, and the noise and boundary points are defended differently by the immune defense mechanism guided by state knowledge. If noise is no longer involved in the immune process, it is eliminated directly. Boundary points do not participate in the process of cloning to avoid the generation of a large number of cloned antibodies at the boundary and prevent the blurring of network structure at the boundary. The reason why boundary points participate in the selection and variation is to avoid the excessive movement of antibodies to the clustering center, resulting in a lack of affinity between the boundary and antibody network, thus leading to the problem of boundary point misclassification.

3.3 Specific steps

For the final immune network, the minimum spanning tree is generated according to its connected graph. There is a larger weight between the representative antibodies of two different clusters. According to the set pruning threshold, the m connections with larger weights are removed so that m+1 clusters can be obtained. The steps of the CaiNet algorithm are shown as:

After the data points in these units are eliminated, the dataset is recorded as X = {x₁, x₂, ⋯, x_i, ⋯, x_n}, and the clustering is recorded as C₁,C₂,⋯, C_j,⋯,C_m. Next, determine the type of data based on the distance between the data point and the antibody, that is,

$$ {d}^2\left({x}_i,{b}_l\right)=\min \left\{{d}^2\left({x}_i,{b}_k\right),k=1,2,\mathrm{L},m\;\right\},{b}_l\in {C}_j\Rightarrow {x}_i\in {C}_j $$

(16)

4 Experiments

The running configurations include hardware settings (2.70 GHz CPU, 8.0 GB RAM) and software settings (Windows 10 and Python 3.6). Each test is executed 50 times to record their average performances.

4.1 Experimental results on a synthetic dataset

High-dimensional data are not easy to display intuitively, so we use a two-dimensional synthetic dataset to verify the proposed clustering algorithm. There are three clusters in the dataset, two of which have more samples, and the other contains fewer samples. There are fuzzy boundaries between the three clusters, and they contain many instances of sample noise.

As shown in Fig. 2, the minimum spanning tree obtained by CaiNet can be divided into three distinct categories. The nodes of the tree can better reflect the data distribution of the dataset, and the nodes are relatively uniform. According to the algorithm, the last operation before obtaining the final minimum spanning tree is network compression. Therefore, the uniform distribution of the nodes is related to the selection of the compression threshold, which also shows that selecting the threshold using the shadow sets method is effective and can avoid the blindness of choosing a fixed compression threshold. The taboo clone method is not used in the algorithm, but the algorithm is also effective for datasets with fuzzy boundaries, indicating that the immune defense principle can achieve the same effect as the taboo clone.

The algorithm clusters the noise, boundary, and normal data and explicitly eliminates the noise. From the results, we can see that most of the noise in the data can be identified by the algorithm. Since taboo cloning is not used, the new algorithm does not need to set the taboo threshold in advance, which is very convenient and effective in practice.

4.2 Experimental results on a real-world dataset

To test the clustering effect of the algorithm on actual high-dimensional data, we choose the iris, wine, and seeds UCI datasets as experimental objects.

The average correct rate represents the proportion of the data that the algorithm classifies in the cluster correctly. To test the stability of the algorithm, we test the specified algorithm 50 times on the datasets to obtain a variance in the accuracy after 50 times. Obviously, the smaller the variance is, the higher the stability of the algorithm. For these three datasets, the comparison between the CaiNet algorithm and the aiNet clustering algorithm is shown in Table 1. As seen from the table, in the comparative experiment results, the variance in the CaiNet algorithm is smaller than that of the aiNet clustering algorithm, and the average correct rate of the CaiNet algorithm is higher than the average correct rate of the aiNet clustering algorithm, which shows that the CaiNet algorithm is more stable. When the seeds dataset is selected as the experimental object, the variance in the CaiNet algorithm decreases the most, which is 3.71% less than the variance in the aiNet clustering algorithm. The average accuracy of the CaiNet algorithm is the highest, which is 5.8% higher than the average accuracy of the aiNet clustering algorithm. When the wine dataset is selected as the experimental object, the variance in the CaiNet algorithm is the smallest at only 0.66%; at the same time, the average correct rate of the CaiNet algorithm reaches 98.78%. It can be seen that the variance and average correct rate of the CaiNet algorithm are affected by the selected dataset, and the degree of improvement of its algorithm stability is also related to the selected dataset.

Table 1 Comparison of clustering performance between the two algorithms for three datasets.

Full size table

In addition, we test the algorithm convergence performances. In the running time of the simulation experiment, we choose to perform 100 simulation operations. The results are shown in Figs. 3, 4, 5, and 6. As seen in the figures, the CaiNet algorithm has the best balance and the highest accuracy. The CaiNet algorithm also has a higher recall and hit rate than the other methods. Therefore, the CaiNet algorithm has better convergence performance.

4.3 Discussion

We tested and evaluated our proposed CaiNet method with a baseline method named aiNet to prove the advantages of our method. However, several additional points should be noted and further analyzed in detail, which are specified below.

1.
For the three compared datasets in the experiments in Subsection 4.2, i.e., iris, wine, and seeds, their data volumes are all not large enough (the three sample sizes are 150, 178, and 210, respectively). Therefore, in future work, we need to investigate more appropriate and larger datasets to validate the feasibility of our model and method, especially in the big data environment.
2.
Although our CaiNet method performs better than the compared baseline aiNet method, the accuracy of the CaiNet method is still not very high (92.16%, 98.78%, 88.24%). Therefore, we need to seek more efficient improvements to refine our work in this paper.
3.
Clustering is often a time-consuming task that requires a high time complexity, which is often not very suitable for the big data environment. Therefore, lightweight clustering methods are often required. We will further optimize our method to accommodate big data volume.

5 Conclusion

In this paper, cultural knowledge is used to guide the clustering of aiNet, and topology knowledge of the cultural algorithm is used to represent the distribution of antigens and antibodies in the space. Antigens only need to search using the antibodies in the topological unit when finding the highest affinity antibody, which greatly reduces the complexity. Through immune defense, the flexibility of algorithm application is improved. According to the theory of shadow sets, an adaptive method to determine the compression threshold is proposed; the traditional clonal mutation operator is improved by elite learning, which speeds up the convergence of the network. From the simulation experiment, we can see that the accuracy and stability of the improved algorithm have been improved, which proves its effectiveness.

In the future, we will continue to refine our work by considering more complex scenarios, such as multidimensional clustering problems. In addition, how to adapt our method to big data application requirements is another open question that requires intensive study.

Availability of data and materials

The recruited experiment dataset is the clustering analysis of a synthetic dataset.

Abbreviations

aiNet:: Artificial immune evolutionary network
CaiNet:: Cultural evolutionary artificial immune network

References

C. Zhang, M. Yang, J. Lv, W. Yang, An improved hybrid collaborative filtering algorithm based on tags and time factor. Big Data Mining Analytics 1(2), 128–136 (2018)
Google Scholar
H. Liu, H. Kou, C. Yan, L. Qi, Link prediction in paper citation network to construct paper correlated graph. EURASIP J Wireless Commun Network Article number 233 (2019)
F. Marcantoni, M. Diamantaris, S. Ioannidis, J. Polakis. A Large-scale study on the risks of the Html5 WebAPI for mobile sensor-based attacks. In Proc. of World Wide Web Conference (WWW’19), ACM Press, New York, pp. 3063–3071 (2019)
X. Chi, C. Yan, H. Wang, W. Rafique, L. Qi, Amplified LSH-based recommender systems with privacy protection. concurrency and computation: practice and experience (2020). https://doi.org/10.1002/CPE.5681
N. Almarimi, A. Ouni, S. Bouktif, M.W. Mkaouer, R.G. Kula, M.A. Saied, Web service API recommendation for automated mashup creation using multi-objective evolutionary search. Appl. Soft Comput. 85, 105830 (2019)
Google Scholar
W. Zhong, X. Yin, X. Zhang, S. Li, W. Dou, R. Wang, L. Qi, Multi-dimensional quality-driven service recommendation with privacy-preservation in mobile edge environment. Comput. Commun. (2020). https://doi.org/10.1016/j.comcom.2020.04.018
X. Xu, R. Mo, F. Dai, W. Lin, S. Wan, W. Dou, Dynamic resource provisioning with fault tolerance for data-intensive meteorological workflows in cloud. IEEE Transactions on Industrial Informatics (2019). https://doi.org/10.1109/TII.2019.2959258
B.S. Jena, C. Khan, R. Sunderraman, High performance frequent subgraph mining on transaction datasets: A survey and performance comparison. Big Data Mining Analytics 2(3), 159–180 (2019)
Google Scholar
Y. Zhang, K. Wang, Q. He, Covering-based web service quality prediction via neighborhood-aware matrix factorization, IEEE transactions on services computing. (2019). https://doi.org/10.1109/TSC.2019.2891517
Y. Zhang, G. Cui, S. Deng, Efficient query of quality correlation for service composition. IEEE Trans. Serv. Comput.. https://doi.org/10.1109/TSC.2018.2830773,2018
Y. Zhang, C. Yin, Q. Wu, et al., Location-Aware Deep Collaborative Filtering for Service Recommendation, IEEE Transactions on Systems, Man, and Cybernetics: Systems (2019). https://doi.org/10.1109/TSMC.2019.2931723
Y. Chen, N. Zhang, Y. Zhang, X. Chen, W. Wu, X.S. Shen, Energy efficient dynamic offloading in mobile edge computing for internet of things. IEEE Transact Cloud Comput (2019). https://doi.org/10.1109/TCC.2019.2898657
J. Li, T. Cai, K. Deng, X. Wang, T. Sellis, F. Xia, Community-diversified influence maximization in social networks. Inf. Syst. 92, 1–12 (2020)
Google Scholar
T. Cai, J. Li, A.S. Mian, R. Li, T. Sellis, J.X. Yu, Target-aware holistic influence maximization in spatial social networks. IEEE Trans. Knowl. Data Eng. (2020). https://doi.org/10.1109/TKDE.2020.3003047
J. He, M. Han, S. Ji, T. Du, Z. Li, Spreading social influence with both positive and negative opinions in online networks. Big Data Mining Analytics 2(2), 100–117 (2019)
Google Scholar
G. Li, S. Peng, C. Wang, J. Niu, Y. Yuan, An energy-efficient data collection scheme using denoising autoencoder in wireless sensor networks. Tsinghua Sci. Technol. 24(1), 86–96 (2019)
Google Scholar
L. Liu, X. Chen, Z. Lu, L. Wang, X. Wen, Mobile-edge computing framework with data compression for wireless network in energy internet. Tsinghua Sci. Technol. 24(3), 271–280 (2019)
Google Scholar
X. Xu, Y. Chen, X. Zhang, Q. Liu, X. Liu, L. Qi, A Blockchain-Based Computation Offloading Method for Edge Computing in 5G Networks. Software: Practice and Experience (2019). https://doi.org/10.1002/spe.2749
Y. Huang, Y. Chai, Y. Liu, J. Shen, Architecture of next-generation e-commerce platform. Tsinghua Sci. Technol. 24(1), 18–29 (2019)
Google Scholar
K. Yang, K. Maginu, H. Nomura, Cultural algorithm and their application. Int. J. Comput. Math. 87(10), 2143–2157 (2010)
MathSciNet MATH Google Scholar
H. Liu, H. Kou, C. Yan and L. Qi. Keywords-driven and popularity-aware paper recommendation based on undirected paper citation graph. Complexity, Volume 2020, Article ID 2085638, 15 pages, 2020.
J. Qian, M. Ji, A quantum-inspired evolutionary algorithm based on culture and knowledge. System Eng Theory Practice 35(1), 228–238 (2015)
Google Scholar
L. Qi, Q. He, F. Chen, X. Zhang, W. Dou, Q. Ni, Data-driven web APIS recommendation for building web applications. IEEE Transact Big Data (2020). https://doi.org/10.1109/TBDATA.2020.2975587
M. Daneshyari, G.G. Yen, Culture-based multiobjective particle swarm optimization. IEEE Transact Syst Man Cybernet B Cybern 41(2), 553–567 (2011)
Google Scholar
B.Z. Qiu, Y. Yang, X.W. Du, BRINK: An algorithm of boundary points of clusters detection based on local qualitative factor. J Zhengzhou Univ Eng Sci 33(3), 117–121 (2012)
MathSciNet Google Scholar
X. Li, P. Geng, B. Qiu, Clustering boundary points detection technology for attribute data set. Control and Decision 30(1), 171–175 (2015)
Google Scholar
X. Xu, X. Zhang, X. Liu, J. Jiang, L. Qi, M.Z.A. Bhuiyan, Adaptive computation offloading with edge for 5G-envisioned internet of connected vehicles. IEEE Trans. Intell. Transp. Syst. (2020). https://doi.org/10.1109/TITS.2020.2982186
B.Z. Qiu, J.Y. Shen, Grid-based and extend-based clustering algorithm for multi-density. Control Decision 21(9), 1011–1014 (2006)
MATH Google Scholar
B.Z. Qiu, T. Yu, Boundary points detecting based gradient of grid. Microelectron Comput 25(3), 77–80 (2008)
Google Scholar
B.Z. Qiu, F. Yue, J.Y. Shen, BRIM: An Efficient Boundary Points Detecting Algorithm. Advances in Knowledge Discovery and Data Mining (Springer, Berlin, 2007), pp. 761–768
Google Scholar
B.Z. Qiu, S. Wang, in The 7th International Conference on Computational Intelligence and Security. A boundary detection algorithm of clusters based on dual threshold segmentation (IEEE, Sanya, 2011), pp. 1246–1250
Google Scholar
X. Li, P. Geng, B.Z. Qiu, Clustering boundary points detection technology for attribute data set. Control Decision 30(1), 171–175 (2015)
Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by Xi’an Research Institute of High-Technology.

Author information

Authors and Affiliations

Xi’an Research Institute of High-Technology, Xi’an, China
Liyuan Deng, Ping Yang & Weidong Liu

Authors

Liyuan Deng
View author publications
You can also search for this author in PubMed Google Scholar
Ping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LD finished the algorithm and English writing of the paper. PY and WL finished the experiments. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Liyuan Deng.

Ethics declarations

Competing interests

We declare that there is no conflict of interest regarding this submission.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Deng, L., Yang, P. & Liu, W. Artificial immune network clustering based on a cultural algorithm. J Wireless Com Network 2020, 168 (2020). https://doi.org/10.1186/s13638-020-01779-1

Download citation

Received: 01 May 2020
Accepted: 14 August 2020
Published: 03 September 2020
DOI: https://doi.org/10.1186/s13638-020-01779-1

Artificial immune network clustering based on a cultural algorithm

Abstract

1 Introduction

2 Related work

3 Method

3.1 Background knowledge

3.2 AiNet clustering based on a cultural algorithm

3.2.1 Optimal antibody search

3.2.2 Elite learning variation

3.2.3 Compression threshold determination

3.2.4 Immune defense

3.3 Specific steps

4 Experiments

4.1 Experimental results on a synthetic dataset

4.2 Experimental results on a real-world dataset

4.3 Discussion

5 Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords