The definitions of DSACD studied in this paper are given as follows.

### Matrix preprocessing

Let *G*=(*V*,*E*) be the graph, where *V*=*v*1,*v*2,...,*v**n* represents the set of nodes (vertices) in the graph and *E* represents the set of edges in the figure. Let *N*(*u*) be the neighbor nodes set of node *u*. Let matrix *A*=[ *a*_{ij}]_{n×n} be the adjacency matrix of graph *G*, and the corresponding elements of the matrix represent whether there are edges between two points in graph *G*. For example, *a*_{ij} equals 1 indicates that there exists an *e*_{ij}. If *a*_{ij} equals 0, the indication is that there is no *e*_{ij}.

In the small graph, the adjacency matrix *A* can directly calculate the community relationship in the graph using a clustering algorithm such as K-means, and the result is more accurate (see Section 3). However, the adjacency matrix records only the relationship between adjacent nodes, and does not express the relationship between the node and its neighbors, or even more distant nodes. For any two nodes in the community, even if they are not connected to each other, it is possible to have the same community. Therefore, if the adjacency matrix is directly used as the similarity matrix for community partitioning, the complete community relationship cannot be reflected. If the adjacency matrix is directly clustered, the information will be lost.

In this paper, on the premise of definition: the similarity matrix which can express the non-adjacent information matrix is calculated by transforming the adjacency. Based on this, the definitions are given as follows.

###
**Definition 1**

Let a network graph be *G*=(*V*,*E*) for ∀*v*∈*V*. If the number of the shortest path from the node *v*_{i} to another node *v*_{j} is *s*, then, the node *v*_{i} can jump to node *v*_{j} through *s* hops. That is, hop is the number of least traversed edges from the node *v*_{i} to *v*_{j}.

As shown in the network of Fig. 1, node *v*_{1} reaches *v*_{2},*v*_{3}, or *v*_{6} after one hop and arrives at *v*_{4} or *v*_{5} after two hops. For an instance, from *v*_{1}→*v*_{2}, the number of least traversed edges is 1, so the hop count is 1; from *v*_{1}→*v*_{5}, the number of least traversed edges is 2, and the hop count is 2.

###
**Definition 2**

In *G*=(*V*,*E*), the **similarity between two points***v*_{i} and *v*_{j} is defined as formula *1*:

$$ Sim(i,j)=e^{\tau (1-s)},s\geq 1,\tau \in (0,1) $$

(1)

where the hop number from *v*_{i} to *v*_{j}, and *τ* is the attenuation factor. The node similarity decreases with the increase in the hop threshold *s*, *τ* controls the attenuation rate of the similarity, and the velocity of the node similarity relationship decays faster with the increase in *τ*.

###
**Definition 3**

In *G*=(*V*,*E*), its (**similarity matrix**) *S*=[*s*_{ij}]_{n×n} is calculated by the node similarity between two points in *G* where *s*_{ij}=*S**i**m*(*v*_{i},*v*_{j}),*v*_{i},*v*_{j}∈*V*.

The similarity matrix obtained by processing the adjacency matrix by the hop count and the attenuation factor can better reflect the relationship between the distant nodes in the high-dimensional matrix, and the results of the community discovery are also improved. Obviously, the selection of the hop count threshold and the attenuation factor will have an important impact on the similarity matrix. The selection of hop count is obtained from the parameter learning process, which is explained at Section 3.6. Section 3 of this paper will set up experiments on these two parameters to explore the impact of different parameters on the results.

### Deep sparse autoencoder

Based on a sparse autoencoder, the structure of deep sparse autoencoder is shown in Fig. 2.

The output of the previous layer, that is, the code *h* after dimension reduction, is shown in Fig. 2, as the input of the next layer. Then, the dimensions are reduced one by one.

Autoencoder [25] is an unsupervised learning artificial neural network that can learn the efficient encoding of data to express the eigenvalues of the data. The typical usage of the AE is to reduce dimensionality.

###
**Definition 4**

As shown in Figs. *3* and *4*, given an unlabeled data set \(\{x^{(i)}\}_{i=1}^{m}\), the **automatic encoder** learns the nonlinear code through a two-layer neural network (input layer is not counted) to express the original data, the *[*25*]* training process uses the back propagation algorithm, and the sign of the end of the training is that the difference between the learned nonlinear code and the original data is minimized.

The automatic encoder is composed of two parts: the coder (encode) and the decoder (decode). The encoding process is from the input layer to the hidden layer. At this time, the input data are subjected to dimensionality reduction to form a code, which is encoded as the output of the encoder, and then, the code is used as an input of the decoder for decoding, and the decoded result has the same dimension as the input data, used as the output of the decoder. After the output result is obtained, the output result is compared with the input result, the reconstruction errors are calculated, and then, the back-propagation algorithm is used to adjust the weight matrix of the automatic encoder. The reconstruction errors are calculated again and iterated continuously until the number of iterations or the reconstruction errors are less than the specified range. The output is equal to or close to the input result. The process of training a neural network using a back-propagation algorithm is also referred to as the minimization of the reconstruction error. Finally, the output of the encoder, i.e., the encoding, is taken as the output of the automatic encoder.

Specific steps are as follows:

Let *X* be the network graph *G* similarity matrix with dimension *n*. As the input matrix, where *x*_{i}∈(0,1),*X*∈*R*^{(}*n*×*n*). *x*_{i}∈*R*^{(}*n*×1) represents the *i*th column vector in *X*, *W*_{1}∈*R*^{(}*d*×*n*) is the weight matrix of the input layer [26], and *W*_{2}∈*R*^{(}*n*×*d*) is the weight matrix of the hidden layer [27].

*b*∈*R*^{(}*d*×1) is the offset column vector of the hidden layer [27].

*c*∈*R*^{(}*n*×1) is the offset column vector of the input layer [27].

The output *h* of the coding layer is obtained by formula 2:

$$ h_{i}=\tau (W_{1}x_{i}+b) $$

(2)

where *h*_{i}∈*R*^{(}*d*×1) is the encoded *i*th column vector. *τ* is the activation function, and the sigmoid function [28] is chosen as the activation function *τ*, which is shown by formula 3.

$$ f(z)=\frac{1}{1+e^{-z}} $$

(3)

In formula 3, *z*=*W*^{T}*X*.

The matrix *h* obtained at this time is a matrix after dimensionality reduction.

The output *z* of the decoding layer is obtained by formula 4:

$$ z_{i}=\tau (W_{2}h_{i}+c) $$

(4)

where *z*_{i}∈*R*(*n*×1) is the decoded *i*th column vector. *τ* is the activation function. The resulting matrix *z* is the same as the *X* dimension of the input matrix.

Combining the formula 2 with the formula 4, the reconstruction error is obtained:

$$ \text{error}=\sum_{i=1}^{n}\left \| \tau(W_{2}\tau(W_{1}x_{i}+b)+c)-x_{i} \right \|_{2}^{2} $$

(5)

When the activation function is sigmoid, the mapping range of the neurons is (0,1). When the output is close to 1, it is called active, and when the output is close to 0, it is called inactive [29]. In a sparse autoencoder, sparseness restrictions are added to the hidden layer. A sparse restriction means that neurons are suppressed most of the time, that is, the output is close to 0. A sparse expression has been successfully applied to many applications, such as target recognition [30*,*31*], speech recognition [*32*], and behavior recognition [*33]. The sparsity calculation method is as follows:

First, the average value of the output of the coding layer \(\widehat {\rho }_{j}\) is calculated and *h*_{j}(*x*) denotes the output value of the neuron for the *j*th neuron (*h*_{j}) of the hidden layer when the input is *x* [34]. The average value of the neuron output in the hidden layer is:

$$ \widehat{\rho }_{j}=\frac{1}{m}\sum_{i=1}^{m}[h_{j}(x_{i})] $$

(6)

To achieve sparsity, it is necessary to add a sparsity limit, which is achieved by:

$$ \widehat{\rho }_{j}=\rho $$

(7)

where *ρ* is the sparsity parameter, generally, *ρ*≪1, such as 0.05. When formula 7 is satisfied, the activation value of the hidden layer neurons is mostly close to 0.

A sparsity limit is added to the reconstruction error, that is, a penalty term is added to the reconstruction error, and \(\widehat {\rho }_{j}\) deviating from *ρ* is punished. The penalty function is as follows:

$$ \sum_{j=1}^{d} \rho \log \frac{\rho}{\widehat{\rho}_{j}} + (1-\rho) \log \frac{1-\rho}{1-\widehat{\rho}_{j}} $$

(8)

where *d* represents the number of hidden layer neurons. This formula is based on Kullback-Leibler divergence (KL [35]), so it can also be written as formula 9:

$$ \sum_{j=1}^{d} KL(\rho \parallel \widehat{\rho}_{j}) $$

(9)

In summary, formula 8 and formula 9 are combined to obtain the following:

$$ KL(\rho \parallel \widehat{\rho}_{j})= \rho \log \frac{\rho}{\widehat{\rho}_{j}} + (1-\rho)\log\frac{1-\rho}{1-\widehat{\rho}_{j}} $$

(10)

When \(\widehat {\rho }_{j}=\rho \), the penalty function \(KL(\rho \parallel \widehat {\rho }_{j})\) is 0. When \(\widehat {\rho }_{j}=\rho \) is far from *ρ*, the function monotonically increases and tends to infinity, as shown in Fig. 5:

Therefore, by minimizing the sparse term penalty factor, that is, formula 10, \(\widehat {\rho }_{j}\) is closed to *ρ*. At this point, the reconstruction error is updated to formula 11:

$$ \text{error}=\sum_{i=1}^{n}\left \| \tau(W_{2}\tau(W_{1}x_{i}+b)+c)-x_{i} \right \|_{2}^{2} + \beta \sum_{j=1}^{d} KL(\rho \parallel \widehat{\rho}_{j}) $$

(11)

where *β* is the weight of the sparse penalty factor.

The training sparse autoencoder minimizes the reconstruction error by the back-propagation algorithm, that is, formula 11.

### The deep sparse autoencoder for community detection

Based on the deep sparse autoencoder shown in Fig. 2, the data are preprocessed first, and the similarity matrix *S*_{0}∈*R*^{(n×n)} is obtained by formula 1. The similarity matrix is used as the input of the deep sparse autoencoder, then, the number of layers *T* of the deep sparse autoencoder is set and the number of nodes per layer {*d*_{0},*d*_{1},*d*_{2},⋯,*d*_{T}∣*d*_{0}=*n*,*d*_{0}>*d*_{1}>*d*_{2}>⋯>*d*_{T}}. The similarity matrix *S*_{0}∈*R*^{(n×n)} is input into the sparse autoencoder with the hidden layer as *d*_{1} as the input data of the first layer. After the first layer of training, the dimensioned matrix \(\phantom {\dot {i}\!}S_{1} \in R^{(n \times d_{1})}\) is obtained, then, *S* is input into the second layer of the deep sparse autoencoder, and then, the dimension is reduced to obtain \(\phantom {\dot {i}\!}S_{2} \in R^{(n \times d_{2})}\), etc., until the last layer. The low-dimensional feature matrix \(\phantom {\dot {i}\!}S_{T} \in R^{(n \times d_{T})}\) is obtained, and finally, the community is obtained by K-means clustering. See algorithms 1 and 2 for the detailed process.

Algorithm 1 hop count threshold *S*, attenuation factor *σ*, and formula 1 are used to compute the similar degree matrix *sim* of *A*∈*R*^{(n×n)}. Algorithm 1 is used to obtain the similarity matrix by computing the similarity of *x* with other nodes in *V*.

Algorithm 2 uses the deep sparse autoencoder with L-BFGS in which the layer number is *T* to reduce the dimension for a similar degree matrix, and then, the feature is extracted, and the low-dimensional characteristic matrix \(\phantom {\dot {i}\!}S_{T} \in R^{(n \times d_{T})}\) is obtained. Algorithm2 is used to reduce the similarity matrix and obtain the characteristics.

The K-means algorithm is used in *S*_{T} to obtain the cluster result *C**o**m**s*={*C*_{1},*C*_{2},⋯,*C*_{k}} and then return it.

After the K-means algorithm is used, the communities {*C*_{1},*C*_{2},⋯,*C*_{k}} are obtained, and then, the result is returned.

In the proposed algorithm, the inputs include the adjacent matrix *A*∈*R*^{(n×n)} of the *G*=(*V*,*E*), *k*—the number of communities, *S*—the hop count threshold, *σ*—the attenuation factor, and *T*—the layer number of deep sparse autoencoder and nodes in every layer *d*_{t}={*t*=0,1,2,⋯,*T*∣*d*_{0}>*d*_{1}>⋯>*d*_{T}}.