Design of a reservoir for cloud-enabled echo state network with high clustering coefficient

Reservoir computing (RC) is considered as a suitable alternative for descending gradient methods in recursive neural networks (RNNs) training. The echo state network (ESN) is a platform for RC and nonlinear system simulation in the cloud environment with many external users. In the past researches, the highest eigenvalue of reservoir connection weight (spectral radius) was used to predict reservoir dynamics. Some researchers have illustrated; the characteristics of scale-free and small-world can improve the approximation capability in echo state networks; however, recent studies have shown importance of the infrastructures such as clusters and the stability criteria of these reservoirs as altered. In this research, we suggest a high clustered ESN called HCESN that its internal neurons are interconnected in form of clusters. Each of the clusters contains one backbone and a number of local nodes. We implemented a classical clustering algorithm, called K-means, and three optimization algorithms including genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) to improve the clustering efficiency of the new reservoir and compared them with each other. For investigating the spectral radius and predictive power of the resulting reservoirs, we also applied them to the laser time series and the Mackey-Glass dynamical system. It is demonstrated that new clustered reservoirs have some specifications of biologic neural systems and complex networks like average short path length, high clustering coefficient, and power-law distribution. The empirical results illustrated that the ESN based on PSO could strikingly enhance echo state property (ESP) and obtains less chaotic time series prediction error compared with other works and the original version of ESN. Therefore, it can approximate nonlinear dynamical systems and predict the chaotic time series.


Introduction
Unlike feed-forward neural networks, it is costly and challenging to train recurrent neural networks with traditional methods such as gradient descent in the presence of feedback loops. The echo state network (ESN) is considered to be a suitable alternative for gradient descent algorithms [1] in for cloud-based services [2]. Given that in ESN, both the input and internal connections weight matrix are constant, but the network output matrix is trained by the linear regression method; this efficient training method causes ESN has a highly dynamic behavior with little learning complexity. However, the random structure of the reservoir may reduce its accuracy of the estimation. Nevertheless, many effective schemes of reservoir production have proposed. Some works use the analytical method to construct the reservoirs but use these methods to solve the complex problems or discrete functions is difficult. To further improve the reservoir's performance, some other works are looking for evolutionary techniques [3][4][5]. Furthermore, echo state property (ESP) [6] is the astonishing capabilities in ESN. As mentioned by Jaeger [7], only provided the current reservoir state is exclusively specified by the long-time history of inputs after running; the ESN has echo state property (ESP). Such a way that, for ESN, the maximum eigenvalue of reservoir connection matrix, spectral radius, must be no larger than 1 in such a way that the ESP is maintained [8]. It should be noted that the bigger the spectral radius, the slower the network response to the input pulses, and the network memory capacity is enhanced. Hence, the ESN can have a better approximation ability and a more efficient computing power. In some sense, the spectral radius has an extraordinary effect on the approximation abilities of the ESN. To guarantee ESP without the reservoir weight scale, a reservoir production method has been proposed using the eigenvalue decomposition, and the convergence of the algorithm is also theoretically guaranteed. However, eigenvalues are still generated randomly [9].
In recent years, small-world phenomenon and scalefree property in many complex systems of real-world, like immune systems, biological networks, transport networks, internet backbone, citation networks, and many other networks have been discovered [10][11][12][13][14][15][16][17]. The small-world feature refers to the short characteristic path length as well as the high clustering coefficient in the network. It is noteworthy that the complex artificial neural networks such as associative memory systems [18] using the small-world phenomenon, and the scalefree property [19,20] have better efficiency in time and memory capacity than the randomly connected networks with similar connections. Also, the chaotic time series prediction problem by the small-world trait and the scale-free property can be solved more efficiently [21][22][23][24]. It is noteworthy that the forecast precision of a chaotic time series using ESN substantially increases with a ratio of 2400 compared with former methods [25].
Deng and Zhang [26] suggested a complex ESN model with a gradual growth state reservoir called HCESN, which included a lot of internal nodes with sparse connections. Their experimental results showed that the echo state property could be improved by permitting a large-scale spectrum of the acceptable spectral radius.
Jarvis et al. [27] investigated the impact of clusters, hierarchies, and interconnections between clusters on the spectral radius prediction ability. Their experimental results showed that hierarchy and almost small cluster size increments the amplitude of the spectral radius in the reservoir. Also, they showed that the size of the entire reservoir and the connection between the clusters effect on the allowed range of spectral radius. As well as, for clustering reservoir neurons in [28] classical clustering algorithms have been used.
In this article, we suggest four new (ESNs) and compare them with each other and previous works towards cloud computing. Then, the best network is selected as the desired reservoir. We showed that four suggested reservoirs have scale-free property, small-world trait, and distributed structure. Neurons in the first network are clustered using the K-means clustering technique, and the resulting reservoir is called HCESN-KM. The second network is clustered by the genetic algorithm (GA), and the created reservoir is named HCESN-GA. In the third network, clustering is done using the algorithm of differential evolution (DE), and the resulting reservoir is called HCESN-DE. And in the fourth network, neurons are clustered by particle swarm optimization (PSO) and are named HCESN-PSO. In each cluster, the meaningful nodes are considered as backbone neurons and other nodes as local neurons. Hence, connections between neurons have small-world topology and follow a power-law distribution so that the resulting models can reflect the learning mechanism of most biological systems. We also applied the new reservoirs to challenging difficulties like Mackey-Glass (MG) and laser time series prediction problems and evaluated them with each other and with previous works. The empirical results show that the suggested methods, particularly HCESN-PSO, outperform the previous ones in terms of the capability of approximating nonlinear dynamic systems and prediction accuracy of chaotic time series.
Other sections include the following. In Section 2, the classical clustering and three evolutionary clustering algorithms are briefly described. In Section 3, the proposed new state reservoirs using the small-world and the scale-free topologies are explained, and in Section 4, the complexities of HCESNs are analyzed. Dynamic approximation ability and enhanced echo state property are investigated and compared with previous works in Section 5. In the long run, the last section is devoted to conclusions.

Classical clustering
Clustering is an unsupervised learning issue because it classifies unlabeled data into classes or clusters, in such a way that the data within the same clusters have the most similarity and between different clusters have the most difference. Hence, it is needed to define a measurement criterion for these similarities and established a benchmark for allocating data to particular cluster centers. One of these criteria is the Euclidean distance of two data x and y or dist(x, y). The smaller the interval between x and y, the higher the similarity between them and vice versa [29]. This method attempt to assign the data in the dataset D to k-clusters Ci ,…, Ck so that C i ⊂ D, and C i ∩ C j = 0 for 1 ≤ i, j ≤ k. The clustering quality of C i is measured as follows: Here, MSE is the mean square error between all data in C i and the centroid of c i .
Although the K-means clustering algorithm is a simple and well-known method, it may get stuck at local optimum solutions, depending on the selection of the primary cluster centers. To overcome such a challenging problem, many evolutionary computation techniques have been suggested. In the next section, the proposed clustering algorithms are described. We employ these algorithms for clustering the internal neurons of proposed HCESNs in Section 3 [29].

Genetic algorithm-based clustering technique
Genetic algorithm (GA) is the initial conjecture-based search and optimization algorithm that is inspired by biological evolution. The algorithm searches for multiple paths simultaneously and hence reduces the probability of getting stuck in the local optimum solutions.
A cluster similarity metric, or a fitness function, can be used to search for optimal cluster centers in the feature space. To use the genetic algorithm to solve optimization problems, unknown variables are encoded in string structures. Each string is called a chromosome. Chromosomes are considered to encode many fixed cluster centers [29]. A set of strings (chromosomes) is called a population. At first, a population is created randomly that denote different objects. Each of them is related to an objective or fitness function that is optimized based on the principle of genetics and natural evolution. Inspired by biological factors like crossover and mutation, a new generation of the chromosome is produced. The mentioned process is to continue until conditions are met [30].

Differential evolution (DE)
This algorithm is a multipurpose optimization method that can find almost optimal solutions to real and mathematical problems. Individuals in differential evolution where d is the number of nonlinear optimization operators, and np is the population size. This method is proposed to overcome the local search problem in the genetic algorithm. The main difference between GA and DE is the order of crossover and mutation operators, also in how the selection operator works. According to [31], the classic evolutionary process of DE is as follows. For more details, see [32,33].

Generation of initial population
Using uniform distribution, the value of each individual is selected within the range unifrnd (0,1) function is used to generate a random number with [0,1] using uniform distribution.

Mutation
The mutation process delivers vector y i as an outcome by applying a strategy like yi = vi 1 , from the current generation where i 1 ≠ i 2 ≠ i 3 are an integer, F > 0 is the mutation factor and controls the difference of the mutation di = vi 2 − vi 3 .

Crossover
In the crossover, to get the trial vector u i (j), an element is taken either from donator vector y i (j) or target vector v i (j) based on the following expression where u j (0, 1) denotes the uniform random distribution among (0,1), and CR is the rate of crossover.

Selection
If get vector, the individual of the next generation is replaced.
where v′ i means the offspring of v i for the later generation [31].

Particle swarm optimization (PSO)
This algorithm is an evolutionary calculation method and a population-based stochastic search process introduced by Kenney and Eberhart (1995) [34]. The PSO models the social behavior of a group of birds or fish and is a subset of swarm intelligence. Each particle represents a position and a velocity in Qdimensional search space G, and it adjusts its position to the best particle position ever (pbest) and the best position in the particle population (gbest). Initially, the positions and velocities of all individuals are randomly determined. In each phase, first, the particle velocity and then its position are updated. Hence, each of the particles has a memory keeping their best where φ is the inertia weight, ω 1 and ω 2 are the acceleration constants, and both r 1 and r 2 are uniform random distribution in the range of [0, 1].
In this section, we consider three optimization algorithms, such as GA, DE, and PSO, as well as the Kmeans clustering algorithm. We apply these algorithms to several standard datasets from the University of California Irvine (UCI) database like Iris, Wine, CMC, Bupa, Vowel, Cancer, and Thyroid and analyze their results. Here, the Euclidean distance is used to the sum of the mean square error. The algorithms mentioned above have separately implemented 30 times for each UCI dataset by the Matlab 2016 software. The result is illustrated in Table 1.
The comparison results show that the PSO and Kmeans algorithms have the lowest and the highest cost functions in six datasets, respectively. On the other hand, the DE algorithm in the Wine dataset has the least cost function among the six datasets, and the GA has less cost function than the DE algorithm in the Bupa and Vowel datasets. The results in Table 1 show that PSO clustering methods have lots of potentials. As well as, in the general case, Fig. 1 shows the cost function for the K-means, GA, DE, and PSO algorithms with validity indices CS [35] and DB [36].

HCEAN's reservoir architecture
The reservoir architecture is illustrated in Fig. 2, which includes input units (I), internal units (N), and output units (Q).
The network activations for input units at the time step t are The reservoir nodes are sparsely connected, and the output of each internal node is known as a state and denoted by The output layer is collected all the inputs u i (t) in the layer one, and all the states x i (t) in the new state network via a Q × (N × I) output connection matrix W out , and in the long run, generate an output vector y(t) = [y 1 (t). y 2 (t). …. y q (t)] T of the HCESNs. At the same time, the output vector y(t) has feedback to whole the internal nodes via an n × q connection weight matrix W back . The activation function of tanh() is set for the last two layers. The weight matrices of input and feedback W in and W back are determined using uniform random distribution, as well as, the output connection weight matrix W out is tuned with supervised learning. Even so, the reservoir connection matrix W res is generated corresponding to our proposed evolutionary rules instead of the merely random methods used in [7,25]. The readout y(t) of HCESNs is implemented as follows: where v(t) is a threshold noise. Production of new state reservoirs includes 1. First, create a H × H grid space and initialize it. 2. Generate backbone neurons in the grid space by the uniform distribution. 3. Produce synaptic relationships using the clustering methods for recently added local neurons. Here, a classical clustering algorithm like K-means and three metaheuristic optimization algorithms including genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for clustering the internal neurons are used. The mean local neurons in each cluster are called the backbone unit. The number of backbone neurons is approximately 1% of local neurons [26]. The backbone neurons coordinate on the grid space are randomly created. And also, the resulting domains were separated. The distance among the backbone neurons should not be less than a specified basis. Adding the new local neurons to the reservoir space creates a fully connected grid of the backbone neurons N backbone . In this process, we randomly selected the coordinates (X, Y) of one of the backbone neurons and placed the coordinates x and y of the local neurons associated with the backbone of the grid space to distribute the unbalanced power law of outdegree using the algorithms mentioned above. This spatial distribution of neurons is very similar to the function of the human brain system [37,38]. 4. Applying preferential attachment rules [11]. The rule causes the neurons that recently added are attached to the neurons that have the most synaptic connectivity. If the coordinates of the new local neuron are considered as the center of the circle where the environment is the location of the backbone neurons, then the radius of that circle is regarded as the Euclidean distance. As a result, the backbone neurons have the most distance with new local neurons in the candidate neighbors. Our preferential attachment rules include the following steps: Assuming that the N link is the number of links for new local neurons. Also, n candidate and n current are the number of neurons in the candidate region and the current region of a new local neuron, respectively. ( n candidate ≤ n current ).  1. If N link ≥ n current , so, the link among the new local neurons to all neurons in the current region is complete. 2. If N candidate ≤ N link < N current , then the link among the new local neurons to all candidate neighborhood is generated with the probability of O i = P j∈C neighbor O j [39] where O i and O j are the outdegrees of neurons i and j, and C neighbor is the candidate neighborhood for new neurons.
If n conection < n candidate , the process of connecting candidate neurons to local neurons is the same as before.
Generally speaking, candidate neurons can help improve network clustering coefficient and preferential attachment rules to both small-world and scale-free features.

Computational complexity
The HCESN topology with the reservoir capacity of Q = 300 × 300 is generated according to the following parameters: the number of interior nodes N interior = 1000, the links of local nodes N link = 5, and the number of backbone nodes N backbone = 10. The spectral radius of this new reservoir is 0.979%, and its sparse connection is 2.105%.

Distributed and hierarchical architecture
As illustrated in Fig. 3, the new reservoir contains 1000 internal neurons in a 300 × 300 grid and 10 clear clusters or domains. Each of the color dots denotes an interior neuron created with the normal growth rules and clustering methods. In this paper, DB and CS clustering indices are used for clear clustering [40]. In each cluster, the backbone neurons are surrounded by a set of local neurons. The clusters are considered as macro-neurons at the top of the reservoir network hierarchy, and the number of interconnections within clusters is usually much greater than those that are between clusters. Therefore, dynamic behaviors are relatively independent of each cluster. The input connection matrix W in between input and interior neurons is determined using a uniform random distribution, and those input connections that are not linked to the interior neurons of the network are converted to zero. Therefore, the connection weight distribution of the input in the reservoir is a spatial distribution. Hence, our network architecture has a distributed and hierarchical spatial structure.

Small-world property
In graph theory, the clustering coefficient C and the minim characteristic path length L are used to describe the phenomenon of the small-world [10,41]. The reservoir network that is growing naturally consists of 10 3 internal neurons. Accordingly, it has a 10 3 ×10 3 reservoir connection matrix W res , by the sparse connection of 0.979%. Therefore, it is a complex and extensive network. In general, the minim characteristic path length denotes the gauge of a graph or complex system. As well as, it is referred to as the average interval overall pairs between two interior nodes.
The average short characteristic path length is specified The clustering coefficient is determined as the mean part of pairs of neighbor neurons of an interior node that are adjacent to each other. The clustering coefficient is computed by Here, E i is the real edges of the neighbor node i, K i is the total neighbors connected to the neuron i, and K i (K i − 1)/2 is the maximum number of possible connectivities between the neighbor neurons. Hence, C i denotes a ratio of real neighbor neuron connectivity to the maximum possible connectivities [42]. Deng and Zhang [26] represented a reservoir named SHESN. For this network, they computed the average path length L = 3.7692 and the clustering coefficient C = 0.2303. As well as, they calculated a random system with a similar size. For this network, they computed the small characteristic path length and the clustering coefficient, respectively, L R = 32668 and C R = 0.0112. Our experimental results denote that the short characteristic path length L related to four new reservoirs is approximately as near as L random , and the clustering coefficients C are immensely more extensive than C random . Hence, our new HCESNs are the complex network with the small-world phenomenon.
Besides, Table 2 illustrates the short characteristic path length and coefficient of clustering for each of the ten clusters in four networks. According to the table, it is observed that each cluster is also a sub-network of the small-world.

Scale-free property
Recently, it has been observed that the specific features of internet topologies can be described using the power law in the form of y = x γ [43], where γ, as an index or degree of the power law, is used to describing some of the properties of the global network topologies. So that, reservoirs with power-law distribution are called scalefree [26]. Deng and Zhang [26] observed the gradient of the resulting linear plot between (x, y) on a log-log scale as the power law of γ. They also used Pearson's correlation coefficient (p) to ensure that the power law existed. The closer the correlation coefficient is to 1, the more the data follow the power-law distribution [39].
We investigate two types of power-law distributions as follows: the number of nodes against outdegree and outdegree of nodes against rank. It should be noted that the rank of a node is defined as the number of connections of one node to another and can be calculated using the corrcoef in MATLAB. The correlation coefficients for HCESN-KM, HCESN-GA, HCESN-DE, and HCESN-PSO reservoirs with the p value of 0, were calculated as 0.987, 0.982, 0.986, and 0.984 respectively. Also, the correlation coefficient relationship between outdegree and number of nodes for HCESN-KM, HCESN-GA, HCESN-DE, and HCESN-PSO reservoirs was calculated as 0.968, 0.971, 0.978, and 0.989, respectively. As well as, we calculated the correlation coefficient for each of the 10 clusters. As shown in Table 2, for each cluster as a low-level sub-networks, power-law properties exist. Hence, the proposed HCESNs have some biological features, such as scale-free distribution [38].

Supervised learning
As mentioned in Section 3, to maintain the echo state property, the connection matrix W res of the new reservoirs must be attentively selected. And the input and the feedback connections (W in and W fb ) could be arbitrarily determined using the uniform distribution. The output connection matrix W out using supervised learning must be adjusted. In such a way that the RNN with the echo Table 2 The characteristics of the scale-free and small-world for ten clusters state property can approximate the following sample dataset with the length of n r .
u n 1 ð Þ; y d n 1 ð Þ f g : u n 2 ð Þ; y d n 2 ð Þ f g :…: u n r ð Þ; y d n r ð Þ f g After throwing away the initial transition n 0 , we must find W out to reduce the mean square error (MSE).
where u(t) and y d (t) is input and desired output vectors at the time(t), respectively. It should be noted that d(t) = tanh −1 y d (t), x(t) = [x 1 (t), x 2 (t)…x n (t)] T and (n) denotes the echo state parameters. An inverse matrix is used to predict this linear regression model. Hence, the matrix W out is obtained as follows: where T is the transpose. The (N + 1) × (n r − n 0 ) dimensional matrix is given as follows: It should be noted that the generalized inverse matrix computation (M G ) has been performed with the pinv function in MATLAB [26].

Mackey-Glass dynamic system
Mackey-Glass (MG) dynamic system, with a long time-delay δ, is an appropriate test-bed for the prediction of nonlinear chaotic systems [26,[44][45][46]. This system has been used by a challenging problem to verify the efficiency of the new HCESNs. The MG differential equation is given by where s and δ denote the state and the time-delay, respectively. A chaotic time series occurs in the Mackey-Glass when δ ≥ 17. The reason for the turbulence of this system is that the slightest change in the initial conditions has the most significant impact on the system output. Also, for solving differential equations with constant delay, the function of the dde23 in MATLAB, which contains a set of training and test data, is used. In particular, instead of the predefined value of the dde23 function, we determine absolute precision with (1e-16). To compare, we perform our experiments, according to datasets supported by Jaeger and Haas [25] and as well as Deng and Zhang [26] (Table 3).

Laser time series
The laser time series is broadly employed to testing a variety of prediction methods of the real-world chaotic time series [25,44,[47][48][49]. We used datasets 18 and 19 as used by Deng and Zhang [26].

Formulations of the problem
At this stage, the accuracy of HCESNs is evaluated in 100 independent runs, either at an identified observation point or for whole specified points. In particular, for the MG system, 84 instances from the learning dataset by length n k were selected as follows: {u(n r + 1); y d (n r + 1)} , u(n r + 2); y d (n r + 2)} to {u(n r + 84); y d (n r + 84)}.
where y i (t) denotes the network output in the ith test to predict the laser time series problem. Also, we considered the observation point t = n r + 84 and performed 100 separate experiments. To compute the normalized root mean square error (NRMSE). This time, we use all 200 data points in the experimental dataset, namely n t + 1 to n t + 200, to calculate NRMSE for 100 unique implementations.

Enhanced echo state property
According to the definition given by Jaeger in [25], if the maximum eigenvalue of the reservoir connectionweight matrix |⋋ max |(W res ) or spectral radius is no more than one, then ESN has echo state. It is fully compatible with the experimental studies of Deng and Zhang [26], that is, when the network has a spectral radius of more than one, the ESN is not able to work correctly. But they succeeded in enhancing the echo state property by increasing the spectral radius till the boundary of 6.0 in SHEEN.
Here, we performed 4 experiments using the datasets 1, 2, 18, and 19 on the Mackey-Glass system and laser time series prediction to check the echo state properties of the HCESNs. We calculated the NRMSE84 exam errors by increasing the spectral radius by the step size of 0.1. Our empirical results are illustrated in Figs. 6, 7, and 6, respectively. The following parameters are considered for testing HCESNs in datasets 1 and 2.
Reservoir capacity H × H = 200 × 200, number of internal nodes n in = 500, number of new local node connections n local = 5, number of backbone nodes n backbone = 5, input connection-weight W in , and feedback connection W fb were tuned by uniform distribution from − 1 to 1. The output connection-weight W out was obtained by supervised learning. The empirical results in the Mackey-Glass system for HCESNs are illustrated in Figs. 4 and 5, respectively. The parameters used in datasets 18 and 19 are the same as those used in datasets 1 and 2, except that the number of new local neuron connections n local = 1, input connection weight W in , and feedback connection W fb were tuned by uniform distribution from − 1 to 1. Our empirical results in the Mackey-Glass system for HCESNs are illustrated in Figs. 6 and 7, respectively.
The results demonstrate that the HCESN based on PSO clustering is capable of having a more extensive range of the spectral radius than the other three networks discussed in this paper. Even at a spectral radius more significant than 1, this reservoir can significantly enhance the echo state property.

HCESN's capability to nonlinear approximation 5.4.1 Mackey-Glass system
As mentioned above, for an MG system, by increasing time-delay δ, the system becomes extremely nonlinear. In particular, a chaotic time series occurs when δ ≥ 17. Therefore, the approximation of the Mackey-Glass system using the echo state network or any other model with increasing δ is almost impossible, and it is undoubtedly a significant challenge. Deng and Zhang [26] Table 5, the standard deviation for our HCESNs and other ESNs are listed. By increasing δ to 26 and beyond, the nonlinear approximation of ESN dramatically increases, which means that the nonlinearity dynamics of the system are much more critical, and it is very complicated to deal with this problem. For instance, when δ = 29, the SHESN and new HCESNs have a suitable performance. However, HCESN-PSO has higher accuracy than SHESN (Fig. 8).

Laser time series prediction
In Figs. 6 and 7, the prediction capability of HCESNs was evaluated based on datasets 18 and 19. Then, the prediction accuracy of HCESNs with SHESN in the spectral radius of 4.0 was compared, as mentioned by Deng and Zhang [26]. The results were also compared with the prediction accuracy of the entirely random ESN reservoir with a spectral radius of 0.9 in [26]. In Table 4, the list of NRMSE test errors in 100 independent runs is shown. The standard deviation for ESN, SHESN, and HCESNs is also shown in Table 5 for delays of 17 to 31.
As shown for ESN in [7,25], if the spectral radius |⋋ max |(W res ) is greater than 1, the echo state property (ESP) does not continue. But Deng and his colleagues [26] have shown that in the natural growing SHESN network, by permitting a wide scope of the acceptable spectral radius, the ESP increases.
According to [26], the largest eigenvalue in (i = 1) for both ESN and SHESN is considered 1.5. The slope of the curve corresponding to the ESN reservoir is very slow, and the size of each 100 eigenvalues is about 1.5. At the same time, SHESN reservoir eigenvalues are heavily decreased. And only 1.2% of all eigenvalues are larger than 1. However, the distribution of all eigenvalues follows the power law, in which the correlation coefficient with p = 0 is 0.989. We also achieved this fantastic phenomenon at HCESNs. So, we calculated the average of the total eigenvalues in 100 independent runs. Experimental results confirm that magnitudes of maximum eigenvalues are less than 1 and follow the power law. The correlation coefficients for HCESN-KM, HCESN-GA, HCESN-DE, and HCESN-PSO were 0.9671, 0.986, 0.988, and 0.992, respectively, as shown in Fig. 9.

Conclusions
In this study, an enhanced ESN was proposed, which includes a classic clustering algorithm and three  Results showed that PSO-based clustering is faster and has a lower cost function than the other proposed clustering. Therefore, HCESN-PSO is recommended for reservoir design. As well as, results in Section 4.3 denote that HCESN-PSO has the shortest characteristic path length and the highest clustering -coefficient, which includes the scale-free property and small-world phenomenon. We investigated two kinds of power-law distribution, such as the number of nodes vs. outdegree and outdegree of nodes vs. rank. We presented several natural incremental growth rules that include such as (a) average path length, (b) high clusteringcoefficient, (c) scale-free property, (d) distributed and hierarchical architectures. We reviewed all the behaviors of HCESNs and applied them to the Mackey-Glass system and the laser time series prediction. The empirical results confirm that, compared with the utterly random ESN by Jaeger [7], as well as the SHESN proposed by Deng and Zhang, our new HCESNs networks, specially HCESN-PSO, which include thousands of neurons or even more, can significantly enhance the echo state property (ESP) and approximate the highly complex nonlinear dynamic systems. Such an efficient system is likely to represent some biological neural properties, such as the smallworld phenomenon and scale-free distribution. In order to applications in other areas of research and applied developments, we suggest applications of the proposed method in environmental and energy studies which uses soft computing techniques [50][51][52]. This study also tried to make the HCESNs architecture more robust against noise on specific (hub) neurons by searching for the best centers of the backbone neurons. Undoubtedly, optimization methods with mathematical proofs and accurate statistical analysis of improved echo state property of ESNs are some of the most exciting and important issues that will be investigated in the future.