 Research
 Open access
 Published:
Thresholddriven Kmeans sector clustering algorithm for wireless sensor networks
EURASIP Journal on Wireless Communications and Networking volumeÂ 2024, ArticleÂ number:Â 68 (2024)
Abstract
The clustering algorithm is an effective method for developing energy efficiency routing protocol for wireless sensor networks (WSNs). In clustered WSNs, cluster heads must handle high traffic, thus consuming more energy. Therefore, forming balanced clusters and selecting optimal cluster heads are significant challenges. The paper proposes a sector clustering algorithm based on Kmeans called KMSC. KMSC improves efficiency and balances the cluster size by employing symmetric dividing sectors in conjunction with Kmeans. For the selection of cluster heads (CHs), KMSC uses the residual energy and distance to calculate the weight of the node, then selects the node with the highest weight as CH. A hybrid singlehop and multihop communication is utilized to reduce longdistance transmissions. Furthermore, the impact of the number of sectors, the threshold for clustering, and the network size on the performance of KMSC has been explored. The simulation results show that KMSC outperforms EECPKmeans, Kmeans, TSC, LSC, and SEECP in terms of FND, HND, and LND.
1 Introduction
In IOT systems, using inexpensive wireless sensor networks (WSNs) to sense targets and collect data is an effective way. The sensor nodes are connected by a 2.4Â G antenna and have limited energy. Energy efficiency has consistently been a focal point in WSNs because nodes, typically placed in unattended scenarios such as buildings, islands, and battlefields cannot be replaced or recharged.
Clustering is adopted to improve the energy efficiency of WSNs [1, 2]. In a WSN, nodes are often randomly placed, and clustering algorithms divide the network into clusters of varying sizes. A cluster head (CH) is selected for each cluster through a clustering selection process. Other nodes then join the cluster and send the collected data to the CH. However, due to the random distribution of nodes in the network, clustering algorithms can result in uneven clustering, leading to an imbalance in energy consumption among nodes, which in turn affects the networkâ€™s lifespan. Some CHs consume too much energy, leading to premature failure [3,4,5,6]. In addition, CHs that are closer to the sink must relay traffic from other CHs in addition to their own data, which leads to the earlier death of CHs in the vicinity of the sink. This is the hotspot problem. To balance the energy consumption among CHs, an unequal clustering algorithm has been proposed. This algorithm first calculates the competitive radius of each node based on the residual energy, distance to BS, and the number of neighbors before forming unequal cluster [7,8,9]. Because the competitive radius of each cluster is different, and it is proportional to the distance to the sink, unequal clustering can alleviate hotspot issues. However, it still cannot completely prevent the premature death caused by the uneven energy consumption of nodes. Some hybrid clustering approaches [10, 11] have been proposed to address the hotspot problem. These methods consider both equal and unequal clustering, offering a more balanced solution but at the cost of increased complexity.
Trackbased or gridbased clustering [12,13,14,15,16,17] divides the network into tracks or grids, with each forming a cluster. CHs are selected based on residual energy, distance to the sink, and node density. Data are delivered to the sink using multihop or singlehop path between CHs. Because the initial number of clusters generated by the clustering algorithm is fixed, it is easy to cause unequal clusters, which is not conducive to extending the network lifetime. In a low power, randomly deployed WSN, the challenge remains to flexibly partition the network, balance the energy consumption among nodes, and maximize the networkâ€™s lifespan.
Kmeans is a widely used for cluster formation. Several Kmeansbased algorithms [18,19,20,21,22] have been proposed for enhance clustering performance. Due to the repeated assignments of the dataset during the iteration process, the Kmeans algorithm has a very high time complexity. Moreover, some algorithms incorporate the midpoint algorithm or kernel density estimation (KDE) to refine the selection of the initial centroids for the Kmeans, aiming to achieve more stable clustering outcomes. However, this approach to stabilizing the initial centroid selection may delay the necessary CH rotation, potentially leading to premature CH failure.
Based on the previous discussion, unbalanced clustering algorithms or those based on kmeans may result in some clusters becoming too large, leading to uneven energy consumption among CHs. Trackbased or gridbased clustering algorithms, due to inflexible network division, limit their applicability. Inspired by these considerations, this paper introduces a new sector clustering algorithm named KMSC that has been developed.
To improve execution efficiency and reduce complexity, the network is segmented into distinct sectors. The KMSC algorithm assesses each sector to determine the suitability of applying the Kmeans algorithm, utilizing a predefined clustering threshold denoted as \(n_{\text{th}}^C\). In each cluster, the cluster head is selected based on the total communication distance and the residual energy. The contributions of this work are delineated as follows:

A new flexible and efficient sector clustering algorithm named KMSC based on Kmeans has been introduced. By dividing the network into sectors and using the threshold for clustering (\(n_{\text{th}}^C\)), KMSC reduces the size of the dataset, thereby improving Kmeans efficiency. Concurrently, KMSC utilizes the \(n_{\text{th}}^C\) to ensure the equitable distribution of cluster sizes.

A redesigned weight calculation method, based on the nodeâ€™s residual energy and the distance of communication, has been engineered to improve the selection process of cluster head.

The impact of the number of sectors, the threshold for clustering, and the network size on the performance of KMSC has been explored.

The performance of KMSC is evaluated by using simulation experiments. The results show that KMSC improves energy efficiency and prolongs the network lifetime.
The structure of the paper is as follows: Sect.Â 1 introduces the background and the problem of the paper. SectionÂ 2 introduces the method and experimental used in this work. SectionÂ 3 summarizes the existing work. SectionÂ 5 describes the network model and the energy model. KMSC is proposed in Sect.Â 5. SectionÂ 6 gives the simulation results. SectionÂ 7 concludes our work and future research direction.
2 Methods/experimental
This work presents a novel sector clustering algorithm, KMSC, designed to enhance the energy efficiency of routing protocols in wireless sensor networks (WSNs). The algorithm is based on the Kmeans clustering method incorporating sectorbased division to balance the cluster load among cluster heads (CHs). In KMSC, the selection of CHs is determined by a weight calculation that considers the residual energy, the distance from each node to the candidate CH, and distance to the sink. Nodes with the highest weights are designated as CHs. The experiments were conducted in a simulated environment using MATLAB 2019a. Various performance metrics were monitored and collected, including the first node dies (FND), half nodes die (HND), and last node dies (LND). These metrics are critical for assessing the lifetime and stability of the network.
3 Related work
Numerous clustering protocols have been proposed to realize energyefficient WSNs.
LEACH [1] is a wellknown clustering protocol. In LEACH, if the random number produced by the node is smaller than a threshold, it will be selected as the cluster head (CH). The rotation of cluster heads is used to balance energy consumption. However, LEACHâ€™s random CH selection overlooks nodesâ€™ residual energy, some nodes with low residual energy may also be selected as cluster heads, thereby impacting the performance of LEACH. HEED [3] is a distributed energyefficient clustering protocol. It selects cluster heads based on the residual energy and the density of nodes, which ensures uniform distribution of cluster heads across the network. To further balance energy consumption, HEED incorporates intercluster multihop transmission. AECR [4] produces a uniform cluster based on the distribution of nodes, facilitating an effective load balance and prolong the network lifetime. In response to the highenergy consumption caused by repeated clustering, ECRP [6] proposes a strategy based on residual energy, where cluster heads rotate within the cluster to avoid frequent reclustering. Before the network ends, ECRP only needs to perform clustering once. However, due to the uneven cluster, the death of hightraffic nodes may be significantly earlier than that of lowtraffic nodes. SEECP [23] adopts a deterministic cluster head selection and rotation strategy to reduce the cost of reclustering. In order to balance the energy consumption of cluster heads, SEECP designed a hybrid intercluster routing strategy. When the distance to the sink is less than the threshold R, the cluster head delivers the collected data to the sink in a singlehop manner. Otherwise, the cluster head adopts multihop transmission. The threshold R in SEECP is calculated using the formula \(R=\sqrt{(}X_{\text{max}}^2+Y_{\text{max}}^2)\), with \(X_{\text{max}}, Y_{\text{max}}\) denoting the length and width of the network area, respectively.
When multihop is used between CHs, the CH near the sink not only needs to relay traffic from other CHs, but also needs to transmit its own traffic, which will lead to premature death of CHs around the sink. This is the problem of hotspot. To counteract the hotspot issue, unequal clustering schemes have been introduced. EADUC [8] is a distributed, energyaware unequal clustering protocol. It accounts for residual energy, neighbor density, and the distance to the sink when calculating the competition radius for each node. Unequal clusters are then formed with CHs having varying competition radius, leading to smaller cluster sizes closer to the sink and larger ones further away. HCD [11] is a density based hybrid multihop clustering technique. In cluster head elections, HCD applies the residual energy, the number of neighbors, the distance to sink, and the location of nodes to balance both energy consumption and traffic. HEUC [10] is a hybrid clustering technique that utilizes both equal and unequal clustering to maximize the network lifetime. In HEUC, the network area is segmented into zones. Clusters included in the same zone have the same size. In addition, the cluster size increases proportionally to the distance from zone to sink. However, HEUC necessitates a deterministic deployment strategy for node placement, which can greatly restrict the algorithmâ€™s scalability and practical applicability.
Recently, fuzzy logicbased clustering technique has been proposed. UCT2TSK [24] is an unequal clustering technique grounded in the interval type2 TSK fuzzy logic theory. It takes the residual energy, node density, and distance to sink as inputs to the interval type2 TSK fuzzy logic system (FLS), and then selects CHs and sets competition radius based on the output of FIS to achieve better performance in network lifetime and throughput. EFUCA [25] presents a FLS which takes the residual energy, average communication distance, and distance to sink as inputs. In EFUCA, time is divided into rounds, and at the start of each round, each node generates a random number. If this number exceeds a predefined threshold, the node becomes a candidate CH. Subsequently, each candidate CH calculate its rank and competition radius using the FLS. The final CH is determined by the FLS output. EFUC [26] is another clustering technique that employs an FLS. The FIS in EFUC takes the residual energy, distance to sink, and the node density as inputs to calculate the competitive radius of node. When selecting a CH, the node generates a random probability that indicates the likelihood of the node becoming the cluster head. Then, based on the competitive radius and the random probability, the final CH is elected. FZC [27] is a clustering technique used to solve the problem of imbalanced energy consumption in heterogeneous WSNs. In order to improve energy efficiency, FZC employs a zonebased clustering mechanism and incorporates a fuzzy logic system (FLS) to dynamically determine the selection of CHs. However, a semirandom deployment strategy is required for nodes when using FZC, which may impose certain constraints on the network setup and scalability.
ECRPUCA [28] has extended the network lifetime by using unequal clustering and ant colony optimization (ACO) technology. This approach strategically divides the network into clusters of variable sizes, taking into account critical parameters such as residual energy, the number of neighbors, distance to sink, and the number of feedback nodes. ECRPUCA adopts batchbased clustering, which enables it to avoid reclustering during several rounds, reducing the control overhead. A clustering technique based on butterfly optimization algorithm (BOA) and ACO is proposed in [29]. The author takes the residual energy, distance to both sink and neighbors, node centrality, and the degree of the node as inputs for BOA. The output of BOA is used to select the most proper CH. Furthermore, an ACObased routing protocol is developed to improve energy efficiency. GEC [30] introduces a gamebased clustering technique. Within GEC, nodes are treated as players in the game. The length of idle listening is adopted by the player to adjust the nodeâ€™s activity state. By keeping nodes asleep as much as possible, energy conservation is significantly enhanced. PSOLEACH [31] finds the Elbow method suitable for small to mediumsized networks while the Silhouette method performs better for larger networks. It also validates the energy efficiency and packet transmission performance of a PSObased LEACH compared to LEACH in smallsized WSNs. The work highlights the utility of the PSO in improving clustering performance and energy efficiency in WSNs.
In [19], the authors proposed a Kmeansbased clustering algorithm for WSNs. The author takes energy as the main factor for selecting CH, and the computational complexity of Kmeans also is discussed. A modified Kmeans algorithm was introduced in [32], where CHs are selected based on residual energy and communication distance. EECPKmeans [20] is an energyefficient clustering protocol based on Kmeans algorithm. In order to produce balanced clusters, EECPKmeans uses the midpoint algorithm to select the initial centroids. It also employs intercluster multihop transmission to alleviate the energy consumption caused by longdistance communication. ISkmeans [21] is an improved softkmeans clustering technique, which is used to balance the energy consumption. It adopts kernel density estimation (KDE) [33] and clustering by fast search and find of density peaks (CFSFDP) [34] algorithm to optimize the selection of initial centroids. In order to obtain an even cluster size, a node reassignment is adopted by ISkmeans. In [35], the authors introduce a novel clusterbased routing protocol using multistrategy fusion snake optimizer (MSSO) for selecting cluster heads and relay nodes, and minimum spanning tree for intercluster routing. It enhances energy efficiency through dynamic parameter updating, adaptive mutation, and bidirectional search optimization, extending network lifetime. The protocol optimizes clustering and routing by integrating fuzzy cmeans and considering multiple factors like node location, energy, and distance to the base station. [36] proposes a hybrid clustering algorithm that combines the Kmeans algorithm for cluster formation with a fuzzy logic system for selecting CHs. The algorithm aims to balance the load and minimize energy expenditure by considering residual energy, nearest neighbors, and distance to the sink. The simulation results demonstrate improved performance in terms of energy consumption and network lifetime.
TSC [12] is a clustering algorithm that reduces energy consumption by preventing redundant data transmission. In order to reduce redundant data and shortest transmission distance, it divides the network into concentric tracks and triangular sectors. In TSC, each trackâ€™s triangular sector is structured with a transport chain, thereby reducing the distance for data to reach from cluster members to CH and from the CH to the sink. TSTCS [37] is a clustering technique based on a track tree structure. It forms a hierarchical tree structure cluster and employs optimal energy consumption for communication between the CH and the sink. LSC [17] is a lightweight sectorbased clustering algorithm and was introduced to balance cluster sizes. It selects CHs by considering communication distance and residual energy, thereby extending the network lifetime. EEORP [15] is an energyoptimized routing protocol that adopts a gridbased dynamic CH election method to reduce energy consumption. In order to enhance the energy efficiency of the intercluster communication, EEORP constructs data transmission paths using hop count gradients and grid distances. PEGCP [16] combines chain transmission and grid clustering to reduce energy consumption. It uses a grid algorithm to partition the network into virtual cells, referred to as clusters. A chain transmission is used to complete data transfer within and among the clusters.
The aforementioned techniques often result in uneven cluster sizes due to the random distribution of nodes, potentially leading to hotspots. Unequal clustering can mitigate this issue by forming clusters of varying sizes. However, because nodes are randomly placed, it cannot be assured that the node density and energy consumption are proportional to the cluster radius. Additionally, Kmeansbased clustering algorithms see a sharp rise in computational complexity as node numbers increase, affecting performance. This work proposes a novel sectorbased clustering algorithm that incorporates Kmeans. It begins with sector division to achieve balanced cluster size and uses a clustering threshold to decide on the execution of Kmeans within a sector. With a reduced node count in each sector, the time complexity of the Kmeans algorithm is lowered. Finally, CHs are selected based on residual energy and communication distance which contributes to an extended network lifetime.
4 Kmeans clustering algorithm
The basic idea of the Kmeans clustering algorithm is to divide a given dataset into k disjoint clusters, where k is a preset value. The algorithm mainly consists of two phases. AlgorithmÂ 1 details the procedure of the basic Kmeans clustering algorithm.
Table 1 lists the symbols utilized in this work.
5 Thresholddriven Kmeans sector clustering algorithm
5.1 Network model
We consider a static, homogeneous wireless sensor network. Nodes have the same initial energy and are randomly distributed in the target area. For the purposes of this work, we assume the following conditions are met:

The sink is provided with high processing and a constant supply of energy.

The battery equipped by the node is nonreplaceable or nonrechargeable.

Nodes can obtain location by using GPS or a localization algorithm.
5.2 Energy model
In KMSC, the firstorder radio energy model used in [1] is adopted. For a given distance d between the sender and receiver, the energy expenditure for delivering an lbits packet by both the sender and the receiver is given by Eqs.Â (1) and (2).
In the model, \(E_{\text{ele}}\) denotes the dissipated energy per bit in electronic circuitry, \(\epsilon _{\text{amp}}\) is the energy consumed for the amplification of the wireless signal, and \(\epsilon _{\text{fs}}\) represents the energy cost associated with freespace signal propagation. \(d_{\text{th}}\) is the distance threshold, which can be calculated using Eq.Â (3).
5.3 Parameters and methods for selecting cluster heads
In KMSC, the CH is elected based on the total distance (\(D_w\)) and the residual energy (\(E_r\)).
The energy model suggests that minimizing the distance helps in preserving the CHâ€™s energy, thereby delaying node death. The total distance in KMSC can be calculated using Eq.Â (4).
where \(D_{w}^i\) denotes the total distance when node \(s_i\) is choose as the CH. \(C_n\) presents the number of nodes in a cluster. \(D_{i,\text{BS}}\) is the distance to sink. \(D_{i,j}\) is the Euclidean distance between \(s_i\) and \(s_j\), which can be calculated by using (5).
where \(s_i.xd\),\(s_j.xd\),\(s_i.yd\),\(s_j.yd\) are the x and y coordinates of \(s_i\) and \(s_j\), respectively.
To prevent nodes with low residual energy from being selected as CHs, the residual energy is considered during the CH selection process, with its value determined by Eq.Â (6).
where \(E_{\text{ini}}\) is the initial energy, and \(E_d\) is the energy dissipated till now.
Within a cluster, a suitable CH should fulfill two conditions: It must possess adequate energy to receive and transmit aggregated data to the sink, and it should maintain a moderate distance to minimize energy expenditure. By using the distance and residual energy discussed above, KMSC calculates the weight of each node in the cluster, represented as \(s_i\), through (7), and selects the node with the highest weight as the CH.
where \(E_r^{\text{min}}\) (\(E_ r^{\text{min}} > 0\)) is the lowest residual energy in the cluster, \(E_r^{\text{max}}\) (\(E_r^{\text{max}} \le E_0\)) is the highest residual energy, \(D_t^{\text{min}}\) is the minimum distance, \(D_t^{\text{max}}\) is the maximum distance, \(\alpha\) (\(0 \le \alpha \le 1\)) is used to adjust the proportion of residual energy in the nodeâ€™s weight.
5.4 Proposed KMSC algorithm
The work of KMSC algorithm consists of the following three phases:
Phase 1: Sectors division (AlgorithmÂ 2)
Phase 2: Cluster formation (AlgorithmÂ 3)
Phase 3: Data aggregation (AlgorithmÂ 4)
FigureÂ 1 shows the flowchart of KMSC.
AlgorithmÂ 2 details the procedure for dividing the network into sectors, assigning nodes to these sectors based on their spatial orientation. atan2 returns the radian between \(\pi\) and \(\pi\). Using the returned angle, the algorithm sets the sector id for each node. Finally, we will obtain a list of \(n * p\) sectors. FigureÂ 2 illustrates the division of sectors applied in KMSC.
AlgorithmÂ 3 outlines the cluster formation process. For each sector, when the number of nodes \(n_{sc} > n_{\text{th}}^C\), the algorithm calculates the number of clusters \(n_c\) as \(n_c=\frac{n_{sc}}{n_{\text{th}}^C}\). \(n_{\text{th}}^C\) is the threshold for clustering. The Kmeans is then used to divide the sector into \(n_c\) clusters, and the centroids of the clusters are selected as candidate CHs. All regular nodes join the nearest candidate CH to form a cluster. Subsequently, for each cluster in the sector, the weight of the node is calculated using Eq.Â (7), and the node with the highest weight is selected as the CH. FigureÂ 2 is an illustration of AlgorithmÂ 3.
FigureÂ 2 shows an illustration of KMSC. The figure shows three sectors, among which \(\text{sc}_1\) and \(\text{sc}_3\) only forms one cluster because the number of nodes is lower than the threshold \(n_{\text{th}}^C\) (assuming \(n_{\text{th}}^C=5\)). Since the number of nodes in \(\text{sc}_2\) is 10, thus forming two clusters. Furthermore, because the distance between the \(\text{CH}_2\) and sink is greater than the distance threshold \(d_{\text{th}}\), \(\text{CH}_1\) located in the same sector will forward its data. The sink locates at the center of the network.
AlgorithmÂ 4 describes the data aggregation phase. All data produced by nodes are collected by the cluster head (CH), where they are aggregated before being forwarded to the sink through a direct or indirect route consisting of one or multiple hops.
5.5 Complexity analysis
The KMSCâ€™s runtime complexity mainly involves three phases: sectors division (Phases 1), cluster formation (Phase 2), and data aggregation (Phase 3). In Phase 1, KMSC needs o(n) operations to calculate the nodesâ€™ radian and sector id where n is the number of nodes. Then, in Phase 2, KMSC requires \(o(2r*\frac{1}{p}*n_{C})\) operations to execute Kmeans where \(n_C \le \frac{1}{n_{\text{th}}^C*p}\) and r is the number of iterations of Kmeans, and o(n) operations to form cluster. Because the average number of nodes in the clustering is \(n*p\), the value of r is small. In Phase 3, KMSC needs o(n) operations to collect data. Thus, the overall time complexity of KMSC is \(o(n+r*\frac{1}{p}*n_{C})\) operations.
6 Simulation results and analysis
6.1 Simulation parameters
We evaluated the performance of KMSC using MATLAB and compared it with other algorithms, including EECPKmeans [20], Kmeans [19], TSC [12], LSC [17], and SEECP [23]. The network size is 100Â m x 100Â m, and the sink is located at (50,50). The number of nodes is 100 and 200, and randomly scattered in the network area. The results presented are an average derived from 50 independent experiments. Our simulation parameters are detailed in TableÂ 2.
6.2 Network lifetime
FigureÂ 3 shows the first node death (FND), half node death (HND), and last node death (LND) for each algorithm. Among clustering algorithms, energy consumption is ideally balanced, resulting in a delayed FND. In Fig.Â 3(a), the average FND of EECPKmeans is 257, which is much earlier than the other algorithms. However, compared to TSC, LSC, SEECP, and Kmeans, the average LND occurs later. This indicates that the energy consumption of the EECPKmeans algorithm is not balanced. The main reason is that although the midpoint algorithm can balance the cluster size, it slows down the rotation of cluster head, causing an earlier death of cluster heads. KMSCâ€™s FND is about 4.3 times that of EECPKmeans, 1.2 times that of TSC and SEECP, and higher than Kmeans and LSC, indicating that KMSC can delay the death of the first node more effectively. Furthermore, KMSCâ€™s HND is 1164, surpassing the HND of EECPKmeans, Kmeans, TSC, SEECP, and LSC, which suggests slight delay in the death of the first 50% of nodes. In Fig.Â 3(a), KMSCâ€™s average LND is 2943, approximately twice that of EECPKmeans, TSC, and LSC, and three times that of Kmeans and SEECP.
Instead of only using the residual energy to select the CH as in EECPmeans, KMSC chooses CH based on both residual energy and the distance produced by the cluster. In each cluster, only the node with the maximum weight (calculated using Eq.Â (7)) can be selected as the CH, thereby preventing the early death of CH due to low residual energy. Simultaneously, this mechanism also avoids selecting nodes that are too far from the cluster center as CHs, thus prolonging the networkâ€™s lifetime.
In Fig.Â 3(b), the average FND and HND for all algorithms decreased. This is because an increases in the node density leads to increased traffic on the CHs, accelerating their energy depletion. Although the EECPKmeans algorithm still showed very poor results in terms of average FND, it has a relatively high LND compared to Kmeans and SEECP, indicating that about half of the nodes can survive longer. Additionally, it also indicates that KMSC outperforms the other five algorithms in delaying FND, HND, and LND.
FigureÂ 4 presents a comparison of the number of alive node between KMSC and five other algorithms. As depicted in Fig.Â 4(a), except for EECPKmeans, the alive node curves of the other five algorithms are approximately vertical. This suggests that in these algorithms, most nodes have approximately the same number of rounds before they die. In addition, it can be seen from the figure that the KMSC surpasses other algorithms in energy consumption balance. The results in Fig.Â 4(a) also indicate that the EECPKmeans algorithm has a longer network lifetime. In Fig.Â 4(b), all algorithms exhibit similar behavior to Fig.Â 4(a), and the KMSC algorithm continues to outperform the other five algorithms in balancing energy consumption and prolonging network lifetime.
We evaluated the average FND, HND, and LND of all algorithms when the network size was 200Â m x 200Â m. As shown in Fig.Â 5(a), the average FND, HND, and LND of all algorithms have decreased. FigureÂ 5(a) shows that EECPKmeans is more sensitive to changes in network size than other algorithms, and its performance has decreased significantly. In terms of average FND, EECPKmeans decreased from 257 to 24, a decrease of more than 90%. In terms of average HND and LND, EECPKmeans decreased by 59% and 45.3%, respectively. Meanwhile, the average FND of KMSC decreased from 1110 to 806, a decrease of 27.4%. Its average HND decreased by 6.8%, while its LND decreased from 2943 to 2107, a decrease of 28.4%. For SEECP, the average FND decreased from 965 to 420, a decrease of 56.5%. The average HND and LND of SEECP decreased by 19.6% and 15.5%, respectively. Since LSC uses distance and residual energy to select CHs, its average FND, average HND, and average LND decreased by 26.9%, 14.1%, and 16.1%, respectively. Compared with LSC, TSC also implements multihop intercluster transmission to reduce longdistance communication. Its average FND, average HND, and average LND decreased by 7.3%, 11.8%, and 25.4%, respectively. Due to the increased communication distance between CHs and sink, the average FND, average HND, and average LND of Kmeans decreased by 19.7%, 15.1%, and 11.8%, respectively. In summary, Fig.Â 5(a) clearly demonstrates that when changing network size, KMSC still has an advantage in terms of FND, HND, and LND.
FigureÂ 5(b) shows the comparison results of the algorithm in terms of alive node. It can be seen that the alive node curves of all algorithms except EECPKmeans are approximately vertical. Although EECPKmeans adopts multihop intercluster communication, its average FND and average HND performance are poor due to the problem of untimely rotation of CHs. Furthermore, Fig.Â 5(b) shows that KMSC algorithm outperforms other algorithms in maximizing network lifetime.
6.3 The impact of \(\alpha\)
In KMSC, \(\alpha\) represents the impact of the residual energy on the selection of CHs. FigureÂ 6 shows the results when \(\alpha\)\(\alpha =0.5\) is set to 0.5 and 0.6. It can be seen that the average FND is 800 when , which is about onethird less than the FND of \(\alpha=0.6\). In terms of average HND, there is no significant change. However, when \(n=100\), the average LND increased by about 15%, from 3000 to 3500. When \(n=200\), there was no significant change in the average LND. One possible reason is that nodes with less residual energy but short distance are also likely to be selected as CHs when \(\alpha =0.5\), causing early death of CHs and then affecting the average FND. Correspondingly, since nodes with more residual energy are normal nodes in the cluster, it is beneficial to extend the average LND. When the number of nodes increases to 200, the advantage of KMSC in terms of average LND is somewhat diminished due to the increased traffic for CHs.
6.4 The impact of \(n_{\text{th}}^C\)
In KMSC, \(n_{\text{th}}^C\) determines whether to perform Kmeans in a sector. FigureÂ 7 shows the results when \(n_{\text{th}}^C\) is set to 5 and 8, respectively, with \(p=0.1\). It can be seen that when \(n_{\text{th}}^C = 8\), there is no significant change in the average FND and average HND. However, the average LND decreases from 2943 to 2576 for n=100, and from 3597 to 3102 for n=200, which corresponds to reductions of 12.7% and 13.7%, respectively. The result shows that the number of clusters has a slight effect on the LND. Compared with \(n_{\text{th}}^C = 8\), when \(n_{\text{th}}^C = 5\), there are more clusters in the network, and the traffic of the CH is smaller, which reduces energy consumption and is conducive to extending the networkâ€™s lifetime.
6.5 The impact of p
In KMSC, p is used to determine the number of sectors. FigureÂ 8 shows the result when p is set to 0.05 and 0.1, respectively, with \(n_{\text{th}}^C=5\). As can be seen from the figure, the average FND, HND does not change much. However, compared to \(p=0.05\), the average LND increased by 100% when \(p=0.1\). This demonstrates that the number of sectors has a significant effect on the LND. When \(p=0.1\), the number of sectors is doubled compared with \(p=0.05\), which helps to reduce the number of nodes in the sector, thus reducing the energy consumption of CH. When \(p=0.05\), the reduction in the number of sectors means there are more nodes per sector, which increases the energy consumption of CHs. Additionally, the initial centroids selected randomly can lead to unbalanced cluster, thereby affecting the networkâ€™s lifetime.
6.6 Discussion
From the previous results, KMSC is able to better balance cluster size and energy consumption, outperforming existing work in terms of FND, HND, and LND. Meanwhile, the algorithm achieves high execution efficiency due to the limitation on the number of nodes in each sector. Additionally, as can be seen from Figure 6, Figure 7, and Figure 8, \(\alpha\), \(n_{\text{th}}^C\), and p have the greatest impact on LND. Similar to other clustering algorithms, the cluster head selection and rotation mechanisms of KMSC are key to optimizing the network lifetime.
7 Conclusions
To enhance the energy efficiency of wireless sensor networks, this paper introduces a thresholddriven Kmeans sector clustering algorithm named KMSC. KMSC enhances efficiency and equalizes cluster sizes through the use of sector division and the Kmeans method. Within KMSC, a nodeâ€™s weight is determined by its residual energy and distance, and the node with the highest weight is chosen as the cluster head (CH). The performance of KMSC is assessed through simulation experiments and compared with EECPKmeans, Kmeans, TSC, LSC, and SEECP. The results indicate that KMSC surpasses the other five algorithms in terms of FND, HND, and FND, suggesting that KMSC can balance energy consumption and extend the networkâ€™s lifespan.
Moreover, we also discuss the impact of parameters \(\alpha\), p, and \(n_{\text{th}}^C\) on KMSCâ€™s performance. In future work, we plan to explore dynamic sector division and a strategic cluster head rotation to further distribute energy consumption evenly.
Availability of data and materials
The datasets used during the study are available from the corresponding author on reasonable request.
Abbreviations
 WSNs:

Wireless sensor networks
 CH:

Cluster head
 DS:

Dataset
 FND:

First node death
 HND:

Half node death
 LND:

Last node death
References
W.R. Heinzelman, A. Chandrakasan, H. Balakrishnan, Energyefficient communication protocol for wireless microsensor networks, In: Proceedings of the 33rd annual Hawaii international conference on system sciences. IEEE, pp. 10â€“pp (2000)
M.Z.A. Bhuiyan, G. Wang, A.V. Vasilakos, Local area predictionbased mobile target tracking in wireless sensor networks. IEEE Trans. Comput. 64(7), 1968â€“1982 (2015)
O. Younis, S. Fahmy, Heed: a hybrid, energyefficient, distributed clustering approach for ad hoc sensor networks. IEEE Trans. Mob. Comput. 3(4), 366â€“379 (2004)
K. Haseeb, K.A. Bakar, A.H. Abdullah, T. Darwish, Adaptive energy aware clusterbased routing protocol for wireless sensor networks. Wireless Netw. 23(6), 1953â€“1966 (2016)
C. Chen, F. Rao, X. Zhang, Y. Dong, An asynchronous cluster head rotation scheme for wireless sensor networks, In: International Wireless Communications and Mobile Computing Conference (IWCMC) 2015, 551â€“556 (2015)
N. Moussa, Z. HamidiAlaoui, A.E.B.E. Alaoui, ECRP: an energyaware clusterbased routing protocol for wireless sensor networks. Wireless Netw. 26(4), 2915â€“2928 (2020)
G. Chen, C. Li, M. Ye, J. Wu, An unequal clusterbased routing protocol in wireless sensor networks. Wireless Netw. 15(2), 193â€“207 (2007)
V. Gupta, R. Pandey, An improved energy aware distributed unequal clustering protocol for heterogeneous wireless sensor networks. Eng. Sci. Technol. Int. J. 19(2), 1050â€“1058 (2016)
H. Li, Y. Liu, W. Chen, W. Jia, B. Li, J. Xiong, Coca: Constructing optimal clustering architecture to maximize sensor network lifetime. Comput. Commun. 36(3), 256â€“268 (2013)
N.M. Shagari, M.Y.I. Idris, R.B. Salleh, I. Ahmedy, G. Murtaza, A.Q.B.M. Sabri, A hybridization strategy using equal and unequal clustering schemes to mitigate idle listening for lifetime maximization of wireless sensor network. Wireless Netw. 27, 2641â€“2670 (2021)
T. Han, S.M. Bozorgi, A.V. Orang, A.A.R. Hosseinabadi, A.K. Sangaiah, M.Y. Chen, A hybrid unequal clustering based on density with energy conservation in wireless nodes. Sustainability 11(3), 746 (2019)
N. Gautam, W.I. Lee, and J.Y. Pyun, Tracksector clustering for energy efficient routing in wireless sensor networks, In: Ninth IEEE international conference on computer and information technology, vol. 2. IEEE 2009, 116â€“121 (2009)
Q. Xu and J. Zhao, Multihead tracksector clustering routing algorithm in wsn, In: 4th International Conference on Information Technology and Management Innovation. Atlantis Press, pp. 707â€“713 (2015)
S. Dutt, G. Kaur, S. Agrawal, Energy efficient sectorbased clustering protocol for heterogeneous wsn, In: Proceedings of 2nd International Conference on Communication, Computing and Networking: ICCCN 2018, NITTTR Chandigarh, India. Springer, pp. 117â€“125 (2019)
X. Ren, J. Li, Y. Wu, Y. Chen, H. Sun, Z. Shi, An enhanced energy optimization routing protocol for wsns. Ann. Telecommun. 76, 343â€“354 (2021)
F. Bouakkaz, M. Derdour, Maximizing wsn life using power efficient gridchain routing protocol (pegcp). Wireless Pers. Commun. 117(2), 1007â€“1023 (2021)
B. Zeng, C. Zhao, Y. Zhang, J. Sun, X. Gao, A sectorbased energyefficient lightweight clustering algorithm. IEEE Access 10, 108 285108 295 (2022)
M. Daviran, R. Ghezelbash, M. Niknezhad, A. Maghsoudi, H. Ghaeminejad, Hybridizing kmeans clustering algorithm with harmony search and artificial bee colony optimizers for intelligence mineral prospectivity mapping. Earth Sci. Inf. 16(3), 2143â€“2165 (2023)
P. Sasikumar, S. Khara, Kmeans clustering in wireless sensor networks, In: Fourth international conference on computational intelligence and communication networks. IEEE 2012, 140â€“144 (2012)
A. Ray, D. De, Energy efficient clustering protocol based on kmeans (eecpkmeans)midpoint algorithm for enhanced network lifetime in wireless sensor network. IET Wireless Sens. Syst. 6(6), 181â€“191 (2016)
B. Zhu, E. Bedeer, H.H. Nguyen, R. Barton, J. Henry, Improved softkmeans clustering algorithm for balancing energy consumption in wireless sensor networks. IEEE Int. Things J. 8(6), 4868â€“4881 (2020)
M. Daviran, R. Ghezelbash, A. Maghsoudi, Gwokm: a novel hybrid optimization algorithm for geochemical anomaly detection based on grey wolf optimizer and kmeans clustering. Geochemistry 84(1), 126036 (2024)
N. Mittal, U. Singh, B.S. Sohi, A stable energy efficient clustering protocol for wireless sensor networks. Wireless Netw. 23, 1809â€“1821 (2017)
Y. Tao, J. Zhang, L. Yang, An unequal clustering algorithm for wireless sensor networks based on interval type2 tsk fuzzy logic theory. IEEE Access 8, 197 173197 183 (2020)
P.S. Mehra, EFUCA: enhancement in fuzzy unequal clustering and routing for sustainable wireless sensor network. Complex Math. Intell. Syst. 8(1), 393â€“412 (2021)
C. SoIn, S. Phoemphon, P. Aimtongkham, T.G. Nguyen, An energyefficient fuzzybased scheme for unequal multihop clustering in wireless sensor networks. J. Ambient Intell. Humaniz. Comput. 12(4), 873â€“95 (2021)
T. Stephan, K. Sharma, A. Shankar, S. Punitha, V. Varadarajan, P. Liu, Fuzzylogicinspired zonebased clustering algorithm for wireless sensor networks. Int. J. Fuzzy Syst. 23, 506â€“517 (2020)
N. Moussa, Alaoui A. Belrhiti, An energyefficient clusterbased routing protocol using unequal clustering and improved ACO techniques for WSNs. PeertoPeer Networking and Applications. 14(3), 1334â€“47 (2021)
P. Maheshwari, A.K. Sharma, K. Verma, Energy efficient cluster based routing protocol for WSN using butterfly optimization algorithm and ant colony optimization. Ad Hoc Netw. (2021). https://doi.org/10.1016/j.adhoc.2020.102317
X. Yan, C. Huang, J. Gan, X. Wu, Game theorybased energyefficient clustering algorithm for wireless sensor networks. Sensors 22(2), 478 (2022)
G. W. Hamaali, K. A. Abduljabbar, D. R. Sulaiman. Kmeans clustering and pso algorithm for wireless sensor networks optimization. Univ. ThiQar J. Eng. Sci. 13(1), 40â€“50 (2023)
R. Gantassi, Z. Masood, S. Lim, Q. A. Sias, Y. Choi, Performance analysis of machine learning algorithms with clustering protocol in wireless sensor networks, In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC) 2023, 543â€“546 (2023)
A. Ihsani, T.H. Farncombe, A kernel density estimatorbased maximum a posteriori image reconstruction method for dynamic emission tomography imaging. IEEE Trans. Image Process 25, 2233â€“2248 (2016)
A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks. Science 344, 1492â€“1496 (2014)
L. Yang, D. Zhang, L. Li, Q. He, Energy efficient clusterbased routing protocol for WSN using multistrategy fusion snake optimizer and minimum spanning tree. Sci. Rep. (2024). https://doi.org/10.1038/s41598024667039
B. Angadi, M.S. Kakkasageri, Kmeans and fuzzy based hybrid clustering algorithm for wsn. Int. J. Electron. Telecommun. (2023). https://doi.org/10.24425/ijet.2023.147703
J. Naveen, P. Alphonse, S. Chinnasamy, Tracksectortree clustering scheme for dense wireless sensor networks. Clust. Comput. 22, 12 42112 428 (2019)
Funding
This research was funded by Key Scientific Research Projects of Henan (No. 24A520024)
Author information
Authors and Affiliations
Contributions
Conceptualization was performed by Bo Zeng; methodology was done by B. Z. and S. S. L.; validation was done by B. Z. and X. F. G.; formal analysis was performed by X. F. G.; writingâ€”original draft preparation was prepared by B. Z.; writingâ€”review and editing was performed by S. S. L. and X. F. G. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License, which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâ€™s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâ€™s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/byncnd/4.0/.
About this article
Cite this article
Zeng, B., Li, S. & Gao, X. Thresholddriven Kmeans sector clustering algorithm for wireless sensor networks. J Wireless Com Network 2024, 68 (2024). https://doi.org/10.1186/s13638024024032
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13638024024032