Skip to main content

Threshold-driven K-means sector clustering algorithm for wireless sensor networks

Abstract

The clustering algorithm is an effective method for developing energy efficiency routing protocol for wireless sensor networks (WSNs). In clustered WSNs, cluster heads must handle high traffic, thus consuming more energy. Therefore, forming balanced clusters and selecting optimal cluster heads are significant challenges. The paper proposes a sector clustering algorithm based on K-means called KMSC. KMSC improves efficiency and balances the cluster size by employing symmetric dividing sectors in conjunction with K-means. For the selection of cluster heads (CHs), KMSC uses the residual energy and distance to calculate the weight of the node, then selects the node with the highest weight as CH. A hybrid single-hop and multi-hop communication is utilized to reduce long-distance transmissions. Furthermore, the impact of the number of sectors, the threshold for clustering, and the network size on the performance of KMSC has been explored. The simulation results show that KMSC outperforms EECPK-means, K-means, TSC, LSC, and SEECP in terms of FND, HND, and LND.

1 Introduction

In IOT systems, using inexpensive wireless sensor networks (WSNs) to sense targets and collect data is an effective way. The sensor nodes are connected by a 2.4 G antenna and have limited energy. Energy efficiency has consistently been a focal point in WSNs because nodes, typically placed in unattended scenarios such as buildings, islands, and battlefields cannot be replaced or recharged.

Clustering is adopted to improve the energy efficiency of WSNs [1, 2]. In a WSN, nodes are often randomly placed, and clustering algorithms divide the network into clusters of varying sizes. A cluster head (CH) is selected for each cluster through a clustering selection process. Other nodes then join the cluster and send the collected data to the CH. However, due to the random distribution of nodes in the network, clustering algorithms can result in uneven clustering, leading to an imbalance in energy consumption among nodes, which in turn affects the network’s lifespan. Some CHs consume too much energy, leading to premature failure [3,4,5,6]. In addition, CHs that are closer to the sink must relay traffic from other CHs in addition to their own data, which leads to the earlier death of CHs in the vicinity of the sink. This is the hotspot problem. To balance the energy consumption among CHs, an unequal clustering algorithm has been proposed. This algorithm first calculates the competitive radius of each node based on the residual energy, distance to BS, and the number of neighbors before forming unequal cluster [7,8,9]. Because the competitive radius of each cluster is different, and it is proportional to the distance to the sink, unequal clustering can alleviate hotspot issues. However, it still cannot completely prevent the premature death caused by the uneven energy consumption of nodes. Some hybrid clustering approaches [10, 11] have been proposed to address the hotspot problem. These methods consider both equal and unequal clustering, offering a more balanced solution but at the cost of increased complexity.

Track-based or grid-based clustering [12,13,14,15,16,17] divides the network into tracks or grids, with each forming a cluster. CHs are selected based on residual energy, distance to the sink, and node density. Data are delivered to the sink using multi-hop or single-hop path between CHs. Because the initial number of clusters generated by the clustering algorithm is fixed, it is easy to cause unequal clusters, which is not conducive to extending the network lifetime. In a low power, randomly deployed WSN, the challenge remains to flexibly partition the network, balance the energy consumption among nodes, and maximize the network’s lifespan.

K-means is a widely used for cluster formation. Several K-means-based algorithms [18,19,20,21,22] have been proposed for enhance clustering performance. Due to the repeated assignments of the dataset during the iteration process, the K-means algorithm has a very high time complexity. Moreover, some algorithms incorporate the midpoint algorithm or kernel density estimation (KDE) to refine the selection of the initial centroids for the K-means, aiming to achieve more stable clustering outcomes. However, this approach to stabilizing the initial centroid selection may delay the necessary CH rotation, potentially leading to premature CH failure.

Based on the previous discussion, unbalanced clustering algorithms or those based on k-means may result in some clusters becoming too large, leading to uneven energy consumption among CHs. Track-based or grid-based clustering algorithms, due to inflexible network division, limit their applicability. Inspired by these considerations, this paper introduces a new sector clustering algorithm named KMSC that has been developed.

To improve execution efficiency and reduce complexity, the network is segmented into distinct sectors. The KMSC algorithm assesses each sector to determine the suitability of applying the K-means algorithm, utilizing a predefined clustering threshold denoted as \(n_{\text{th}}^C\). In each cluster, the cluster head is selected based on the total communication distance and the residual energy. The contributions of this work are delineated as follows:

  • A new flexible and efficient sector clustering algorithm named KMSC based on K-means has been introduced. By dividing the network into sectors and using the threshold for clustering (\(n_{\text{th}}^C\)), KMSC reduces the size of the dataset, thereby improving K-means efficiency. Concurrently, KMSC utilizes the \(n_{\text{th}}^C\) to ensure the equitable distribution of cluster sizes.

  • A redesigned weight calculation method, based on the node’s residual energy and the distance of communication, has been engineered to improve the selection process of cluster head.

  • The impact of the number of sectors, the threshold for clustering, and the network size on the performance of KMSC has been explored.

  • The performance of KMSC is evaluated by using simulation experiments. The results show that KMSC improves energy efficiency and prolongs the network lifetime.

The structure of the paper is as follows: Sect. 1 introduces the background and the problem of the paper. Section 2 introduces the method and experimental used in this work. Section 3 summarizes the existing work. Section 5 describes the network model and the energy model. KMSC is proposed in Sect. 5. Section 6 gives the simulation results. Section 7 concludes our work and future research direction.

2 Methods/experimental

This work presents a novel sector clustering algorithm, KMSC, designed to enhance the energy efficiency of routing protocols in wireless sensor networks (WSNs). The algorithm is based on the K-means clustering method incorporating sector-based division to balance the cluster load among cluster heads (CHs). In KMSC, the selection of CHs is determined by a weight calculation that considers the residual energy, the distance from each node to the candidate CH, and distance to the sink. Nodes with the highest weights are designated as CHs. The experiments were conducted in a simulated environment using MATLAB 2019a. Various performance metrics were monitored and collected, including the first node dies (FND), half nodes die (HND), and last node dies (LND). These metrics are critical for assessing the lifetime and stability of the network.

3 Related work

Numerous clustering protocols have been proposed to realize energy-efficient WSNs.

LEACH [1] is a well-known clustering protocol. In LEACH, if the random number produced by the node is smaller than a threshold, it will be selected as the cluster head (CH). The rotation of cluster heads is used to balance energy consumption. However, LEACH’s random CH selection overlooks nodes’ residual energy, some nodes with low residual energy may also be selected as cluster heads, thereby impacting the performance of LEACH. HEED [3] is a distributed energy-efficient clustering protocol. It selects cluster heads based on the residual energy and the density of nodes, which ensures uniform distribution of cluster heads across the network. To further balance energy consumption, HEED incorporates inter-cluster multi-hop transmission. AECR [4] produces a uniform cluster based on the distribution of nodes, facilitating an effective load balance and prolong the network lifetime. In response to the high-energy consumption caused by repeated clustering, ECRP [6] proposes a strategy based on residual energy, where cluster heads rotate within the cluster to avoid frequent reclustering. Before the network ends, ECRP only needs to perform clustering once. However, due to the uneven cluster, the death of high-traffic nodes may be significantly earlier than that of low-traffic nodes. SEECP [23] adopts a deterministic cluster head selection and rotation strategy to reduce the cost of reclustering. In order to balance the energy consumption of cluster heads, SEECP designed a hybrid inter-cluster routing strategy. When the distance to the sink is less than the threshold R, the cluster head delivers the collected data to the sink in a single-hop manner. Otherwise, the cluster head adopts multi-hop transmission. The threshold R in SEECP is calculated using the formula \(R=\sqrt{(}X_{\text{max}}^2+Y_{\text{max}}^2)\), with \(X_{\text{max}}, Y_{\text{max}}\) denoting the length and width of the network area, respectively.

When multi-hop is used between CHs, the CH near the sink not only needs to relay traffic from other CHs, but also needs to transmit its own traffic, which will lead to premature death of CHs around the sink. This is the problem of hotspot. To counteract the hotspot issue, unequal clustering schemes have been introduced. EADUC [8] is a distributed, energy-aware unequal clustering protocol. It accounts for residual energy, neighbor density, and the distance to the sink when calculating the competition radius for each node. Unequal clusters are then formed with CHs having varying competition radius, leading to smaller cluster sizes closer to the sink and larger ones further away. HCD [11] is a density based hybrid multi-hop clustering technique. In cluster head elections, HCD applies the residual energy, the number of neighbors, the distance to sink, and the location of nodes to balance both energy consumption and traffic. HEUC [10] is a hybrid clustering technique that utilizes both equal and unequal clustering to maximize the network lifetime. In HEUC, the network area is segmented into zones. Clusters included in the same zone have the same size. In addition, the cluster size increases proportionally to the distance from zone to sink. However, HEUC necessitates a deterministic deployment strategy for node placement, which can greatly restrict the algorithm’s scalability and practical applicability.

Recently, fuzzy logic-based clustering technique has been proposed. UCT2TSK [24] is an unequal clustering technique grounded in the interval type-2 TSK fuzzy logic theory. It takes the residual energy, node density, and distance to sink as inputs to the interval type-2 TSK fuzzy logic system (FLS), and then selects CHs and sets competition radius based on the output of FIS to achieve better performance in network lifetime and throughput. E-FUCA [25] presents a FLS which takes the residual energy, average communication distance, and distance to sink as inputs. In E-FUCA, time is divided into rounds, and at the start of each round, each node generates a random number. If this number exceeds a predefined threshold, the node becomes a candidate CH. Subsequently, each candidate CH calculate its rank and competition radius using the FLS. The final CH is determined by the FLS output. EFUC [26] is another clustering technique that employs an FLS. The FIS in EFUC takes the residual energy, distance to sink, and the node density as inputs to calculate the competitive radius of node. When selecting a CH, the node generates a random probability that indicates the likelihood of the node becoming the cluster head. Then, based on the competitive radius and the random probability, the final CH is elected. FZC [27] is a clustering technique used to solve the problem of imbalanced energy consumption in heterogeneous WSNs. In order to improve energy efficiency, FZC employs a zone-based clustering mechanism and incorporates a fuzzy logic system (FLS) to dynamically determine the selection of CHs. However, a semi-random deployment strategy is required for nodes when using FZC, which may impose certain constraints on the network setup and scalability.

ECRP-UCA [28] has extended the network lifetime by using unequal clustering and ant colony optimization (ACO) technology. This approach strategically divides the network into clusters of variable sizes, taking into account critical parameters such as residual energy, the number of neighbors, distance to sink, and the number of feedback nodes. ECRP-UCA adopts batch-based clustering, which enables it to avoid reclustering during several rounds, reducing the control overhead. A clustering technique based on butterfly optimization algorithm (BOA) and ACO is proposed in [29]. The author takes the residual energy, distance to both sink and neighbors, node centrality, and the degree of the node as inputs for BOA. The output of BOA is used to select the most proper CH. Furthermore, an ACO-based routing protocol is developed to improve energy efficiency. GEC [30] introduces a game-based clustering technique. Within GEC, nodes are treated as players in the game. The length of idle listening is adopted by the player to adjust the node’s activity state. By keeping nodes asleep as much as possible, energy conservation is significantly enhanced. PSO-LEACH [31] finds the El-bow method suitable for small to medium-sized networks while the Silhouette method performs better for larger networks. It also validates the energy efficiency and packet transmission performance of a PSO-based LEACH compared to LEACH in small-sized WSNs. The work highlights the utility of the PSO in improving clustering performance and energy efficiency in WSNs.

In [19], the authors proposed a K-means-based clustering algorithm for WSNs. The author takes energy as the main factor for selecting CH, and the computational complexity of K-means also is discussed. A modified K-means algorithm was introduced in [32], where CHs are selected based on residual energy and communication distance. EECPK-means [20] is an energy-efficient clustering protocol based on K-means algorithm. In order to produce balanced clusters, EECPK-means uses the midpoint algorithm to select the initial centroids. It also employs inter-cluster multi-hop transmission to alleviate the energy consumption caused by long-distance communication. IS-k-means [21] is an improved soft-k-means clustering technique, which is used to balance the energy consumption. It adopts kernel density estimation (KDE) [33] and clustering by fast search and find of density peaks (CFSFDP) [34] algorithm to optimize the selection of initial centroids. In order to obtain an even cluster size, a node reassignment is adopted by IS-k-means. In [35], the authors introduce a novel cluster-based routing protocol using multi-strategy fusion snake optimizer (MSSO) for selecting cluster heads and relay nodes, and minimum spanning tree for inter-cluster routing. It enhances energy efficiency through dynamic parameter updating, adaptive mutation, and bidirectional search optimization, extending network lifetime. The protocol optimizes clustering and routing by integrating fuzzy c-means and considering multiple factors like node location, energy, and distance to the base station. [36] proposes a hybrid clustering algorithm that combines the K-means algorithm for cluster formation with a fuzzy logic system for selecting CHs. The algorithm aims to balance the load and minimize energy expenditure by considering residual energy, nearest neighbors, and distance to the sink. The simulation results demonstrate improved performance in terms of energy consumption and network lifetime.

TSC [12] is a clustering algorithm that reduces energy consumption by preventing redundant data transmission. In order to reduce redundant data and shortest transmission distance, it divides the network into concentric tracks and triangular sectors. In TSC, each track’s triangular sector is structured with a transport chain, thereby reducing the distance for data to reach from cluster members to CH and from the CH to the sink. TSTCS [37] is a clustering technique based on a track tree structure. It forms a hierarchical tree structure cluster and employs optimal energy consumption for communication between the CH and the sink. LSC [17] is a lightweight sector-based clustering algorithm and was introduced to balance cluster sizes. It selects CHs by considering communication distance and residual energy, thereby extending the network lifetime. EEORP [15] is an energy-optimized routing protocol that adopts a grid-based dynamic CH election method to reduce energy consumption. In order to enhance the energy efficiency of the inter-cluster communication, EEORP constructs data transmission paths using hop count gradients and grid distances. PEGCP [16] combines chain transmission and grid clustering to reduce energy consumption. It uses a grid algorithm to partition the network into virtual cells, referred to as clusters. A chain transmission is used to complete data transfer within and among the clusters.

The aforementioned techniques often result in uneven cluster sizes due to the random distribution of nodes, potentially leading to hotspots. Unequal clustering can mitigate this issue by forming clusters of varying sizes. However, because nodes are randomly placed, it cannot be assured that the node density and energy consumption are proportional to the cluster radius. Additionally, K-means-based clustering algorithms see a sharp rise in computational complexity as node numbers increase, affecting performance. This work proposes a novel sector-based clustering algorithm that incorporates K-means. It begins with sector division to achieve balanced cluster size and uses a clustering threshold to decide on the execution of K-means within a sector. With a reduced node count in each sector, the time complexity of the K-means algorithm is lowered. Finally, CHs are selected based on residual energy and communication distance which contributes to an extended network lifetime.

4 K-means clustering algorithm

The basic idea of the K-means clustering algorithm is to divide a given dataset into k disjoint clusters, where k is a preset value. The algorithm mainly consists of two phases. Algorithm 1 details the procedure of the basic K-means clustering algorithm.

Algorithm 1
figure a

K-means clustering algorithm

Table 1 lists the symbols utilized in this work.

Table 1 Symbol notation

5 Threshold-driven K-means sector clustering algorithm

5.1 Network model

We consider a static, homogeneous wireless sensor network. Nodes have the same initial energy and are randomly distributed in the target area. For the purposes of this work, we assume the following conditions are met:

  • The sink is provided with high processing and a constant supply of energy.

  • The battery equipped by the node is non-replaceable or non-rechargeable.

  • Nodes can obtain location by using GPS or a localization algorithm.

5.2 Energy model

In KMSC, the first-order radio energy model used in [1] is adopted. For a given distance d between the sender and receiver, the energy expenditure for delivering an l-bits packet by both the sender and the receiver is given by Eqs. (1) and (2).

$$\begin{aligned} E_T(l,d)= & {\left\{ \begin{array}{ll} l(E_{\text{ele}} + \epsilon _{fs}d^2), d \le d_{\text{th}} \\ l(E_{\text{ele}} + \epsilon _{amp}d^4), d \ge d_{\text{th}}\end{array}\right. } \end{aligned}$$
(1)
$$\begin{aligned} E_R(l)= & lE_{\text{ele}} \end{aligned}$$
(2)

In the model, \(E_{\text{ele}}\) denotes the dissipated energy per bit in electronic circuitry, \(\epsilon _{\text{amp}}\) is the energy consumed for the amplification of the wireless signal, and \(\epsilon _{\text{fs}}\) represents the energy cost associated with free-space signal propagation. \(d_{\text{th}}\) is the distance threshold, which can be calculated using Eq. (3).

$$\begin{aligned} d_{\text{th}} = \sqrt{\frac{\epsilon _{\text{fs}}}{\epsilon _{\text{amp}}}} \end{aligned}$$
(3)

5.3 Parameters and methods for selecting cluster heads

In KMSC, the CH is elected based on the total distance (\(D_w\)) and the residual energy (\(E_r\)).

The energy model suggests that minimizing the distance helps in preserving the CH’s energy, thereby delaying node death. The total distance in KMSC can be calculated using Eq. (4).

$$\begin{aligned} D_w^i = D_{i,\text{BS}} + \sum _{j=1}^{C_n} D_{i,j} \end{aligned}$$
(4)

where \(D_{w}^i\) denotes the total distance when node \(s_i\) is choose as the CH. \(C_n\) presents the number of nodes in a cluster. \(D_{i,\text{BS}}\) is the distance to sink. \(D_{i,j}\) is the Euclidean distance between \(s_i\) and \(s_j\), which can be calculated by using (5).

$$\begin{aligned} D_{i,j} = \sqrt{(s_i.xd - s_j.xd)^2 + (s_i.yd - s_j.yd)^2}) \end{aligned}$$
(5)

where \(s_i.xd\),\(s_j.xd\),\(s_i.yd\),\(s_j.yd\) are the x and y coordinates of \(s_i\) and \(s_j\), respectively.

To prevent nodes with low residual energy from being selected as CHs, the residual energy is considered during the CH selection process, with its value determined by Eq. (6).

$$\begin{aligned} E_r = E_{\text{ini}} - E_{d} \end{aligned}$$
(6)

where \(E_{\text{ini}}\) is the initial energy, and \(E_d\) is the energy dissipated till now.

Within a cluster, a suitable CH should fulfill two conditions: It must possess adequate energy to receive and transmit aggregated data to the sink, and it should maintain a moderate distance to minimize energy expenditure. By using the distance and residual energy discussed above, KMSC calculates the weight of each node in the cluster, represented as \(s_i\), through (7), and selects the node with the highest weight as the CH.

$$\begin{aligned} W_i = \alpha \frac{E_r^i-E_r^{\text{min}}}{E_r^{\text{max}}-E_r^{\text{min}}} + (\alpha - 1)\frac{D_t^i-D_t^{\text{min}}}{D_t^{\text{max}}-D_t^{\text{min}}} \end{aligned}$$
(7)

where \(E_r^{\text{min}}\) (\(E_ r^{\text{min}} > 0\)) is the lowest residual energy in the cluster, \(E_r^{\text{max}}\) (\(E_r^{\text{max}} \le E_0\)) is the highest residual energy, \(D_t^{\text{min}}\) is the minimum distance, \(D_t^{\text{max}}\) is the maximum distance, \(\alpha\) (\(0 \le \alpha \le 1\)) is used to adjust the proportion of residual energy in the node’s weight.

5.4 Proposed KMSC algorithm

The work of KMSC algorithm consists of the following three phases:

Phase 1: Sectors division (Algorithm 2)

Phase 2: Cluster formation (Algorithm 3)

Phase 3: Data aggregation (Algorithm 4)

Figure 1 shows the flowchart of KMSC.

Fig. 1
figure 1

The flowchart for KMSC; Figure 1 shows the flowchart of KMSC

Algorithm 2 details the procedure for dividing the network into sectors, assigning nodes to these sectors based on their spatial orientation. atan2 returns the radian between \(-\pi\) and \(\pi\). Using the returned angle, the algorithm sets the sector id for each node. Finally, we will obtain a list of \(n * p\) sectors. Figure 2 illustrates the division of sectors applied in KMSC.

Algorithm 2
figure b

Sectors division

Algorithm 3 outlines the cluster formation process. For each sector, when the number of nodes \(n_{sc} > n_{\text{th}}^C\), the algorithm calculates the number of clusters \(n_c\) as \(n_c=\frac{n_{sc}}{n_{\text{th}}^C}\). \(n_{\text{th}}^C\) is the threshold for clustering. The K-means is then used to divide the sector into \(n_c\) clusters, and the centroids of the clusters are selected as candidate CHs. All regular nodes join the nearest candidate CH to form a cluster. Subsequently, for each cluster in the sector, the weight of the node is calculated using Eq. (7), and the node with the highest weight is selected as the CH. Figure 2 is an illustration of Algorithm 3.

Algorithm 3
figure c

Cluster formation

Figure 2 shows an illustration of KMSC. The figure shows three sectors, among which \(\text{sc}_1\) and \(\text{sc}_3\) only forms one cluster because the number of nodes is lower than the threshold \(n_{\text{th}}^C\) (assuming \(n_{\text{th}}^C=5\)). Since the number of nodes in \(\text{sc}_2\) is 10, thus forming two clusters. Furthermore, because the distance between the \(\text{CH}_2\) and sink is greater than the distance threshold \(d_{\text{th}}\), \(\text{CH}_1\) located in the same sector will forward its data. The sink locates at the center of the network.

Fig. 2
figure 2

An illustration of KMSC; Figure 2 illustrates the KMSC algorithm. The network is divided into three sectors, with \(\text{sc}_1\) and \(\text{sc}_3\) forming only one cluster each because the number of nodes is below the threshold \(n_{\text{th}}^C\) (assuming \(n_{\text{th}}^C=5\)). Since the number of nodes in \(\text{sc}_2\) is 10, thus forming two clusters. Furthermore, because the distance between the \(\text{CH}_2\) and sink is greater than the distance threshold \(d_{\text{th}}\), \(\text{CH}_1\), located in the same sector, forwards its data. The sink locates at the center of the network

Algorithm 4 describes the data aggregation phase. All data produced by nodes are collected by the cluster head (CH), where they are aggregated before being forwarded to the sink through a direct or indirect route consisting of one or multiple hops.

Algorithm 4
figure d

Data aggregation

5.5 Complexity analysis

The KMSC’s runtime complexity mainly involves three phases: sectors division (Phases 1), cluster formation (Phase 2), and data aggregation (Phase 3). In Phase 1, KMSC needs o(n) operations to calculate the nodes’ radian and sector id where n is the number of nodes. Then, in Phase 2, KMSC requires \(o(2r*\frac{1}{p}*n_{C})\) operations to execute K-means where \(n_C \le \frac{1}{n_{\text{th}}^C*p}\) and r is the number of iterations of K-means, and o(n) operations to form cluster. Because the average number of nodes in the clustering is \(n*p\), the value of r is small. In Phase 3, KMSC needs o(n) operations to collect data. Thus, the overall time complexity of KMSC is \(o(n+r*\frac{1}{p}*n_{C})\) operations.

6 Simulation results and analysis

6.1 Simulation parameters

We evaluated the performance of KMSC using MATLAB and compared it with other algorithms, including EECPK-means [20], K-means [19], TSC [12], LSC [17], and SEECP [23]. The network size is 100 m x 100 m, and the sink is located at (50,50). The number of nodes is 100 and 200, and randomly scattered in the network area. The results presented are an average derived from 50 independent experiments. Our simulation parameters are detailed in Table 2.

Table 2 Simulation parameters

6.2 Network lifetime

Fig. 3
figure 3

Comparison of FND,HND,LND; Fig. 3 shows the first node death (FND), half node death (HND), and last node death (LND) for each algorithm. Among clustering algorithms, energy consumption is ideally balanced, resulting in a delayed FND. In Fig. 3(a), the average FND of EECPK-means is 257, which is much earlier than the other algorithms. However, compared to TSC, LSC, SEECP, and K-means, the average LND occurs later. This indicates that the energy consumption of the EECPK-means algorithm is not balanced. The main reason is that although the midpoint algorithm can balance the cluster size, it slows down the rotation of cluster head, causing an earlier death of cluster heads. KMSC’s FND is about 4.3 times that of EECPK-means, 1.2 times that of TSC and SEECP, and higher than K-means and LSC, indicating that KMSC can delay the death of the first node more effectively. Furthermore, KMSC’s HND is 1164, surpassing the HND of EECPK-means, K-means, TSC, SEECP, and LSC, which suggests slight delay in the death of the first 50% of nodes. In Fig. 3(a), KMSC’s average LND is 2943, approximately twice that of EECPK-means, TSC, and LSC, and three times that of K-means and SEECP

Figure 3 shows the first node death (FND), half node death (HND), and last node death (LND) for each algorithm. Among clustering algorithms, energy consumption is ideally balanced, resulting in a delayed FND. In Fig. 3(a), the average FND of EECPK-means is 257, which is much earlier than the other algorithms. However, compared to TSC, LSC, SEECP, and K-means, the average LND occurs later. This indicates that the energy consumption of the EECPK-means algorithm is not balanced. The main reason is that although the midpoint algorithm can balance the cluster size, it slows down the rotation of cluster head, causing an earlier death of cluster heads. KMSC’s FND is about 4.3 times that of EECPK-means, 1.2 times that of TSC and SEECP, and higher than K-means and LSC, indicating that KMSC can delay the death of the first node more effectively. Furthermore, KMSC’s HND is 1164, surpassing the HND of EECPK-means, K-means, TSC, SEECP, and LSC, which suggests slight delay in the death of the first 50% of nodes. In Fig. 3(a), KMSC’s average LND is 2943, approximately twice that of EECPK-means, TSC, and LSC, and three times that of K-means and SEECP.

Instead of only using the residual energy to select the CH as in EECP-means, KMSC chooses CH based on both residual energy and the distance produced by the cluster. In each cluster, only the node with the maximum weight (calculated using Eq. (7)) can be selected as the CH, thereby preventing the early death of CH due to low residual energy. Simultaneously, this mechanism also avoids selecting nodes that are too far from the cluster center as CHs, thus prolonging the network’s lifetime.

In Fig. 3(b), the average FND and HND for all algorithms decreased. This is because an increases in the node density leads to increased traffic on the CHs, accelerating their energy depletion. Although the EECPK-means algorithm still showed very poor results in terms of average FND, it has a relatively high LND compared to K-means and SEECP, indicating that about half of the nodes can survive longer. Additionally, it also indicates that KMSC outperforms the other five algorithms in delaying FND, HND, and LND.

Fig. 4
figure 4

Comparison of the number of alive node; Fig. 4 presents a comparison of the number of alive node between KMSC and five other algorithms. As depicted in Fig. 4(a), except for EECPK-means, the alive node curves of the other five algorithms are approximately vertical. This suggests that in these algorithms, most nodes have approximately the same number of rounds before they die. In addition, it can be seen from the figure that the KMSC surpasses other algorithms in energy consumption balance. The results in Fig. 4(a) also indicate that the EECPK-means algorithm has a longer network lifetime. In Fig. 4(b), all algorithms exhibit similar behavior to Fig. 4(a), and the KMSC algorithm continues to outperform the other five algorithms in balancing energy consumption and prolonging network lifetime

Figure 4 presents a comparison of the number of alive node between KMSC and five other algorithms. As depicted in Fig. 4(a), except for EECPK-means, the alive node curves of the other five algorithms are approximately vertical. This suggests that in these algorithms, most nodes have approximately the same number of rounds before they die. In addition, it can be seen from the figure that the KMSC surpasses other algorithms in energy consumption balance. The results in Fig. 4(a) also indicate that the EECPK-means algorithm has a longer network lifetime. In Fig. 4(b), all algorithms exhibit similar behavior to Fig. 4(a), and the KMSC algorithm continues to outperform the other five algorithms in balancing energy consumption and prolonging network lifetime.

Fig. 5
figure 5

The network lifetime when n=200; we evaluated the average FND, HND, and LND of all algorithms when the network size was 200 m x 200 m. As shown in Fig. 5(a), the average FND, HND, and LND of all algorithms have decreased. Figure 5(a) shows that EECPK-means is more sensitive to changes in network size than other algorithms, and its performance has decreased significantly. In terms of average FND, EECPK-means decreased from 257 to 24, a decrease of more than 90%. In terms of average HND and LND, EECPK-means decreased by 59% and 45.3%, respectively. Meanwhile, the average FND of KMSC decreased from 1110 to 806, a decrease of 27.4%. Its average HND decreased by 6.8%, while its LND decreased from 2943 to 2107, a decrease of 28.4%. For SEECP, the average FND decreased from 965 to 420, a decrease of 56.5%. The average HND and LND of SEECP decreased by 19.6% and 15.5%, respectively. Since LSC uses distance and residual energy to select CHs, its average FND, average HND, and average LND decreased by 26.9%, 14.1%, and 16.1%, respectively. Compared with LSC, TSC also implements multi-hop inter-cluster transmission to reduce long-distance communication. Its average FND, average HND, and average LND decreased by 7.3%, 11.8%, and 25.4%, respectively. Due to the increased communication distance between CHs and sink, the average FND, average HND, and average LND of K-means decreased by 19.7%, 15.1%, and 11.8%, respectively. In summary, Fig. 5(a) clearly demonstrates that when changing network size, KMSC still has an advantage in terms of FND, HND, and LND

We evaluated the average FND, HND, and LND of all algorithms when the network size was 200 m x 200 m. As shown in Fig. 5(a), the average FND, HND, and LND of all algorithms have decreased. Figure 5(a) shows that EECPK-means is more sensitive to changes in network size than other algorithms, and its performance has decreased significantly. In terms of average FND, EECPK-means decreased from 257 to 24, a decrease of more than 90%. In terms of average HND and LND, EECPK-means decreased by 59% and 45.3%, respectively. Meanwhile, the average FND of KMSC decreased from 1110 to 806, a decrease of 27.4%. Its average HND decreased by 6.8%, while its LND decreased from 2943 to 2107, a decrease of 28.4%. For SEECP, the average FND decreased from 965 to 420, a decrease of 56.5%. The average HND and LND of SEECP decreased by 19.6% and 15.5%, respectively. Since LSC uses distance and residual energy to select CHs, its average FND, average HND, and average LND decreased by 26.9%, 14.1%, and 16.1%, respectively. Compared with LSC, TSC also implements multi-hop inter-cluster transmission to reduce long-distance communication. Its average FND, average HND, and average LND decreased by 7.3%, 11.8%, and 25.4%, respectively. Due to the increased communication distance between CHs and sink, the average FND, average HND, and average LND of K-means decreased by 19.7%, 15.1%, and 11.8%, respectively. In summary, Fig. 5(a) clearly demonstrates that when changing network size, KMSC still has an advantage in terms of FND, HND, and LND.

Figure 5(b) shows the comparison results of the algorithm in terms of alive node. It can be seen that the alive node curves of all algorithms except EECPK-means are approximately vertical. Although EECPK-means adopts multi-hop inter-cluster communication, its average FND and average HND performance are poor due to the problem of untimely rotation of CHs. Furthermore, Fig. 5(b) shows that KMSC algorithm outperforms other algorithms in maximizing network lifetime.

6.3 The impact of \(\alpha\)

Fig. 6
figure 6

The impact of \(\alpha\); in KMSC, \(\alpha\) represents the impact of the residual energy on the selection of CHs. Figure 6 shows the results when \(\alpha\) is set to 0.5 and 0.6. It can be seen that the average FND is 800 when \(\alpha =0.5\), which is about one-third less than the FND of \(\alpha=0.6\). In terms of average HND, there is no significant change. However, when \(n=100\), the average LND increased by about 15%, from 3000 to 3500. When \(n=200\), there was no significant change in the average LND. One possible reason is that nodes with less residual energy but short distance are also likely to be selected as CHs when \(\alpha =0.5\), causing early death of CHs and then affecting the average FND. Correspondingly, since nodes with more residual energy are normal nodes in the cluster, it is beneficial to extend the average LND. When the number of nodes increases to 200, the advantage of KMSC in terms of average LND is somewhat diminished due to the increased traffic for CHs

In KMSC, \(\alpha\) represents the impact of the residual energy on the selection of CHs. Figure 6 shows the results when \(\alpha\)\(\alpha =0.5\) is set to 0.5 and 0.6. It can be seen that the average FND is 800 when , which is about one-third less than the FND of \(\alpha=0.6\). In terms of average HND, there is no significant change. However, when \(n=100\), the average LND increased by about 15%, from 3000 to 3500. When \(n=200\), there was no significant change in the average LND. One possible reason is that nodes with less residual energy but short distance are also likely to be selected as CHs when \(\alpha =0.5\), causing early death of CHs and then affecting the average FND. Correspondingly, since nodes with more residual energy are normal nodes in the cluster, it is beneficial to extend the average LND. When the number of nodes increases to 200, the advantage of KMSC in terms of average LND is somewhat diminished due to the increased traffic for CHs.

6.4 The impact of \(n_{\text{th}}^C\)

Fig. 7
figure 7

The impact of \(n_{\text{th}}^C\); in KMSC, \(n_{\text{th}}^C\) determines whether to perform K-means in a sector. Figure 7 shows the results when \(n_{\text{th}}^C\) is set to 5 and 8, respectively, with \(p=0.1\). It can be seen that when \(n_{\text{th}}^C = 8\), there is no significant change in the average FND and average HND. However, the average LND decreases from 2943 to 2576 for n=100, and from 3597 to 3102 for n=200, which corresponds to reductions of 12.7% and 13.7%, respectively. The result shows that the number of clusters has a slight effect on the LND. Compared with \(n_{\text{th}}^C = 8\), when \(n_{\text{th}}^C = 5\), there are more clusters in the network, and the traffic of the CH is smaller, which reduces energy consumption and is conducive to extending the network’s lifetime

In KMSC, \(n_{\text{th}}^C\) determines whether to perform K-means in a sector. Figure 7 shows the results when \(n_{\text{th}}^C\) is set to 5 and 8, respectively, with \(p=0.1\). It can be seen that when \(n_{\text{th}}^C = 8\), there is no significant change in the average FND and average HND. However, the average LND decreases from 2943 to 2576 for n=100, and from 3597 to 3102 for n=200, which corresponds to reductions of 12.7% and 13.7%, respectively. The result shows that the number of clusters has a slight effect on the LND. Compared with \(n_{\text{th}}^C = 8\), when \(n_{\text{th}}^C = 5\), there are more clusters in the network, and the traffic of the CH is smaller, which reduces energy consumption and is conducive to extending the network’s lifetime.

6.5 The impact of p

Fig. 8
figure 8

The impact of p; Fig. 8 shows the result when p is set to 0.05 and 0.1, respectively, with \(n_{\text{th}}^C=5\). As can be seen from the figure, the average FND, HND does not change much. However, compared to \(p=0.05\), the average LND increased by 100% when \(p=0.1\). This demonstrates that the number of sectors has a significant effect on the LND. When \(p=0.1\), the number of sectors is doubled compared with \(p=0.05\), which helps to reduce the number of nodes in the sector, thus reducing the energy consumption of CH. When \(p=0.05\), the reduction in the number of sectors means there are more nodes per sector, which increases the energy consumption of CHs. Additionally, the initial centroids selected randomly can lead to unbalanced cluster, thereby affecting the network’s lifetime

In KMSC, p is used to determine the number of sectors. Figure 8 shows the result when p is set to 0.05 and 0.1, respectively, with \(n_{\text{th}}^C=5\). As can be seen from the figure, the average FND, HND does not change much. However, compared to \(p=0.05\), the average LND increased by 100% when \(p=0.1\). This demonstrates that the number of sectors has a significant effect on the LND. When \(p=0.1\), the number of sectors is doubled compared with \(p=0.05\), which helps to reduce the number of nodes in the sector, thus reducing the energy consumption of CH. When \(p=0.05\), the reduction in the number of sectors means there are more nodes per sector, which increases the energy consumption of CHs. Additionally, the initial centroids selected randomly can lead to unbalanced cluster, thereby affecting the network’s lifetime.

6.6 Discussion

From the previous results, KMSC is able to better balance cluster size and energy consumption, outperforming existing work in terms of FND, HND, and LND. Meanwhile, the algorithm achieves high execution efficiency due to the limitation on the number of nodes in each sector. Additionally, as can be seen from Figure 6, Figure 7, and Figure 8, \(\alpha\), \(n_{\text{th}}^C\), and p have the greatest impact on LND. Similar to other clustering algorithms, the cluster head selection and rotation mechanisms of KMSC are key to optimizing the network lifetime.

7 Conclusions

To enhance the energy efficiency of wireless sensor networks, this paper introduces a threshold-driven K-means sector clustering algorithm named KMSC. KMSC enhances efficiency and equalizes cluster sizes through the use of sector division and the K-means method. Within KMSC, a node’s weight is determined by its residual energy and distance, and the node with the highest weight is chosen as the cluster head (CH). The performance of KMSC is assessed through simulation experiments and compared with EECPK-means, K-means, TSC, LSC, and SEECP. The results indicate that KMSC surpasses the other five algorithms in terms of FND, HND, and FND, suggesting that KMSC can balance energy consumption and extend the network’s lifespan.

Moreover, we also discuss the impact of parameters \(\alpha\), p, and \(n_{\text{th}}^C\) on KMSC’s performance. In future work, we plan to explore dynamic sector division and a strategic cluster head rotation to further distribute energy consumption evenly.

Availability of data and materials

The datasets used during the study are available from the corresponding author on reasonable request.

Abbreviations

WSNs:

Wireless sensor networks

CH:

Cluster head

DS:

Dataset

FND:

First node death

HND:

Half node death

LND:

Last node death

References

  1. W.R. Heinzelman, A. Chandrakasan, H. Balakrishnan, Energy-efficient communication protocol for wireless microsensor networks, In: Proceedings of the 33rd annual Hawaii international conference on system sciences. IEEE, pp. 10–pp (2000)

  2. M.Z.A. Bhuiyan, G. Wang, A.V. Vasilakos, Local area prediction-based mobile target tracking in wireless sensor networks. IEEE Trans. Comput. 64(7), 1968–1982 (2015)

    Article  MathSciNet  Google Scholar 

  3. O. Younis, S. Fahmy, Heed: a hybrid, energy-efficient, distributed clustering approach for ad hoc sensor networks. IEEE Trans. Mob. Comput. 3(4), 366–379 (2004)

    Article  Google Scholar 

  4. K. Haseeb, K.A. Bakar, A.H. Abdullah, T. Darwish, Adaptive energy aware cluster-based routing protocol for wireless sensor networks. Wireless Netw. 23(6), 1953–1966 (2016)

    Article  Google Scholar 

  5. C. Chen, F. Rao, X. Zhang, Y. Dong, An asynchronous cluster head rotation scheme for wireless sensor networks, In: International Wireless Communications and Mobile Computing Conference (IWCMC) 2015, 551–556 (2015)

  6. N. Moussa, Z. Hamidi-Alaoui, A.E.B.E. Alaoui, ECRP: an energy-aware cluster-based routing protocol for wireless sensor networks. Wireless Netw. 26(4), 2915–2928 (2020)

    Article  Google Scholar 

  7. G. Chen, C. Li, M. Ye, J. Wu, An unequal cluster-based routing protocol in wireless sensor networks. Wireless Netw. 15(2), 193–207 (2007)

    Article  Google Scholar 

  8. V. Gupta, R. Pandey, An improved energy aware distributed unequal clustering protocol for heterogeneous wireless sensor networks. Eng. Sci. Technol. Int. J. 19(2), 1050–1058 (2016)

    Google Scholar 

  9. H. Li, Y. Liu, W. Chen, W. Jia, B. Li, J. Xiong, Coca: Constructing optimal clustering architecture to maximize sensor network lifetime. Comput. Commun. 36(3), 256–268 (2013)

    Article  Google Scholar 

  10. N.M. Shagari, M.Y.I. Idris, R.B. Salleh, I. Ahmedy, G. Murtaza, A.Q.B.M. Sabri, A hybridization strategy using equal and unequal clustering schemes to mitigate idle listening for lifetime maximization of wireless sensor network. Wireless Netw. 27, 2641–2670 (2021)

    Article  Google Scholar 

  11. T. Han, S.M. Bozorgi, A.V. Orang, A.A.R. Hosseinabadi, A.K. Sangaiah, M.-Y. Chen, A hybrid unequal clustering based on density with energy conservation in wireless nodes. Sustainability 11(3), 746 (2019)

    Article  Google Scholar 

  12. N. Gautam, W.-I. Lee, and J.-Y. Pyun, Track-sector clustering for energy efficient routing in wireless sensor networks, In: Ninth IEEE international conference on computer and information technology, vol. 2. IEEE 2009, 116–121 (2009)

  13. Q. Xu and J. Zhao, Multi-head track-sector clustering routing algorithm in wsn, In: 4th International Conference on Information Technology and Management Innovation. Atlantis Press, pp. 707–713 (2015)

  14. S. Dutt, G. Kaur, S. Agrawal, Energy efficient sector-based clustering protocol for heterogeneous wsn, In: Proceedings of 2nd International Conference on Communication, Computing and Networking: ICCCN 2018, NITTTR Chandigarh, India. Springer, pp. 117–125 (2019)

  15. X. Ren, J. Li, Y. Wu, Y. Chen, H. Sun, Z. Shi, An enhanced energy optimization routing protocol for wsns. Ann. Telecommun. 76, 343–354 (2021)

    Article  Google Scholar 

  16. F. Bouakkaz, M. Derdour, Maximizing wsn life using power efficient grid-chain routing protocol (pegcp). Wireless Pers. Commun. 117(2), 1007–1023 (2021)

    Article  Google Scholar 

  17. B. Zeng, C. Zhao, Y. Zhang, J. Sun, X. Gao, A sector-based energy-efficient lightweight clustering algorithm. IEEE Access 10, 108 285-108 295 (2022)

    Article  Google Scholar 

  18. M. Daviran, R. Ghezelbash, M. Niknezhad, A. Maghsoudi, H. Ghaeminejad, Hybridizing k-means clustering algorithm with harmony search and artificial bee colony optimizers for intelligence mineral prospectivity mapping. Earth Sci. Inf. 16(3), 2143–2165 (2023)

    Article  Google Scholar 

  19. P. Sasikumar, S. Khara, K-means clustering in wireless sensor networks, In: Fourth international conference on computational intelligence and communication networks. IEEE 2012, 140–144 (2012)

  20. A. Ray, D. De, Energy efficient clustering protocol based on k-means (eecpk-means)-midpoint algorithm for enhanced network lifetime in wireless sensor network. IET Wireless Sens. Syst. 6(6), 181–191 (2016)

    Article  Google Scholar 

  21. B. Zhu, E. Bedeer, H.H. Nguyen, R. Barton, J. Henry, Improved soft-k-means clustering algorithm for balancing energy consumption in wireless sensor networks. IEEE Int. Things J. 8(6), 4868–4881 (2020)

    Article  Google Scholar 

  22. M. Daviran, R. Ghezelbash, A. Maghsoudi, Gwokm: a novel hybrid optimization algorithm for geochemical anomaly detection based on grey wolf optimizer and k-means clustering. Geochemistry 84(1), 126036 (2024)

    Article  Google Scholar 

  23. N. Mittal, U. Singh, B.S. Sohi, A stable energy efficient clustering protocol for wireless sensor networks. Wireless Netw. 23, 1809–1821 (2017)

    Article  Google Scholar 

  24. Y. Tao, J. Zhang, L. Yang, An unequal clustering algorithm for wireless sensor networks based on interval type-2 tsk fuzzy logic theory. IEEE Access 8, 197 173-197 183 (2020)

    Article  Google Scholar 

  25. P.S. Mehra, E-FUCA: enhancement in fuzzy unequal clustering and routing for sustainable wireless sensor network. Complex Math. Intell. Syst. 8(1), 393–412 (2021)

    Article  Google Scholar 

  26. C. So-In, S. Phoemphon, P. Aimtongkham, T.G. Nguyen, An energy-efficient fuzzy-based scheme for unequal multihop clustering in wireless sensor networks. J. Ambient Intell. Humaniz. Comput. 12(4), 873–95 (2021)

    Google Scholar 

  27. T. Stephan, K. Sharma, A. Shankar, S. Punitha, V. Varadarajan, P. Liu, Fuzzy-logic-inspired zone-based clustering algorithm for wireless sensor networks. Int. J. Fuzzy Syst. 23, 506–517 (2020)

    Article  Google Scholar 

  28. N. Moussa, Alaoui A. Belrhiti, An energy-efficient cluster-based routing protocol using unequal clustering and improved ACO techniques for WSNs. Peer-to-Peer Networking and Applications. 14(3), 1334–47 (2021)

    Article  Google Scholar 

  29. P. Maheshwari, A.K. Sharma, K. Verma, Energy efficient cluster based routing protocol for WSN using butterfly optimization algorithm and ant colony optimization. Ad Hoc Netw. (2021). https://doi.org/10.1016/j.adhoc.2020.102317

    Article  Google Scholar 

  30. X. Yan, C. Huang, J. Gan, X. Wu, Game theory-based energy-efficient clustering algorithm for wireless sensor networks. Sensors 22(2), 478 (2022)

    Article  Google Scholar 

  31. G. W. Hamaali, K. A. Abduljabbar, D. R. Sulaiman. K-means clustering and pso algorithm for wireless sensor networks optimization. Univ. Thi-Qar J. Eng. Sci. 13(1), 40–50 (2023)

    Google Scholar 

  32. R. Gantassi, Z. Masood, S. Lim, Q. A. Sias, Y. Choi, Performance analysis of machine learning algorithms with clustering protocol in wireless sensor networks, In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC) 2023, 543–546 (2023)

  33. A. Ihsani, T.H. Farncombe, A kernel density estimator-based maximum a posteriori image reconstruction method for dynamic emission tomography imaging. IEEE Trans. Image Process 25, 2233–2248 (2016)

    Article  MathSciNet  Google Scholar 

  34. A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)

    Article  Google Scholar 

  35. L. Yang, D. Zhang, L. Li, Q. He, Energy efficient cluster-based routing protocol for WSN using multi-strategy fusion snake optimizer and minimum spanning tree. Sci. Rep. (2024). https://doi.org/10.1038/s41598-024-66703-9

    Article  Google Scholar 

  36. B. Angadi, M.S. Kakkasageri, K-means and fuzzy based hybrid clustering algorithm for wsn. Int. J. Electron. Telecommun. (2023). https://doi.org/10.24425/ijet.2023.147703

    Article  Google Scholar 

  37. J. Naveen, P. Alphonse, S. Chinnasamy, Track-sector-tree clustering scheme for dense wireless sensor networks. Clust. Comput. 22, 12 421-12 428 (2019)

    Article  Google Scholar 

Download references

Funding

This research was funded by Key Scientific Research Projects of Henan (No. 24A520024)

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization was performed by Bo Zeng; methodology was done by B. Z. and S. S. L.; validation was done by B. Z. and X. F. G.; formal analysis was performed by X. F. G.; writing—original draft preparation was prepared by B. Z.; writing—review and editing was performed by S. S. L. and X. F. G. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Bo Zeng.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, B., Li, S. & Gao, X. Threshold-driven K-means sector clustering algorithm for wireless sensor networks. J Wireless Com Network 2024, 68 (2024). https://doi.org/10.1186/s13638-024-02403-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13638-024-02403-2

Keywords