"The authors would like to retract this article as several portions of this manuscript were published previously by Gu, Wu, and Rao ("Optimizing Cluster Heads for Energy Efficiency in Large-Scale Heterogeneous Wireless Sensor Networks, International Journal of Distributed Sensor Networks 2010, 961591). The authors would like to apologise to Editors and readers."
This article proposes a novel moment-based local cluster division optimization method in wireless sensor networks, and improves the energy efficiency of local cluster area with uneven nodes distribution. In the proposed method, first, each node estimates the higher moment of local sensors' coordinates. Second, the current cluster zone is divided into four quadrant zones with cluster head's (CH) coordinates as central point. Finally, among the divided quadrant zones, the slave CH is selected according to the higher moment to help the master CH optimize data transmission in the local area. To use the higher moment effectively in segmentation of zone, we present a hybrid higher moment method by computing the kurtosis coefficient of the sensors' coordinates. When the coordinates of sensors in a quadrant zone have higher kurtosis coefficient than a threshold, the quadrant zone is considered to be a zone needed to be a slave CH. The simulation results show that the proposed method can increase system throughput, decrease delay and packet loss rate, and enhance the energy efficiency.
Keywords:wireless sensor network; energy efficiency; cluster division; higher moment
Wireless sensor network (WSN)  has found many applications in different areas, such as environmental surveillance, intelligent building, health monitoring, etc. Although there are many important aspects which need to be taken into consideration when we are dealing with the overall network design problem, energy efficiency should be considered as the key design objective among them. Therefore, minimizing the total energy consumption (TEC)  for sensor data gathering is critical to ensuring sustained operations of these large-scale WSNs, even though minimizing TEC does not necessarily maximize network lifetime, which also depends on the balance of residual energy across the network.
How to efficiently utilize sensor nodes to prolong the lifespan of WSNs has been a research topic. Clustering is a key routing technique used to reduce energy consumption. In the cluster scheme, several nodes are elected as cluster heads (CHs), and then CHs aggregate the data from the nodes of their respective cluster, often referred to as leaf nodes (LNs), and forward the fusion data to base station (BS). The most popular routing protocol based on cluster scheme is called LEACH , which is proposed for CH selection by probability. However, LEACH selects the CHs by probability in each round, so the actual number of CHs with the expected number maybe different.
It is a critical problem that how many clusters should be separated. As one of the most dominant factors in clustered WSNs, the selection of CH can impact the network performance. The optimal number of CHs is an important parameter of WSN performance. Network nodes will consume more energy if the number of CHs is too much or too little. Suitable cluster number will prolong the lifetime of WSN and reduce energy consumption in CH selection per round. To solve those disadvantages, a distance-based crowdedness clustering (DCC) algorithm  to determine the CHs in sensor networks under general node distribution is presented, in which the number of CHs in sensor networks under uniform node distribution is optimized through deriving an analytical formula. However, their proposed scheme is not suitable for all the deployment conditions, especially, unevenly distribution phenomenon which is likely in sensor networks.
In this article, we propose a novel moment-based local cluster division (MLCD) optimization method in WSNs. The proposed method improves the energy efficiency of local cluster area with uneven nodes distribution.
The rest of this article is organized as follows. Section 2 introduces the related study. Section 3 gives an energy model, and elaborates the proposed MLCD algorithm. Section 4 makes several experiments on many parameters such as the number and location of CHs, TEC under different nodes scale. Finally, conclusion is summarized in Section 5.
2. Related study
A CH in WSNs is responsible for receiving, processing, and transmitting data from the LNs in its service area to the BS, and hence it consumes much more energy than an LN. The sensor nodes in the close proximity of a CH may also run out of battery quickly due to frequent data forwarding. Therefore, designating an optimal subset of sensor nodes as CHs at appropriate locations is critical to minimizing the TEC for prolonging the lifetime of the entire network. There exist a large number of research efforts in the literature that have been devoted to solving various clustering problems with different objectives.
One commonly adopted way to ensure load balance and meet energy constraint of the entire network is to rotate the role of a CH and form a corresponding cluster on a random and periodical basis among all sensor nodes. Classical LEACH is a traditional hierarchy topology control algorithm. In LEACH, the node is selected to be CH in turns. The alternation mechanisms of CHs [4-10] balance the nodes energy consumption and prolong network lifetime.
Quan et al.  propose a robust energy-aware clustering architecture for large-scale WSNs. Machado et al.  study the clustered WSNs. Shu and Krunz  consider optimization formulations under both deterministic and stochastic setups, and propose two mechanisms for achieving balanced power consumption in the stochastic case: a routing-aware optimal cluster planning and a clustering-aware optimal random relay. Younis and Akkaya  categorize the placement strategies into static and dynamic depending on whether the optimization is performed at the time of deployment or while the network is operational, respectively. Azad and Kamruzzaman  propose a transmission scheme and determine the optimal ring thickness and hop size by formulating network lifetime as an optimization problem. Houngbadji and Pierre  propose a novel distributed address assignment and routing scheme based on a topic clustering system and fractal theory iterated function systems. In , an adaptive energy-efficient multi-sensor scheduling scheme is proposed for collaborative target tracking in WSNs. Kim et al.  propose a novel energy-efficient coverage-time optimized dynamic clustering scheme for two-tiered WSNs used in an outdoor monitoring application of home networking systems.
3. The proposed MLCD method
The problem of determining the optimal number and location of CHs for minimum TEC in sensor networks with unevenly distribution phenomenon is formulated as follows. We consider a WSN where n sensor nodes have been deployed in a bounded L × L (m2) square region and a single BS is located at (xBS,yBS), somewhere inside or outside the network region. The location of each sensor vi, i = 0, 1,..., n - 1, is denoted as (xi, yi). We assume a one-hop communication model for both intra-cluster (from LNs to their slave CHs) and inter-cluster (from CHs to the BS) communication.
We first consider a scenario of even node distribution, and adopt DCC method  to optimize the number of CHs. For uneven node distribution, the intra-cluster energy consumption model may not demonstrate the intra-cluster energy consumption correctly. To solve this problem, we use the higher moment to demonstrate the intra-cluster energy consumption correctly in this article. The intra-cluster data can be translated to CH through the new CH divided by the old CH so as to optimize the data transmission.
We consider the energy consumption for data transmission of each LN, and for idle state, data receiving, processing, and transmission of each CH. Since the energy cost for environment sensing is generally much less than communication and processing tasks, we do not consider sensing energy cost here. Obviously, the TEC depends on the network distribution, the number, and location of CHs, and the compression ratio α at CHs.
3.1. Optimizing the number of CHs in uniform distribution
Researchers have proposed several optimal CH selection algorithms to prolong the lifetime of WSNs. Yang and Sikdar  first apply a sleep-wakeup-based decentralized MAC protocol to LEACH, then present an analytic framework for obtaining the optimal probability with which a node becomes a CH in order to minimize the network's energy consumption. Gu and Wu  optimize the number of CHs in sensor networks under uniform node distribution, and propose a DCC algorithm to determine the CHs in sensor networks under general node distribution.
DCC algorithm first calculates the expected distances from an LN to its CH and from a CH to the BS using an approach similar to the one used in . Since the CHs are uniformly distributed in an L × L (m2) sensor network region, the expected square area covered by each cluster with the CH deployed at (xCH, yCH) can be calculated as based on Voronoi tessellation, where k is the optimal the number of CHs. Furthermore, the LNs are also uniformly and independently deployed in each cluster, where we have E[xLN] = E[xCH] = E[yLN] = E[yCH] = , and E[(xLN)2] = E[(xCH)2] = E[(yLN)2] = E[(yCH)2] = L2/(3k). Therefore, the average squared distance from an LN to its CH within a cluster can be calculated as follow.
Then, the average squared distance from a CH to the BS is similarly given by Equation (2).
The TEC per round denoted by ETot is the sum of the energy consumption ELN of all LNs for data transmission and the energy consumption ECH of all CHs for data receiving, processing, and transmission in one round, which can be defined as Equation (3).
Since ELN only contains transmission energy cost Et and the total number of LNs in the network is n - k, ELN can be estimated as shown in Equation (4).
Similarly, ECH is the total energy cost for the transfer of one unit of data from each CH to the BS in one round, which includes the energy cost Er for receiving, Ep for processing, and Et for transmission. Each of n - k LNs transfers one unit of data to its corresponding CH, which performs processing (aggregation and compression) on the received data and its own sensing data, and sends the compressed aggregated result to the BS. Since there are total n units of input data (including n - k units of data received from LNs and k units of data collected by k CHs themselves), the total energy consumed by k CHs is defined by Equation (5).
Using Equations (1)-(5), we obtain the TEC per round ETot as follows:
The TEC per round ETot can be minimized by selecting an optimal value of k, which is a solution to the first derivation of Equation (6). Following that, the optimal number k of CHs can be calculated as (the negative solution is ignored):
We further verify that the solution to the second derivation of Equation (6) is positive. Therefore, we conclude that the value of k defined in Equation (7) results in the minimum TEC in WSNs with uniform node distribution. Once the optimal number of CHs is obtained, their locations can be determined based on Voronoi tessellation among uniformly distributed sensors.
3.2. The proposed algorithm based on higher moment for uneven distribution
Unimodal distribution has two eigenfunctions, including skewness and kurtosis. Coefficient of skewness is the ratio of the third central moment to the third power of standard deviation. Coefficient of kurtosis is the ratio of the fourth central moment to the fourth power of standard deviation. If the coefficient of kurtosis is above 3, the distribution is sharp and has the excess kurtosis. In this case, the distribution concentrates around the expectation. If the coefficient of kurtosis approaches to 1.8, the distribution curve has a shape of horizontal rectangle. If the coefficient of kurtosis is below 1.8, the distribution curve has a shape of 'U'. The coefficient of kurtosis (Kc) is given in Equation (8).
Uniform distribution is suited for achieving energy balance. However, the performance of WSN is affected by the environment. Since sensor nodes are generally deployed in surveillance areas by airplane and cannonball, node's position is affected by many factors, which leads to unevenly distribution. Maybe nodes in some areas are very thick, while nodes in other areas are very thin. Uneven distribution phenomenon brings many disadvantages in network management, such as unbalanced energy dissipation between nodes, nodes' premature death, and network lifetime short.
According to the energy model and DCC algorithm, in the case of the uneven distribution, the intra-cluster energy consumption model may not demonstrate the intra-cluster energy consumption correctly. To solve this problem, we use kurtosis to demonstrate the intra-cluster energy consumption correctly in this article. The intra-cluster data can be translated to CH through the new CH divided by the old CH so as to optimize the data transportation.
As far as the CHs total number, the optimal CHs number calculated by the cutoff-distance is still available, the optimal node number k of general distribution and uneven distribution is still calculated as the case of even distribution.
The algorithm given above does not consider the energy consumption of CHs idle time. If the energy consumption of CHs idle time EI is considered, and EI = Eelec, the energy consumption of CHs will increase as shown in Equation (9).
where t is the total idle time of the CHs. Excessive CHs will increase the energy consumption of CHs idle time. Therefore, decreasing the CHs number properly will decrease the network energy consumption. Correspondingly, the optimal number k of CHs considering total idle time of the CHs is
We develop a heuristic MLCD algorithm, based on higher moment to solve the CH optimization problem in WSNs under uneven node distribution. Here, the "higher moment" is the threshold that decides which cluster needs to be divided: if the higher moment of LNs' coordinates satisfies the threshold, the slave CH is considered to be located inside this cluster; otherwise, it is not.
In the MLCD algorithm, we first calculate the LNs coordinates' kurtosis coefficient and mean value, each of which is used as the metric. Using this metric, we designate the sensor with the comparatively large number of neighbor nodes as a CH and form a slave cluster of the master cluster with its crowded LNs. Namely, MLCD designates the sensor with the comparative large number of neighbor nodes as a CH and forms a cluster of this CH with all its neighbors considering the energy consumption of the CHs idle time. We repeatedly designate slave CHs with crowded neighbor nodes in the rest of the clusters using the higher moment until there is no quadrant zones satisfied the higher moment metric. We calculate the higher moment using Equation (8) for metric, from which, the node is selected as the CH of the slave cluster.
The specific division process is given as follows. First, in setup phase, the intra-cluster zone is divided into four quadrant zones. The central point (0,0) is the CH's coordinates. Second, the proposed MLCD algorithm collects the intra-cluster coordinate information to calculate the kurtosis of x and y coordinates, respectively. If the kurtosis is below 1.8 as shown in Figure 1, the node may concentrate their distribution not only on their CHs zone, but also on another quadrant zone in current cluster area from the aspect of x and y coordinates. According to cutoff-distance, the elected CH may locate in the zone with dense nodes.
Figure 1. The shape of U with kurtosis below 1.8.
However, the calculated number of CHs considering the idle time energy consumption may decrease. In general node distribution or uneven node distribution, the decrease of the CHs number may cause the uneven communication distance in the cluster zone. The kurtosis can be used to find another quadrant zone with comparatively dense node. If the kurtosis is below 1.8, the nodes' distribution has a shape of 'U'. The quadrant zones with comparatively dense nodes could be discerned, and a sub-cluster is formed in the quadrant. In other words, a slave CH is designated besides the current CH and the nodes in the same quadrant will choose to join either the CH or slave CHs once more. The slave CH collects the data of intra-cluster and sends it to the CH. The proposed algorithm optimizes the local energy consumption, balance the intra-cluster energy consumption. Moreover, the total CHs energy consumption is decreased due to the shortening of idle time.
The choice of slave CHs will consider the kurtosis of node coordinates. If the kurtosis of both x and y coordinates are below 1.8, the slave CH will be selected beyond the mean value of the both coordinates. If either x or y coordinates is below 1.8, the CH will be selected randomly beyond the mean value of the coordinates below 1.8.
This article presents an MLCD method. The pseudo-code of MLCD algorithm is given in Algorithm 1. The CHs idle time is considered to decrease the number of the CHs number. The kurtosis is used to demonstrate the slave cluster zone within the main cluster zone. The complexity of this algorithm is O(n2).
Algorithm 1. MLCD
Input: a sensor network G = (V, E) with n LNs randomly deployed in a L × L (m2) square region and one BS deployed inside or outside the region.
Output: the optimal slave CH and its location with minimum TEC.
1://Initialize cluster setup;
2: for all CHs in sensor network G do
3: Advertise the CH identity CHID;
4: Collect the LNs' coordinates [xi, yi];
5: end for
6://Calculate the LNs' higher moment;
7: for all CHs in sensor network G do
8: Divide the LNs zone from vertical and horizontal direction through the CH node to form 4 quadrants zone;
9: Calculate the [xi, yi]'s kurtosis coefficient [Kcx, Kcy] and mean value coordinates [xmean, ymean];
10: if Kcx > 3 and Kcy > 3 then
11: Designate a node with the absolute value of coordinates above [|xmean|, |ymean|] as slave CH randomly;
12: end if
13: if Kcx > 3 then
14: Designate a node with the absolute value of x coordinates above [|xmean|, yrandom] as slave CH;
15: end if
16: if Kcy > 3 then
17: Designate a node with the absolute value of y coordinates above [yrandom, |ymean|] as slave CH;
18: end if
19: Designate the LNs in the same quadrants with slave CH to form slave cluster;
20: end for
21: return slave CH and coordinates.
4. Performance evaluation
4.1. Implementation and experimental settings
The proposed MLCD algorithm is implemented in Java and runs on a Windows XP desktop equipped with an 8 cores CPU and 4 GB memory. We conduct an extensive set of experiments using a wide variety of simulated sensor networks, in which two deployment scenarios are considered: uniform and uneven distributions. We generate these simulation datasets by varying the size of network regions and the number of initially deployed sensor nodes. The parameters used in the testing sensor networks and the sensor radio characteristics of the energy cost models for wireless communication are tabulated in Table 1. For testing purposes, we consider a fixed value 0.8 for the compression ratio α, which has an impact on the clustering process.
Table 1. WSN communication parameters
We first investigate the optimization problem on the number and location of CHs in a sensor network within a square region of 200 × 200 (m2) under uniform distribution. Figure 2 plots the TEC optimization curve that is calculated by Equation (7) in the network of n = 200 sensor nodes in response to the number k of CHs varying from 1 to 60.
Figure 2. Analytical calculation of TEC per round (a) with n = 200.
We obtain the optimal number of CHs to be k = 6 from Equation (7), which is consistent with the optimal one observed in Figure 2. From Equation (10), the improved algorithm obtains the optimal number of CHs to be k = 4, which is consistent with the optimal one observed in Figure 2 and the TEC decreases as the number of CHs decreases. The TEC increases as the number of CHs moves away from the optimal point. We further study a case with a larger WSN of n = 900 sensor nodes. The optimal number 13 is obtained in DCC algorithm. Our proposed MLCD algorithm obtains the optimal number 11. The TEC also decreases as the number of CHs decreases. The unimodal property of the TEC optimization curve justifies the correctness of our derivation for the optimal number of CHs in WSNs under uniform node distribution.
To evaluate the robustness of our solution, we perform more experiments on networks under uniform distribution with 10 different problem sizes from small to large ones: in the first case, the total number of initial sensor nodes is set to be 10; in the rest 9 cases, it increases from 100 to 900 nodes at an interval of 100 nodes. The optimization results, i.e., the optimal number of CHs, the expected distance from an LN to its CH, and the corresponding minimum TEC in each case, are plotted in Figure 3. We observe that the expected distance from an LN to its CH varies from 32 to 116 m and it decreases when the optimal number of CHs increases.
Figure 3. Analytical calculation of minimum TEC per round in ten cases ranging from small to large scales.
4.2. Case study for general distribution
To better illustrate the optimization process of MLCD in WSNs under uneven node distribution, we calculate and plot the TEC per round under different cut-off distances at each optimization step for a network of n = 200 sensor nodes deployed in the same region, as shown in Figure 4. This three-dimensional optimization curve also features a unimodal property: the TEC is minimized with an optimal number of CHs at a certain cut-off distance.
Figure 4. TEC per round at each optimization step using MLCD and DCC algorithm.
For performance comparison, we adapt DCC to our problem and compare its performance with that of the proposed MLCD algorithm using the same set of ten problem sizes previously considered for uniform distribution. We first determine the optimal number k of CHs and the corresponding minimum TEC using the DCC algorithm. For visual comparison, we plot the performance measurements of TEC produced by these two algorithms in Figure 5. The results produced by the MLCD algorithm outperform those produced by DCC algorithm in all ten cases we studied, which shows the performance superiority of the proposed MLCD algorithm.
Figure 5. Performance comparison between MLDC and DCC in ten cases ranging from small to large scales.
For illustration purposes, we lay out the node distribution and clustering of the sensor network with 100 sensor nodes as shown in Figure 6, in which the unclustered solid node denotes the BS. The clustering results obtained by the MLCD algorithm are marked by the solid lines while the results obtained by the adaptive k-means algorithm are marked by the dashed lines. The MLCD algorithm produces six clusters in this instance.
Figure 6. Visualization of the clustering in a network of 100 nodes using MLCD and adaptive k-means algorithms: the results of MLCD are marked by solid lines and the results of adaptive k-means are marked by dashed lines.
Using the same network instance of 100 sensor nodes, we visually compare the clustering results obtained by the MLCD algorithm and the DCC algorithm, as shown in Figure 7. The 6 clusters marked by the solid lines are obtained by the MLCD algorithm (the same MLCD clusters in Figure 6), and the 14 clusters marked by the dashed lines are obtained by the DCC algorithm. In Figure 7, we observe that MLCD algorithm decrease the CHs number, and achieve a more reasonable clustering result than DCC algorithm in terms of local sensor density. The TEC of MLCD is lower than that of DCC algorithm.
Figure 7. Visualization of the clustering in a network of 100 nodes using MLCD and DCC algorithms: the results of MLCD are marked by solid lines and the results of DCC are marked by dashed lines.
This article examines the problem of determining the optimal number and location of CHs for minimum TEC in sensor networks with uneven nodes distribution from higher moment perspective, improves the analytical formula of DCC to determine the optimal number, and location of CHs in WSNs under uneven distribution, and proposes a zone division clustering algorithm based on higher moment to optimize the intra-cluster communications and location of slave CHs in WSNs under uneven nodes distribution. The simulation results illustrate the performance superiority of the proposed solution in comparison with the clustering schemes based on classical k-means algorithm and DCC algorithm.
The authors declare that they have no competing interests.
This study was supported by National Natural Science Foundation of China (60973117, 60973116, and 60973115), the China Postdoctoral Science Foundation (20110491530), and the Dalian Science and Technology Planning Project of China (2010J21DW019).
WR Heinzelman, A Chandrakasan, H Balakrishnan, Energy-efficient communication protocol for wireless microsensor networks (Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, 2000)
H Chen, H Mineno, T Mizuno, Adaptive data aggregation scheme in clustered wireless sensor networks. Comput Commun 31(15), 3579–3585 (2008). Publisher Full Text
HJ Choe, P Ghosh, SK Das, QoS-aware data reporting control in cluster-based wireless sensor networks. Comput Commun 33(11), 1244–1254 (2010). Publisher Full Text
N Kima, J Heo, HS Kim, WH Kwona, Reconfiguration of clusterheads for load balancing in wireless sensor networks. Comput Commun 31(1), 153–159 (2008). Publisher Full Text
H Zhou, Y Wu, Y Hu, GZ Xie, A novel stable selection and reliable transmission protocol for clustered heterogeneous wireless sensor networks. Comput Commun 33(15), 1843–1849 (2010). Publisher Full Text
K Sha, W Shi, Modeling the lifetime of wireless sensor networks. Sensor Lett 3(2), 126–135 (2005). Publisher Full Text
M Younis, K Akkaya, Strategies and techniques for node placement in wireless sensor networks: a survey. Ad Hoc Netw 6(4), 621–655 (2008). Publisher Full Text
T Houngbadji, S Pierre, SubCast: a distributed addressing and routing system for large scale wireless sensor and actor networks. Comput Netw 53(16), 2840–2854 (2009). Publisher Full Text