 Research
 Open Access
 Published:
Learning nodes: machine learningbased energy and data management strategy
EURASIP Journal on Wireless Communications and Networking volumeÂ 2021, ArticleÂ number:Â 176 (2021)
Abstract
The efficient use of resources in wireless communications has always been a major issue. In the Internet of Things (IoT), the energy resource becomes more critical. The transmission policy with the aid of a coordinator is not a viable solution in an IoT network, since a node should report its state to the coordinator for scheduling and it causes serious signaling overhead. Machine learning algorithms can provide the optimal distributed transmission mechanism with little overhead. A node can learn by itself by utilizing the machine learning algorithm and make the optimal transmission decision on its own. In this paper, we propose a novel learning Medium Access Control (MAC) protocol with learning nodes. Nodes learn the optimal transmission policy, i.e., minimizing the data and energy queue levels, using the Qlearning algorithm. The performance evaluation shows that the proposed scheme enhances the queue states and throughput.
1 Introduction
With the advent of the Internet of Things (IoT), wireless communication function is employed not only in the electronic devices but also in every â€˜Thingsâ€™ [1]. These devices are expected to have lowcost and lowpower consumption characteristics to operate in IoT networks [2]. Also, since tons of devices are expected to be deployed in IoT networks, energy should be provided in a sustainable way to maintain longlasting networks [3]. In this sense, energy will play an important role to provide seamless services with limited resource.
One of the viable solutions is to produce energy by devices for themselves or provide energy to devices wirelessly, i.e., energyharvesting, which enables devices to obtain energy from various physical phenomena such as wind, sunlight, and Radio Frequency (RF) signal [4]. In a network with energyharvesting devices, the status of data and energy in the devices may vary [5, 6]. Devices may have different amount of traffic to transmit. Some devices frequently report the status or send information to the network, while other devices have relatively sparse traffic. The energy sustainability of devices also fluctuates. If a device is located near the power beacon, it receives much energy with minimal loss. However, the nodes far from the power beacon obtain small amount of energy. Without an appropriate data transmission strategy, the devices may suffer from shortage of energy to transmit or unnecessary charging, and accumulated data queue and packet loss. Therefore, depending on the status of devices, a desired transmission strategy is required.
Transmission policies for energyharvesting devices have been researched. In [7, 8], the optimal packet scheduling policy for a single energy harvesting node is studied. The transmission power of a node related with the data rate is optimized to minimize the total transmission time of a node. The authors in [9, 10] present a transmission policy for pointtopoint transmission in the fading channel. By controlling the time sequence, throughput is maximized and the total transmission time is minimized. In [11], a decentralized random access policy is studied to maximize the longterm network utility. Using the game theory, nodes decide the policy to transmit, remain idle, or discard packets. The optimal new solution is found and the heuristic algorithm is provided. The authors in [12] studied the power management policies for the dual energy harvesting links, where transmitter and receiver are both energy harvestingcapable nodes. Considering the battery size and the retransmission index, the packet drop probability (PDP) is modeled. The battery size highly is shown to impact on the PDP performance as it helps to overcome the randomness of energy availability. Also, the optimal retransmission policy to minimize PDP is designed. In [13], the selective sampling, which decodes the packet with a certain length, is proposed to reduce energy consumption. The selective sampling information is further utilized by piggybacking for more efficient energy use at the receiver. Also, the retransmission strategy and the power allocation scheme to ensure lower PDP are introduced using Markov Decision Process (MDP).
Recently, machine learning has drawn much interest as a powerful tool to solve complex problems, e.g., Googleâ€™s AlphaGo [14]. This adaptive learning capability can be applied to tackle complex problems. The transmission strategy for energy harvesting nodes by machine learning is an attractive research issue. In [15], adaptation of duty cycle for energy harvesting sensor nodes is studied. To achieve the balance between the energy supply and the Quality of Service (QoS) requirement, a modified MDP using reinforcement learning is introduced.
Reinforcement learningbased energy management policies for single [16] and multiple [17] nodes are studied. The energy harvesting node is modeled as continuously to create data and to gain energy from the energy source. Data can be transmitted using a certain amount of energy defined in a conversion function. For a single node, the authors in [16] utilize Qlearning to find the optimal policy for a general conversion function. An extra energy source node providing energy to multiple nodes is considered [17]. To minimize the average delay of transmitting nodes, an efficient energy sharing method is presented using the Qlearning algorithm. In [18], Qlearningbased Medium Access Control (MAC) protocol for underwater sensor networks is studied. Without extra message exchange, a node learns to optimize backoff slots to reduce collision through trialanderror. By intelligently selects a backoff slot through Qlearning, lowsignaling overhead and low complexity can be obtained. Also, the authors design the reward function updates from messages, especially to consider the level of collision from Negative Acknowledgment (NACK). The authors in [19] proposed a machine learningenabled MAC framework for IoT nodes coexisting with WiFi users. During the rendezvous phase, an intelligent gateway learns the type and expected amount of devices, i.e., WiFi and IoT nodes by monitoring the threeway handshake. Then, the gateway schedules frequency channels to IoT and WiFi devices based on the learning result. In the transmission phase, IoT devices and WiFi users contend for data transmission. The gateway can dynamically adjust the superframe length to achieve enhanced throughput.
In this paper, we propose a new learning MAC protocol to learn a differentiated transmission strategy method of a node in a network. We focus on the imbalance nature between the energy and the data in a node, which stems from the randomness of arrival rates of energy and data. We propose a MAC protocol with learning mechanism to mitigate the imbalance problem. The contributions of our work are:

The â€˜imbalanceâ€™ problem of energy and data management for energyharvesting nodes in IoT networks is revealed.

Based on the nature of nodes, we classify them into energydominant and datadominant nodes, and, for each type of node, the multislot and highrate transmission strategies are proposed to mitigate the imbalance problem.

We utilize Qlearning to automatically determine better choice when using multislot and highrate schemes. Nodes learn their best actions given the energy and data availability.

Performance evaluation shows the learning behavior for stable queue states, and overall improved throughput.
We consider a more realistic environment in which the nodes have various data and energy profiles. Also, different transmission strategies, i.e., multislot and highrate transmission, are proposed. Each node learns and selects a different transmission mechanism based on their evolutions of data and energy queue states. We utilize a Qlearning algorithm for individual nodes to learn the optimal parameters of the proposed learning MAC. As time evolves, a node learns the optimal transmission strategy, which can minimize the data and energy queue levels, by itself so that the nodes in a network harmoniously transmit data while boosting energy efficiency.
2 Proposed learning MAC protocol
We consider a network of devices, in which devices report the collected information to the sink node. The nodes are capable of producing electricity by energyharvesting. Harvested energy can be stored in the battery of a node and used to transmit data to the sink node. If the energy is not sufficient to transmit a packet of data, it is not transmitted and remains in the data queue. If there is no data to transmit, the harvested energy is stored in the battery until the next data transmission. Dynamic data traffic and energy states of nodes may create unbalanced use of energy and data. Thus, an optimal and balanced transmission strategy of nodes is required to minimize the data and energy queue levels. To react to the status of a node considering both energy and data, we define Enode and Dnode for which different transmission strategies are employed.
2.1 Energy dominant and data dominant nodes
In an IoT network, different types of nodes coexist depending on their own jobs. Nodes can have different tasks to do and different performance of energyharvesting. So, the energy and data packet arrival rates can vary to each of the nodes.
The arrival rates of energy can be higher than those of data in the nodes. They tend to have a small number of jobs compared to the amount of energy. We refer this kind of nodes as Energy dominant nodes (Enode). The nodes may locate near the power beacon or sparsely transmit data. Enodes are likely to have sufficient amount of energy. However, some of the stored energy will not be used properly and wasted. To mitigate the energy waste, Enode is required to have a proper transmission scheme.
On the other hand, some nodes may have heavier data arrival rates than the energy generation rates. The nodes may be placed far from the wireless power source or shaded by obstacles and hardly gets the sufficient energy. We refer this kind of nodes as Data dominant nodes (Dnode). These nodes suffer from the shortage of energy when they try to transmit data. Then, the nodes tend to wait until sufficient energy arrives and the length of data queue may be increased. To resolve the energy shortage, an appropriate transmission policy to reduce energy consumption needs to be applied.
2.2 Multislot method for energy dominant nodes
We define Enode as the energy dominant node which has a larger arrival rate of energy than that of data traffic. The Enodes are likely to have excessive energy and they tend to wait for the arrival of data traffic. With the conventional MAC, e.g., Frame Slotted ALOHA (FSA), the unused energy may keep accumulating in the energy queue. So, in order to increase the energy utilization, instead of storing excessive energy, the node may need to increase the energy consumption by transmitting more data.
To use surplus energy efficiently, we propose to use multislot transmission. In the multislot contention mechanism, the node can select multiple slots for transmitting data. Then, a node attempts to transmit data in selected time slots until the successful data transmission. If data transmission succeeds in the middle of the selected time slots, the node quits the procedure. FigureÂ 1 shows an example of the multislot transmission of a certain Enode. At first, an Enode selects 4 time slots according to the multislot transmission scheme. In the first and the second selected slots the Enode fails to send data due to collisions with other nodes. Data is sent at the third selected time slot and the Enode ends the data transmission and does not operate in the fourth selected time slot. Then, the node consumes 3 energy units to transmit one data packet. Since the Enode consumes more energy for the transmitted data unit, the imbalance difference between energy units and data units at the beginning of the Enode is mitigated from 3 to 2 (Fig.Â 1). However, the number of multiple slots to be selected should be carefully determined since the degree of balance between data and energy among the nodes is affected by that and a congestion problem may arise by the heavy multislot transmission mechanism.
2.3 Highrate method for data dominant nodes
The Dnode is defined as the data dominant node which generates more data traffic than energy traffic. For the Dnode, data packets tend to accumulate in the data queue due to insufficient energy to use. So, Dnodes are likely to suffer from the energy shortage and arriving packets can wait long in the queue. In this case, it is desirable to transmit data packets with less energy.
To decrease the energy consumption, we propose highrate transmission. Highrate transmission can be done by shortening the operation time to reduce the energy consumption. The amount of energy consumed in the active state is known to be significant. In highrate transmission, the node boosts the data rate by changing the modulation scheme and sacrifices the reliability of transmission. Then, the node can pump out more data packets in a slot and the transmission time of a single data packet can be reduced. If a node succeeds in the random access, it attempts to transmit as many data packets as possible in the data queue. FigureÂ 2 shows an example of the highrate transmission of a certain Dnode. In the first frame, a Dnode succeeds and the 3 data packets can be transmitted in a time slot. Then, the Dnode unburden 3 data packets using one energy unit. The imbalance nature of the Dnode is resolved using the highrate transmission. However, in the second frame, the Dnode fails to transmit data due to a collision. As collision happens, the Dnode quits the transmission in a frame. Still, the Dnode only consumes one third of the energy unit as it uses 3 times higher data rate. However, the data rate of a node should be carefully selected since the reliability of packet transmission decreases as the rate increases. Thus, appropriate selection of rate is required.
2.4 Channel model
An IoT node utilizes Phase Shift Keying (PSK) for transmitting data to the sink node. We consider the Additive White Gaussian Noise (AWGN) channel during the data transmission. The noise is modeled as a white Gaussian random process with zero mean and Power Spectral Density (PSD) \(N_0/2\). Due to channel error, transmitted data from a node might be lost at the sink node. We assume that the data transmission fails when IoT nodes collide in the same time slot due to high level of interference. So, only the noise can be reasonably considered to distinguish the successful transmission in the PHY layer. The SignaltoNoise Ratio (SNR) can be defined
where \(P_{\mathrm{r}}\) and B are the received power and the bandwidth. Total noise power within the bandwidth 2B is \(N=N_0/2\cdot 2B=N_0B\). In terms of energy per bit (\(E_{\mathrm{b}}\)) or energy per symbol (\(E_{\mathrm{s}}\)), the SNR can be
where \(T_{\mathrm{b}}\) and \(T\) are the bit time and symbol time.
We use a symbol error rate to determine the successful transmission of an IoT node. For an Mary PSK, the symbol error rate can be modeled as [20]
where \(Q(\cdot )\) is the Qfunction. If a node wins the MAC layer contention and the transmission is successful, the node gains the reward and it is reflected to the corresponding Qvalue matrix.
2.5 Proposed learning MAC
The proposed learning MAC utilizes both the multislot and highrate schemes. In the learning MAC, nodes operate based on the frame broadcast by the sink node. Nodes select one of the method by comparing the energy and data queues. If the amount of energy is larger than the number of data packets in the queue, a node chooses multislot scheme. Otherwise, a node selects the highrate scheme for data transmission. Then, nodes are required to determine the parameters used in the multislot and the highrate scheme. In the multislot scheme, nodes need to decide the number of slots to be selected. On the other hand, in the highrate scheme, the factor for boosting data rate is needed. The detailed process of parameter settings is described in Sect. 4. Using the methods and the parameters, nodes perform contention in the current frame. At the end of the frame, the energy and data queue states will be updated by the contention. In the next frame, the node operates based on the updated queue states.
FigureÂ 1 shows an example of the proposed learning MAC protocol with multislot and highrate transmissions. The nodes 1, 2, and 5 are the Enodes while the nodes 3, 4, and 6 are the Dnodes. The Enodes need to select multiple slots to utilize the surplus energy, while the Dnodes need to apply the highrate strategy to reduce the transmission time of a node. The node 1 selects slots 1 and 7. Since the node 1 succeeds in slot 1, it does not transmit in slot 7. Collision between node 2 and node 5 occurs in slot 5. Node 5 recovers from the collision in the second transmission attempt in slot 6. Nodes 3, 4, and 6 terminate the successful highrate data transfer to transmit more bits (three times) in slots 3, 4, and 7 to save the energy. As mentioned in the strategy, a node should select the transmission policy in a wise way. Considering dynamic arrival rates of energy and data of a node, it is desirable to track the optimal MAC parameters, i.e., number of multiple slots and transmission rate in a slot.
3 Learning MAC with Qlearning
The proposed learning MAC is affected by the parameters selected by the node. To do this, a central coordinator may collect the nodesâ€™ status and make decision. However, the process of collecting and making decision may cause the signaling overhead. Also, the status of nodes varies over contention process instantaneously. So, the centralized coordination scheme may not be a viable solution. As a solution, we apply Qlearning to each node to find its best strategy. Qlearning is a method to estimate the available actions by scoring based on the result caused by the actions. Based on its own previous choices, nodes find the optimal action. Using Qlearning, nodes can learn its current best parameters with the minimal interaction.
3.1 Qlearning
Qlearning is one of the reinforcement learning techniques that can be used to find the optimal action using the reward by learning. The agent, the learner, utilizes Qlearning to learn the optimal policy by interacting with its environment. Let S be the possible states and A be the possible actions of the agent. The learner senses its state \(s_t \in S\) and chooses an action \(a_t \in A\) based on its state at time t. After the action taken, the agent moves to the new states \(s_{t+1}\) with the probability of \(P_{s_t, s_{t+1}}\). Then, the learner receives its reward r(s,Â a). The objective of the learner is to find the optimal policy \(\pi ^*(s)\), which maximizes the cumulative reward \(r_t = r(s_{\mathrm{r}}, a)\) over time. In the considered network and problem setup, the optimal criterion is to minimize the data and energy queue levels of an IoT node.
The total discounted return over an infinite time is
where \(\gamma\) is the discount factor from 0 to 1. The value function \(V^{\pi }(s)\) can be further expressed as [21]
where \(R(s,\pi (s))\) is the expectation of r(s,Â a) and \(P_{s,s'}\) is the transition probability from state s to \(s'\). Applying the Bellman optimality criterion [22], which shows the existence of at least one optimal strategy, the value function is
For a policy \(\pi\), action a is taken at state s. Then, the expected return value, Qvalue, is
When the optimal policy \(\pi ^*\) is applied, the Qvalue can be defined as
Plugging Eq. (5) into Eq. (3), we get
Therefore, the optimal value function can be obtained from the maximized \(Q^*(s, a)\). Using the result of Eq. (6), the Qvalue can be expressed as
Let the learner performs action \(a_t\) in state \(s_t\) at time t. Then, the state changes to \(s_{t+1}\) and the learner returns the immediate reward \(r_{t+1}\). In the Qlearning process, the optimal action can be found iteratively. So, the learner updates the Qvalues as follows.
where \(\alpha\) and \(r_{t+1}\) are the learning rate and the reward observed after performing \(a_t\) in \(s_t\). The learning rate affects the update rate while the discount factor controls the importance of the future values. It is known that the Qvalue \(Q(s_t, a_t)\) converges to the optimal value \(Q^*(s,a)\) as the pairs of stateaction are performed.
3.2 Network model for Qlearning
We consider a networks of N nodes. An energyharvesting node is equipped with the energy queue and the data queue. Each queue stores the harvested energy and the data. The energy is assumed to be quantized to be buffered in the energy queue. For a basic data rate, a node requires unit energy to transmit a data packet.
3.2.1 State
Let the state be the difference between the state of energy and that of data in the queues. High value of the difference denotes the severe unbalance between energy and data. At the beginning of each frame, nodes set their states as follows.
where \(E_{q,t}\), \(D_{q,t}\), and \([\cdot ]\) refer to the energy queue, data queue states at the frame time t, and the nearest integer function. The state ranges from 0 to M, where M is the maximum capacity of each of the queues.
3.2.2 Action
Depending on the state, a node can select an appropriate action. Since the Enode operates by multislot transmission, the number of slots to be selected becomes the available actions. The available actions of an Enode can be written as
where l and \(l_{\mathrm{max}}\) are the number of slots and the maximum number of slots to be selected. The Dnode uses highrate transmission. Then, the available actions of the Dnode can be defined as the number of bits in a symbol to be used for transmission. The available actions of a Dnode is defined as
where g and \(g_{\mathrm{max}}\) are the number of bits in a symbol and the maximum number of bits in a symbol to be used in the highrate transmission. The transmission rate is g multiples of the unit rate.
3.2.3 Reward
We define the reward as the change of difference after a contention. If the difference decreases after a contention, it can be concluded that the node is moving toward an appropriate direction. The reward induced from action a at the state \(x_t\) is
where \(r_1\) and \(r_2\) indicate the weights of the first and the second term. The first term accounts for the change of the difference between energy and data. Decreasing the unbalance between energy and data contributes to the positive reward. The second term is for the overall queue states of a node. As the queues build up, the reward decreases.
3.2.4 Next state
After the contention in a frame, the queue states of nodes may change. If a node succeeds in contention, both energy and data queue states decrease. In the multislot policy, the amount of energy consumption depends on the number of transmission attempts. For highrate nodes, the energy consumption varies according to the selected number of bits in a symbol.
3.3 Proposed Qlearning mechanism
The nodes construct the stateaction matrix to manage the Qvalues. The rows of the matrix indicate the current states and the columns of the matrix indicate the actions. If a node is an Enode, it uses a multislot Qvalue matrix \(Q_E\) and if a node is a Dnode, it uses a highrate Qvalue matrix \(Q_D\). Using the stateaction matrix, the nodes retrieve the Qvalues.
where \(u_{ij}\) and \(v_{ij}\) denote the expected Qvalue when the action is taken to change state from i to j. For example, when a node tries to access the channel, it first checks the difference between the energy level and the data queue length. If a node has more energy than data, it become an Enode and utilizes \(Q_E\) matrix. Otherwise, a node is treated as a Dnode and it uses \(Q_D\) matrix. Then, the optimal action (MAC parameter assignment) in a frame is determined by
where \(\lfloor \cdot \rfloor\) is the floor function. Since a node can transmit within the current energy state (\(E_{q,t}\)), the number of slots to be selected is limited by the current energy level.
With the achieved MAC parameter, the nodes perform contention with the selected mode. After each frame, the energy and data queue states of the nodes change by the learning actions. Based on the changed queue states, the nodes update the stateaction matrix as follows.
The nodes then learn the optimal action at each state by updating the stateaction matrix. After a certain learning time, nodes can choose the best action and the data and energy queue states are expected to be balanced (Fig.Â 3).
4 Results and discussion
We conduct simulations to verify the performance enhancement of our transmission strategy for learning nodes. In simulations, nodes have random arrival rates of data and energy in a predefined range (see TableÂ 1). At the beginning of a frame, a node determines the action based on the Qmatrix and performs channel access. During the contention in a frame, newly arrived data and energy are put into the queues. Then, the Qmatrix is updated according to the queue status of the current and the previous frame. The conventional FSA protocol is chosen as a comparative scheme. In FSA, nodes perform channel access if at least one energy and data unit exist. Otherwise, they operate in the sleep mode. Simulations are conducted for 10,000 frames. The parameters used in the simulation are shown in TableÂ 1.
FigureÂ 4 shows the change of the difference between the energy queue and the data queue for Qlearning iterations (frames). FigureÂ 4a indicates the node with similar energy arrival rate and data arrival rate. As the Qlearning mechanism evolves, the difference between the queues fluctuates over time. Since nodes perform contentions, the difference may suddenly rise up sometimes due to the randomness. However, the node recovers below a certain level by utilizing the proposed method. FigureÂ 4b refers to the Dnode, in which the arrival rate of energy is bigger than the arrival rate of data. The difference is alleviated after 200 Qlearning iterations. After that, the difference of the queue states, becomes under control. By transmitting more data with reduced amount of energy, the imbalance problem is mitigated. Finally, Fig.Â 4c stands for the Enode, with larger data arrival rate than the energy arrival rate. At first, the difference starts with the small amount of imbalance. When the difference becomes larger, the degree of imbalance is mitigated after 250 Qlearning iterations (from 150 to 400). By inducing the consumption of excess energy to transmit data, multislot schemes resolve the imbalance problem. The fluctuation of difference swings more than that of the case (b). As the multislot method largely depends on the contention result, the effect of resolving imbalance is weaker than that in the highrate method.
FigureÂ 5 shows the energy and data queue states of the individual nodes. The data and energy arrival rates are randomly chosen from 1 to 1.5. In the FSA, the frame size is the same as the number of nodes (\(N=20\)) while the frame size is set to 4 times to the number of nodes in the proposed scheme due to the multislot operation. For the proposed learning MAC, the queue states of the nodes are distributed along the diagonal line. With the learning capability, the nodes take actions to balance the energy and data. Depending on the arrival rates of energy and data, the balancing point is appropriately determined. The proposed scheme mitigates the imbalance and the queue buildup problem. However, in the FSA scheme, queue states are shown to be deployed in the upper right corner area. Since the nodes suffer from the imbalance problem, the data packets and energy are unnecessarily builtup in the queues. At first, the data queue states are higher than the energy queue states. Energy can be consumed by successful transmissions, but that is not the case due to collisions and energy starts to accumulate. Then both energy and data queues become almost full.
FigureÂ 6 shows the saturation throughput for varying frame size. It is shown that the throughput of proposed learning MAC outperforms than that of the FSA. Since the multislot transmission improves the success probability in the contention, the number of packets transmitted in a certain time increases. Also, the highrate transmission contributes to the throughput improvement by saving the transmission time of the nodes. In the proposed learning MAC, the optimal frame size changes for different data and energy rate conditions. As the data arrival rate increases, the use of multislot is needed and the optimal frame size increases.
FigureÂ 7 indicates the throughput performance for varying numbers of nodes. In this simulation the frame size is set to 60. To implement different percentage of Dnodes and Enodes, we generate different energy and data arrival rates. For example, in the Dnode dominant case we set the higher data arrival rate. As the number of nodes increases, the throughputs improve since the proposed MAC manages various imbalances. Also, the Enode dominant case shows the better performance than the other cases. Since nodes are expected to use multislot scheme, the transmitted data increases. When Dnodes are dominant, nodes are likely to utilize highrate method. Then, the channel error rate might be increased since the transmission is performed with the reduced amount of bit energy.
5 Conclusion
We have proposed a new learning MAC protocol for energyharvesting nodes to resolve the imbalance between energy and data. For Enodes, multislot policy is used to enhance the success probability and energy efficiency. Highrate policy is utilized for Dnodes to decrease the energy consumption. The optimal MAC parameters depending on the data and energy queue states are automatically learned by the nodes using the Qlearning mechanism. Thus nodes learn the optimal actions for every queue states. The performance evaluation shows that our new learning MAC protocol with learning nodes outperforms in terms of the queue sizes and the network throughput. Nodes are shown to appropriately flush out data and energy in the queues and to achieve better throughput and lower packet drop rate.
Abbreviations
 IoT:

Internet of Things
 RF:

radio frequency
 PDP:

packet drop probability
 MDP:

Markov decision process
 QoS:

quality of service
 MAC:

medium access control
 NACK:

negative acknowledgment
 Enode:

energy dominant node
 Dnode:

data dominant node
 FSA:

framed slotted ALOHA
 PSK:

phase shift keying
 AWGN:

additive white Gaussian noise
 PSD:

power spectral density
 SNR:

signaltonoise ratio
References
L. Atzori, A. Iera, G. Moralbito, Internet of Things: a survey. Comput. Netw. 54(1), 2787â€“2805 (2012)
J. Gubbi, R. Buyya, S. Marusic, M. Palaniswamia, Internet of Things (IoT): a vision, architectural elements, and future directions. Future Gener. Comput. Syst. 29(7), 1645â€“1660 (2013)
P. Kamalinejad, C. Mahapatra, Z. Sheng, S. Mirabbasi, V.C.M. Leung, Y.L. Guan, Wireless energy harvesting for the Internet of Things. IEEE Commun. Mag. 53(6), 102â€“108 (2015)
S. Sudevalayam, P. Kulkarni, Energy harvesting sensor nodes: survey and implications. IEEE Commun. Surv. Tutor. 13(3), 443â€“461 (2011)
A.Â Biason, M.Â Zorzi, Transmission policies for an energy harvesting device with a data queue. In Proceedings of International Conference on Computing, Networking and Communications (ICNC) (2015), pp. 189â€“195
D. Liu, J. Lin, J. Wang, X. Chen, Y. Chen, Dynamic power allocation for a hybrid energy harvesting transmitter with multiuser in fading channels. In Proceedings of IEEE Vehicular Technology Conference (VTC) (2016), pp. 1â€“5
J. Yang, S. Ulukus, Optimal packet scheduling in an energy harvesting communication system. IEEE Trans. Commun. 60(1), 220â€“230 (2012)
D.D. Testa, N. Michelusi, M. Zorzi, Optimal transmission policies for twouser energy harvesting device networks with limited stateofcharge knowledge. IEEE Trans. Wirel. Commun. 15(2), 1393â€“1405 (2016)
O. Ozel, K. Tutuncuoglu, J. Yang, S. Ulukus, A. Yener, Transmission with energy harvesting nodes in fading wireless channels: optimal policies. IEEE J. Sel. Areas Commun. 29(8), 1732â€“1743 (2011)
Q.Â Bai, J.A.Â Nossek, Joint optimization of transmission and reception policies for energy harvesting nodes. In Proceedings of International Symposium of Wireless Communication Systems (2015)
N. Michelus, M. Zorzi, Optimal adaptive random multiaccess in energy harvesting wireless sensor networks. IEEE Trans. Commun. 63(4), 1355â€“1372 (2015)
M.K. Sharma, C.R. Murthy, On the design of dual energy harvesting communication links with retransmission. IEEE Trans. Wirel. Commun. 16(6), 4079â€“4093 (2017)
A. Yadav, M. Goonewardena, W. Ajib, O.A. Dobre, H. Elbiaze, Energy management for energy harvesting wireless sensors with adaptive retransmission. IEEE Trans. Commun. 65(12), 5487â€“5498 (2017)
D. Silver et al., Mastering the game of go with deep neural networks and tree search. Nature 529, 484â€“489 (2016)
W.H.R. Chan, P. Zhang, I. Nevat, S.G. Nagarajan, A.C. Valera, H.X. Tan, N. Gautam, Adaptive duty cycling in sensor networks with energy harvesting using continuoustime Markov Chain and fluid models. IEEE J. Sel. Areas Commun. 33(12), 2687â€“2700 (2015)
K.J. Prabuchandran, S.K. Meena, S. Bhatnagar, Qlearning based energy management policies for a single sensor node with finite buffer. IEEE Wirel. Commun. Lett. 2, 1 (2013)
S. Padakandla, K.J. Prabuchandran, S. Bhatnagar, Energy sharing for multiple sensor nodes with finite buffers. IEEE Trans. Commun. 63, 5 (2015)
F. Ahmed, H.S. Cho, A timeslotted data gathering medium access control protocol using Qlearning for underwater acoustic sensor networks. IEEE Access 9, 48742â€“48752 (2021)
B. Yang, X. Cao, Z. Han, L. Qian, A machine learning enabled MAC framework for heterogeneous InternetofThings networks. IEEE Trans. Wirel. Commun. 18, 7 (2019)
A. Goldsmith, Wireless Communication (Cambridge University, Cambridge, 2005)
C.J. Watkins, Learning from delayed rewards. Ph.D. dissertation, Cambridge University, Cambridge, UK (1989)
C.J. Watkins, P. Dayan, Qlearning. Mach. Learn. 8(3â€“4), 279â€“292 (1992)
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) through the Korean Government Ministry of Science and ICT (MSIT) under Grant NRF 2021R1A2B5B01002661.
Author information
Authors and Affiliations
Contributions
YKâ€™s contribution is to write the paper and conduct performance analysis and simulations. TJLâ€™s contribution is to write and revise the paper, and to guide the direction and organization of the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kim, Y., Lee, TJ. Learning nodes: machine learningbased energy and data management strategy. J Wireless Com Network 2021, 176 (2021). https://doi.org/10.1186/s13638021020476
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13638021020476
Keywords
 Energyharvesting
 Transmission policy
 Qlearning
 IoT