Analysis of opportunistic spectrum access in cognitive radio networks using hidden Markov model with state prediction

We propose a study using hidden Markov model (HMM) with state prediction for opportunistic spectrum access (OSA) in cognitive radio (CR) networks. The primary channels are assumed to be operating in a TDMA manner with synchronous time slots of equal length and alternating between idle and busy states. The secondary users (SUs) may use them when they are idle by channel sensing. In contrast to the traditional scheme relying only on channel sensing for exploring spectrum opportunities, the proposed prediction scheme takes advantage of state prediction, channel sensing, and acknowledgments (ACKs) from the receiver in an attempt to maximize the utility. In the prediction scheme, there are three distinct actions: direct skip, sensing then conditional access, and direct access. We impose some constraints on the system parameters and derive thresholds by which we can specify the optimal action. We then conduct simulations to compare the performance of the prediction scheme to that of the traditional scheme. Results show that the former is superior to the latter. We believe the proposed prediction scheme is suitable for the OSA system using spectrum sensing.


Introduction
The spectrum resources are rare, and a large part of the spectrum licensed for various applications are not fully used in times or in spaces. These facts motivate the use of cognitive radio (CR) networks coexisting with the licensed primary networks [1,2]. In CR networks, the secondary users (SUs) are allowed to utilize the frequency channels assigned for the primary users (PUs) when they are currently not being used. This kind of spectrum sharing is referred to as opportunistic spectrum access (OSA) or spectrum overlay. To detect the spectrum opportunities, spectrum sensing may be used. In [3][4][5][6], the authors consider the OSA system in which an ad hoc CR network operating with a primary network for which N frequency channels are assigned. The channels are assumed to be operating in a TDMA manner with synchronous time slots of equal length and alternating between idle and busy *Correspondence: jwwang@asia.edu.tw 1 Department of Photonics and Comm. Engineering, Asia University, No. 500, Lioufong Rd., Wufong District, Taichung City, Taiwan Full list of author information is available at the end of the article states. The SU may use them when they are idle by channel sensing.
To effectively implement the concept of CR networking and alleviate the processing delays, spectrum prediction technique has been employed in the functions of spectrum sensing, spectrum decision, and spectrum mobility [7][8][9][10]. With spectrum prediction, SUs may skip the sensing duty on channels that are predicted to be busy or directly access when channels are predicted to be idle. In this way, sensing time and energy consumption may be reduced.
CR networks can operate in either centralized or distributed mode [11]. In the centralized mode, a central controller is implemented to perform spectrum decisions and resource allocation. In contrast to the centralized mode requiring a central controller, the distributed mode does not rely on any central controller. In this mode, SUs may cooperate by exchanging their own local information to enhance the sensing performance or operate independently to pursue their own benefits [12].
Many researchers have used partially observable Markov decision process (POMDP) to model distributed OSA networks [3][4][5][6]. The idea to apply the POMDP model arises from the considerations that states of the system cannot be totally observed. Optimal action is determined by using the partial information and the history of observations. Assuming that the channel statistics are the same within T time periods, the optimal value functions for each system state are solved for a T-finite horizon optimal problem in a recursive manner. The problem is quite involved because there are a total of 2 N system states [4,5]. In a POMDP model, the actions taken by the agents control the state transitions of the hidden process. Different actions taken in a time change the future behavior of the process, so we have to solve the problem based on finite or infinite horizon.
In this work, we consider the OSA system with distributed mode. We assume that channels are statistically independent and focus on the treatment for a single channel. With imperfect sensing and by considering the fact that actions taken by SUs do not affect the evolution of the channel state, we thus model the system as a hidden Markov model (HMM) [13][14][15][16][17][18][19][20]. An HMM can be considered as consisting two processes: the variation of the hidden states is a Markov process and the observation under a specific hidden state is a random process. This model is suitable for modeling Markov process with imperfect sensing. Operating in this model, SUs try to track the state of the hidden process and use the channel when idle state is detected. As actions do not control the hidden process, the current action does not affect the value function in the future. Thus, in finding optimal policy, no recursive procedures are required. Using the HMM in conjunction with state prediction, we derive thresholds on the probability space for the action policy, which specifies the optimal action achieving maximum expected utility based on the predicted probability.
The rest of this paper is organized as follows. In Section 2, we give a description of the system. Section 3 describes actions and derives the decision rule. Simulation results are given in Section 4 and finally we draw conclusions in Section 5.

System description
The same OSA system as studied in [3][4][5][6] is considered in this work. In the OSA system we assume that a dedicated control channel is available for SUs in the system to exchange signaling messages, as suggested in [21]. We also assume each primary channel is characterized by a two-state Markov chain. As SUs cannot know the channel state exactly, we further model each channel as an HMM, which may be viewed as a discrete-time bi-variable random process {S(t), O(t)}, where t = 1, 2, · · · , is the discrete time, S(t) is the hidden process, and O(t) is the observable process having states as the hidden process.
For the HMM under considerations, we make some assumptions stating as follows [14]. The first one is the Markov property, which means the next state depends only on the current state and independent of previous states. The second one is the stationary property, which states that the transition probabilities are independent of the actual time the transitions take places. The third one is the observation independence assumption; that is, the current observation is statistically independent of previous observations. The hidden process S(t) of a channel is characterized by two transition probabilities α and β, where α is the probability of transition from idle state (s 0 ) to busy state (s 1 ) and β is the probability of transition from s 1 to s 0 . The observable process also has idle and busy states. However, due to imperfect sensing, observations may be incorrect. When a busy state is observed, denoted by o 1 , if the hidden process is actually idle, we say a false alarm occurs and denote this probability by p fa ; otherwise we have a correct detection, denoting this probability by p d . Thus, (1 − p d ) is the miss detection probability. Perfect sensing means p d = 1 and p fa = 0, but it is extremely hard to achieve in practice. A spectrum opportunity detector can be considered as performing a binary hypotheses test. As can be seen from a receiver operating characteristic curve, a smaller value of p fa implies a larger value of (1 − p d ). This results in more packet collisions. Contrarily, a larger p fa implies a smaller (1 − p d ) but a large part of the unused channel resources will be wasted. In general, we would like to make p d as large as possible and p fa as small as possible. However, these are usually conflicting objectives. To solve the problem, we may resort to Neyman-Pearson criterion [22]; that is, determining a value of p fa which is acceptable and seeking a decision strategy on the channel energy that constrains p fa to this value while simultaneously maximizing p d .
Similar to [6,16], we assume energy detection is used for spectrum sensing. Consider two binary hypotheses H 0 and H 1 , where H 0 and H 1 denote the absence and presence of a primary user, respectively. A test statistic T is formulated to distinguish between H 0 and H 1 by a predefined threshold λ as follows: Spectrum sensing is equivalent to detecting channel energy. The threshold λ is determined by the constraint on p fa and reflects the effect of noise uncertainty on spectrum sensing. For the observable process, we may first give a constraint on p fa and then determine p d . Thus values of p d and p fa can be known [8,19]. The state-transition diagram of a hidden process is shown in Figure 1, in which the observable process is also shown.

Actions and decision rule
In the traditional scheme for the OSA system, at the beginning of a time slot, the SU first senses the channel. If the outcome is idle, the SU then transmits over the channel; otherwise it skips the rest of the slot. Thus, no attempts are made to track the state of the channel. In this work, we propose to use state prediction in conjunction with channel sensing. In contrast to the traditional scheme always performing sensing in each slot, the prediction scheme may skip the slot or directly transmit without sensing, so as to avoid sensing as much as possible. As the concept of Bayesian filters plays a vital role in the proposed scheme, before going into further details, we first describe the concept behind the Bayesian filter.

Bayesian filter
Let π (t) = [π 0 (t) π 1 (t)] be the belief probability vector of a channel at time t, where π i (t) is the probability of the hidden process staying in state s i . A Bayesian filter may be used to update current belief given previous belief π (t −1) and current observation o m , that is, where p ij denotes the transition probability, from state s i to state s j , of the hidden process and p(o m |s k ) is the probability of the occurrence of observation o m , given the process staying in state s k . Letπ j (t) denote the one-step ahead prediction probability of the hidden process being in state s j , expressed bŷ Therefore, we may rewrite (2) as , j = 0, 1. (4) These equations show that an update of the Bayesian filter may be viewed as composed of two major steps. The first step is to predict the one-step ahead probability vector by the transition probabilities and the previous belief vector, as in (3). The second step is to update current belief by the predicted probability vector and the observation, as in (4). However, obtaining observations requires channel sensing and doing so consumes energy. If the hidden process is predicted to be idle with high probability, one may avoid channel sensing and directly transmit. Similarly, if the channel is predicted to be busy with high probability, channel sensing may be avoided as well. These considerations are the motivation of this work.

Actions and probability update
In contrast to the traditional scheme relying only on channel sensing, the proposed prediction scheme takes advantage of state prediction, channel sensing, and acknowledgments (ACKs) from the receiver in an attempt to maximize the utility. Consider a particular channel for an SU. Let b(t) andb(t) denote the belief probability and the prediction probability, respectively, of the channel being in idle state at time slot t. As in (3), the SU may predict After the prediction, ifb(t) is quite low, the SU directly skips the time slot for energy savings and then sets b(t) =b(t) as no observations are available. This action is referred to as direct skip, denoted by a 1 . Whenb(t) is quite high, the SU directly transmits over the channel. This action is referred to as direct access, denoted by a 3 . The belief update for this action depends on the transmission result and will be discussed later. Whenb(t) is moderate, the SU first performs channel sensing. If the outcome is o 1 , the SU skips the rest of the slot and updates current belief fromb(t) by The update in (6) tends to lower the prediction probability in the next slot and in turn helps induce action a 1 to be chosen because whenb(t) is moderate, in the denominator, the first term may be quite small compared to the second term. If the sensing outcome is o 0 , the SU then transmits over the channel. This action is referred to as sensing then conditional access, denoted by a 2 , which is also the only action in the traditional scheme.
In this work, as similar to [6], we assume that at the end of each time slot, the corresponding receiver will send an ACK, through the dedicated control channel, to inform the SU whether the transmission is successful. Assume also that packets involved in a collision are collapsed; that is, capture effect is ignored. Upon receiving the ACK after a transmission by action a 2 or a 3 , the SU updates the belief as follows. If a positive ACK is received, this means the channel is definitely in idle state, so the SU sets current belief to b(t) = 1. When a negative ACK (NACK) is received (timeout is equivalent to NACK) after taking action a 2 , as already having an observation o 0 , the SU first updates the belief by and then further updates b(t) by where ε is the channel error rate. When a NACK is received after taking action a 3 , the SU updates b(t) by Note that if the channel is assumed to be error-free, the SU sets b(t) = 0 when a NACK is received. In fact, both updates in (8) and (9) may be as small as zero because the error rate usually is quite small compared to the other quantities.

Formulation of decision rule
Before we can formulate the decision rule, we have to specify the utility for each action. In the OSA system, we assume the SU can earn positive rewards only from successful transmissions. Suppose the SU has to pay a cost for a transmission, as doing so consuming energy. If the transmission is successful, the SU receives a reward. When the transmission collides with a PU's transmission, the SU has to pay a cost for the collision but no further cost is paid when collapsed by the channel noise. As the transmission cost itself is a constant quantity and the channel error rate plays its role as a weighting factor on the utilities, both of them may be incorporated into the reward for a successful transmission, R succ , and the cost for a collision, C coll . Channel sensing consumes energy, so each time the SU senses a channel is assumed to be charged for a cost C sens . In addition, we assume that C coll > R succ > C sens .
For clarity, in the following expressions, we may omit the time variable t onb(t) and b(t) if no ambiguity in time instants. After the prediction probability update described in (5), if the SU takes action a 1 , we assume that the SU gets no rewards and pays nothing. So the utility function is The utility function for action a 3 may be expressed as When action a 2 is taken and an o 0 is observed, the utility as a function of current belief b is If an o 1 is observed, the utility is the sensing cost paid, that is, In order to compare the utility of action a 2 with others, we have to transform the belief b in (12) and (13) (4), we have where p(o 0 |s 0 ) = 1 − p fa and p(o 1 |s 1 ) = p d . With this transformation, the utility when action a 2 is performed may be computed by Having determined all the utility values as a function of b, we now proceed with the formulation of the decision rule.
From (15), U a 2 (b) may be further expressed by By (10), (11), and (16), we make the following observations. The utility functions are all linear over the whole probability space. For any given set of parameters, we can specify a distinct interval in the probability space for each action which maximizes the utility. However, for real applications, we have the following concerns. At the extreme pointb = 1, we have Thus, U a 3 (1) is definitely larger than both U a 2 (1) and U a 1 (1). In addition, we desire that U a 2 (1) > U a 1 (1). To meet this condition, we may set p fa < 1 − (C sens /R succ ). This constraint is reasonable because a smaller p fa is desirable. At the other extreme pointb = 0, we find Clearly, U a 1 (0) is the maximum one. In order to have U a 3 (0) < U a 2 (0), we require C coll (1 − p d ) + C sens < C coll . A setting of p d > C sens /C coll satisfies the inequality. We find this constraint reasonable as a higher p d is desirable. With the above treatments, atb = 1, we have and U a 1 (b) are not parallel to each other and any two lines intersect at one point. For illustrating purposes, we in the rest of this paper focus on this scenario. However, the idea applies to other scenarios as well.
Denote the prediction probabilityb at which U a 3 (b) and U a 2 (b) intersect by η 1 and U a 2 (b) and U a 1 (b) intersect by η 0 . Obviously, depending on the parameters, we have either η 1 > η 0 or η 1 ≤ η 0 . By equating (10) and (16) and solving forb, we obtain Similarly, by equating (11) and (16), we get If η 1 > η 0 , we may formulate the decision rule for the prediction scheme as: We show an illustrating example of the utility functions under this condition in Figure 2. If η 1 ≤ η 0 , we have to specify another threshold η 2 , denoting theb at which U a 1 (b) and U a 3 (b) intersect, that is, Based on η 2 , the decision rule may be formulated as Figure 2 Illustrating the utility functions for the case η 1 > η 0 . Figure 3 shows an illustrating example of the utility functions of which If channels are statistically independent, then extension to the case of multiple channels is straightforward. Suppose the SU is capable of treating a number of channels up to l. If l ≥ N, all the N channels can be treated simultaneously; otherwise the first l channels having the largest utility values are chosen with potential ties being broken randomly.

Effects of transition probabilities
For the prediction scheme, when updating from a given b(t) = 0, we prefer that the SU first chooses action a 1 Figure 3 Illustrating the utility functions for the case η 1 < η 0 . a number of times and then chooses action a 2 . On the other hand, when updating from a given b(t) = 1, we prefer the SU performs action a 3 consecutively before receiving a NACK. In each update, the prediction probabilityb(t) depends heavily on the transition probabilities. We thus examine the effects of the latter on the proposed scheme to see under what conditions would the preferences be satisfied. Suppose action a 1 is consecutively taken n times starting with a given b(t). With some algebra manipulations, we may express the n-step ahead prediction probabilityb(t + n) aŝ From this equation we make some observations, distinguished by different cases, as follows.
reaches the minimum value, which is below β/(α + β), in the sequence ofb(t + n). As the rate of change of the state is fast, the actions in adjacent time slots tend to be different so as to follow the channel state. Performing the same action a 1 (or a 3 ) in consecutive time slots is seldom.
From these observations, we recognize that the case α + β < 1 with may fit the preferences described above. Under this condition, the prediction scheme definitely outperforms the traditional scheme. However, values of α and β are given by PUs and the SU has no choices. But it is clear that the prediction scheme does at least as well as the traditional scheme as long as utilities are of concerns.

Simulation results
In this section, we compare the performance of the prediction scheme to that of the traditional scheme by simulations. Before conducting simulations, we have to set system parameters appropriately so that the imposed constraints are met. With this consideration in mind, we first set C coll = 4, C succ = 3, and C sens = 0.4. For the given values, the constraints imposed in (17) and (18) are p fa < 0.866 and p d > 0.1, respectively. These constraints are not stringent. First, we numerically computed the thresholds. In Figure 4, we show the effects of p d on η 0 and η 1 for p fa = 0.1 and 0.2. From this figure, we observe that for all parameters considered, η 1 is larger than η 0 . This figure also shows that as p d increases, η 1 slightly increases and η 0 decreases. This change gives the SU more chances to perform action a 2 as the channel sensing becomes more reliable. In addition, we can also examine the effects of the transition probabilities on the prediction scheme. For example, when p d = 0.9 and p fa = 0.1, we find η 0 = 0.25 and η 1 = 0.82. Suppose α = β = 0.1. It is easy to check that the condition in (25) is met. In Figure 5, we show the threshold values as a function of p fa . As can be seen from the figure that we have η 1 > η 0 again for all parameters considered. In addition, as p fa increases, η 0 increases while η 1 decreases. This tendency is opposite to that shown in Figure 4. Thus, the probability of performing action a 2 decreases as the channel sensing becomes more unreliable.  With the thresholds numerically determined, we then conducted simulations to evaluate the utilities of the proposed prediction scheme and the traditional scheme. In each run, a total of 10 6 time slots were simulated for each scheme and the utilities obtained were averaged. In Figure 6, we depict the utility values achieved in error-free channels as a function of p d , where U p denotes utility of the prediction scheme and U t the traditional scheme. We see from this figure that when p d ≤ 0.6, U p is zero and U t is negative. Indeed, when the environment is too hostile, U t may become negative, but U p ≥ 0 for all situations. When p d ≥ 0.65, both utilities are larger than zero and the proposed scheme has a gain roughly a magnitude of C sens relative to the traditional scheme. In addition, we also simulated both schemes with channel errors. However, for reasonable error rates, such as 10 −4 or less, the degradation due to channel errors is not significant, several orders less than the utilities achieved by either scheme. Thus, we show the results for error-free channels only.   Figure 7 depicts the utility values as a function of p fa . We observe from this figure that the effects of p fa on the proposed scheme is not significant, but it is on the traditional scheme. This dissimilarity is mainly due to the fact that whenb is high, the proposed scheme may perform action a 3 without sensing while the traditional scheme always senses. A higher p fa may cause the traditional scheme to waste more idle slots. Finally, we show the effects of β on utility values in Figure 8. As β increases, the channel will stay in idle state more often so that the SU has more chances to utilize the channel. Thus, we can see from this figure that as β increases, both U p and U t increase. Moreover, the superiority of the prediction scheme can also be observed.

Conclusions
We proposed a study using HMM with state prediction for spectrum sensing in OSA networks. The SU may exercise one of the three distinct actions: direct skip, sense then conditional access, or direct access. We imposed some constraints on the system parameters and derived thresholds by which we can specify the optimal action which maximizes the utility. We compared the performance of the prediction scheme to that of the traditional scheme. Simulation results show that as trying to avoid sensing as much as possible, the prediction scheme may earn a gain roughly a magnitude of the sensing cost relative to the traditional scheme. Simulation results also show that when the false alarm probability is high, the performance gain can be significant. This suggests that using the prediction scheme instead of relying only on sensing results can alleviate the detriment arising from unreliable sensing, as prediction results become more reliable. With channel errors taken into account, we observe from the simulation results that the effect of channel errors on the superiority of the proposed scheme over the traditional scheme is not significant. This means that superiority exists equally in any kind of channels. Thus, we believe the proposed scheme is suitable for OSA systems using spectrum sensing.