Joint spectrum sensing and access for stable dynamic spectrum aggregation
 Wei Wang^{1, 2}Email author,
 Lingcen Wu^{1},
 Zhaoyang Zhang^{1} and
 Lin Chen^{3}
https://doi.org/10.1186/s1363801503657
© Wang et al.; licensee Springer. 2015
Received: 1 October 2014
Accepted: 17 April 2015
Published: 8 May 2015
Abstract
Spectrum aggregation is an emerging technology to satisfy the data rate requirement of broadband services for nextgeneration wireless communication systems. In dynamic spectrum environment, in which the spectrum availability is timevarying, it is quite challenging to maintain the stability of spectrum aggregation. In this paper, we investigate the spectrum sensing and access schemes to minimize the times of channel switching for achieving stable dynamic spectrum aggregation, taking into consideration the hardware limitations of spectrum sensing and aggregation capability. We develop an analytical framework for the joint spectrum sensing and access problem based on partially observable Markov decision process (POMDP). Especially, we derive the reward function by estimation of the stability of different spectrum sensing and access strategies. Based on the POMDP framework, we propose a rolloutbased suboptimal spectrum sensing and access scheme which approximates the value function of POMDP, and propose a differential training method to improve its robustness. It is proved that the rollout policy achieves performance improvement over the basis heuristics. The simulation results show that the proposed POMDPbased spectrum sensing and access scheme improves the system stability significantly and achieves nearoptimal performance with a much lower complexity.
Keywords
Cognitive radio Spectrum aggregation Spectrum sensing POMDP1 Introduction
Spectrum aggregation [1,2] enables the utilization of discrete spectrum bands or fragments to support broadband services. By spectrum aggregation, the discrete spectrum bands can provide the same transmission service as continuous spectrum bands. Recently, spectrum aggregation becomes one of the key features during LTEadvanced standardization. The performance on the system efficiency and fairness of spectrum aggregation is investigated in [3] and [4]. The energy efficiency of spectrum aggregation is also considered in [5].
The introduction of cognitive radio (CR) [6,7] increases spectrum efficiency by utilizing the spectrum dynamically, and further facilitates the application of spectrum aggregation. To exploit the instantaneous spectrum opportunities in dynamic spectrum environment, the secondary users (SUs) identify available spectrum resources by spectrum sensing and then access the available channels without interrupting primary users (PUs). Dynamic spectrum aggregation (DSA) provides a feasible way to support the broadband services in dynamic spectrum environment. With DSA, multiple available spectrum bands discovered via CR can be aggregated dynamically to fulfill the service requirement.
There have been a few existing publications on spectrum sensing and access schemes in dynamic spectrum environment. In [8], a decentralized MAC protocol is proposed for the SUs to sense the spectrum opportunities. The optimal sensing and channel selection are investigated to maximize the expected total number of bits delivered over a finite number of slots. In [9] and [10], the authors investigate the impacts of sensing errors on the system performance and try to alleviate their negative effects. In [11], by adopting the fusion strategy of collaborative spectrum sensing, the authors design a multichannel MAC protocol. However, these existing works have all focused on the cases that each user uses only a single channel without considering the cases with spectrum aggregation. In [12], we propose a Maximum Satisfaction Algorithm (MSA) for admission control and a Least Channel Switch (LCS) strategy for DSA, but the spectrum sensing and access schemes are not considered jointly. In [13], we provides some preliminary results on a general POMDP framework for cognitive radio networks.

Spectrum Sensing Limitation: Due to the limitation of spectrum sensing capability, it is not always possible to sense all the spectrum bands for a largespan spectrum. Each SU chooses only a subset of channels (i.e., a part of the spectrum) to sense. As a result, the system is lack of the perfect information on channel availability, which brings new technical challenges for DSA.

Spectrum Aggregation Limitation: Due to the hardware capability, only the spectrum bands within a certain range can be aggregated together for a single user. The spectrum aggregation range leads to an additional constraint when the SUs access the spectrum.

Channel Switch Overhead: When an SU adjusts his access strategy and switches to other channels, it is unavoidable for the channel switch to result in extra system overhead, such as rendezvous, synchronization, etc. When designing the spectrum sensing and access scheme with the overhead consideration, it is necessary to reduce as many times of channel switch as possible.
Taking the above practical issues into consideration, we propose a decisiontheoretic approach by casting the design of joint spectrum sensing and access for stable DSA in the partially observable Markov decision process (POMDP) [14] framework. In order to provide the reward function for the POMDP framework, the probability of channel switch is estimated based on Markov chain. Since the optimal solution of POMDP is very intensive computationally due to the curse of dimensionality, i.e., the computational time and storage requirements grow exponentially with the number of channels. We further introduce an approximation technique called rollout [15] to design the suboptimal joint spectrum sensing and access scheme. A heuristics is proposed first as the base policy, which can greedily choose the spectrum sensing and access actions to reduce the channel switch times. By rolling out the base policy, the proposed rollout algorithm can approximately calculate the value function defined in the POMDP framework and reduces the times of channel switch. A theoretical analysis is provided to prove the performance improvement of rollout policy over the heuristics. Furthermore, we propose a differential training method which reduces the sensitivity to approximation errors. The performance of the proposed scheme is evaluated by simulation which demonstrates that the proposed policies in the POMDP framework reduce the times of channel switch significantly, and the rolloutbased scheme achieves a nearoptimal performance compared to the optimal scheme.
The rest of this paper is organized as follows. Section 2 describes the system model and formulates the problem. Section 3 introduces the POMDP framework and the approach to estimate the access and switching probabilities. In Section 4, the rolloutbased suboptimal spectrum sensing and access schemes are proposed. Section 5 provides the performance evaluation by simulation. Finally, Section 6 summarizes this paper.
2 System model and problem formulation
2.1 Dynamic spectrum aggregation model
We consider a largespan licensed spectrum consisting of N channels, which have the same bandwidth BW. Time is slotted and the duration of each time slot is T _{ p }. The availabilities of channels, which depends on the PU activities, are modeled as the following assumption:
Assumption 1 (Channel Availability).
where S _{ n }(t)∈{0(occupied),1(idle)} denotes the occupancy state of channel n∈{1,…,N} at time slot t, which is independent over channels. □
where i _{ n }∈{0,1} and j _{ n }∈{0,1} are the nth element of the system states i and j, respectively. For simplicity of expression, we denote P _{ n }(t)= Pr{S _{ n }(t)=1}.
The SUs sense the presence of PUs and access the spectrum opportunistically in a decentralized manner. Here, we consider the spectrum sensing and access of a single SU^{b}. At the beginning of each time slot t, the SU chooses a set of channels A _{1}(t) to sense.
Assumption 2 (Spectrum Sensing).
Due to the spectrum sensing capability, the SU can only sense at most L channels, which means the size of A _{1}(t) is no more than L, i.e., A _{1}(t)≤L. When L<N, the SU only obtains the availability information of a subset of channels. □

\(\mathcal {H}_{0}\): Null hypothesis indicating that the sensed channel is available.

\(\mathcal {H}_{1}\): Alternative hypothesis indicating that the sensed channel is occupied.
If the SU obtains an incorrect sensing result \(\mathcal {H}_{1}\) when the channel state is \(\mathcal {H}_{0}\), i.e., false alarm, the SU will refrain from transmitting and a spectrum opportunity is wasted. On the other hand, if the SU obtains an incorrect sensing result \(\mathcal {H}_{0}\) when the channel state is \(\mathcal {H}_{1}\), i.e., miss detection, the SU will collide with a PU. Let P _{ f } and P _{ m } denote the probabilities of false alarm and miss detection, respectively.
Based on the spectrum sensing results, the SU aggregates a set of channels A _{2} for the data transmission with spectrum aggregation.
Assumption 3 (Spectrum Aggregation).
Due to the spectrum aggregation limitation, the SU can only aggregate the channels within Γ, which means that the channels in A _{2}(t) are within the frequency range Γ, i.e., D(i,j)≤Γ, ∀i,j∈A _{2}(t), where D(i,j) indicates the frequency distance between channel i and channel j. The total bandwidth of the available channels in A _{2}(t) should satisfy the SU’s bandwidth requirement, denoted as Υ.
2.2 Problem formulation

If R(t)≥Υ/B W, the SU reselects only A _{1}(t). The spectrum aggregation decision does not change, i.e., A _{2}(t)=A _{2}(t−T _{ p }).

If R(t)<Υ/B W, the SU has to reselect both A _{1}(t) and A _{2}(t) and trigger a channel switch.
The first two constraints indicate the spectrum sensing and spectrum aggregation limitations respectively, and the last constraint guarantees the satisfactory of the SU’s bandwidth requirement.
3 A POMDP framework for dynamic spectrum aggregation
In this section, we propose a decisiontheoretic framework for DSA based on POMDP [14]. Especially, we convert minimizing the times of channels switches into a new objective, i.e., maximizing the time interval of channel switches, and provide an approach to estimate this interval as the reward of the POMDP model, which is challenging in dynamic spectrum environment.
If the SU is able to sense the whole spectrum accurately in the network, all the elements of S(t) can be obtained and the optimization problem is a standard Markov decision process (MDP) since the channel availability states S(t) is a discretetime Markov process. However, in the practical situation with the limitation of spectrum sensing ability and the existence of sensing errors, the SU can only obtain the imperfect occupancy states of a part of channels, which means L<N and S(t) is partially and inaccurately observable. As a result, we need to cast the optimization problem into the POMDP framework, which is a particular case of MDP in which the state of the system is partially observed by the decision maker.
3.1 POMDP framework
Before the discussion of POMDP framework, we first introduce a new concept called control interval, each of which is composed of a number of consecutive time slots and delimited by channel switches. It is obvious that the length of a control interval is uncertain depending on how long time the current aggregated channels keep satisfying the SU’s bandwidth requirement. Incorporating the control interval structure, the joint spectrum sensing and access scheme are designed based on the POMDP framework, and the framework in [8] is no longer suitable.
Now, we define the key components of the POMDP framework for DSA. For simplicity of expression, we adopt A _{1}(m) and A _{2}(m) instead of A _{1}(t _{ s }(m)) and A _{2}(t _{ s }(m)), respectively.
where C _{ i } is the index of the ith sensed channel, ∀i∈{1,…,L} and C _{ start } is the starting index of the accessed channel set A _{2}(m). Define as the set of all possible actions, i.e., \(a(m) \in \mathfrak {A}\).
where δ _{ i }(m)= Pr{S(m)=iH(m)} and H(m)={a(i),Θ(i)}_{ i≥m }.
in which \(\text {Pr}\{ \Theta _{i,A_{1}} (m)= \theta \}\) can be obtained through the information provided by Equations (10) and (11).
It has been proved in [14] that the belief vector Δ(m) is a sufficient statistics for determining the optimal actions for future control intervals.
A policy is said to be stationary if the mapping μ _{ m } only depends on the belief vector Δ(m) and is independent to the number of remaining control intervals m. Denote the set of stationary policies as Π _{ s }, and it is usual to restrict the set of policies to Π _{ s } in POMDP. In our framework, a spectrum sensing and access scheme are essentially a policy of this POMDP.
which means that over the finite time horizon, the longer the control intervals are, the less expected total times of channel switches will occur, and our objective can be converted into maximizing the average reward.
For control interval m, a set of accessed channels A _{2}(m) is determined according to the belief vector Δ(m). To evaluate the reward of A _{2}(m), we first define the access probability and the switching probability as follows.
Definition 1 (Access Probability).
Definition 2 (Switching Probability).
Both the access probability ζ and the switching probability ξ can be calculated based on the sensing and access action a, which will be discussed with details in the next subsection. We omit a in the notations of both probabilities for simplicity of expression.
3.2 Estimation of access probability and switching probability
In order to obtain the reward function of POMDP, the access probability ζ and the switching probability ξ need to be estimated.
Based on the distribution of R(t), we calculate the access probability ζ and the switching probability ξ in the following two propositions.
Proposition 1 (Calculation of Access Probability).
Proof.
The access probability is the probability of \(R\geq { \frac {\Upsilon }{{BW} }}\), which is also shown by the shadow region in Figure 3. Therefore, Equation (25) is obtained and the proposition holds.
Now we estimate the switching probability ξ by asymptotic analysis, in which the sensing period T _{ p } is equally divided into k slim time spans. The situation within one slim time span \( \frac {{T_{p} }}{k} \) is analyzed firstly, and then the period T _{ p } is investigated by considering multiple slim time spans.
1) The case with complete and accurate sensing.
There are R available channels and M−R channels occupied by the PUs in set A _{2}(m) at the beginning of period T _{ p }. The sensing period T _{ p } is equally divided into k parts, in which the parameter k is large enough, so that we can assume that only one single channel’s state is altered during one slim time span.
During the slim time span \( \frac {{T_{p} }}{k} \), the number of available channels R in set A _{2}(m) has three possible situations: increased by one, decreased by one, or unchanged. The probabilities of these three situations are denoted by P _{ up }, P _{ down }, and P _{ hold }, respectively.
where \(P_{\textit {ij}}^{n}\) be the transition probability of channel n from state i to state j during time \(\frac {{T_{p} }}{k}\).
Based on P _{ up } and P _{ down }, we have P _{ hold }=1−P _{ up }−P _{ down }.
According to the above analysis, we obtain the expression of switching probability ξ in the following proposition.
Proposition 2 (Calculation of Switching Probability).
Proof.
During the sensing period T _{ p }, there are H=⌈k(1−P _{ hold })⌉=⌈k(P _{ up }+P _{ down })⌉ alterations of channel state in total, in which we assume that there are l times of decrease and H−l times of increase of the number of available channels
2) The case with partial or inaccurate sensing.
4 Joint spectrum sensing and access: Rollout policy
On basis of the proposed POMDP framework in Section 3, we can derive the optimal spectrum sensing and access scheme. For optimality, the value function V ^{ m }(Δ) is computed by averaging over all possible state transitions and observations. Since the number of system states grows exponentially with the number of channels, the realization of the optimal scheme suffers from the curse of dimensionality and is computationally overwhelming. In this section, we exploit the specific structure of the problem and develop a rolloutbased suboptimal spectrum sensing and access scheme with a much lower complexity.
4.1 Rollout policy
The most essential issue of designing the spectrum sensing and access scheme is the calculation of the value function V ^{ m }(Δ), which is also the most computationally intensive part. To alleviate the complexity, we adopt an approximation technique that can offer an effective and computationsaved solution. Rollout algorithm [15], as an approximate dynamic programming methodology based on policy iteration ideas, has been successfully applied to various domains such as combinatorial optimization [18] and stochastic scheduling [19]. Instead of tracing the accurate value, the rollout algorithm can estimate the value function approximately. By use of Monte Carlo method, the results of a number of randomly generated samples are averaged, and the number of samples is typically smaller than the dimensionality of the total strategy space. When the sample number is large enough, we can obtain a joint spectrum sensing and access scheme with reduced complexity and limited performance loss.
where κ ^{ m }(a) denotes the amount of time slots included in the mth last control interval, namely the reward function which depends on the action choice a.
Here, we propose two different heuristics based on our designing objective, namely BandwidthOriented Heuristics (BOH) and SwitchOriented Heuristics (SOH).
where P _{ i }=Pr{S _{ i }=1}, which can be updated according to A _{1}(m). Intuitively, the wider the available bandwidth is, the better the requirement of SU will be satisfied, and it is less possible to trigger the channel switch in the next time slot. But in this heuristics, the statistics of the PU traffic is not taken into consideration to predict the future dynamic behaviors of the channels.
where the calculation of \(\phantom {\dot {i}\!}p_{\kappa ^{m} }\) includes the operation of prediction on the access probability ζ and the switching probability ξ. Making full use of the dynamic statistics of the channels, SOH is more sophisticated and achieves better performance than BOH.
with the initial condition \(V_{\mathcal {H}}^{0} (\Delta)=0\).
The rollout policy can approximate the value function by the use of the reward of the base policy, and consequently decide the nearoptimal action a ^{ R L }(m). We prove by theoretical deduction that the rollout policy is guaranteed to substantially improve the performance of the heuristics as the base policy.
Proposition 3 (Rollout Improving Property).
Proof.
The proposition is proved by backward mathematical induction.
Hence, the proposition holds for m=T.
Therefore, the property holds for m−1. According to the mathematical induction, the proposition is proved.
4.2 Suboptimal spectrum sensing and access
which indicates expected reward that the SU can accrue during the lifetime of the process from current control interval, and then the rollout action can be expressed as \(a^{RL}(m) = \arg \max \limits _{a \in {{\mathfrak {A}}}}Q_{m}(a)\).
However, as the key point of the rollout policy, the Qfactor may not be known in a closed form, which makes the computation of a ^{ R L }(m) an nontrivial issue [20]. To overcome this difficulty, we adopt a widely used Monte Carlo method [21].
This rolloutbased suboptimal spectrum sensing and access scheme can reduce the computational complexity a lot by estimating the value function approximately rather than tracing the accurate value.
4.3 Robustness via differential training
It is obvious that, in a stochastic environment, the Monte Carlo method of computing the rollout policy is particularly sensitive to the approximation error, which is closely related to the number of trajectories. In this subsection, we adopt differential training [22] in the proposed rolloutbased suboptimal scheme to improve the robustness. In the differential training method, we estimate the relative Qfactor difference rather than absolute Qfactor value, which is a suitable improvement of the recursively generating rollout policy in the context of Monto Carlobased policy iteration methods.
In order to compute the rollout action a ^{ R L }(m)=\( \arg \max \limits _{a \in {{\mathfrak {A}}}}Q_{m}(a)\), the Qfactor differences \(Q_{m}(a_{1})Q_{m}(a_{2}), ~~\forall a_{1},a_{2}\in \mathfrak {A}\) should be computed accurately. By comparing the Qfactor differences with 0, these possible actions can be accurately compared. Unfortunately, in a stochastic environment, the approximation \(\widetilde {Q}_{m}(a)\) fluctuated around the accurate Qfactor value, bigger or smaller than Q _{ m }(a) randomly, as a result of which, the preceding differences computing operation enlarges the approximation error. For example, in the case that a _{1} performs better than a _{2} and thus, Q _{ m }(a _{1}) is definitely bigger than Q _{ m }(a _{2}), which results in Q _{ m }(a _{1})−Q _{ m }(a _{2})>0. However, when using stochastic Monte Carlo method, the approximate \(\widetilde {Q}_{m}(a_{1})\) may be smaller than the accurate value Q _{ m }(a _{1}), and meanwhile \(\widetilde {Q}_{m}(a_{2})\) may be bigger than the accurate value Q _{ m }(a _{2}), which makes it quite possible that \(\widetilde {Q}_{m}(a_{1})\widetilde {Q}_{m}(a_{2})<0\), and this computation result will lead to a fatal error when determining which action is chosen for spectrum sensing and access.
The reference \(\widetilde {Q}_{m}(a^{\mathcal {H}})\) has the same fluctuation monotonicity as \(\widetilde {Q}_{m}(a)\), which is caused by the approximation error due to the limited number of trajectories. We take the same example that a _{1} actually performs better than a _{2} and Q _{ m }(a _{1})>Q _{ m }(a _{2}). If the approximate \(\widetilde {Q}_{m}(a_{1})\) is smaller than the accurate value, so is \(\widetilde {Q}_{m1}(a^{\mathcal {H}})\). Similarly, if \(\widetilde {Q}_{m}(a_{2})\) is larger than the accurate value, so is \(\widetilde {Q}_{m2}(a^{\mathcal {H}})\). Using the differential training operation \(\widetilde {Q}_{m}(a)\widetilde {Q}_{m}(a^{\mathcal {H}})\), the effect of approximation error can be eliminated. Thus, it probably holds that \(\widetilde {Q}_{m}(a_{1})\widetilde {Q}_{m1}(a^{\mathcal {H}}) > \widetilde {Q}_{m}(a_{2})\widetilde {Q}_{m2}(a^{\mathcal {H}})\), consequently the SU will choose the better action a _{1}.
From the above discussion, the approximate Qfactor difference \(\widetilde {Q}_{m}(a)\widetilde {Q}_{m}(a^{\mathcal {H}})\) is more robust than the approximate independent Qfactor \(\widetilde {Q}_{m}(a)\). By the differential training of the rollout policy, the approximation error caused by Monte Carlo method can be reduced a lot and the proposed suboptimal spectrum sensing and access scheme performs more robustly.
5 Simulation results
Simulation configuration
Total number of channels N  10 

Number of sensing channels L  3 
Bandwidth per channel BW  10 MHz 
Aggregation range Γ  40 MHz 
Bandwidth requirement Υ  20 MHz 
Duration of time slot T _{ p }  2 ms 
Probability of false alarm P _{ f }  0.03 
Probability of miss detection P _{ m }  0.08 
In Figure 6, as the number of sensing channels L increases, the numbers of channel switches decrease for the BOH, SOH, BOH and SOHbased rollout, and optimal POMDPbased schemes. When the whole spectrum can be sensed (L/N=1), these schemes achieve their corresponding best performance because the more channels the SU senses, the more information about the system state can be obtained. The spectrum aggregation action determined on the basis of sensing results has better performance in minimizing the expected times of channel switches. For the random access scheme, which determines the access channels without considering the sensing results, the performance does not change with the increase of L.
When L is small, which means that a small number of channels can be sensed, the performances of all schemes are almost the same because the system performance is limited by L in this case. With the increasing of L, the POMDPbased optimal scheme performs the best, and the rolloutbased suboptimal schemes achieve much better performance than the basis heuristics and the random scheme. Especially, the SOHbased rollout scheme achieves a performance gain over the BOHbased rollout scheme, which verifies that the choice of the base policy affects the performance of the corresponding rollout policy. When the heuristic is good, the rollout scheme based on it can achieve relatively better performance. Compared with the optimal POMDPbased scheme, the rolloutbased suboptimal scheme only has a slight performance loss, but makes significant improvement in reducing the computational complexity.
6 Conclusion
In this paper, we investigate the spectrum sensing and access schemes to minimize the channel switching times for achieving stable DSA, taking into consideration the practical limitations of both spectrum sensing and aggregation capability. We develop an POMDP framework for joint spectrum sensing and access. Especially, we derive the reward function by estimation of the stability of different spectrum sensing and access strategies. Based on the POMDP framework, we propose a rolloutbased suboptimal spectrum sensing and access scheme which approximates the value function of POMDP. It is proved that the rollout policy achieves performance improvement over the basis heuristics. By numerical evaluation, we find that with the increase of number of random trajectories, the performance of the proposed rolloutbased scheme gets close to the optimal performance. When the number of random trajectories is large enough, the proposed scheme performs nearoptimally with a lower complexity, which also achieves a significant improvement over the base policy. In the rolloutbased schemes, the basis heuristics affects the performance of its corresponding rollout policy, and the differential training method improves the robustness to the approximation error.
7 Endnotes
^{a} The transition probabilities can be estimated by the statistics of the channel availabilities of two adjacent slots and is assumed to be known by the SUs [23].
^{b} The schemes proposed in this paper can be easily extended to multiple SU cases by adopting the RTS/CTS scheme [24] for the access coordination between SUs.
^{c} Minimizing the expected times of channel switches can also be treated equally as maximizing the throughput with the consideration of the system overhead of channel switches.
Declarations
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (No. 61261130585), National Key Basic Research Program (No. 2012CB316006), National HiTech R&D Program (No. 2014AA01A702), Fundamental Research Funds for the Central Universities, and Open Research Fund of State Key Laboratory of Integrated Services Networks (No. ISN1308).
Authors’ Affiliations
References
 W Wang, Z Zhang, A Huang, Spectrum aggregation: Overview and challenges. Net. Protoc. Appl. 2(1), 184–196 (2010).Google Scholar
 QinetiQ Ltd, A study of the provision of aggregation of frequency to provide wider bandwidth services. Final report for Office of Communications (Ofcom) (2006). http://www.indepen.uk.com/docs/aggregation[1].pdf.
 F Wu, Y Mao, S Leng, et al, A carrier aggregation based resource allocation scheme for pervasive wireless networks. Proc. of IEEE DASC 2011, 196–201 (2011).Google Scholar
 H Shajaiah, A AbdelHadi, C Clancy, Utility proportional fairness resource allocation with carrier aggregation in 4GLTE. Proc. of IEEE Milcom 2013, 412–417 (2013).Google Scholar
 F Liu, K Zheng, W Xiang, et al, Design and performance analysis of an energyefficient uplink carrier aggregation scheme. IEEE J. Sel. Areas Commun. 32(2), 197–207 (2014).View ArticleGoogle Scholar
 J Mitola, G Maguire, Cognitive radio: making software radios more personal. IEEE Pers. Commun. 6(4), 13–18 (1999).View ArticleGoogle Scholar
 W Wang, KG Shin, W Wang, Joint spectrum allocation and power control for multihop cognitive radio networks. IEEE Trans. Mobile Comput. 10(7), 1042–1055 (2011).View ArticleGoogle Scholar
 Q Zhao, L Tong, A Swami, Y Chen, Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: a POMDP framework. IEEE J. Sel. Areas Commmun. 25(3), 589–600 (2007).View ArticleGoogle Scholar
 AA ElSherif, KJR Liu, Joint design of spectrum sensing and channel access in cognitive radio networks. IEEE Trans. Wireless Commum. 10(6), 1743–1753 (2011).View ArticleGoogle Scholar
 W Wang, K Wu, H Luo, G Yu, Z Zhang, Sensing error aware delayoptimal channel allocation scheme for cognitive Radio Networks. Telecommun. Sys. 52(4), 1895–1904 (2013).View ArticleGoogle Scholar
 J Park, P Pawelczak, D Cabric, Performance of joint spectrum sensing and MAC algorithms for multichannel opportunistic spectrum access Ad Hoc networks. IEEE Trans. Mobile Comput. 10(7), 1011–1027 (2011).View ArticleGoogle Scholar
 F Huang, W Wang, H Luo, G Yu, Z Zhang, Predictionbased spectrum aggregation with hardware limitation in cognitive Radio Networks. Proc. of IEEE VTC. 2010 (2010).Google Scholar
 L Wu, W Wang, Z Zhang, L Chen, A POMDPbased optimal spectrum sensing and access scheme for cognitive radio networks with hardware limitation. Proc. of IEEE WCNC 2012 (2012).Google Scholar
 GE Monahan, A survey of partially observable Markov decision processes: Theory, models, and algorithms. Manage. Sci. 28(1), 1–16 (1982).View ArticleMATHMathSciNetGoogle Scholar
 DP Bertsekas, JN Tsitsiklis, Neurodynamic programming: an overview. Proc. of IEEE CDC 1995 (1995).Google Scholar
 R Smallwood, E Sondik, The optimal control of partially observable Markov processes over a finite horizon. Oper. Res, 1071–1088 (1971).Google Scholar
 BV Gendenko, AN Kolmogorov, Limit Distributions for Sums of Independent Random Variables, (AddisonWesley, 1954).Google Scholar
 DP Bertsekas, JN Tsitsiklis, C Wu, Rollout algorithms for combinatorial optimization. J. Heuristics. 3(2), 245–262 (1997).View ArticleMATHGoogle Scholar
 DP Bertsekas, DA Castanon, Rollout algorithms for stochastic scheduling problems. J. Heuristics. 5(1), 89–108 (1998).View ArticleGoogle Scholar
 L Wu, W Wang, Z Zhang, L Chen, A Rolloutbased joint spectrum sensing and access policy for cognitive radio networks with hardware limitation. Proc. of IEEE Globecom 2012 (2012).Google Scholar
 G Tesauro, GR Galperin, Online policy improvement using Monte Carlo search. Proc. of Neural Inf Process. Syst. Conf (1996).Google Scholar
 DP Bertsekas, Differential training of rollout policies. Proc. of Allerton Conference on Communication, Control, and Computing (1997).Google Scholar
 H Kim, KG Shin, Efficient discovery of spectrum opportunities with MAClayer sensing in cognitive radio networks. IEEE Trans. Mobile Comput. 7, 533–545 (2008).View ArticleGoogle Scholar
 G Bianchi, Performance analysis of the IEEE 802.11 distributed coordination function. IEEE J. Sel. Areas Commun. 18(3), 535–547 (2000).View ArticleGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.