MDP-based handover policy in wireless relay systems

Dang, Xiaoyu; Wang, Jin-Yuan; Cao, Zhe

doi:10.1186/1687-1499-2012-358

Research
Open access
Published: 29 November 2012

MDP-based handover policy in wireless relay systems

Xiaoyu Dang¹,
Jin-Yuan Wang² &
Zhe Cao¹

EURASIP Journal on Wireless Communications and Networking volume 2012, Article number: 358 (2012) Cite this article

2649 Accesses
4 Citations
Metrics details

Abstract

Wireless relay transmission has been considered as a supplementary technology for future wireless communication system, and handover is a key element in wireless relay transmissions to support users’ mobility. Handover failure will result in the forced termination of an ongoing call, and a good handover scheme will provide good system performance. This article focuses on the handover decision problem in wireless relay transmission systems. Initially, the architecture of a single-cell relay system is introduced. The handover decision problem is formulated as a Markov decision process (MDP). In the proposed model, a profit function is used to evaluate the quality-of-service of the chosen serving node, and a cost function is used to model processing load and signaling cost. Based on the profit and cost functions, the reward function is formulated. Moreover, the objective is to maximize the expected total reward per connection. The value iteration algorithm is employed to determine the optimal handover policy. Furthermore, the analysis is generalized to multi-cell relay systems. Numerical results show that the proposed MDP-based handover policy is well behaved in wireless relay systems.

1 Introduction

Recently, wireless communications have been developing very rapidly from code division multiple access to orthogonal frequency division multiple access (OFDMA) techniques [1–3]. As the fast growing of users’ demands, the new generation wireless communication systems will provide high data rate and wide coverage. However, due to the restriction of scarce spectrum resources, the spectral efficiency of the new generation wireless communication systems is very limited [4]. The relay transmission, which deploys multiple relay nodes between a source node and a destination node, has attracted much attention in recent years. Due to its great potential to improve spectral efficiency and coverage area, relay transmission has been considered as a supplementary technology for the new generation wireless communication systems. As a result, the IEEE 802.11 Task Group and 802.16 Task Group are actively working on the standardization of relay transmission protocols [5, 6].

The basic idea of wireless relay transmission was introduced by van der Meulen [7], and was comprehensively studied by Cover and El Gamal [8]. So far, wireless relay transmission has been investigated in various aspects, including information-theoretic capacity [9, 10], diversity [11], outage performance [12], network coding [13], and power allocation [14, 15]. It should be noted that handover [16, 17], which provides the seamless mobility, is also an important issue in wireless relay systems. In conventional cellular systems, handover occurs only when a mobile station (MS) moves to different cells or different sectors of the same cell. However, the introduction of relay stations (RSs) in cellular systems creates additional handover scenarios. Figure 1 depicts diverse handover scenarios in wireless relay systems [18]. In scenario 1, an MS performs handover between two different RSs in the same cell. In scenario 2, an MS changes its communication node from a base station (BS) to the RS of the same cell, or vice versa. Scenario 3 is exactly the same as the inter-BS handover in the conventional cellular systems. In scenario 4, an MS performs handover from an RS to the RS of different cells. Scenario 5 is the process of an MS moves from a BS to the RS of different cells. Each mobile connection may experience a number of handovers during its lifetime connection. An efficient handover decision policy is beneficial to the system performance. Therefore, the handover is an important problem in wireless relay transmission systems. Recently, the handover schemes in wireless relay transmission systems have been defined in the base line document for the draft standard of IEEE 802.16j [6].

The handover policies for wireless communications have been reported in some literatures. Stemm and Katz [19] proposed a vertical handover scheme, where the handover decisions only depend on the presence or absence of beacon packets. A series of researches about handover based on the received signal strength have been carried out. In [20], the handover algorithms based on the received signal strength were reviewed, and advanced techniques (such as hypothesis testing, dynamic programming, and pattern recognition based on neural networks or fuzzy logic) were mentioned. An adaptive timer handoff (ATHO) algorithm was proposed in [21], which was based on the received signal strength with a hysteresis timer to adapt the ping-pong effect in mobile communication systems. In [22], a practical approach based on GSM measurement data was employed, and a handover algorithm was proposed to improve the handover performance. Jiang et al. [23] proposed a novel scheme using uplink and downlink signals for intra multi-hop relay BS handover with transparent RS. In [24], an optimization handover scheme for mobile users was proposed to minimize the delay and maximize the throughput by employing a dwell-timer. Although all above works have investigated the handover problems in wireless systems, the link quality conditions are only taken as the handover decision criteria, and the quality-of-service (QoS) and the processing load during handovers are not considered. To the best of the authors’ knowledge, few previous studies take the QoS and processing load together as the handover criteria in wireless relay systems.

In this article, we focus this study on a handover policy in wireless relay systems. At first, a relay transmission system with a single cell is introduced. In order to obtain better system performance, the adaptive modulation and coding schemes are employed. Furthermore, the available transmission rates for both the direct transmission and relay transmission links have been analyzed. Due to the mobility of MSs, a handover decision problem is then considered, which is formulated as a Markov decision process (MDP). After that, a profit function is used to evaluate the QoS of the chosen serving node, which represents the benefit that the MS can gain by choosing a serving node (i.e., the BS or some RS). The cost function is also considered, which captures the processing and signal load incurred when the connection switch the current serving node to another. Based on the profit and cost functions, the reward function is formulated. The objective of the handover problem is to maximize the expected total reward per connection. An optimal MDP-based handover decision policy is proposed by employing the value iteration algorithm (VIA). Furthermore, the proposed MDP-based handover policy is generalized to multi-cell relay systems.

The remainder of this article is organized as follows. The system model is described in Section 2. Section 3 formulates the handover decision process as an MDP, after that the optimality equations and the VIA are proposed. Extensions to multiple cell systems are discussed in Section 4. Numerical results are presented in Section 5 before conclusions are drawn in Section 6.

2 System model

Consider the downlink transmission of a single-cell relay system, which is composed of one BS, M RSs, and multiple MSs, as shown in Figure 2. The BS can communicate with MSs directly or with the help of one RS. Depending on whether the BS communicates with MSs directly or with the help of RS, the transmissions can be classified into direct transmission and relay transmission. Moreover, the transmit power of BS (RS) is equally distributed to its serving MSs. Here, each MS is assumed to operate in a pre-assigned orthogonal channel, such as non-overlapping time/frequency slot. This assumption holds for most practical systems such as time division multiple access, frequency division multiple access (FDMA), and OFDMA systems [25]. Therefore, no inter-channel interferences exist among all MSs. In order to maximize system performance, the adaptive modulation and coding schemes are employed. Assuming that the BS or RSs can transmit data to the MS with W − 1 nonzero fixed transmission rates, which can be denoted as R⁽¹⁾,…,R^(W−1). Without loss of generality, we suppose that R^(W−1) > ⋯ > R⁽¹⁾ > 0. If zero is regarded as one special transmission rate, the set of downlink transmission rates can be denoted as Λ = {R^(0}R⁽¹⁾,…,R^(W−1)), where R⁽⁰⁾ = 0. To guarantee the receiving accuracy of certain downlink transmission rate, it is expected that the received signal-to-noise ratio (SNR) must be no less than the target SNR. For a downlink transmission rate R^(ω)(ω=0,1,…,W−1), the corresponding target SNR is denoted by γ^(ω). Obviously, these target SNRs should satisfy

0 = γ^{(0)} < γ^{(1)} < \dots < γ^{(W - 1)} < γ^{(W)} ≜ \infty

(1)

When the received SNR γ∈[γ^(ω),γ^{(ω + 1)}), the transmission rate can be determined by R^(ω) at most. Mathematically, given a received SNR γ, the transmission rate r can be obtained by

r = ξ (γ) ≜ \sum_{ω = 0}^{W - 1} R^{(ω)} [U (γ - γ^{(ω)}) - U (γ - γ^{(ω + 1)})]

(2)

where U(t−t₀) is the Heaviside unit step function defined by

U (t - t_{0}) = \{\begin{matrix} 1, & for t \geq t_{0} \\ 0, & for t < t_{0} \end{matrix}

(3)

Note that the relationship between the received SNR γ and the transmission rate r can be seen in Figure 3.

For the direct transmission case, an MS can receive signal from the BS directly. The received downlink signal y₀ can be expressed as

y_{0} = \sqrt{\frac{P_{0}}{N_{0}} G_{0}} x_{0} + z_{0}

(4)

where P₀ denotes the total transmit power of the BS, N₀ is the number of MSs served by the BS, x₀ is the transmitted information symbol with unit energy at the BS, G₀ is the channel gain between the BS and the MS, and z₀is the additive white Gaussian noise. Without loss of generality, we assume that the noise level is the same for all links and is denoted by σ². Then, the directly received SNR γ₀ can be expressed as

γ_{0} = \frac{P_{0} G_{0}}{N_{0} σ^{2}}

(5)

Therefore, in this case, the maximum transmission rate r₀ can be obtained by

r_{0} = ξ (γ_{0})

(6)

For the relay transmission case, we concentrate on the BS–RSm–MS link. In this case, the transmission consists of two phases. In phase 1, the BS first transmits the information to RS m, which is selected to relay the information. Therefore, the received signals at RS m can be expressed as

y_{m}^{(1)} = \sqrt{\frac{P_{0}}{N_{0}^{(1)}} G_{m}^{(1)}} x_{m}^{(1)} + z_{m}^{(1)}

(7)

where $N_{0}^{(1)}$ is the number of the RSs served by the BS in phase 1, $G_{m}^{(1)}$ is the channel gain between the BS and RS m, $z_{m}^{(1)}$ is the received additive white Gaussian noise. Therefore, the received SNR at RS m can be derived as

γ_{m}^{(1)} = \frac{P_{0} G_{m}^{(1)}}{N_{0}^{(1)} σ^{2}}

(8)

Therefore, the transmission rate $r_{m}^{(1)}$ on the link between the BS and RS m can be obtained by

r_{m}^{(1)} = ξ (γ_{m}^{(1)})

(9)

In phase 2, amplify-and-forward relay transmission strategy is employed [26], that is, RS m amplifies $y_{m}^{(1)}$ with transmission power $P_{m} /N_{m}^{(2)}$ and forwards it to the MS, where P_m is the total transmit power of RS m and $N_{m}^{(2)}$ is the numbers of MSs served by RS m in phase 2. Accordingly, the received signal of the MS can be expressed as

y_{m}^{(2)} = \sqrt{\frac{P_{m}}{N_{m}^{(2)}} G_{m}^{(2)}} x_{m}^{(2)} + z_{m}^{(2)}

(10)

where

x_{m}^{(2)} = \frac{y_{m}^{(1)}}{|y_{m}^{(1)}|}

(11)

is the unit-energy transmitted signal that RS m received from the BS in phase 1, $G_{m}^{(2)}$ is the channel gain between RS m and the MS, and $z_{m}^{(2)}$ is the received noise at the MS. Then, in this phase, the received SNR at the MS can be expressed as

γ_{m}^{(2)} = \frac{P_{m} G_{m}^{(2)}}{N_{m}^{(2)} σ^{2}}

(12)

Accordingly, the transmission rate $r_{m}^{(2)}$ in phase 2 can be obtained by

r_{m}^{(2)} = ξ (γ_{m}^{(2)})

(13)

Finally, the available transmission rate on the BS–RSm–MS link can be determined as

r_{m} = \frac{1}{2} min (r_{m}^{(1)}, r_{m}^{(2)})

(14)

where the coefficient 1/2 is due to the fact that cooperative transmission uses half of the resources (e.g., time slots, frequency bands, orthogonal codes).

Due to the mobility, each MS will monitor the whole system all the time and has the choices of communicating directly with the BS or with the help of one RS, which involves the handover decision problems. The following section will solve this handover problem using MDP.

3 MDP-based handover decision

MDP, also referred to as stochastic dynamic programs or stochastic control problems, are models for sequential decision making when outcomes are uncertain. The MDP model consists of decision epochs, states, actions, rewards, and transition probabilities [27]. Choosing an action in a state generates a reward and determines the state at the next decision epoch through a transition probability function. Policies or strategies are prescriptions of which action to choose under any eventuality at every future decision epoch. Decision makers seek policies which are optimal in some sense. MDP has already been successfully used to solve a variety of problems, including finance [28], admission control [29], mobility-related issues in the areas of mobile communications, and in wireless sensor networks [30, 31]. Therefore, MDP is a promising candidate in solving the handover control problems in wireless relay systems.

In the following, we will describe how the handover decision problem is formulated as an MDP. Then, the optimality equations and the VIA are introduced. To facilitate the description, in the following, the BS and all RSs are also called as serving nodes. Specially, the serving node 0 refers to the BS and the serving node m (m =1,…,M) refers to RS m.

3.1 Decision epoch, action, and state

Referring to Figure 4, the sequence T = {1,2,…,Q} represents the time of successive decision epochs, and the random variable Q denotes the time that the connection terminates. At each decision epoch, the MS has to decide whether the connection should use the current serving node or choose another serving node based on its current state. In this article, the state space of an MS is denoted by S, and the number of states that an MS can possibly be in is finite. The state of the MS contains information such as the current serving node that the MS connects to, and the available transmission rates that all serving nodes offer. Specifically, the state space can be expressed as follows

S = \{0, 1, 2, \dots, M\} \times R_{0} \times R_{1} \times R_{2} \times \dots \times R_{M}

(15)

where “×” denotes the Cartesian product, {0,1,2,…,M} denotes the set of current serving nodes that the MS connects to, R₀denotes the set of the available transmission rates on the direct transmission link, and R_m(m=1,…,M) denotes the set of the available transmission rates on the BS–RSm–MS link. Obviously, it can be known that R₀ = Λ. Furthermore, according to Equation (14), R_mcan be derived as

R_{m} = \{r| 2 r \in Λ\}, \forall m \in \{1, 2, \dots, M\}

(16)

Based on the current state, the MS has to choose an action at each decision epoch. Considering the MS can only select a serving node from the BS and all RSs, the action set can be defined as $A ≜ \{0, 1, 2, \dots, M\}$ , where A consists of all the serving nodes that MS can handover to.

3.2 Transition probability

Let vector s (t) = [i,r₀,r₁,…,r_M] denote the state of the MS at the t th decision epoch, where i denotes the current chosen serving node, r₀denotes the current available transmission rate on the direct transmission link, r_m(m=1,…,M) denotes the current available transmission rate on the BS–RSm–MS link. Based on the current state s(t), if the chosen action is a(t)=j, the probability transition function that the next state $s (t + 1) = [j, r_{0}^{'}, r_{1}^{'}, \dots r_{M}^{'}]$ is given by

P [s (t + 1) |s (t), a (t)] = \{\begin{matrix} \prod_{m = 0}^{M} P [r_{m}^{'} |r_{m}, a (t)], if a (t) = j \\ 0, otherwise \end{matrix},

(17)

where $P [r_{m}^{'} |r_{m}, a (t)]$ denotes the transition probability of the transmission rate of the serving node m. Although all serving nodes are collected in the system, the transmission rate of each serving node is only rely on the number of users supported by the serving node and the channel gain between the node and the MS. Therefore, the transition probability of the transmission rate of each serving node in Equation (17) is assumed to be independent with each other. Specially, if the values of the transmission rates are guaranteed for the duration of the connection, the transition probability $P [r_{m}^{'} |r_{m}, a (t)]$ in this special case can be expressed as

P [r_{m}^{'} |r_{m}, a (t)] = \{\begin{matrix} 1, & if r_{m}^{'} = r_{m} \\ 0, & otherwise \end{matrix}

(18)

Note that the functions (17) and (18) are Markovian because the state transition probability only depends on the current state and action but not on the previous states.

3.3 Reward function

Let z(s(t),a(t)) denote the reward that the MS receives after the t^thdecision epoch. In this section, the reward function is defined as

z (s (t), a (t)) = f (s (t), a (t)) - g (s (t), a (t))

(19)

where f (s (t) , a (t)) is the profit function and reflects the QoS provided by the chosen serving node at the t^th decision epoch. g (s (t) , a (t)) denotes the cost function, which captures the processing and signal load incurred when the connection switch the current serving node to another.

Given the current state s (t) = [i,r₀,r₁,…,r_M], where i denotes the current serving node used by the connection, r_mdenotes the available rate provided by serving node m. At each decision epoch, based on the current state s(t), the MS should choose an action a(t)∈A. Therefore, the profit function can be defined as

f (s (t), a (t)) = \{\begin{array}{l} 1, if r_{i} = \max_{k \in \{0, 1, \dots, M\}} \{r_{k}\}, a (t) = i, \\ 0, if r_{i} = max_{k \in \{0, 1, \dots, M\}} \{r_{k}\}, a (t) \neq i, \\ \frac{r_{a (t)} - r_{i}}{max_{k \in \{0, 1, \dots, M\}} \{r_{k} - r_{i}\}}, if r_{i} \neq max_{k \in \{0, 1, \dots, M\}} \{r_{k}\}, r_{a (t)} > r_{i} \\ 0, if r_{i} \neq max_{k \in \{0, 1, \dots, M\}} \{r_{k}\}, r_{a (t)} \leq r_{i} \end{array}

(20)

The profit function (in terms of transmission rate) can be assessed as follows. Given that the MS is currently connecting to serving node i, if serving node i is the one which supports the highest rate among others and the chosen action a (t) = i, the profit is set to be 1, otherwise, the profit is set to be 0. However, when serving node i is not the one which supports the highest rate, the profit that it can obtain is represented by a fraction, in which the numerator is the MS’s actual increase of rate by choosing action a (t) in state s (t), and the denominator is the MS’s maximum possible increase of rate.

The cost function g (s (t),a (t)) is defined as

g (s (t), a (t)) = \{\begin{matrix} K_{i, a (t)}, if a (t) \neq i \\ 0, if a (t) = i \end{matrix}

(21)

It can be known from Equation (21) that, when the new serving node a (t) is the same as the current serving node i (i.e., a (t) = i), there is no handover happens, and thus the cost is set to be zero. However, when the new serving node a (t) is different from the current serving node i (i.e., a (t) ≠ i), the handover will happen, and the cost in this case is set to be K_i,a(t). Note that K_i,a(t) denotes the switching cost from the current serving node i to the new serving node a (t). The value of K_i,a(t) depends on several factors, such as the types of handover and the current traffic load on the serving node.

3.4 Expected total reward

A decision rule prescribes a procedure for action selection in each state at a specified decision epoch. Deterministic Markov decision rules are functions δ (t) : S → A, which specify the action choice a (t) when the system occupies state s (t) at the t th decision epoch. A policy Π = (δ (1) , δ (2) ,…,δ (Q)) is a sequence of decision rules to be used at all decision epochs.

Let v^Π(s(0)) denote the expected total reward from the first decision epoch until the handover decision period elapses, given that the policy Π is used with an initial state s(0). We have

v^{Π} (s (0)) = E_{s (0)}^{Π} [E_{Q} \{\sum_{t = 1}^{Q} z (s (t), a (t))\}]

(22)

where $E_{s (0)}^{Π}$ denotes the expectation with respect to policy Π and initial state s(0), and E_Qdenotes the expectation with respect to random variable Q. Note that a different policy Π and initial state s(0) will change the chosen action a(t). This will also cause a different transition probability function P [ s (t + 1])| s (t), a (t)| to be used in the expectation $E_{s (0)}^{Π}$ . The random variable Q, which denotes the connection termination time, is assumed to be geometrically distributed with mean 1/(1−λ)[32]. That is

P (Q = q) = λ^{q - 1} (1 - λ), \forall n = 1, 2, 3, \dots

(23)

Therefore, Equation (22) can be further written as

v^{Π} (s (0)) = E_{s (0)}^{Π} \{\sum_{q = 1}^{\infty} \sum_{t = 1}^{q} z (s (t), a (t)) λ^{q - 1} (1 - λ)\}

(24)

Since $\sum_{q = 1}^{\infty} \sum_{t = 1}^{q} (\cdot) = \sum_{t = 1}^{\infty} \sum_{q = t}^{\infty} (\cdot)$ , by interchanging the order of the summation, we have

\begin{array}{l} v^{Π} (s (0)) = E_{s (0)}^{Π} \{\sum_{t = 1}^{\infty} \sum_{q = t}^{\infty} z (s (t), a (t)) λ^{q - 1} (1 - λ)\} \\ = E_{s (0)}^{Π} \{\sum_{t = 1}^{\infty} λ^{t - 1} z (s (t), a (t))\} \end{array}

(25)

where λ∈[0,1) is the discount factor of the model.

Our optimization problem is to maximize the expected total discounted reward. The Π^∗ is defined to be the optimal policy in π if $v^{Π^{*}} \geq v^{Π}$ for all Π ∈ π. A policy is said to be stationary if δ (t) = δ for all t. A stationary policy has the form Π = (δ,δ,…,δ). For convenience, Π is denoted as δ. Our objective is to determine an optimal stationary deterministic policy δ^∗, which maximizes the expected total discounted reward given by Equation (25).

3.5 Optimality equations and the VIA

In this section, the optimality equations which maximize the expected total reward are proposed, and following that, a VIA is used to determine a stationary optimal handover policy.

Let v(s) denote the maximum expected total reward, given the initial state s. That is

v (s) = max_{Π \in π} v^{Π} (s)

(26)

Referring to [27], the optimality equations can be written as

v (s) = max_{a \in A} \{z (s, a) + \sum_{s^{'} \in S} λ P [s^{'}| s, a] v (s^{'})\}

(27)

where s^′ denotes the next state while we select action a in state s. The solutions of the optimality equations correspond to the maximum expected total reward v(s) and the MDP optimal policy δ^∗(s). Note that the MDP optimal policy δ^∗(s) indicates the decision as to which node to choose from, given that the current state is s.

The VIA [27, 32] is employed in this article to determine a stationary deterministic optimal policy and the corresponding expected total reward. The stepwise procedures of the VIA algorithm are described as follows:

Step (1): Set v⁰(s) = 0 for each state s. Specify ε>0 and set k=0 .
Step (2): For each state s, compute v^{k + 1}(s) by using $v^{k + 1} (s) max_{α ε A} \{z (s, a) + \sum_{s^{'} ε^{S}} λ P [s^{'} | s, a] v^{k} (s^{'})\}$
Step (3): If ∥v^{k + 1}−v^k∥<ε(1−λ)/2λ thenGo to Step (4).ElseLet k=k + 1, and return to Step (2).EndIf
Step (4): For each state sεS, compute the stationary optimal policy by using $δ (s) = argmax \{z (s, a) + \sum_{s^{'} ε^{S}} λ P [s^{'} | s, a] v^{k + 1} (s^{'})\}$
Step (5): Stop.

In the VIA, the normal function ∥·∥ is defined as ∥v∥ = max|v(s|) for s ∈ S. Note that the VIA is convergent because the operation in Step (2) corresponds to a contraction mapping. Each iteration of the VIA is performed in O (|A)|S|²|, i.e., the convergence rate of the VIA is linear.

A stationary deterministic optimal policy table can be created according to this algorithm. The MSs are assumed to periodically receive information from the serving nodes. The advertised information from each serving node may include, among other parameters, the achieved transmission rate and the handover cost. In each time of period, the MSs decide whether the connection should use the current serving node or reroute to another serving node according to the stationary deterministic optimal policy table.

4 Extensions to multiple cell systems

The proposed MDP-based handover policy can be generalized to multi-cell systems. Assuming that there are E cells in the system, and each cell has one BS and N RSs. Let us define B_e(e=1,2,…,E) as the BS in cell e, and define $R_{n}^{(e)} (e = 1, 2, \dots, E, n = 1, 2, \dots, N)$ as the RS n in cell e. Then, the directly received signal-to-interference-plus noise ratio (SINR) provided by BS e can be expressed as

γ_{B_{e},M} = \frac{P_{B_{e}} G_{B_{e},M}}{N_{B_{e}} (σ^{2} + I_{B_{e},M})}

(28)

where $P_{B_{e}}$ is the total transmit power of BS e, $N_{B_{e}}$ is the number of nodes (i.e., MSs in direct transmission case) served by BS e, and $G_{B_{e},M}$ is the channel gain from BS e to the MS, $I_{B_{e},M}$ denotes the inter-cell interference (ICI). Therefore, in this case, the actual available maximum direct transmission rate $r_{B_{e},M}, (e = 1, 2, \dots, E)$ can be obtained by

r_{B_{e},M} = ξ (γ_{B_{e},M})

(29)

For the relay transmission case, the SINR at RS n of cell e in phase 1 can be derived as

γ_{B_{e} {,R}_{n}^{(e)}}^{(1)} = \frac{P_{B_{e}} G_{B_{e} {,R}_{n}^{(e)}}}{N_{B_{e}} (σ^{2} + I_{B_{e} {,R}_{n}^{(e)}}^{(1)})}

(30)

where $N_{B_{e}}$ is the number of nodes (i.e., RSs in phase 1 of the relay transmission case) served by BS e, $G_{B_{e} {,R}_{n}^{(e)}}$ is the channel gain between the BS and RS n in cell e, and $I_{B_{e} {,R}_{n}^{(e)}}^{(1)}$ is the ICI. Then, the transmission rate $r_{B_{e} {,R}_{n}^{(e)}}^{(1)}$ on the BS- $R_{n}^{(e)}$ link in cell e can be obtained by

r_{B_{e} {,R}_{n}^{(e)}}^{(1)} = ξ (γ_{B_{e} {,R}_{n}^{(e)}}^{(1)})

(31)

Similarly, in phase 2, the received SINR on the $R_{n}^{(e)}$ -MS link in cell e can be expressed as

γ_{B_{e} {,R}_{n}^{(e)}}^{(2)} = \frac{P_{R_{n}^{(e)}} G_{R_{n}^{(e)}, M}}{N_{R_{n}^{(e)}} (σ^{2} + I_{B_{e} {,R}_{n}^{(e)}}^{(2)})}

(32)

where $P_{R_{n}^{(e)}}$ is the total transmit power of RS n in cell e, $N_{R_{n}^{(e)}}$ is the number of MSs served by RS n in cell e, $G_{R_{n}^{(e)}, M}$ and $I_{B_{e} {,R}_{n}^{(e)}}^{(2)}$ are the channel gain and ICI from RS n to the MS in cell e. Then, the transmission rate $r_{B_{e} {,R}_{n}^{(e)}}^{(2)}$ on the $R_{n}^{(e)}$ -MS link in cell e can be obtained by

r_{B_{e} {,R}_{n}^{(e)}}^{(2)} = ξ (γ_{B_{e} {,R}_{n}^{(e)}}^{(2)})

(33)

Therefore, the available rate of the MS on the BS_e– $R_{n}^{(e)}$ –MS link can be expressed as

r_{B_{e} {,R}_{n}^{(e)}} = \frac{1}{2} min (r_{B_{e} {,R}_{n}^{(e)}}^{(1)}, r_{B_{e} {,R}_{n}^{(e)}}^{(2)})

(34)

According to MDP, the action set can be defined as $A ≜ \{B_{e}, R_{n}^{(e)} (e = 1, \dots, E, n = 1, \dots, N)\}$ , and the state space can be expressed as

\begin{array}{l} S = \{B_{1}, R_{1}^{(1)}, \dots {,R}_{N}^{(1)}, \dots, B_{E}, R_{1}^{(E)}, \dots, R_{N}^{(E)}\} \\ \times R_{B_{1}} \times R_{R_{1}^{(1)}} \times \dots \times R_{R_{N}^{(1)}} \times \dots \times R_{B_{E}} \times R_{R_{1}^{(E)}} \dots \times R_{R_{N}^{(E)}} \end{array}

(35)

where $R_{B_{e}} (e = 1, 2, \dots, E)$ denotes the set of the available transmission rate of the BS in cell e, and $R_{R_{n}^{(e)}} (e = 1, 2, \dots, E, n = 1, 2 \dots, N)$ denotes the set of the available transmission rate of RS n in cell e.

Similar to the single-cell system, the handover decision policy can be determined by VIA. Note that, the introduction of multiple-cell systems will result in the increase of the state space of MDP. Obviously, computational complexity will increase accordingly.

5 Numerical results

In order to evaluate the performance, it is desired to compare our proposed MDP-based handover policy with other policies through computer simulations. Unfortunately, to the best of the authors’ knowledge, few previous studies take the QoS and processing load together as the handover criteria in wireless relay systems. Here, a nearest distance handover policy and a biggest channel gain handover policy are taken as the comparative policies. In the nearest distance handover policy, each MS chooses the nearest serving node at each decision epoch. In the biggest channel gain handover policy, the serving node to be selected in each decision epoch is the one that has the biggest channel gain. Then, under the same simulation environments, the numerical results of the proposed MDP-based handover policy are compared with the nearest distance handover policy and the biggest channel gain handover policy.

A single-cell wireless relay system is used as the test system. In the system, all serving nodes and MSs are located in a square area with 2000-m side length. Specifically, the BS is located in the center of the cell and its coordinate is set to be (0 m, 0 m). To simplify the dimension of the state space, four RSs are configured in the cell, which are located at (600 m, 0 m), (0 m, 600 m), (−600 m, 0 m), and (0 m, −600 m), respectively. The initial positions of all MSs are randomly generated in the cell. A mobility pattern is employed for each MS, i.e., the speed of each MS follows uniform distribution in the range of [0,3] m/s, and the direction of each MS is uniformly distributed in the range of [0,2Π. To ensure that each MS always locates in the coverage of the cell, once the MS arrives at the boundary of the cell, we let the MS moves along the reverse direction of the current direction. Referring to [33], the channel gain is modeled as Y/L(l), where Y accounts for the loss of shadow fading and follows a lognormal distribution with variance $σ_{s}^{2} = 10 dB$ . L(l) is the path loss between the transmitter and the receiver, and can be calculated from 10_log10L(l)=128. 1 + 37. 6_log10l, where l (in kilometers) represents the distance between the transmitter and the receiver. The background noise level is set to be −104 dBm. In the simulation, the set of the modulation and coding schemes consists of 1/2-BPSK, 1/2-QPSK, 2/3-QPSK, 5/6-QPSK, 1/2-16QAM, and 2/3-16QAM, where their thresholds of SNR maintaining the target BER of 10⁻³can be found in Table 1[34]. Then, the set of downlink transmission rates can be denoted as

Λ = \{0, \frac{1}{2} c, c, \frac{4}{3} c, \frac{3}{2} c, \frac{5}{3} c, 2 c, \frac{8}{3} c\}

(36)

where the symbol rate c is set to be 640 ksymbols/s. Moreover, we use a simulation-based method to estimate the state transition probability. For more detailed description of this method, the readers can refer to [32]. The other simulation parameters of the system are summarized in Table 2. In the following, the system transmission rates (i.e., the average transmission rates of every MS) and the expected total reward per connection will be used as the performance metrics.

Table 1 The SNR threshold of different modulation and coding schemes maintaining the target BER of 10 ⁻³

Full size table

Table 2 The parameters of the relay system

Full size table

Figures 5, 6, and 7 illustrate the system transmission rates of the three policies in different scenarios. Specifically, Figure 5 plots the system transmission rates versus the transmit power of the BS, where the transmit power of each RS is set to be 10 mW and the number of MSs in the cellular is set to be 100. It can be observed that, with the increase of the transmit power of the BS, the received SNR of each MS will also increase. Accordingly, a better adaptive modulation and coding scheme will be selected, which results in a greater system transmission rate. Moreover, it can be found that the transmission rates of the proposed MDP-based handover policy are always greater than that of the nearest distance handover policy and the biggest channel gain handover policy, actually, about more than 25 and 10%, respectively. That is to say, the performance of handover has been improved by employing the proposed MDP-based handover policy in wireless relay systems.

Figure 6 shows the relationship between the system transmission rates and the variance of shadow fading. In this simulation, there are 100 MSs in the system, the transmit power of each RS is fixed to be 10 mW, the transmit power of the BS is set to be 1 W, and the variance of the shadow fading is increasing from 4dB to 12 dB. It can be observed from Figure 6 that the system transmission rates decrease with the increase of the variance of the shadow fading. The reason is that, when the variance of the shadow fading increases, the channel fading becomes worse, and then the received SNR of each MS will decrease. Accordingly, the system transmission rates will be affected. Moreover, compared with the other two policies, the proposed MDP-based handover policy can consistently achieve the biggest transmission rate, which indicates that the proposed policy is well behaved.

Figure 7 depicts the system transmission rates versus the number of MSs, where the transmit power of the BS is set to be 1 W and the transmit power of each RS is set to be 10 mW. In Figure 7, the transmission rates of the three policies are shown to be decreased with the increase of the number of the MSs. This is due to the fact that, the power resource will be allocated to more MSs with the increase of the number of the MSs. Once again, it should be noted that the proposed MDP-based handover policy always outperforms the other two policies in terms of the system transmission rate, which also proves that the proposed MDP-based handover policy is well behaved.

Figures 8 and 9 illustrate the expected total rewards of the three policies in different scenarios. Specifically, Figure 8 shows the expected total reward versus the switching cost between every two different serving nodes (i.e., K_i,a(t),a (t) ≠ i). In Figure 8, with the increase of the switching cost, the expected total rewards of all three policies decrease. This indicates that the switching cost provides flexibility for the system operators. In other words, small values of the switching cost can be set among serving nodes with light signaling load, while large values of the switching cost can temporarily be set for overload serving nodes as a load-balancing technique to decrease the traffic load in the system. Moreover, the proposed MDP-based handover policy always keeps the biggest expected total reward compared to the other two policies, which means the proposed MDP-based handover policy achieves the best performance in terms of the expected total reward.

Figure 9 shows the expected total reward versus the discount factor λ. It can be seen that the expected total rewards of the three policies increase with the increase of the discount factor. This conclusion can directly be derived from Equation (25) or (27). Moreover, compared to the other two policies, the MDP-based handover policy can consistently provide the highest expected total reward per connection for all values of λ. For example, when the discount factor is 0.9, the MDP-based handover policy can obtain 10 and 4% more total expected total reward than the nearest distance handover policy and the biggest channel gain handover policy, respectively. Once again, it should be noted that the proposed MDP-based handover policy is well behaved in wireless relay systems.

6 Conclusion

This article has focused on the handover decision problem in wireless relay transmission systems. The architecture of the single-cell relay system is first introduced. Then, the handover decision problem is formulated as an MDP. A profit function is used to evaluate the QoS of the chosen serving node, and a cost function is used to model the signaling cost. Based on the two functions, a reward function is formulated. After that, in order to maximize the expected total reward per connection, a stationary deterministic optimal policy is obtained by employing the VIA. Furthermore, we have shown that the proposed MDP-based handover policy can be generalized to multiple-cell systems. Numerical results show that the MDP-based handover policy proposed in this article is well behaved in solving the handover problem of wireless relay systems.

Abbreviations

BS:: Base station
FDMA:: Frequency division multiple access
MDP:: Markov decision process
MS:: Mobile station
OFDMA:: Orthogonal frequency division multiple access
QoS:: Quality-of-service
RSs:: Relay stations
SINR:: Signal-to-interference-plus-noise ratio
SNR:: Signal-to-noise ratio.

References

Wang J, Chen J: Performance of wideband CDMA systems with complex spreading and imperfect channel estimation. IEEE J. Sel. Areas Commun 2001, 19(1):152-163. 10.1109/49.909617
Article Google Scholar
Wang J-B, Chen M, Wan X, Wei C: Ant-colony-optimization-based scheduling algorithm for uplink CDMA nonreal-time data. IEEE Trans. Veh. Technol 2009, 58(1):231-241.
Article Google Scholar
Zhu H, Wang J, Chunk-based resource allocation in OFDMA systems—Part: I chunk allocation: IEEE Trans. Commun. 2009, 57(9):2734-2744.
Article Google Scholar
Wang J-B, Chen M: Optimal uplink packet scheduling for CDMA non-real time data. IEE Electron. Lett 2007, 43(14):764-766. 10.1049/el:20070786
Article Google Scholar
IEEE: Draft Standard for Local and Metropolitan Area Networks—Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems: Multihop Relay Specification. IEEE Standard 2007. 2007. 4312731
Google Scholar
IEEE: Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems: Multihop Relay Specification, IEEE P802.16j Base Line Document. 2007.
Google Scholar
van der Meulen EC, Three-terminal communication channels: Adv. Appl. Probab. 1971, 3(1):120-154. 10.2307/1426331
Article Google Scholar
Cover TM, El Gamal AA: Capacity theorems for the relay channel. IEEE Trans. Inf. Theory 1979, IT-25(5):572-584.
Article MathSciNet MATH Google Scholar
Lei X, Li W: Exact closed-form expression for ergodic capacity of amplify-and-forward relaying in channel-noise-assisted cooperative networks with relay selection. IEEE Commun. Lett 2011, 15(3):332-333.
Article MathSciNet Google Scholar
Ding Y, Zhang J-K, Wong KM: Ergodic channel capacities for the amplify-and-forward half-duplex cooperative systems. IEEE Trans. Inf. Theory 2009, 55(2):713-730.
Article MathSciNet Google Scholar
Kwon U-K, Choi C-H, Im G-H: Full-rate cooperative communications with spatial diversity for half-duplex uplink relay channels. IEEE Trans. Wirel. Commun 2009, 8(11):5449-5454.
Article Google Scholar
Wang J-B, Jiao Y, Chen M, Wang J-Y, Cao Z: Multi-hop free space optical communications using serial decode-and-forward relay transmissions. China Commun 2011, 8(5):102-110.
Google Scholar
Ding Z, Leung KK, Goeckel DL, Towsley D: On the study of network coding with diversity. IEEE Trans. Wirel. Commun 2009, 8(3):1247-1259.
Article Google Scholar
Luo J, Blum RS, Cimini LJ, Greenstein LJ, Haimovich AM: Decode-and-forward cooperative diversity with power allocation in wireless networks. IEEE Trans. Wirel. Commun 2007, 6(3):793-799.
Article Google Scholar
Ng TC-Y, Yu W: Joint optimization of relay strategies and resource allocations in cooperative cellular networks. IEEE J. Sel. Areas Commun 2007, 25(2):328-339.
Article Google Scholar
Pabst R, Walke BH, Schultz DC, Herhold P, Yanikomeroglu H, Mukherjee S, Viswanathan H, Lott M, Zirwas W, Dohler M, Aghvami H, Falconer DD, Fettweis GP: Relay-based deployment concepts for wireless and mobile broadband radio. IEEE Commun. Mag 2004, 42(9):80-89. 10.1109/MCOM.2004.1336724
Article Google Scholar
Wu H, Qiao C, De S, Tonguz O: Integrated cellular and ad hoc relaying systems: iCAR. IEEE J. Sel. Areas Commun 2001, 19(10):2105-2115. 10.1109/49.957326
Article Google Scholar
Cho S, Jang EW, Cioffi JM: Handover in multihop cellular networks. IEEE Commun. Mag 2009, 47(7):64-73.
Article Google Scholar
Stemm M, Katz RH: Vertical handoffs in wireless overlay networks. ACM/Baltzer Mobile Netw. Appl 1998, 3(4):335-350. 10.1023/A:1019197320544
Article Google Scholar
Pahlavan K, Krishnamurthy P, Hatami A, Ylianttila M, Makela JP, Pichna R, Vallstron J: Handoff in hybrid mobile data networks. IEEE Personal Commun 2000, 7(2):34-47. 10.1109/98.839330
Article Google Scholar
Huang Y-F, Gao F-B, Hsu H-C: Performance of an adaptive timer-based handoff scheme for wireless mobile communications. In Proceedings of the 10th WSEAS International Conference on Signal Processing, Computational Geometry and Artificial Vision. Stevens Point, Wisconsin, USA; 2010:46-51.
Google Scholar
Lin H-P, Juang R-T, Lin D-B: Validation of an improved location-based handover algorithm using GSM measurement data. IEEE Trans. Mobile Comput 2005, 4(5):530-536.
Article Google Scholar
Jiang J, Heng W, Zhang H, Wu G, Wang H: A novel handover scheme for multi-hop transparent relay network. In Proceedings of the International Conference on Wireless Communications & Signal Processing. Nanjing, China; November 2009:1-5.
Google Scholar
Ylianttila M, Pande M, Makela J, Mahonen P: Optimization scheme for mobile users performing vertical handoffs between IEEE 802.11 and GPRS/EDGE networks. In Proceedings of the IEEE Global Telecommunications Conference. San Antonio, TX, USA; November 2001:3439-3443.
Google Scholar
Wang J-B, Chen H, Chen M, Wang J: Cross-layer packet scheduling for downlink multiuser OFDM systems. Sci. China Ser. F - Inf. Sci 2009, 52(12):2369-2377. 10.1007/s11432-009-0219-1
Article MATH Google Scholar
Li W, Wang J-B, Chen M: Outage probability of dual-hop amplify-and-forward relaying systems over shadowed Nakagami-m fading channel. IEICE Trans. Fund. Electron. Commun. Comput. Sci 2008, 91(11):3403-3405.
Article Google Scholar
Puterman M: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken, NJ; 1994.
Book MATH Google Scholar
Feinberg EA, Schwartz A, Feinberg EA: Handbook of Markov decision processes, ch. Markov Decision Processes in Finance and Dynamic Options. Kluwer, Netherlands; 2002.
Book Google Scholar
Yu F, Krishnamurthy V, Leung VCM: Cross-layer optimal connection admission control for variable bit rate multimedia traffic in packet wireless CDMA networks. IEEE Trans. Signal Process 2006, 54(2):542-555.
Article Google Scholar
Yu F, Wong VWS, Leung VCM: Efficient QoS provisioning for adaptive multimedia in mobile communication networks by reinforcement learning. Mobile Netw. Appl 2006, 11(1):101-110. 10.1007/s11036-005-4464-2
Article Google Scholar
Tham CK, Renaud JC: Multi-agent systems on sensor networks: a distributed reinforcement learning pproach. In Proceedings of the Second International Conference on Intelligent Sensors, Sensor Networks and Information Processing. Melbourne, Australia; December 2005:423-429.
Google Scholar
Stevens-Navarro E, Lin Y, Wong VWS: An MDP-based vertical handoff decision algorithm for heterogeneous wireless networks. IEEE Trans. Veh. Technol 2008, 57(2):1243-1254.
Article Google Scholar
UMTS: Annex B: Test Environments and Deployment Models. 1998. UMTS 30.03
Google Scholar
Moon J, Hong S: Adaptive code-rate and modulation for multi-user OFDM system in wireless communications. In Proceedings of the IEEE 56th Vehicular Technology Conference. Vancouver, Canada; September 2002:1943-1947.
Chapter Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the funding assistance of National Natural Science Foundation of China (No. 61172078).

Author information

Authors and Affiliations

College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
Xiaoyu Dang & Zhe Cao
National Mobile Communications Research Laboratory, Southeast University, Nanjing, 210096, China
Jin-Yuan Wang

Authors

Xiaoyu Dang
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Yuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Yuan Wang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Dang, X., Wang, JY. & Cao, Z. MDP-based handover policy in wireless relay systems. J Wireless Com Network 2012, 358 (2012). https://doi.org/10.1186/1687-1499-2012-358

Download citation

Received: 29 May 2012
Accepted: 05 November 2012
Published: 29 November 2012
DOI: https://doi.org/10.1186/1687-1499-2012-358

MDP-based handover policy in wireless relay systems

Abstract

1 Introduction

2 System model

3 MDP-based handover decision

3.1 Decision epoch, action, and state

3.2 Transition probability

3.3 Reward function

3.4 Expected total reward

3.5 Optimality equations and the VIA

4 Extensions to multiple cell systems

5 Numerical results

6 Conclusion

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords