Sum rate maximization for multi-cell downlink OFDMA with subcarrier pair-based opportunistic DF relaying

This paper considers multi-cell decode-and-forward (DF) relay-aided orthogonal frequency division multi-access (OFDMA) downlink systems, in which all sources and relays are coordinated by a central controller for resource allocation (RA). The improved subcarrier pair-based opportunistic DF relaying protocol proposed and studied in the IEEE International Conference on Communications, Beijing, 3795–3800, 2008 and IEEE Trans. Signal Process. 61:2512–2524, 2013 is applied. This protocol has a high spectrum efficiency (HSE) in the sense that all unpaired subcarriers are utilized for data transmission during the second time slot (TS). In particular, the sum (over all cells and all destinations) rate maximized problem with a total power constraint in each cell+ is formulated. To solve this problem, an iterative RA algorithm is proposed to optimize mode selection (decision whether the relay should help or not), subcarrier assignment and pairing (MSSAP) and power allocation (PA) in an alternate way. As for the MSSAP stage of each iteration, the formulated problem is decoupled into subproblems with the tentative PA results. Each subproblem can be easily solved by using the optimal results of a linear assignment problem (LAP), which is then solved by the Hungarian Algorithm in polynomial time. As for the PA stage of each iteration, an algorithm based on single-condensation and geometric programming (SCGP) is proposed to optimize PA in polynomial time with the tentative MSSAP results. The proposed algorithm is coordinate ascent (CA)-based and therefore can reach a local optimum in polynomial time. Finally, the convergence and effectiveness of the proposed algorithm, the impact of relay position and total power on the system performance, and the benefits of using subcarrier pairing (SP) and the HSE protocol are illustrated through numerical experiments.


Introduction
High data rate and ubiquitous coverage are strongly required in the next-generation wireless communication networks. To achieve this goal, orthogonal frequency division multiple access (OFDMA) technology is receiving a lot of interest due to its inherent ability to combat frequency-selective multi-path fading and its flexibility in applying dynamic radio resource allocation (RA) for performance improvement. On the other hand, the emerging relaying technology is highly favored by both academia *Correspondence: zhiwen.jin@student.uclouvain.be 1 ICTEAM Institute, École Polytechnique de Louvain, Université Catholique de Louvain, Louvain-la-Neuve 1348, Belgium Full list of author information is available at the end of the article and industry due to its attractive feature of coverage extension and data rate improvement [1]. Therefore, it is promising to incorporate OFDMA with relaying technologies in next-generation cellular systems.
Concerning relay aided OFDM(A) transmissions, two low-complexity yet efficient types of relaying have been proposed in [2] and [3], namely amplify-and-forward (AF) and decode-and-forward (DF). Recently, the DF relaying is receiving a lot of interest due to its simple processing at the relay. To better apply it, [4] and [5] have further proposed the ML decoding to address the error propagation issue. With DF relaying, symbols are transmitted in two time slots (TSs). During the first TS, the source broadcasts symbols on all subcarriers with the relay keeping quiet. During the second TS, except for the relay, the http://jwcn.eurasipjournals.com/content/2014/1/24 source might also broadcast symbols on subcarriers not used by the relay, as will be elaborated later. Each subcarrier used by the relay during the second TS is paired with a subcarrier during the first TS. In particular, these two subcarriers are termed as paired subcarriers, referred to as the relay-aided transmission mode hereafter. Adopting subcarrier-by-subcarrier pairing-based DF relaying, RA problems for downlink OFDMA have been investigated intensively [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25].
In particular, the works of [8][9][10][11] have considered RA for the DF relay-aided OFDMA systems when the destination cannot or hardly hear from the source, meaning, the source-to-destination (S-D) link is unavailable. Every subcarrier during the first TS is paired with a subcarrier during the second TS using the relay-aided transmission mode. Under a total power constraint, it has been proven that the optimum sum rate can be reached by ordered subcarrier pairing, i.e., the best source-to-relay subcarrier should be paired with the strongest relay-to-destination subcarrier, and so on.
Considering the case when the S-D link is available, [6,7,[12][13][14][15][16][17][18][19][20][21][22][23][24][25] have studied RA in systems with opportunistic relaying (sometimes termed as selection relaying). To start with, a low spectrum efficiency (LSE) protocol was studied in [6,[12][13][14][15][16], when only the relay broadcasts symbols during the second TS. Specifically, every symbol can be transmitted in either the relay-aided mode or the direct mode. In the relay-aided mode, a subcarrier during the first TS is paired with a subcarrier during the second TS for the transmission. While in the direct mode, the symbol is transmitted on a subcarrier during the first TS directly to the targeted destination without the help of the relay. Note that using this LSE protocol, unpaired subcarriers during the second TS are actually unused, which causes a waste of the limited spectrum resource. To address this issue, [17][18][19][20][21][22][23][24][25] have proposed and studied improved high spectrum efficiency (HSE) protocols, which allow new symbols to be transmitted on the unpaired subcarriers during the second TS. Note that the HSE protocols do not really modify the implementation of DF relaying during the second TS but simply let the source use unpaired subcarriers during the second TS for direct-mode transmission. To further improve the system performance, a novel subcarrier pairbased opportunistic DF relaying protocol is proposed in [7]. Note that this protocol truly improves the implementation of DF relaying by using transmit beamforming at a subcarrier during the second TS. Specifically, this novel protocol is the same as the HSE protocol, except that the source and relay implement transmit beamforming to transmit the symbol at a paired subcarrier during the second TS.
Note that all these papers consider RA in singlecell situations with the cochannel interference (CCI) modeled as additive background noise, which is reasonable only when the frequency reuse factor is low. Here, the frequency reuse factor is 1/W, where W is the number of cell clusters which cannot use the same frequencies for transmission. However, in next-generation cellular systems, aggressive frequency reuse is recommended due to its ability to achieve higher system capacity [26]. When the frequency reuse factor is high or even up to 1, the CCI becomes a key factor affecting the system performance and thus cannot be ignored [27,28].
Considering the CCI, the models for multi-cell DF relayaided OFDMA systems become much more interesting and challenging than those for single-cell DF relay-aided OFDMA systems. For RA problems in such multi-cell systems, some algorithms have been proposed in [29] and [30] when powers are uniformly allocated to all stations. Several RA algorithms have been proposed in [31][32][33][34] for multi-cell OFDMA systems without DF relaying. However, these methods cannot be extended directly to solve RA problems jointly optimizing transmission mode selection, subcarrier assignment and pairing (MSSAP) as well as power allocation (PA) in multi-cell OFDMA systems with subcarrier pair-based opportunistic DF relaying. Paper [35] has recently considered and formulated the joint RA and scheduling problem in multicell DF relayed OFDMA systems, taking CCI as well as user data rate requirements into account. However, both opportunistic relaying and subcarrier pairing (SP) were not considered there. The CCI from the interfering sources to the considered destination and relay is ignored for practical and simplification consideration. Moreover, to solve this problem, an interference temperature constraint has been imposed to transform the original problem into a standard convex one.
Compared with the above existing works, the contributions of this paper are listed below: • We formulate the sum rate maximized joint RA problem in multi-cell OFDMA downlink systems aided by a DF relay station (RS) in each cell. In particular, the subcarrier pair-based opportunistic relaying is adopted. To start with, the HSE protocol with SP is considered here without signal combining and transmit beamforming. It will be shown that the system performance can be improved by using SP and the HSE protocol. Note that when modeling the inter-cell CCI of a subcarrier in a selected cell, instead of using an additional integer variable to indicate whether a node in an interfering cell transmits data on this subcarrier or not, we use the corresponding power value to do it. This choice is motivated to simplify the system sum rate expression and facilitate the algorithm design. Specifically, the rate expression http://jwcn.eurasipjournals.com/content/2014/1/24 would be a non-linear function of the integer variables when using an integer variable to indicate whether a subcarrier k in a cell n is used or not, while it is a linear function of the integer variables when using the corresponding power value to do it. • We propose an iterative RA algorithm to solve the RA problem, which optimizes MSSAP and the PA in an alternate way with the sum rate that keeps on increasing until convergence. As for the MSSAP stage of each iteration, the formulated problem is decoupled into subproblems with the tentative PA results. Each subproblem can be easily solved by using the optimal results of a linear assignment problem (LAP), which is then solved by the Hungarian Algorithm in polynomial time. As for the PA stage of each iteration, an algorithm based on singlecondensation and geometric programming (SCGP) is proposed to optimize PA in polynomial time with the tentative MSSAP results. The proposed algorithm is coordinate ascent (CA)-based and therefore can reach a local optimum in polynomial time.
The rest of this paper is organized as follows. First, the system model and the problem formulation for the considered system are presented in the next section. Then, the proposed algorithms are described in Section 3. Furthermore, the effectiveness and convergence of the proposed RA algorithm as well as the benefits of using SP and the HSE protocol are illustrated by numerical experiments in Section 4. Finally, some conclusions are drawn in Section 5.

System model
We consider a cellular OFDMA system with N cells coordinated by a central controller for RA. In each cell, the downlink transmission is carried out from a source to U destinations with the help of a relay, which is assumed to be of the DF type. For each link, the channel is assumed to be frequency selective and transformed into K parallel subchannels by using OFDM with sufficiently long cyclic prefix. The data transmission is carried out in two TSs. During the first TS, a symbol is first broadcasted by the source at a subcarrier k, which is in either relay-aided mode or direct mode. Both the relay and the targeted destination receive this symbol. If the relay-aided mode is used, the relay decodes the received symbol and forwards it to the targeted destination over a subcarrier l with the source keeping quiet on this subcarrier during the second TS. Here, the subcarrier l is paired with the subcarrier k for transmitting the same symbol. The destination only decodes the symbol received during the second TS. If the direct mode is used, the targeted destination decodes the symbol received during the first TS. Also, another symbol is broadcast by the source at an unpaired subcarrier l during the second TS, which is received and only decoded by the destination selected for the second TS.
With OFDMA, each subcarrier is allocated to only one destination in each cell. For the relay-aided transmission, subcarrier k during the first TS can be paired with only one subcarrier l during the second TS and vice versa. Throughout this paper, perfect decoding, timing, and carrier synchronization are assumed. The RA is carried out by a central controller which has perfect knowledge of the system channel state information. Moreover, the coherence time of each link is assumed to be sufficiently long for the RA to be implemented. Lastly, we assume that the optimized RA can be correctly distributed to all nodes. Note that an upper bound on the system performances is obtained by assuming the above idealities.
The data transmission procedure in every cell is identical. Therefore, we only analyze the downlink data transmission inside one selected cell n, which is impaired by cochannel interference from the other cells. Specifically in cell n, the symbols are transmitted in either relayaided mode or direct mode, as will be elaborated in the following.
Let us first see more closely the relay-aided data transmission. As illustrated in Figure 1a, the source s n first produces a symbol P k s n ,t 1 x k s n ,t 1 at subcarrier k during the first TS, while the transmitter of the relay r n remains idle. Here, x k s n ,t 1 denotes the normalized symbol (meaning E{|x k s n ,t 1 | 2 } = 1) transmitted by s n at subcarrier k during the first TS and P k s n ,t 1 denotes the corresponding transmit power. Simultaneously in an interfering cell n , a symbol P k s n ,t 1 x k s n ,t 1 is also produced from the interfering source s n at the same subcarrier. Here again, E{|x k s n ,t 1 | 2 } = 1. Note that instead of using an additional integer variable to indicate whether s n transmits data on subcarrier k or not, we use P k s n ,t 1 to do it. Specifically, P k s n ,t 1 > 0 means that s n uses subcarrier k for data transmission during the first TS, and P k s n ,t 1 = 0 means that s n transmits nothing at the subcarrier k during the first TS. This choice is motivated to simplify the system sum rate expression and facilitate the algorithm design. Specifically, when using an integer variable to indicate whether a subcarrier k in a cell n is used or not, the rate expression would contain integer variables in both the denominators and the numerators, which makes it a non-linear function of the integer variables. When using the corresponding power value to do it, the rate expression contains integer variables only in the numerators, which makes it a linear function of the integer variables. At the end of the first TS, the signal received by r n for subcarrier k can be expressed as where v k r n denotes the additive white Gaussian noise (AWGN) at subcarrier k and r n during the first TS. h k s n ,r n denotes the channel frequency response (CFR) for subcarrier k from s n to r n .
During the second TS, as illustrated in Figure 1b, r n reencodes the decoded symbol and forwards P l r n x l r n at a subcarrier l. Here, x l r n = x k s n ,t 1 , meaning that the subcarrier l during the second TS is paired with subcarrier k during the first TS for transmitting the same symbol in cell n. The source s n transmits nothing on this paired subcarrier l, meaning that P l s n ,t 2 = 0. Here, P l r n and P l s n ,t 2 denote the transmit power allocated to r n and s n , respectively, at subcarrier l during the second TS. At the same time, in an interfering cell n , r n and s n also transmit P l r n x l r n and P l s n ,t 2 x l s n ,t 2 at subcarrier l, at most only one power value out of P l r n x l r n , and P l s n ,t 2 x l s n ,t 2 can be zero. More specifically, when subcarrier l is paired with a subcarrier used during the first TS in cell n , P l s n ,t 2 = 0. Otherwise, P l r n = 0. Here again, E{|x l r n | 2 } = 1 and E{|x l s n ,t 2 | 2 } = 1. At the end of the second TS, the signal received by the tar-geted destination d un at subcarrier l can be expressed as where v l d un ,t 2 denotes the AWGN corrupting d un at subcarrier l during the second TS, h l r n ,d un denotes the CFR for subcarrier l from r n to d un . h l s n ,d un denotes the CFR of subcarrier l from s n to d un .
We assume that {v k r n and v l d un ,t 2 } are independent zeromean circular Gaussian random variables with the same variance σ 2 . After some mathematical calculations, the signal-to-interference-plus-noise ratio (SINR) associated with decoding x k s n ,t 1 from y k r n at r n during the first TS is expressed by where f k r n = σ 2 + n ,n =n P k s n ,t 1 G k s n ,r n denotes the sum power of the AWGN and the interference received by r n at subcarrier k during the first TS. G k s n ,r n = |h k s n ,r n | 2 denotes the channel gain of subcarrier k from s n to r n .
Also, the SINR associated with decoding x l r n , which equals x k s n ,t 1 , from y l d un ,r n at d un during the second TS is expressed by ,n =n P l r n G l r n ,d un denotes the sum power of the AWGN and the interference received by d un at subcarrier l during the second TS. G l r n ,d un = |h l r n ,d un | 2 denotes the channel gain of subcarrier l from r n to d un . G l s n ,d un = |h l s n ,d un | 2 denotes the channel gain of subcarrier l from s n to d un .
The maximum achievable rate for the subcarrier pair (k, l) when allocated to d un is given by [3] R kl un,2 = min ln 1 + k r n , ln 1 + l d un ,t 2 (5) in nats/two TSs. Let us describe further the direct data transmission in cell n. During the first TS, s n broadcasts P k s n ,t 1 x k s n ,t 1 at subcarrier k. The targeted destination d un receives signals http://jwcn.eurasipjournals.com/content/2014/1/24 from all sources. Finally, the signal received by d un at subcarrier k can be expressed as where v k d un ,t 1 denotes the AWGN corrupting d un at subcarrier k during the first TS.
During the second time slot, another symbol P l s n ,t 2 x l s n ,t 2 is broadcast by s n at an unpaired subcarrier l and received by a destination d vn . The received signal can be expressed as where v l d vn ,t 2 denotes the AWGN corrupting d vn at subcarrier l during the second TS.
Thus, the achievable sum rate for subcarrier k during the first TS and subcarrier l during the second TS is given by denotes the SINR associated with the decoding of x k

Problem formulation
In order to formulate the RA problem, we now introduce binary variables a kl uvn and b kl un to describe the mode selection, subcarrier assignment, and pairing in both TSs. To be more specific, a kl uvn = 1 indicates that subcarrier k is allocated for data transmission to d un in direct mode during the first TS and so is subcarrier l to d vn during the second TS. Note in direct mode, although used for transmitting two different symbols, subcarrier k and subcarrier l are virtually seen as a subcarrier pair for data transmission during two TSs. b kl un = 1 indicates that subcarrier k used during the first TS is paired with subcarrier l used during the second TS, and they are both allocated for data transmission to d un aided by r n .
We consider maximizing the sum rate of all destinations in all cells under per-cell total power constraints. The optimization variables are the transmission mode for each subcarrier, the subcarrier assignments and pairings, and the subcarrier power allocations at the sources and the relays. According to the system model, the considered RA problem can be formulated as where function F(x, y) is defined as Here, C1, C2, and C3 ensure that each subcarrier k during the first TS can be paired or virtually paired with only one subcarrier l during the second TS in each cell n. Also, subcarrier l can be paired or virtually paired with only one subcarrier k. For each subcarrier pair (k, l), only one mode (direct/relay-aided) can be selected for data transmission. If the direct mode is used, unique destinations (d un and d vn ) should be targeted for subcarrier k and subcarrier l, respectively. If the relay-aided mode is selected, the subcarrier pair (k, l) can be allocated to only one destination. Moreover, C4 and C5 ensure that the consumed sum power for each cell is less than its available sum power. This type of power constraints gives an upper bound of the system performance. In practice, each node (source, relay) in each cell will have an individual power constraint. Finally, C6, C7, and C8 guarantee that no data is transmitted on an unused subcarrier and subcarrier l is used by only one node (either the source or the relay) in each cell during the second TS.

Algorithm overview
Note that problem (11), which contains both integer and continuous variables, is a mixed combinatorial and nonconvex optimization. Here, the non-convexity is due to the existence of the inter-cell co-channel interference terms in (12). In general, an exhaustive search is needed to find the global optimum. Let us consider a system with N cells, U users, and K subcarriers in each cell. There are K NK possible subcarrier pairs, each of which can be assigned to one of the U possible users with two possible transmission modes. Thus, there are (2U) K NK possible MSSAP, which limits the scalability.
In order to make the problem tractable, we opt for a CA approach made of two stages: the MSSAP stage and the PA stage. We propose an iterative algorithm which optimizes the two stages in an alternate way, as depicted by Algorithm 1. Integer m indicates the iteration number, and a variable with the superscript m denotes the value obtained at the end of iteration m. Specifically in the proposed RA algorithm, we first set m = 0 and initialize the power P 0 by uniform power allocation (UPA). Each iteration is made of the MSSAP stage followed by the PA stage.
During the MSSAP stage of iteration m, problem (11) is solved with P = P m−1 and the output delivered is denoted as {A m , B m }. As will be shown in Section 4.2, when P is fixed, problem (11) is decoupled into n subproblems which can be easily solved using the optimal results of n LAP. Note as each LAP can be solved by the Hungarian Algorithm in polynomial time [36], {A m , B m } will be obtained in polynomial time. Finally, we have During the PA stage of iteration m, problem (11) is solved with A = A m−1 and B = B m−1 . The output delivered is denoted as P m . However, when {A, B} is fixed, problem (11) is still non-convex and hard to solve. Fortunately, by using the algorithm based on SCGP proposed in Section 4.3, this non-convex problem can be approximated into a series of standard geometric programs (GP), which can be solved by common GP solvers. As will be shown in Section 4.3, the solutions of the approximated GP problems converge to a local optimum, satisfying the Karush-Kuhn-Tucker (KKT) conditions of the nonconvex problem. Note as each standard GP can be solved in polynomial time [37], P m will be obtained in polynomial time after the convergence of the SCGP algorithm.
Considering the proposed RA algorithm, we now have where E1 and E2 are due to the effects of the MSSAP stage and the PA stage, respectively. This means that http://jwcn.eurasipjournals.com/content/2014/1/24 Algorithm 1 yields non-decreasing sum rates with iterations. Moreover, the optimum sum rate is upper bounded due to the total power constraint in each cell. Therefore, the sum rate values will not increase indefinitely with iterations, meaning, that the iterations will eventually converge. Algorithm 1 will stop when the sum rate increase is below a prescribed value 1 or when m reaches a prescribed value M. Note that, as both the MSSAP stage and the SCGP stage can be solved in polynomial time, problem (11) can be solved in polynomial time. At the end of iteration m, a local optimum of (11) is obtained at the SCGP stage, which is then improved at the MSSAP stage of the next iteration. After that, a better local optimum of (11) can be calculated at the SCGP stage of iteration m + 1. Finally, a good local optimum can be obtained after convergence.

MSSAP optimization
In this subsection, the MSSAP stage for iteration m is considered. After setting the power variable P to P m−1 , R becomes a linear function of all indicators. Note that each cell has independent constraints for its local indicators. Therefore, problem (11) can be decoupled into N subproblems. As for subproblem n 0 , the sum rate of the corresponding cell is maximized, subject to its local constraints. Specifically, subproblem n 0 is formulated as where A n 0 = {a kl uvn |n = n 0 }, B n 0 = {b kl un |n = n 0 }, C n 0 = {c kl n 0 }, and a kl uvn 0 R kl uvn 0 ,1 P m−1 Due to constraints C1 and C2 , we find that the inequality U u=1 U v=1 a kl uvn 0 R kl uvn 0 ,1 + U u=1 b kl un 0 R kl un 0 ,2 ≤ c kl n 0 Q kl n 0 (15) holds, where Q kl n 0 = max{max uv R kl uvn 0 ,1 , max u R kl un 0 ,2 }. The objective function of problem (13) is upper bounded by that of the following problem. max A n 0 ,B n 0 ,C n 0 where R n 0 ,UB = K k=1 K l=1 c kl n 0 Q kl n 0 . According to C1 , C2 , C3 , and C4 , we certainly have c kl n 0 ∈ {0, 1}, ∀k, l. Considering variables A n 0 and B n 0 of (16) take values only based on C n 0 , problem (16) can be reformulated as Let us now consider Theorem 1.

Theorem 1.
An optimal solution of (13) (A * n 0 and B * n 0 ) can be easily obtained using the optimal solution of (17) (c * ,kl n 0 ,UB ). Specifically, A * n 0 and B * n 0 are obtained by assigning for every combination of k and l, all entries in {a kl uvn 0 |∀u, v} and {b kl un 0 |∀u} to zero, except for the one with the metric equal to Q kl n 0 to c * ,kl n 0 ,UB . Proof 1. Let us view R kl uvn 0 ,1 and R kl un 0 ,2 as metric for a kl uvn 0 and b kl un 0 , respectively. Note that ∀c kl n 0 = 0, inequality (15) is tightened when all entries of {a kl uvn 0 |∀u, v} and {b kl un 0 |∀u} are set to zero. While ∀c kl n 0 > 0, inequality (15) is tightened when all entries of {a kl uvn 0 |∀u, v} and {b kl un 0 |∀u} are set to zero, except that the one with the metric equal to Q kl n 0 is assigned to c kl n 0 . We now denote by {C * n 0 } the optimal variables of (17). Thus, ∃A * n 0 , B * n 0 , so that R n 0 A * n 0 , B * n 0 = R UB n 0 C * n 0 . Here, A * n 0 and B * n 0 are obtained by assigning for every combination of k and l, all entries in {a kl uvn 0 |∀u, v} and {b kl un 0 |∀u} to zero, except for the one with the metric equal to Q kl n 0 to c * ,kl n 0 ,UB . It is important to know that {A * n 0 , B * n 0 , C * n 0 } belongs to the feasible set of (13).
We further note that ∀ feasible {A n 0 , B n 0 , C n 0 }, R n 0 (A n 0 , B n 0 ) ≤ R UB n 0 C n 0 ≤ R UB n 0 C * n 0 . Recalling R n 0 A * n 0 , B * n 0 = R UB n 0 C * n 0 , we obtain the optimal variables of problem (13) as {A * n 0 , B * n 0 , C * n 0 }. This concludes the proof of Theorem 1. http://jwcn.eurasipjournals.com/content/2014/1/24 Let us introduce two sets, F and F1 (F1 ⊂ F). Here, F is the feasible sets of (17). F1 is the same as F except that both C1 and C2 are saturated. Note that ∀C n 0 ∈ F\F1, there must ∃k 0 , K l=1 c k 0 l n 0 = 0 or ∃l 0 , K k=1 c kl 0 n 0 = 0. If ∃k 0 , K l=1 c k 0 l n 0 = 0, according to the constraints, there must ∃l 0 such that c k 0 l 0 n 0 = 0 and K k=1 c kl 0 n 0 = 0. Let us denote the sets of these k 0 and l 0 as K 0 and L 0 , respectively. Thus, we can find C n 0 ∈ F1, where c ,kl n 0 = 1, ∀k ∈ K 0 , l ∈ L 0 and c ,kl n 0 = c kl n 0 , ∀k / ∈ K 0 or l / ∈ L 0 , such that R n 0 ,UB C n 0 ≥ R n 0 ,UB C n 0 . The optimal variables of the following problem is also optimal for problem (17).
Note that (18) is the same as (17) except that the feasible set of (18) F1 is a subset of that of (17) F. (18) is actually a LAP. Hence, it can be solved by means of typical LAP algorithms, like the Hungarian Algorithm.

PA optimization
In this subsection, the PA stage for iteration m is considered. After setting the indicators to {A m , B m }, the transmission modes are fixed in all cells. Let us denote by S n (d) (S n (r)) the set of subcarriers in direct (relay-aided) mode during the first TS in cell n. ∀k, the associated destination u during the first TS and the subcarrier l with which it is paired or virtually paired during the second TS are decided. ∀k ∈ S n (d), the associated destination during the second TS is also decided. Thus, the objective function of problem (11) can consequently be rewritten as where variables l, u, and v are all constants. Note that R is a non-convex function due to the presence of interfering power terms in the denominators of k d un ,t 1 , l d vn ,t 2 , k r n , and l d un,t 2 . To solve problem (11), we first replace it with an equivalent complementary geometric program (CGP) that is then addressed by means of the SCGP algorithm.
The equivalent CGP is obtained by formulating problem (11) as a minimization problem. We first formulate problem (11) as follows min P e −R(P) where e −R(P) = n k∈S n (d) Next, an equivalent formulation is obtained by introducing slack variables r = { n,k r , ∀n, k ∈ S n (r)}. Then, problem (20) can be formulated as Problem (21) is made of an objective function and bounding constraints which all are ratios of two posynomials, making the problem belong to the class of CGP [38]. More details about CGP can be found in Appendix 1. Problem (21) can not be made convex and is NP hard [37]. In order to solve it, the SCGP algorithm is now proposed. The proposed SCGP algorithm approximates the non-convex problem (21) into a series of standard GPs. Therefore, it belongs to the class of successive convex approximation methods [39]. The SCGP is described in Algorithm 2. Integer t indicates the current iteration number, and a variable with superscript m denotes the value obtained at the end of iteration m .
Specifically in the proposed SCGP algorithm, we first set m = 0 and initialize the PA vector P 0 with P ini . Each iteration contains an SC stage followed by a standard geometric programming (SGP) stage.
During the SC stage of iteration m , a condensation rule is first introduced and depicted in Algorithm 3. The goal of the condensation rule is to construct a GP approximation of problem (21) by condensing all denominator posynomials {g k d un ,t 1 , g l d vn ,t 2 , g l d un ,t 2 , g k r n } into monomials {g k d un ,t 1 ,g l d vn ,t 2 ,g l d un ,t 2 ,g k r n }. The SC stage: Apply a condensation rule and condense all denominator posynomials in problem (21) {g k d un ,t 1 , g l d vn ,t 2 , g l d un ,t 2 , g k r n } into monomials {g k d un ,t 1 ,g l d vn ,t 2 ,g l d un ,t 2 ,g k r n } using P m −1 as the initial power values; 6: The SGP stage: Solve the condensed problem using a standard GP solver and assign the results to P m ; 7: until ||P m − P m −1 || ≤ 2 , m =M' 8: Output: P t . n,k r . According to proposition 3 in [38], all the approximations satisfy the three conditions proposed in [39] for the convergence of the successive approximation method. By denoting = {P, r }, the three conditions are listed as follows: 1. Bounding condition: ∀ , h n,k i ( ) ≤h n,k i ( ), ∀i = 1, 2, 3. After the SC stage, problem (21) is formulated by a standard GP. The SGP stage amounts to solving the GP by means of a standard GP solver, e.g., the software provided at [40]. The output provided by this stage corresponds to P m .

Tightness condition: At the beginning of iteration m ,
Thanks to the three conditions fulfilled during the SC stage, our proposed SCGP algorithm is a general inner approximation algorithm [39], which will converge to a local optimum satisfying the KKT conditions of problem (21) according to the corollary 1 of [39]. In practice, the iterations of the SCGP algorithm will be stopped when ||P m − P m −1 || ≤ 2 or when m exceeds a prescribed value M .

Algorithm 3 Condensation Rule
1: Input: a posynomial g(x) = i u i (x) to be approximated and an initial input value x 0 ; 2: (1) Evaluate g(x 0 ); 3: (2) Compute the weight for each term of the posynomial, into a monomialg(x):

Numerical experiments
In order to illustrate the benefits of using SP and the high-efficiency protocol, respectively, we now compare the proposed protocol, named P1, with two other protocols, named P2 and P3, respectively. Specifically, P2 is the same as P1 except that subcarrier k in TS1 is always paired with subcarrier l in TS2. P3 is also the same as P1 except that all sources keep silent during TS2. For both P2 and P3, Algorithm 1 can easily be applied by adding additional constraints. Both UPA and best power allocation (BPA) are used to initialize the proposed algorithm.
Here, BPA is the PA method used in [7], where CCI is set to 0. The impact of relay deployment, on average sum rate, is first studied and proper relay positions are selected for further experiments. Then, RA and the sum rate results are presented for one particular channel realization. The convergence of the proposed RA algorithm is illustrated for all three protocols. Finally, the results are provided and discussed for the performance of those protocols averaged over many channel realizations.

System setup
As shown in Figure 2, a multi-cell OFDMA system with N = 2 coordinated cells and K = 32 available subcarriers is considered for illustration. Each cell contains one source, one relay, and U = 5 destinations, which are uniformly distributed in a circular region of 50-m radius. In each cell, the relay is located along the line between the source and the center of the user region. The source-torelay distances in cell 1 and cell 2 are denoted as D sr1 and D sr2 , respectively. We set P n T = P T , ∀n, σ 2 = −65 dBm, 1 = 0.1, and 2 = R ini /100. Here, R ini denotes the sum rate calculated with the initial RA results at the beginning of each iteration.
The channel impulse response (CIR) of each link is drawn randomly from an eight-tap delay line model, where each tap i has a circular complex gaussian distribution with zero mean and variance σ 2 i . We further assume decreases exponentially with a coefficient of 3. Moreover, i σ 2 i = d −2.5 , meaning that the received power decreases exponentially with distance d and the propagation exponent equals to 2.5. Finally, the CFR of each link is computed from its CIR using the K-point FFT.

Impact of relay deployment on average sum rate
We have chosen a set of values for Dsr1 and Dsr2, ranging from 100 to 900 m. For each possible combination of Dsr1 and Dsr2, 100 sets of random realizations of channels are generated and the average optimized sum rate is computed by using the proposed algorithm with the UPA initialization method. We then compare the obtained average sum rates of all combinations to illustrate how the deployment of relays influences the system performance.
As shown in Figure 3, in low SNR scenario when P T = 10 dBm, the average sum rate increases as the relay moves towards the middle between the source and the destination in each cell. This is because the achievable rate for a relay-aided transmission is the minimum between the rate of the source-to-relay link and the rate of the relay-to-destination link. As the relays move towards the middle, both the source-to-relay channels and the relayto-destination channels are in good conditions. Thus, none of the good link is detrimental, and the minimum rate of both links is very likely to be high. However, in high SNR scenario when P T = 40 dBm, the average sum rates for each possible combination of Dsr1 and Dsr2 are almost the same. As interpreted in [7], this is because the direct transmission is more desirable to maximize the sum rate than the relay-aided transmission when the system sum power is very high.
We further study the impact of relay deployment considering uncoordinated out-of-cluster CCI. Specifically, we place one uncoordinated cell on the top of the two coordinated cells and another one at the bottom of the two coordinated cells. In each uncoordinated cell, the source-to-relay distance is set to 300 m and the power is uniformly allocated. Considering the interferences generated from two uncoordinated cells, the proposed resource allocation algorithm is used for the two coordinated cells. The simulation results are given in Figure 4 both when P T = 10 dBm and when P T = 50 dBm. As in Figure 3, the same insights can be found in Figure 4 except that the system sum rate is decreased due to the existence of the out-of-cluster CCI, especially in high SNR scenario when the CCI is quite strong.
In the following simulations, we set Dsr1 = 300 m and Dsr2 = 300 m.

Results for a random realization of channels 4.3.1 Convergence of the proposed RA algorithm
In order to illustrate the convergence of the proposed RA algorithm, we plot the sum rates calculated during the iterations with the three protocols and two initialization methods when P T = 10 dBm and P T = 40 dBm.
As shown in Figure 5, the sum rates for all the three protocols using all the initialization methods converge smoothly to the final rate without much fluctuation. After convergence, the sum rates are improved significantly except when P T = 10 dBm and the BPA initialization method is used. When using protocol P1 and setting P T = 10 dBm, the converged sum rate with the BPA initialization method is higher than that obtained with the UPA initialization method. This is because the BPA is near optimal in low SNR scenario when the CCI is small. Interestingly, in high SNR scenario when P T = 40 dBm, both BPA and UPA lead to the same sum rate after convergence.
Initialized by the UPA method, the optimized sum rate of P1 is much higher then that of P3, both when P T = 10 dBm and when P T = 40 dBm. Thus, the benefit of using the HSE protocol is very high. When P T = 10 dBm, the optimized sum rate of P1 is slightly higher than that of P2. However, when P T = 40 dBm, P1 and P2 leads to the same optimized sum rate. Thus, the benefit of using subcarrier pairing is limited. This can be understood as most of the subcarriers are in direct mode, for which pairing does not have any impact.

Benefits illustration of the HSE protocol
In order to further illustrate the benefit of using the HSE protocol, we now compare the RA results and sum rates of protocol P2 with another protocol P4. Here, P4 is the same as P2 except that all sources keep silent during the second TS. A multi-cell OFDMA system with N = 2 coordinated cells and K = 16 available subcarriers is considered in this subsection. Each cell contains U = 2 destinations. Figure 6 shows the positions of the BSs; the relays and the destinations are expressed in meters. Specifically, the coordinates of source 1 and source 2 are (0, 0) and (0, 1000), respectively. The coordinates of relay 1 and relay 2 are (300, 0) and (300, 1000), respectively. The coordinates of d 11 and sd 12 are (800, 0) and (1200, 0), respectively. The coordinates of d 21 and d 22 are (800, 1000) and (1200, 1000), respectively. Finally, we set P T = 30 dBm.
As shown in Figure 7e,f,g, each cell appears to allocate the dominant part of its power to only a few subcarriers during each TS for both protocols. Once a cell assigns a significant part of its power to a subcarrier during a TS, the other cells tend to allocate only a small part of their power to this subcarrier during this TS, so as to reduce CCI. As shown in Figure 7b,c,d, the optimized indicators for both P2 and P4 are the same. However, it is very interesting to note that compared to P4, P2 exploits the freedom of allocating power to BSs during the second TS, as shown in Figure 7f, and eventually leads to a much higher sum rate after optimization, as shown in Figure 7a. Thus, the effectiveness of using the HSE protocol is illustrated.
Finally, according to Figure 7c,d, no resources are allocated to the remote destinations (d 11 in cell 1 and d 21 in cell 2) for both protocols. Actually, this confirms the unfair nature of sum rate maximization. Therefore, fairness will be considered in our future work.

Results averaged over channel distribution
In order to illustrate the average performance of our proposed RA algorithm, 100 random realizations of channels are generated. Three benchmark algorithms (BA1, BA2, and BA3) are introduced. Here, BA1 applies the RA algorithm proposed in [7], where the CCI is always set to 0. BA2 uses UPA together with our algorithm proposed for MSSAP. BA3 also uses UPA but with the destinations randomly selected for each subcarrier. We denote the distance between two parallel cells as Dcell and increase it from 200 to 5,200 m.
Let us first compare the average performances for the three protocols using the proposed algorithm. As shown in Figure 8, when P T = 10 dbm, the proposed protocol P1 performs slightly better than P2 and P3. When P T = 40 dbm, P1 performs similarly as P2 and much better than P3. This illustrates that the proposed protocol is more effective than the other ones and moreover that there are more benefits of using the high-efficiency protocol than using SP. We then compare our proposed algorithm with the benchmark algorithms. Figure 9 reports the average sum rates for BA1, BA2, BA3, and the proposed algorithm, which are initialized by either UPA or BPA. As shown in Figure 9, for both initialization methods, the proposed algorithm (named as CA in the figure) results in similar sum rates after optimization. Regardless of the value of P T , the proposed algorithm outperforms all three benchmark algorithms. The improvements are especially significant when P T = 40 dbm.
Note that, in low SNR scenario, when P T = 10 dbm, the sum rates are similar regardless of the distance between two cells. This is because the CCI is already low even when  two cells are close to each other. However, in high SNR scenario, when P T = 40 dbm, the sum rates of all the algorithms increase as Dcell increases. This is because the CCI is quite high when two cells are close to each other, and it decreases as they move away from each other. Note that when Dcell = 5200 m, meaning two cells are quite far from each other, BA 1 performs similarly to our proposed algorithms. This is because BA 1 is near-optimal when the CCI is small.

Conclusion
We have considered a multi-cell OFDMA downlink system where the BS transmissions are aided by the cell specific RS. The adopted HSE protocol enables SP. Assuming a central controller, we have formulated the sum rate maximization problem under a per-cell total power constraint, for which an iterative RA algorithm has been proposed to find a local optimum in polynomial time. Through numerical experiments, the convergence and effectiveness of the  proposed algorithm, the impact of relay position and total power on the system performance as well as the benefits of using SP and the HSE protocol has been illustrated. In the future, combining schemes and multi-relay cases will be considered.