Adaptive hierarchical modulation and power allocation for superposition-coded relaying

We propose a relaying scheme based on superposition coding (SC) with discrete adaptive modulation and coding (AMC) for a three-node wireless relay system, based on half duplex transmission, where each node decodes messages by successive interference cancelation (SIC). Unlike the previous works where the transmission rate of each link is assumed to achieve Gaussian channel capacity, we design a practical superposition-coded relaying scheme with discrete AMC while taking into account the effect of decoding errors at each stage of the SIC process at each node. In our scheme, hierarchical modulation (HM) is used to create an SC message composed of one basic and one superposed message with optimized power allocation. We firstly introduce the proposed scheme without forward error correction (FEC) for high signal-to-noise ratio (SNR) region and provide the optimal power allocation between the superimposed messages. Next, we extend the uncoded scheme to incorporate FEC to overcome bad channel conditions. The power allocation in this case is based on an approximated expression of the bit error rate (BER). Numerical results show the performance gains of the proposed SC relaying scheme with HM compared to conventional schemes, over a large range of SNRs.


Introduction
Cooperative relaying is a recently developed technique that enables to enhance the overall performance of wireless communication systems in terms of transmission rates and coverage by exploiting spatial diversity [1,2]. In cooperative relaying systems, one or more relay nodes assist the data transmission from the source node to the destination node. Due to the broadcast nature of the wireless medium, relays can receive the signal sent by the source and, in turn, forward their processed version of the signal to the destination. At the destination, the signals received from the relays (and possibly from the source) are combined for improving the received signal-to-noise ratio (SNR). A number of cooperative relaying schemes have been proposed for half-duplex relays that either transmit or receive the signal but cannot do both at the same time. There are two major types of relaying protocols: amplifyand-forward (AF) and decode-and-forward (DF) [3]. In the AF protocol, relays amplify the received signal and transmit it to the destination. This protocol has advantages on complexity and cost due to its simple design.
However, noise is also enhanced by the relays. In the DF protocol, the relays decode the received signal before forwarding it to the destination. Although the relays require higher capabilities, this protocol can mitigate the effects of noise in the source to relay links. Moreover, the relays can employ adaptive modulation and coding (AMC) in the relay to destination links, which is one of the notable advantages of the DF protocol. In this paper, we consider the three-node wireless relay system composed of one source, one destination, and one relay as shown in Figure 1, where the relay operates according to DF protocol. For such a system, authors in [4][5][6] have proposed cooperative relaying schemes. In the cooperative diversity (CD) scheme of [4], the destination uses maximum ratio combining (MRC). In [5,6], the destination decodes the combined direct and relayed signals by using the loglikelihood ratio (LLR) for each bit, allowing the source and relay to use different modulation rates.
Recently, a cooperative DF relaying scheme based on superposition coding (SC) has been proposed in [7,8] for the three-node wireless relay system. SC was originally introduced in [9,10] for the broadcast channel, where multiple users are served simultaneously by a single source. The source superimposes the messages for the users with http://jwcn.eurasipjournals.com/content/2013/1/233 Figure 1 The three-node wireless relay system. a certain power allocation and broadcasts the resulting signal. Then, successive interference cancelation (SIC) is performed by each user, where the messages intended for users with weaker channel are decoded first, then subtracted from the received signal, prior to decoding their own message. It was shown in [9,10] that SC with SIC actually achieves the capacity of the Gaussian broadcast channel. In the scheme of [7,8], the source generates the SC message composed of one basic and one superposed message, both of which are destined to the destination, and transmits it in the first time slot with an adequate power allocation between the two messages. In the second time slot, after decoding both messages using SIC, the relay forwards only the superposed message to the destination, reducing the relay transmission time and enhancing the overall rate. Using the superposed message decoded correctly, the destination subtracts the contribution of the superposed message from the direct signal received in the first time slot and decodes the basic message from the resulting signal. It is shown that this SC scheme outperforms conventional relaying schemes by using optimum power allocation under the assumption that the transmission rate of each link achieves Gaussian channel capacity. However, practical wireless systems make use of discrete modulation and coding, in which case the analysis in [7] based on the assumption of Gaussian channels is not applicable. Although an example of implementation with discrete modulation was considered in [8], no power optimization nor channel coding was considered. Thus, the design of a new scheme with optimized power allocation and modulation/coding is required in order to benefit from SC relaying under such practical constraints.
In this paper, we propose a practical superpositioncoded relaying scheme using discrete AMC for the threenode wireless relay system. In order to create an SC message, we make use of hierarchical modulation (HM) [11,12]. HM is the technique adopted by digital TV broadcasting that enables to embed two independent bitstreams, a low-priority one and a high-priority one, using discrete modulations [11]. We first introduce the uncoded SC scheme of [13] with discrete HM, for which we provide here the full analysis for optimizing the power allocation between the two superimposed messages. Such an uncoded scheme can attain better performance over coded schemes in the high SNR regime, since the redundancy introduced by coding becomes useless whenever there are no errors. Next, we extend our uncoded SC scheme in [13] to incorporate FEC technique for overcoming bad channel conditions. In this case, due to the difficulty to derive a theoretical formula for the bit error rate (BER) of decoded messages, we make use of an approximate expression of the BER to perform power optimization. We consider the low-density parity-check (LDPC) code as it can provide a performance close to the Shannon limit [14,15], although other coding techniques can be considered as well. The extensive simulation results show the performance gains provided by the proposed uncoded and coded SC schemes with discrete HM, compared to reference CD schemes.
The paper is organized as follows: Section 2 presents the system model, and Section 3 introduces our protocol. In Sections 4 and 5, the throughput expressions of our scheme are derived, and the optimal power allocation, which maximizes the throughput, is analyzed for both uncoded and LDPC-coded cases. Section 6 presents the numerical results and discussions. Finally, conclusions and directions for future work are given in Section 7.

System model
We consider the wireless relay system which consists of three nodes: source S, relay R, and destination D which are shown in Figure 1.
Relay R is assumed to use DF protocol and to be half duplex, i.e., it cannot transmit and receive simultaneously. We assume separate power constraints at nodes S and R as in [3] a .
Thus, if node S transmits a vector of N symbols x = [x(1), . . . , x(N)] T , nodes D and R receive respectively. The vector of M symbols x R = [x R (1), . . . , x R (M)] T being transmitted from node R, and the received signal at node D is In (1) All channel coefficients h i (i ∈ {SD, SR, RD}) and hence instantaneous SNRs are assumed to be constant during each frame consisting of the two steps. However, they undergo flat Rayleigh fading from frame to frame. Node S is assumed to know γ SD and γ SR , and node R is assumed to know γ RD . Both S and R transmit signals with an average power E[|x(n)| 2 ] = 1 (n = 1, . . . , N). Finally, the bandwidth of each link is assumed to be normalized to 1.
Two types of transmissions will be considered: the uncoded and LPDC coded cases with the discrete modulations specified in Tables 1 and 2, respectively. In the uncoded case, the SNR thresholds for each modulation level are derived as in [16], for a target BER of 10 −3 . On the other hand, the SNR thresholds for the LDPCcoded case are derived based on the frame error rate (FER) approximation used in [17,18], for a target FER of 10 −3 .
In addition, the proposed protocol makes use of HM levels 2/4-QAM and 4/16-QAM in the first step, as explained in the next section.

Description of the proposed protocol
We introduce the steps of the proposed scheme based on [8] when modulations/coding in Tables 1, 2, and HMs 2/4-QAM 4/16-QAM are available.
The steps of the proposed scheme are as follows (see Figure 2): Step 1. Denote a basic message and a superposed message created from information bits at node S as u b and u s , respectively. Moreover, we define L as the number of bits in u b or u s . In the uncoded scheme, the components of u b and u s are mapped to the basic symbols x b (n) and the superposed symbols x s (n) (n = 1, . . . , N), respectively. In the coded scheme, the LDPC codewords c b and c s are generated by encoding u b and u s , respectively.  Then node S maps the LDPC codewords c b and c s to the basic symbols x b (n) and the superposed symbols x s (n) (n = 1, . . . , N), respectively. Using either 2/4-QAM or 4/16-QAM, node S generates the following symbol from the basic and superposed symbols where α denotes the power allocation parameter between the basic symbol x b (n) and the superposed symbol x s (n). If 2/4-QAM is used, x b (n) is an in-phase binary phase shift keying (BPSK) symbol, while x s (n) is a quadrature-phase BPSK symbol, giving the constellation of the superposed signal shown in Figure 3. Let r be the coding rate used at node S, we thus have L = rN. It is noted that r = 1 is equivalent to the uncoded case. We set α ∈ (0, 1 2 ] for 2/4-QAM due to symmetry. On the other hand, when 4/16-QAM is used, both x b (n) and x s (n) are independent QPSK symbols, giving the constellation shown in Figure 4. In this case, we have L = 2rN. We assume α ∈ (0, 1 2 ) so that each constellation point stays within the same quadrant as its corresponding basic symbol. Defining vectors of basic and superposed symbols as . , x s (N)] T , the received signal at node R is given by From y R , node R first decodes x b by treating √ αx s as noise and then obtains the basic message u b if there is no decoding error. Subtracting √ 1 − αx b from y R , we obtain from which R decodes x s and obtains the superposed message u s . On the other hand, node D receives (8) in the first step, and keeps it in memory.
Step 2. Given the link qualities, node R transmits T , the remodulated signal of u s correctly decoded in step 1. The received signal at node D is From y D2 , node D decodes x R and hence obtains u s . Node D cancels the contribution of x s from y D1 kept in memory in step 1, obtaining (10) from which D finally decodes x b , obtaining u b .

Average throughput
Here, we consider the proposed scheme without FEC. First, we derive the average throughput defined as the number of correct bits received at node D per unit symbol time. If at least one bit in the message is not decoded correctly, the whole message is discarded. We define probabilities Q Rb , Q Rs , Q Ds , and Q Db as follows: whereū kb ,ū ks , k ∈ {R, D} represent the basic and superposed messages decoded at node k, respectively; while u b and u s are the original basic and superposed messages, respectively. In the analysis, we assume that the L bits of the superposed message are correctly received at node D if the basic and the superposed messages have been correctly decoded at node R, and if u s has been correctly decoded at node D. However, in the simulations, all superposed messages will be forwarded to node D even if they were not correctly decoded at node R, since they may still be correctly decoded at node D. Still, it will be shown in Section 6 that the analytical throughput gives a valid approximation of the actual one. Thus, the expected number of bits that node D correctly receives is LQ Rb Q Rs Q Ds bits. With a similar calculation for the basic message, the expected number of bits that node D correctly receives after the two steps is equal to LQ Rb Q Rs Q Ds Q Db . Since node S transmits the signal of N symbols in step 1 and node R transmits the signal of M symbols in step 2, the duration of the two steps is N + M symbol times, giving the average throughput as

2/4-QAM
We derive the average throughput when node S selects 2/4-QAM in step 1. As both the basic and the superposed symbols are BPSK, we have L = N in (15). When node R decodes x b from y R , node R decides the basic symbols x b (n) (n = 1, . . . , N) through BPSK symbol detection. In 2/4-QAM, basic and superposed symbols are transmitted using in-phase and quadrature-phase, respectively. This means that Q Rb is the probability that the message is decoded correctly with BPSK at the SNR γ SR (1−α). Thus, denoting the probability that each symbol x b (n) (n = 1, . . . , N) is decoded correctly by q, we have where P BPSK (γ ) is the symbol error rate of BPSK in AWGN channel with SNR γ . P BPSK (γ ) is given by [14] where erfc(x) is the complimentary error function, Using (16), we obtain After cancelation of x b , node R decodes x s at the SNR γ SR α, giving In step 2, node R forwards the N bits to node D using discrete modulations. For clarity of exposition, we will only derive the analysis for QPSK and 16-QAM, since these are the most likely to be used as the relay-destination link is of high quality (otherwise, relaying schemes would be of no use). Moreover, the analysis for BPSK can be derived similarly. Thus, we have M = N 2 when QPSK is used, and M = N 4 for 16-QAM. Denoting by R the SNR threshold for switching between QPSK and 16-QAM (from Table 1, R = 17 dB), Q Ds is given by where P QPSK (γ ) is the symbol error rate of QPSK in AWGN channel with SNR γ and P 16QAM (γ ) is the symbol error rate of 16-QAM in AWGN channel with SNR γ given as [14] P As node D decodes x b in step 2 at SNR γ SD (1 − α), we obtain By substituting (19), (20), (21), and (24) into (15), the average throughput with 2/4-QAM denoted by

4/16-QAM
For 4/16-QAM, the average throughput can be derived in a similar way. As both the basic and the superposed symbols are QPSK, we have L = 2N in (15). Node R first decodes x b from y R by treating the components of x s as noise, which means that it performs common QPSK symbol detection. Here, it should be noted that unlike the case of 2/4-QAM, the interference from x s degrades the detection performance. If the basic symbol corresponds to the point B in the first quadrant in Figure 4, the resulting transmitted 4/16-QAM symbol is one of the points S 1 , S 2 , S 3 , or S 4 . Denoting the probabilities that point S j (j = 1, . . . , 4) is received in the first quadrant by q j and considering the symmetric property of 4/16-QAM constellation, we have As q 1 is the probability that both in-phase and quadrature-phase components of the point S 1 stay in the first quadrant, q 1 is given by Similarly, we obtain Substituting these probabilities q 1 , q 2 , q 3 , and q 4 into (27), we obtain Since node R decodes x s at the SNR αγ SR , we have In step 2, node R transmits the information of N bits to node D using either QPSK or 16-QAM. Thus, we have M = N when QPSK is used and M = N 2 when 16-QAM is used, and Q Ds is given by As node D decodes x b in step 2 at SNR γ SD (1 − α), we obtain By substituting (32), (33), (34), and (35) into (15), the average throughput with 4/16-QAM denoted by (37)

Optimizing the power allocation
The throughput of the proposed scheme depends on the power allocation parameter α, which is optimized next.

2/4-QAM
First, assuming that node S uses 2/4-QAM in step 1, the problem is to select α ∈ (0, 1/2] that maximizes the throughput R SC4 , given N, γ SD , γ SR , and γ RD . It can be observed that the value of α maximizing (25) and (26) does not depend on SNR γ RD , since the HM is restricted to 2/4-QAM here. Thus, without loss of generality, we assume γ RD ≥ R . By differentiating (26) with respect to α, we have where we define A Self-archived copy in Kyoto University Research Information Repository https://repository.kulib.kyoto-u.ac.jp http://jwcn.eurasipjournals.com/content/2013/1/233 Thus, G SC4 (α) = 0 is the necessary condition for an optimal α. We can prove that the solutions of the equation where lim x→c + f (x) represents the right-hand limit of a function f (x) as x approaches c. From (41) and (42), it can be seen that lim α→0 + G SC4 (α) > 0 and G 1 2 < 0, thus the intermediate-value theorem guarantees the existence of the solution in the considered range. Although it is difficult to prove the uniqueness of the solution for G SC4 (α) = 0 in 0 < α ≤ 1 2 in the proposed scheme, the value α * SC4 obtained by solving the equation G SC4 (α) = 0 numerically is used. The numerical results in Section 6 will show that α * SC4 can maximize the throughput R SC4 (α).

4/16-QAM
A similar analysis applies in the case where node S uses 4/16-QAM in step 1. Here, the problem is to select α ∈ (0, 1/2) that maximizes the throughput R SC16 , given N, γ SD , γ SR , and γ RD . Again, we can see that the value of α maximizing (36) and (37) does not depend on SNR γ RD , as the HM is restricted to 4/16-QAM. Thus, we may assume γ RD ≥ R without loss of generality. By differentiating (37) with respect to α, we have where we define For 0 < α < 1 2 , we have Thus, G SC16 (α) = 0 is the necessary condition for an optimal α. We can prove that the solutions of the equation G SC16 (α) = 0 always exist in 0 < α < 1 2 . By (44), we obtain A Self-archived copy in Kyoto University Research Information Repository https://repository.kulib.kyoto-u.ac.jp http://jwcn.eurasipjournals.com/content/2013/1/233 where lim x→c − f (x) represents the left-hand limit of a function f (x) as x approaches c. From (46) and (47), it can be seen that lim α→0 + G SC16 (α) > 0 and lim α→ 1 2 − G SC16 (α) < 0, thus the intermediatevalue theorem guarantees the existence of the solution in the considered range. Again, the proposed scheme chooses the value α * SC16 obtained by solving the equation G SC16 (α) = 0 numerically which is shown to maximize the throughput R SC16 in Section 6.

Average throughput
In this subsection, we consider the average throughput of our scheme with FEC. When LDPC code is used, it is very difficult to obtain the exact closed-form expressions of BER and FER for the proposed method. So, in this paper, we assume that bit errors are independent of each other within each frame and that each bit has the same BER. We employ the following FER approximation used in [17,18] to derive the throughput: where is the number of message bits and γ is the SNR. Parameters a and b in (48) are determined by fitting (48) to the FER curve obtained by simulations as explained in Section 6. Here, we consider the coding rates of 1 2 and 2 5 , although any other codes and rates can be similarly used in the proposed method. We adopt the code length of 64, 800 as in the Digital Video Broadcasting -Satellite -Second Generation (DVB-S2) standard. The fitting parameters a and b are shown in Table 3. The parameter b for QPSK is half of that for BPSK, since QPSK requires double SNR to achieve the same BER as BPSK. For the same reason, the parameter b for 16-QAM is obtained by dividing the parameter b for BPSK by 7.2. Thus, we can write the FER approximation for QPSK (16-QAM) by only using the parameters a and b for BPSK. We will show that these fitting curves provide a good approximation for the FER obtained by simulations in Section 6.

2/4-QAM
We derive the average throughput when node S selects 2/4-QAM in step 1. Here, to simplify the description, we assume that nodes S and R use identical LDPC code. Let r be the coding rate used at node S, giving L = rN in (15). We define A (r) = a and B (r) = b as parameters in Table 3 for BPSK modulation and coding rate r. For example, if r = 1 2 , A (r) = 3.11 × 10 12 and B (r) = 47.9. Since node R decodes the BPSK signal x b at the SNR γ SR (1−α), we have Since node R decodes the BPSK signal x s at the SNR γ SR α after canceling the interference from the signal x b , we have As for the uncoded case, we describe the analysis for the cases where QPSK or 16-QAM are used at node R in step 2 (the case for BPSK may be derived similarly). In this case, As node D decodes the BPSK signal x b in step 2 at the SNR γ SD (1 − α), we obtain By substituting (49), (50), (51), and (52) into (15), the average throughput for 2/4-QAM denoted byR (i) SC4 (α) for γ RD < R and byR (ii) SC4 (α) for γ RD ≥ R can be expressed asR Here, we require the following conditions for the probabilities (49), (50), (51), and (52) to have their value in the range of (0, 1 2 From the conditions (55) and (56), we have the following inequalities for α: Here, we assume that A (r) , B (r) , and γ SD satisfy the inequality 1 , which is true if γ SD > 0.7 dB when using A (r) and B (r) in Table 3.
Since the performance of our scheme has very limited gain when γ SD ≤ 0.7 dB, this is not a restrictive constraint. Thus with (58), we obtain the range of α,

4/16-QAM
For 4/16-QAM, the average throughput can be derived as follows. Here, L = 2rN in (15). Recall that node R decodes x b from the received signal y R with QPSK symbol detection in step 1, treating x s as noise. In the uncoded case, each bit of a basic message is mapped to one constellation point of QPSK symbol corresponding to the basic message, for example, to point B in Figure 4, for which the interference from x s deterministically results from points S 1 , S 2 , S 3 , or S 4 . However, in the coded case, each message bit is encoded by a generator matrix into a number of coded bits which are mapped into any 4/16-QAM constellation point across all symbols. Due to the difficulty of tracking all the constellation points corresponding to each message bit, we take a stochastic approach to approximate the impact of the interference from x s for each message bit as follows. When decoding a basic message bit, we can assume that the overall interference for the message bit results from the aggregation of a large number of interference components with power α, since the length of the LDPC codeword is very long. Therefore, from the central limit theorem, we can see the interference from x s as Gaussian noise with power of α. In this way, we can write Q Rb by using the fitting parameters A (r) and B (r) given by the FER approximation for LDPC-coded BPSK, and thus we have Since node R decodes the QPSK signal x s at SNR γ SR α, we have A Self-archived copy in Kyoto University Research Information Repository https://repository.kulib.kyoto-u.ac.jp http://jwcn.eurasipjournals.com/content/2013/1/233 Using R , Q Ds is given by As node D decodes the QPSK signal x b in step 2 at SNR (1 − α)γ SD , we obtain By substituting (60), (61), (62), and (63) into (15), the average throughput with LDPC-coded 4/16-QAM denoted byR Here, we require the follow the conditions for the probabilities (60), (61), (62), and (63) to have their values in the range of (0, 1 2 ]: From the conditions (66), (67), and (68), we have the following inequalities for α: Here, we assume that A (r) , B (r) , and γ SD satisfy , which is true if γ SD > 4 dB, which again is not a restrictive constraint. Moreover, γ SR satisfies the inequality . This inequality holds when γ SR > 6 dB, the region in which our scheme achieves significant gains. Thus with (70), (71), and (72), we get the region of α, (73)

2/4-QAM
First, assuming that node S uses LDPC-coded 2/4-QAM in step 1, the problem is to select α ∈ (0, 1/2] that maximizes the throughputR SC4 , given r, N, γ SD , γ SR , and γ RD . We assume γ RD ≥ R since the optimal value of α does not depend on SNR γ RD  where we definê Given C defined in (59), when C ≤ α ≤ 1 2 , we have Thus,Ĝ SC4 (α) = 0 is the necessary condition for an optimal α. We can prove that the solutions of the equation Although it is difficult to prove the uniqueness of the solution forĜ SC4 (α) = 0 in C ≤ α ≤ 1 2 , in the proposed scheme, the valueα * SC4 obtained by solving the equation G SC4 (α) = 0 numerically is used.

4/16-QAM
A similar analysis applies in the case where node S uses LDPC-coded 4/16-QAM in step 1. Here, the problem is to select α ∈ (0, 1/2) that maximizes the throughputR SC16 , given N, γ SD , γ SR , and γ RD . Here, we assume γ RD ≥ R . By differentiating (65) with respect to α, we have Based on D and E defined in (73), when D ≤ α ≤ E, we have Thus,Ĝ SC16 (α) = 0 is the necessary condition for an optimal α. We can prove that the solutions of the equationĜ SC16 A Self-archived copy in Kyoto University Research Information Repository https://repository.kulib.kyoto-u.ac.jp Again, the proposed scheme chooses the valueα * SC16 obtained by solving the equationĜ SC16 (α) = 0 numerically.
6 Numerical results

Reference schemes
We consider three alternatives to the proposed scheme when BPSK, QPSK, and 16-QAM are available for adaptive modulation: 1. Direct transmission. Node S transmits to node D directly without any help from node R. Node S chooses the modulation and coding rate to achieve the best throughput. 2. Multi-hop (MH) transmission. Only the relayed signal is considered at node D, not the direct one. After node R decodes the received signal from node S in step 1, it forwards in step 2 the remodulated signal given the SNR γ RD of link RD. 3. CD transmission. We consider the CD scheme proposed in [6], where different modulation rates can be used at node S and node R. In step 1, the signal transmitted by node S is received by both nodes R and D. At node R, the received signal is demodulated and retransmitted to node D in step 2 using the modulation adapted to the SNR γ RD of link RD. If node R uses the same modulation as in step 1, node D performs MRC of the signal received from node S in step 1 with the one received from node R in step 2 and decodes the combined signal. Otherwise, node D decodes the combined signal by using the LLR for each bit before decoding.

FER approximation
When we use half rate LDPC code and a modulation scheme among BPSK, QPSK, and 16-QAM, the FER obtained by simulations and the FER approximation (48) are compared with respect to γ in Figure 5. Here, the parameters a and b in Table 3 are used. The parameters a and b are determined as least square solutions by considering the shift and the slope of the simulated BER curve in logarithmic scale, respectively. It is shown that the fitting curve provides a good approximation for the simulated FER for each modulation. Figure 6 compares the expression (60) with the parameters a and b for BPSK in Table 3, namely, A (r) = 3.11 × 10 12 , B (r) = 47.9, r = 1 2 , and N = 64, 800, with the simulated FER of the basic message for various values of α and γ SR . We can observe a certain gap between the fitting curve and the simulated curve due to the Gaussian approximation of the interference from x s . However, since both the optimal power allocation parameter obtained by exhaustive search and the proposed power allocation parameterα * SC16 obtained numerically from Equation 80 have values less than 0.3, the gap between the approximated and simulated FER of the basic message is not so large. The throughput evaluations presented later show the validity of this approximation.

Performance comparison
Next, the proposed and reference schemes are evaluated in terms of average throughput for the uncoded and LDPC-coded cases under flat Rayleigh fading channels. For each link, instantaneous SNR levels with average denoted by γ avg i (i ∈ {SD, SR, RD}) are generated for each frame consisting of the two steps for the relaying schemes but are assumed to be constant during each frame. Figure 7 illustrates the throughput performance of the schemes for N = 500, γ avg SD = 10 dB, and γ avg RD = 19 dB. Legends SC-2/4QAM (exh.), SC-2/4QAM (num.), and SC-2/4QAM (const.) refer to the simulation curves of the proposed SC scheme using uncoded 2/4-QAM with α given by exhaustive search, the proposed SC scheme with the optimized α determined numerically as α *

SC4
in Section 4, and the proposed SC scheme with fixed α = 1 2 , respectively. Note that fixing α = 1 2 corresponds to a standard QPSK constellation, as considered in [8]. Then, SC-2/4QAM (ana.) refers to the analytical throughput derived in (25) to (26) with optimized α * SC4 . In this case, the throughput of the proposed SC scheme using 2/4-QAM outperforms the one using 4/16-QAM, which is why we omit the throughput curve of the latter case. Recall that in the analysis, node R only forwards the superposed message when it had been correctly decoded, while in the simulations, it is always forwarded regardless of its decoding result.
Finally, SC-2/4QAM (Gaus.) refers to the proposed SC scheme but using the optimal α determined in [8] that assumes Gaussian channel capacities.
We can find that the proposed method using the numerical solution α * SC4 for (39) closely approaches the throughput of the exhaustive searching case, although the uniqueness of the solution G SC4 (α) = 0 in (39) could not be proved. In addition, we see that the analytical throughput SC-2/4QAM (ana.) approaches well the simulated one SC-2/4QAM (num.), proving the validity of the derivations. Moreover, the poor performance of SC-2/4QAM (Gaus.) based on [8] shows the necessity of the proposed schemes and analysis for discrete HM. The proposed method with 2/4-QAM also outperforms the   Again, SC-4/16QAM (ana.) refers to the analytical throughput derived in (36) to (37) with optimized α * SC16 . Finally, SC-4/16QAM (Gaus.) refers to the proposed SC scheme but using the optimal α determined in [8] assumes Gaussian channel capacities.
We can observe that the proposed scheme, where α * SC16 is found numerically from (44) eter, as shown by the large improvement over the fixed SC method using the constant power allocation parameter, α = 1 5 . Again, the performance of SC-4/16QAM (Gaus.) based on [8] is poor, validating the proposed approach. Figure 9 shows the throughput performance of the schemes compared with respect to γ avg SD . When γ avg SD is lower, CD scheme outperforms other relaying schemes, while the direct transmission performs best for γ avg SD ≥ 18. The proposed SC scheme using 4/16-QAM can bridge the gap between those two extreme cases, i.e., for 10 ≤ γ avg SD ≤ 18. Thus, the proposed scheme is most beneficial when the direct link is worse than the relayed links, as in other relaying schemes, yet not too bad since it should be of sufficient quality for enabling the decoding of the basic message at node D.   Note that the best strategy in Figure 7 is to use the proposed SC scheme using 2/4-QAM with α * SC4 for 11 dB ≤ γ avg SR ≤ 18 dB, then CD scheme for γ avg SR > 18 dB. In Figure 8, we should use direct transmission for γ avg SR ≤ 18 dB, then the proposed SC scheme using 4/16-QAM with α * SC16 , for γ avg SR > 18 dB. In Figure 9, CD scheme should be selected for γ avg SD ≤ 10 dB, then our SC scheme using 4/16-QAM with α * SC16 for 10 dB < γ avg SD ≤ 18 dB, and finally the direct transmission for γ avg SD > 18 dB.   Figure 12 Throughput of proposed and benchmark schemes for N = 500, γ avg SR = 5 dB, γ avg RD = 8 dB.

LDPC-coded case
A Self-archived copy in Kyoto University Research Information Repository https://repository.kulib.kyoto-u.ac.jp http://jwcn.eurasipjournals.com/content/2013/1/233 both coding rates of 1 2 and 2 5 . The throughput of the proposed SC scheme with 4/16-QAM is omitted here as 2/4-QAM achieves a better rate. We can observe that the proposed scheme, whereα * SC4 is found numerically from (75), achieves much the same throughput as the scheme where α is obtained by exhaustive search in the region of interest (γ avg SR ≥ 2.5 dB), proving the validity of our solution. The slight decrease of the analytical throughput SC [LDPC,1/2]-2/4 QAM (ana.) compared to the simulated throughput of the proposed scheme SC [LDPC,1/2]-2/4 QAM (num.) is both due to the assumption in the analysis of only forwarding correctly decoded messages at the relay and to the fact thatα * SC4 is derived from the approximated FER expression.
It is also shown that the proposed method with 2/4-QAM outperforms the conventional relaying methods over most of the γ avg SR range for γ avg SD = 2 dB and γ avg RD = 8 dB, by using a coding rate of 2 [8] is extremely low, showing the necessity of our analysis in the LDPC-coded case, too. Figure 11 shows the throughput performance of the schemes for varying values of γ avg SR when γ avg SD = 5 dB and γ avg RD = 8 dB and coding rate of 1 2 . In this case, we obtain the numerical solution of (80)α * SC16 within the range of 0.25 to 0.29, while we obtain the values of the optimal power allocation obtained by exhaustive search from 0.21 to 0.29. Accordingly, we see that the opti-  Figure 11 is to perform direct transmission for 7 dB ≤ γ avg SR ≤ 8 dB, then the proposed SC scheme using 4/16-QAM withα * SC16 , for γ avg SR ≥ 8 dB. Figure 12 shows the throughput performance of the schemes with respect to γ avg SD > 0.7 dB to meet conditions (55) to (56), when r = 1 2 , γ avg SR = 5 dB and γ avg RD = 8 dB. Similar to the uncoded case, the proposed scheme fills well the gap between the conventional relaying schemes and the direct transmission. The proposed SC scheme using half rate LDPC-coded 2/4-QAM withα * SC4 should be chosen for 0.7 dB < γ avg SD ≤ 2.4 dB, followed by direct transmission for γ avg SD > 2.4 dB. Finally, we discuss about the required signaling overhead for implementing the proposed scheme in a practical system. Firstly, node S requires the channel state information feedback of the SR and SD links in order to optimize the power allocation parameter α * , as shown in, e.g., (39), while node R requires CSI feedback for the RD link. However, the reference CD scheme of [6] also requires the same amount of CSI feedback as it should adapt its AMC levels depending on the state of the three links. By then, the only additional information that is needed by the proposed scheme is the knowledge of the actual power allocation parameter α which should be sent from S to nodes R and D in step 1. Note that the proposed scheme may still be implemented without this overhead by fixing α (corresponding to the const. curves in the figures), but significant gains in throughput are achieved at the expense of some extra bits for control.

Conclusions
Considering a three-node wireless relay system with and without FEC, we have proposed an SC-based scheme using discrete HM. We have derived the achievable throughput by analysis, taking into account the decoding errors occurring during SIC. Although a closed-form expression of the optimal power allocation parameter could not be derived, we have obtained the necessary conditions of the optimal power allocation, providing the numerical solution used in the proposed scheme. The simulation results show that the proposed scheme closely approaches the throughput performance obtained by exhaustive search for the optimal power allocation, validating our approach. Moreover, it is shown that over a large range of SNRs, the proposed scheme outperforms the conventional Direct, MH and CD schemes. Thus, relaying based on practical SC is proved to be very effective in a wireless relay system with various discrete modulations and FEC codings.
In the future work, we will extend the proposed SC scheme with discrete HM and coding for serving multiple relayed users in the context of scheduling in cellular relay systems.
Endnotes a Although a total power constraint may be considered for studying the entire system efficiency, setting separate powers reflects the practical system constraints as S and R are distinct entities, each with a given power.
b Although there is a certain gap for γ avg SR < 18 dB, this region is not of interest since the best rate is achieved by direct transmission.