Bounds on the Capacity of the Relay Channel with States at the Source

—We consider a state-dependent full-duplex relay channel with the channel states available non-causally at only the source, i.e., neither at the relay nor at the destination. For the discrete memoryless (DM) case, we establish lower and upper bounds on the channel capacity. The lower bound is obtained by a coding scheme at the source that uses a Gel’fand-Pinsker like binning scheme at the source, and is based on partial decode-and-forward at the relay. The upper bound improves upon that obtained by assuming that the channel state is available at the source, the relay, and the destination. For the Gaussian case also, we establish lower and upper bounds on the capacity. The lower bound is obtained by a coding scheme that consists in a superposition of generalized dirty paper coding (GDPC) and standard DPC at the source, and partial decode-and-forward at the relay. In this case also, the upper bound is better than that obtained by assuming that the channel state is available at the source, the relay, and the destination. For the general Gaussian RC and the degraded Gaussian RC, the lower bound and the upper bound meet, and so give the capacity, for some extreme cases, and so the capacity is obtained for these cases.


Introduction
In this work, we consider a state-dependent three-terminal full-duplex relay channel (RC) in which the outputs Y 2 at the relay and Y 3 at the destination are controlled by the channel inputs X 1 from the source and X 2 from the relay, along with a random parameter S that represents the channel state, through a given conditional probability W Y2,Y3|X1,X2,S . The channel state S is generated according to a given memoryless probability law Q S , and is known, in a noncausal manner, to only the source-for instance, the relay and the destination do not know the channel states. The considered channel model is shown in Figure 1. In this model, the source wishes to transmit a message W to the destination through the statedependent RC in n channel uses, with the help of the relay. The destination estimates the message sent by the source from the received channel output. In this work, we study the capacity of this communication model. We refer to this model as RC with informed source.

Background and Related
Work. Channels whose probabilistic input-output relation depends on random parameters, or channel states, have spurred much interest and can model a large variety of problems, each related to some physical situation of interest. The random states sequence may be known in a causal or noncausal manner. For single user models, the concept of channel states available at only the transmitter dates back to Shannon [1] for the causal channel state case, and to Gel'fand and Pinsker [2] for the noncausal channel state case. In [3], Heegard and El Gamal study a model in which the state sequence is known noncausally to only the encoder or to only the decoder. They also derive achievable rates for the case in which partial channel state information (CSI) is given at varying rates to both the encoder and the decoder. In [4], Costa studies an additive Gaussian channel with additive Gaussian state known at only the encoder, and shows that Gel'fand-Pinsker coding with a specific auxiliary random variable, widely known as dirty paper coding (DPC), achieves the channel capacity. Interestingly, in this case, the DPC removes the effect of the additive channel state on the capacity as if there were no channel state present in the model or the channel state were known to the decoder as well. For a comprehensive review of state-dependent channels and related work, the reader may refer to [5].
For multiuser channels, different state-dependent channel models under different setups are investigated in the literature. One key element in the study of state-dependent multiuser channels is whether the parameters controlling the channel are known to all or only some of the users in the communication model. For example, the broadcast channel (BC) with states available at the transmitter is studied in [6,7], and the multiaccess channel (MAC) with partial state information at all encoders and full state information at the decoder is studied in [8].
In the Gaussian setup, the result on the property that a known additive state does not affect capacity as long as a full knowledge of this state is available at the transmitter, which was originally shown by Costa for a single-user channel, is shown to continue to hold for a number of multiuser channels, including the Gaussian BC [6], the Gaussian MAC [6], the physically degraded relay channel (RC) [9] and the physically degraded relay broadcast channel (RBC) [10]. For these channels, the key feature in enabling complete mitigation of the interference is the symmetric availability of the state at all the encoders. If the state is available at only some encoders (i.e., in the asymmetric case), complete mitigation of the interference is difficult to hope for and, in general, one has to expect some rate penalty due to the lack of knowledge of the state at the noninformed encoders. For example, the state-dependent multiaccess channel (MAC) with only one informed encoder is considered in [11][12][13][14][15][16][17] and the state-dependent relay channel (RC) with only informed relay is considered in [18,19]. For all these models, in the Gaussian case, the informed encoder applies a slightly generalized DPC (GDPC) [11,13] in which the channel input random variable and the channel state random variable are negatively correlated. Also, in these models, the uninformed encoders benefit from the GDPC applied by the informed encoders because the negative correlation between the codewords at the informed encoders and the channel state can be interpreted as partial state cancellation. For the state-dependent MAC with one informed encoder and one noninformed encoder, and with the former sending both a common message and a private message and the latter sending only the common message, the capacity region for the Gaussian case is obtained by deriving a nontrivial outer bound that permits to characterize the rate loss due to not knowing the state at the noninformed encoder [15,16]. Two upper bounds that use similar bounding techniques, but with different proofs, are derived in [17] for another case of degraded message sets for the state-dependent MAC with one informed encoder which is obtained by swapping the roles of the encoders in [15,16], and in [18,19] for the case of state-dependent RC with informed relay. The statedependent MAC studied in [17] has some connection with the model for the RC with informed relay studied in [18,19], when considering transmission from the source and the relay to the destination, that is, the multiaccess part of the relay channel. Figure 1 may find application in cooperative information embedding and data hiding [20][21][22][23][24], fading in time-varying cooperative wireless channels and interfering signals in interference environments. The channel state may also model the information gained by some specific terminals by means of cognition in certain cognitive systems [25][26][27]. In this application, one common assumption, made for example, in [25,26], is that some transmitters can know the signals that are sent by some other transmitters noncausally; and, so, they can then use that knowledge to increase the system spectral efficiency. The problem of collaborative signal transmission in presence of interference and with some cognitive terminals is studied, for example, in [28] for an interference channel with a cognitive transmitter or degraded message sets, and in [29] for a statedependent cognitive interference channel. Interference relay channels and relay strategies for relaying in presence of an outside interference are also studied, for example, in [30][31][32]. In the model we study, the source may model a cognitive radio that cooperates with the relay, as shown in Figure 2.

Main Contributions.
For the discrete-memoryless (DM) case, we derive a lower bound on the capacity of the general state-dependent RC with informed source. This lower bound is obtained by a coding scheme that uses techniques of rate-splitting at the source, regular encoding sliding-window decoding [33] for decode-and-forward (DF) relaying [34,Theorem 4] and a Gel'fand-Pinsker like binning scheme. In this coding scheme, the relay decodes only partially the information sent by the source, that is, Partial-DF [35]. By Figure 2: Example relay channel with asymmetric cognition capabilities. Only the source T 1 knows the interference from the competing source T 0 . The source helps the relay T 2 to cancel the effect of the interference. specializing this achievability result to the case in which the relay decodes all the information sent by the source, we readily obtain a lower bound for the case in which the relay operates in Full-DF [34].
Furthermore, we also consider memoryless Gaussian models in which each of the relay node and the destination node experiences on its link an additive Gaussian outside interference in addition to additive Gaussian noise. The interferences are known noncausally to only the source, and play the role of additive channel states. We first focus on the model in which the links to the relay and to the destination are corrupted by the same interference. For this model we derive a lower bound on the channel capacity, based on a coding scheme that combines techniques of generalized DPC [11,13] and partial decode-and-forward relaying. Then, we focus on the case of independent interferences. For this case, we obtain a lower bound on the channel capacity by using a coding scheme that combines carefully the techniques of carbon copying onto dirty paper (CC) [36], interference reduction at the source, and decode-and-forward relaying. We also discuss a case of correlated interferences.
For the Gaussian model with independent interferences studied in this paper, we mention that a major difference from the Gaussian multicast problem studied in [36] is that, here, one of the two receivers (the relay) also helps the other receiver (the destination) by relaying the information intended to it from the source. Also, the carbon copying onto dirty paper technique that we employ as part of our coding scheme differs from the CC in [36] in that (i) the codeword sent by the informed transmitter is (negatively) correlated with the channel state, and (ii) the noise terms at the two receivers have different variances. We refer to the employed CC with negative correlation between the codewords at the source and the channel state as a generalized carbon copying onto dirty paper (GCC). We show that our lower bound is larger than the one that is obtained by a similar combination of regular CC and decode-and-forward relaying. The allowed improvement is due to that the proposed GCC improves upon regular CC through the allowed negative correlation, just in the same way GDPC improves upon regular DPC.
Finally, we mention that in this work the established lower bounds are compared to the well-known max-flow min-cut bound or cut-set bound. In Section 3.2, we will comment onto why deriving nontrivial upper bounds on the channel capacity for the present scenario is a challenging task, and is more involved than for the seemingly similar informed relay scenario [18,19] or the related MAC with asymmetric CSI and degraded message sets [12,16,17].

Outline and
Notation. An outline of the remainder of this paper is as follows. Section 2 describes the communication model that we consider in this work. Section 3 provides lower bounds on the capacity of the general DM RC with informed source. Section 4 provides lower bounds on the capacity of the Gaussian RC with informed source. This section also contains some discussions as well as some numerical results for illustration purposes. Finally, Section 5 concludes the paper.
We use the following notations throughout the paper. Upper case letters, for example, X, are used to denote random variables; lower case letters, that is, x, are used to denote realizations of random variables; and calligraphic letters, that is, X, designate alphabet sets. The probability distribution of a random variable X is denoted by P X (x). Sometimes, for convenience, we write it as P X . The shorthand notation X j i indicates a sequence of random variables (X i , X i+1 , . . . , X j ) and x j i denotes a particular realization of a random sequence X j i . For convenience the n-vector x n will occasionally be denoted by the boldface notation x as well. We use the notation E X [·] to denote the expectation of random variable X. The set of probability distributions defined on an alphabet X is denoted by P (X). A probability distribution of a random variable Y given X is denoted by P Y |X . The Gaussian distribution with mean μ and variance σ 2 is denoted by N (μ, σ 2 ). Finally, throughout the paper, the logarithm function is to base 2, and the complement to unity of a scalar u ∈ [0, 1] is denoted by u, that is, u = 1 − u.

System Model and Definitions
In this section, we formally present our communication model and the definitions related to it. As shown in Figure 1, we consider a state-dependent relay channel denoted by W Y2,Y3|X1,X2,S whose outputs Y 2 ∈ Y 2 and Y 3 ∈ Y 3 for the relay and the destination, respectively, are controlled by the channel inputs X 1 ∈ X 1 from the source and X 2 ∈ X 2 from the relay, along with a random state parameter S ∈ S. It is assumed that the channel state S i at time instant i is independently drawn from a given distribution Q S and the channel states S n are noncausally known at the source.
The source wants to transmit a message W to the destination with the help of the relay, in n channel uses. The message W is assumed to be uniformly distributed over the set W = {1, . . . , M}. The information rate R is defined as log M/n bits per transmission.
An (M, n) code for the state-dependent relay channel with informed source consists of an encoding function at the source a sequence of encoding functions at the relay Let an (M, n) code be given. The sequences X n 1 and X n 2 from the source and the relay, respectively, are transmitted across a state-dependent relay channel modeled as a memoryless and time invariant channel in the sense that for all i = 1, . . . , n.
The destination estimates the message sent by the source from the channel output Y n 3 . The average probability of error is defined as P n e = Pr[ψ n (Y n 3 ) / = W]. An ( , n, R) code for the state-dependent RC with informed source is an (2 nR , n)-code (φ n 1 , φ n 2 , ψ n ) having average probability of error P n e not exceeding . A rate R is said to be achievable if there exists a sequence of ( n , n, R)-codes with lim n → ∞ n = 0. The capacity C of the state-dependent RC with informed source is defined as the supremum of the set of achievable rates.
The channel is said to be physically degraded if the conditional distribution W Y2,Y3|X1,X2,S factorizes as

The Discrete Memoryless RC with Informed Source
In this section, we assume that the alphabets S, X 1 , X 2 , Y 2 , Y 3 in the model are all discrete and finite.

Lower Bound on Channel
where U ∈ U and U 1 ∈ U 1 are auxiliary random variables with respectively.
The following theorem provides a lower bound on the capacity of the state-dependent DM RC with informed source.

Theorem 1. The capacity of the state-dependent DM RC with informed source is lower-bounded by
where the maximization is over all probability distributions P S,U,U1,X1,X2,Y2,Y3 ∈ P .
Proof of Theorem 1. A formal proof of Theorem 1 with complete error analysis is given in Appendix A. We now give a description of a random coding scheme which we use to obtain the lower bound given in Theorem 1. This coding scheme is based on a combination of the techniques of ratesplitting [37], regular-encoding sliding-window decoding for DF [33], and a variation of Gel'fand-Pinsker binning.
We split the message W to be transmitted into two independent parts, W = (W r , W d ), where W r is sent through the relay at rate R r and W d is sent directly to the destination at rate R d . The total rate is then R = R r + R d . Transmission is performed over B + 1 blocks, each of length n. During each of the first B blocks, the source encodes a message w i = (w r,i , w d,i ) and sends it over the channel, where w r,i ∈ [1, 2 nRr ], w d,i ∈ [1, 2 nRd ] and i = 1, . . . , B denotes the index of the block. During the last block, the source sends w r,B+1 = 1 and some w d,B+1 ∈ [1, 2 nRd ]. For fixed n, the average rate R d + R r (B/(B + 1)) over B + 1 blocks approaches R as B → +∞.
We generate two statistically independent codebooks (codebooks 1 and 2) by following the steps outlined below twice. We will use these codebooks for blocks with odd and even indices, respectively.
(2) For each x 2 (w r ), we generate a collection of JM i.i.d. auxiliary codewords {u(w r , w r , j)} at the source, indexed by w r = 1, . . . , M, j = 1, . . . , J. Each codeword u(w r , w r , j) is with i.i.d. components given x 2 (w r ) drawn according to P U|X2 . For notational convenience, we sometimes denote u(w r , w r , j) as u w r ,wr , j . For the collection {u w r ,wr , j } of auxiliary codewords for given x 2 (w r ), w r indexes bins and j indexes sequences within a particular bin.  Relay codewords Source codewords 4 , j 1 [4] (u(w r,3 , 1, j [4])) (3) For each x 2 (w r ), for each u w r ,wr , j , we generate a collection of J 1 M 1 i.i.d. codewords {u 1w d , j1 (u w r ,wr , j )} at the source, indexed by w d Each codeword u 1w d , j1 (u w r ,wr , j ) is with i.i.d. components given (x 2 (w r ), u w r ,wr , j ) drawn according to P U1|U,X2 . For the collection of auxiliary codewords {u 1w d , j1 (u w r ,wr , j )} for given u w r ,wr , j , w d indexes bins and j 1 indexes sequences within a particular bin.

Encoding.
The encoders at the source and the relay encode messages using codebook 1 for blocks with odd indices, and codebook 2 for blocks with even indices. This is done because some of the decoding steps are performed jointly over two adjacent blocks, and so having independent codebooks makes the error events corresponding to these blocks independent and their probabilities easier to evaluate.
We denote by (w r,k , w d,k ) the message pair to be sent from the source node at the beginning of block k = 1, . . . , B + 1. We pick up the story in block i − 1, i ∈ {0, . . . , B}. First, let us assume that the relay has decoded correctly message w r,i−1 and the destination has decoded correctly both messages w r,i−2 and w d,i−2 , at the end of block i − 1. We will show that our code construction allows the relay to decode correctly message w r,i and the destination to decode correctly both messages w r,i−1 and w d,i−1 at the end of block i (with a probability of error ≤ ). Thus, the information state (w r,i−1 , w d,i−1 ) propagates forward and a recursive calculation of the probability of error can be made, yielding a probability of error ≤ (B + 1) .
Continuing with the strategy, let s[i] and (w r,i , w d,i ) be the state sequence in block i and the new message pair to be sent at the beginning block i, respectively. At the beginning of block i the relay knows w r,i−1 and sends x 2 (w r,i−1 ). The source looks in the bin indexed by w r,i for the smallest j ∈ For convenience we list the codewords at the source and the relay that are used for transmission in the first four blocks in Table 1 .

Decoding.
Decoding is based on a combination of joint typicality and sliding-window. The decoding procedures at the end of block i are as follows.
(1) The relay knows w r,i−1 and declares that w r,i is sent if there is a unique w r,i ∈ {1, . . . , M} such that u wr,i−1, wr,i, j is jointly typical with y 2 [i] given x 2 (w r,i−1 ) for some j ∈ {1, . . . , J}, where y 2 [i] denotes the output sequence at the relay in block i. One can show that the decoding error in this step is small for sufficiently large n if (2) The destination knows w r,i−2 and decodes w r,i−1 and w d,i−1 based on the information received in block i − 1 and block i. It declares that the pair ( w r, are jointly typical and x 2 ( w r,i−1 ) is jointly typical with y 3 [i]. One can show that the decoding error in this step is small for sufficiently large n if The analysis of the probability of error of this scheme, the details about how (9) is obtained from (11) and (12) as well as the proof that the rate (9) is not altered if the sizes of the alphabets of the auxiliary random variables U and U 1 are restricted as in (8a), and ,(8b), are given in Appendix A.
The rate in Theorem 1 requires the relay to decode only one part of the message sent by the source. It readily specializes to the case in which the relay decodes the message sent by the source fully, that is, full-DF. This can be obtained by sending only the message W r in the above coding scheme, that is, R d = 0. The result is stated in the following corollary.

EURASIP Journal on Wireless Communications and Networking
where the maximization is over all probability distributions of the form and U ∈ U is an auxiliary random variable with The formal proof of Corollary 2 is similar to that of Theorem 1 and, hence, it is omitted for brevity.
We close this section by noting that, here, we focused on the case in which the relay decodes and forwards (either fully or partially) the source message. The relay can employ other relaying schemes to assist the source, such as estimate-andforward [34], amplify-and-forward [38][39][40] or combinations of these schemes. However, in general, none of these schemes truly extracts the potential benefits of cooperation even in the standard case in which the channel is state-independent, as no one of these schemes outperforms all the others (in terms of achievable rate) for all possible choices of channel parameters. The analysis of these alternative schemes in the context of the considered RC with informed source is beyond the scope of this work.

Comments on Upper Bounding Techniques.
As we indicated in the introduction section, the model for the RC with informed source that we study in this paper seemingly exhibits some similarities with the RC with informed relay considered in [18,19], and it also connects with the MAC with asymmetric CSI and degraded message sets [12,16,17]. In [18,19], the authors derive a nontrivial upper bound on the capacity of the RC with informed relay, that is, one which is strictly tighter than the cut-set bound. A similar upper bounding technique is developed for multiaccess channels with asymmetric CSI and degraded message sets in [15][16][17].
However, establishing a nontrivial upper bound for the present scenario is more involved, comparatively. Partly, this is due to the following reason. In the two models mentioned above, the uninformed encoder transmits a codeword which is function of only the message to transmit. For the present scenario however, it is potentially possible for the uninformed encoder (the relay) to get some information about the channel states from the past received sequence from the informed encoder (the source). That is, at time i, the input of the relay X 2,i can potentially depend on the channel states through depends on the channel states in a noncausal manner (through the source codeword X 1, j (W, S n )), and not only through the current state S i , so does the input of the relay, potentially.
A trivial upper bound on the capacity of the general DM RC with informed source is obtained by assuming that the channel states are also available at the source and the destination, that is, the max-flow min-cut bound or cut-set bound, where the maximization is over all distributions of the form

The Gaussian RC with Informed Source
In this section, we consider a state-dependent three-terminal full-duplex RC in which the channel states and the noise are additive and Gaussian. Furthermore, we extend the discrete memoryless model considered in Section 3 to accommodate two state sequences; one state sequence affects the transmission to the relay and the other state sequence affects the transmission to the destination. In this model, the channel states model additive Gaussian interferences which are assumed to be known (noncausally) to only the source. We first consider a scenario in which transmission to the relay and to the destination are corrupted by the same interference.
We derive a lower bound on the channel capacity for this scenario in Section 4.2. Then, we consider the case of two independent interferences in Section 4.3. In Section 4.4 we discuss a case of correlated interferences.

Channel Model.
For the state-dependent Gaussian RC, the channel outputs Y 2,i and Y 3,i at time instant i for the relay and the destination, respectively, are related to the channel input X 1,i from the source and X 2,i from the relay, and the channel states S 2,i and S 3,i by where S 2,i models the interference on the link to the relay and S 3,i models the interference on the link to the destination. The channel states S 2,i and S 3,i are zero mean Gaussian random variables with variance Q, and only the source knows the states sequences S n 2 and S n 3 (noncausally). The noises Z 2,i and Z 3,i are zero mean Gaussian random variables with variances N 2 and N 3 , respectively, and are mutually independent and independent from the states sequences (S n 2 , S n 3 ) and the channel inputs (X n 1 , X n 2 ). We consider the following individual power constraints on the average transmitted power at the source and the relay The definition of a code for this channel is the same as that given in Section 2, with the additional constraint that the channel inputs should satisfy the power constraints (19). For convenience we define the following functions R 1 (·), R 2 (·) and σ(·) which we will use throughout the remaining sections.
EURASIP Journal on Wireless Communications and Networking 7 Definition 2. Let where

Case of One Interference.
In this section, we let S 2,i = S 3,i = S i in (18), that is, we consider the case in which the relay and the destination are corrupted by the same interference: The results obtained in Section 4 for the DM case can be applied to memoryless channels with discrete time and continuous alphabets using standard techniques [41]. We use the lower bound established in Theorem 1 to compute a lower bound on the channel capacity of the Gaussian model (21).
The following theorem provides a lower bound on the channel capacity of the model (21).

Proof. A formal proof of Theorem 3 is given in Appendix B.
An informal proof is as follows. As we outlined after Theorem 1, we decompose the message W to be sent from the source into two parts W r and W d . The input X n 1 from the source is divided accordingly into two independent parts, that is, X n 1 = X n 1r + X n 1d , where X n 1r carries message W r and has power constraint nP 1r and X n 1d carries message W d and has power constraint nP 1d , with P 1 = P 1r + P 1d . The message W r is sent through the relay at rate R r and the message W d is sent directly to the destination at rate R d . The total rate is R G = R r + R d . Since message W d is to be decoded by only the destination, it is precoded against the interference on the link to the destination, using a standard DPC. The message W r , however, experiences the same interference S n but different noise terms on its way to the relay and to the destination, and it is precoded against the interference S n through a GDPC. The GDPC can be interpreted as a partial cancellation of the interference, by the source for the relay, combined with standard DPC [11,13]. The relay benefits from this cancellation and can then transmit more reliably to the destination, and so the source benefits in turn.
More formally, we decompose the source input random variable X 1 as where X 1r is zero mean Gaussian with variance P 1r , is independent from X 1d and is correlated with both the relay input X 2 and the state S, with E[X 1r X 2 ] = ρ 12 P 1r P 2 and E[X 1r S] = ρ 1s P 1r Q, for some ρ 12 ∈ [−1, 1], ρ 1s ∈ [−1, 1]; and X 1d is zero mean Gaussian with variance P 1d , and is independent from both the relay input X 2 and the state S.
(Note that X 1d,i also can be chosen to be negatively correlated with the state S i . The rate achievable in this case can be obtained in a straightforward manner from the analysis in Section 4.2.) For the GDPC, we choose the random variable U as for some α ∈ R. For the standard DPC, we choose the random variable U 1 as Let ξ := 1 − ρ 2 12 − ρ 2 1s and ρ := ρ 1s . As it will become clear from the proof in Appendix B, allowable values of the correlation coefficients ρ 12 and ρ 1s are such that the vector (S, X 1r , X 1d , X 2 , Z 2 , Z 3 ) has a nonnegative discriminant, that is, for nonzero P 1r , P 1d , Q, and all values of α such that R 1 (P 1r , Q, N 2 + P 1d , ξ, ρ, α) and R 1 (P 1r , Q, N 3 + P 1d , ξ, ρ, α) are nonnegative real. The result in Theorem 3 readily specializes to the case of full DF at the relay.

Corollary 4. The capacity of the model (21) for the statedependent Gaussian RC with informed source is lower-bounded by
where the functions R 1 (·) and R 2 (·) are defined as in Definition 2 and the maximization is over parameters

Case of Two Independent
Interferences. In this section we consider a model in which the links source-to-relay and source-to-destination are corrupted by independent interferences, that is, the model (18) of Section 4.1, with S 2,i and S 3,i being mutually independent zero mean Gaussian random variables with variance Q.
It is interesting to observe that if the relay operates in a decode-and-forward scheme the channel model (28) has a (high-level) connection with the Gaussian multicast model studied in [36], in that the signal sent by the informed transmitter (i.e., the source) experiences different interferences on its way to the two receivers. However, a major difference from [36] is that, here, one receiver (the relay) also helps the transmitter by relaying the information to the other receiver, that is, a relay setup. Also, unlike the model in [36], the noise terms at the receivers have different variances.
The following theorem provides a lower bound on the channel capacity of the model (28). The coding scheme that we use to establish this theorem builds upon the scheme of [36] (named carbon copying onto dirty paper (CC) therein), and it also generalizes it as will become clear from the proof.

Theorem 5. The capacity of the channel model (28) is lowerbounded by
where N 2 = N 2 + P 1d + Q/2, N 3 = N 3 + P 1d + Q/2 and the maximization is over parameters: Proof. A formal proof of Theorem 5 is given in Appendix C.
In the proof, we develop a coding scheme that uses decodeand-forward relaying and a carbon copying like scheme.
As it can be seen from the proof, the applied CC scheme has two differences from [36]. First, the signal sent by the informed transmitter is correlated with the channel state. Second, the noise terms at the two receivers are with different variances. One direct consequence of the latter dissimilarity is that, in our case, for the transmission of the message that is sent to the two receivers at the same time, one cannot derive an optimal choice for Costa's parameter (i.e., one that permits to remove the effect of the interference simultaneously for the two links via one single DPC as in [36]). (This explains why Costa's parameter α a is left to be optimized over in the encoding step (C.1) in Appendix C, as opposed to in [36] where the choice of Costa's parameter is optimal for both receivers.) We outline the coding scheme in the following.

4.3.1.
Outline of the Coding Scheme. Let S n a := (S n 2 + S n 3 )/2 and S n d := (S n 2 − S n 3 )/2. The states S n a and S n d are mutually independent and they can be used to represent the interferences on the link to the relay and the link to the destination, as We denote by T the total transmission time. We divide this time into two transmission periods of duration νT and νT, respectively, with 0 ≤ ν ≤ 1. Also, we decompose the massage W to be transmitted from the source into two submessages W a and W d . At time i, the input X 1,i from the source is divided accordingly into two independent parts, that is, X 1,i = X 1a,i + X 1d,i , where X 1a,i carries message W a and has power constraint P 1a and X 1d,i carries message W d and has power constraint P 1d , with P 1 = P 1a + P 1d . The message W a will be sent to both the relay and the destination at the same time, during the two periods νT and νT; and it is precoded against the interference S a,i . The precoding for message W a is performed through a GDPC which is similar to that in Section 4.2. The message W d will be sent to only the relay during the period νT and to only the destination during the period νT. Hence, it is precoded against the interference on the link to the relay during the period νT and against the interference on the link to the destination during the period νT. The precoding for message W d is performed through standard DPC. The relay decodes-and-forwards the message W a during both transmission periods and also decodes-andforwards message W d during the period νT.
During the period νT, the relay sends a superposition of two independent Gaussian codewords, X 2,i = X 2a,i + X 2d,i , where codeword X 2a,i enhances the transmission of message W a by the source to the destination and codeword X 2d,i carries message W d to the destination. (Note that during this period the destination decodes message W d from only the transmission by the relay. For instance, the decoder at the destination treats the codeword X 1d,i from the source as part of the noise, as it can be seen from the decoding procedure in Appendix C.) Thus, the relay shares power between X 2a,i and X 2d,i , that is, X 2a,i is sent with power constraint P 2a and X 2d,i is sent with power constraint P 2d , with P 2a + P 2d ≤ P 2 . During the period νT, the relay sends only X 2a,i , with power constraint P 2 . In this case, the destination obtains message W d from the direct transmission from the source. A block diagram of the communication protocol is shown in Figure 3.

Remark 1. The rate of Theorem 5 includes the rates of the following schemes.
(i) full-DF with precoding against the interference S n a using a GDPC; This is obtained by putting ν = 0, P 1d = 0 in (29).
(ii) partial-DF with precoding for the message sent through the relay against the interference S n a using a GDPC, and precoding for the additional message to be decoded only by the destination against the interference on the link to the destination using a standard DPC; This is obtained by putting ν = 0 in (29). (iii) time-sharing DPC between the relay and the destination; This is obtained by setting P 1a = P 2a = 0 during both transmission periods.
Remark 2. So far, we have assumed that the parameters P 1a , P 1d and (ρ 12 = 1 − ξ − ρ 2 , ρ 1s = ρ, α a ) are identical for the two periods νT and νT. One can obtain larger achievable rates if one allows the source to send with different powers and use different parameters (ρ 12 , ρ 1s , α a ) for the two periods. However, the numerical computation of the obtained rates becomes tedious in this case.

Extreme Cases.
We now focus on the behavior of the above bounds in some trivial extreme cases.
(1) In the limit of strong interference (i.e., Q → ∞), the lower bound (29) reduces to which can also be achieved using a standard DPC at the source and keeping the relay off.  (2) For Q = 0, the lower bound (29) reduces to the one achievable with partial DF in the standard interference-free Gaussian RC. Also, in this case, putting P 1d = 0 in (29) we get that the resulting lower bound meets with the cut-set bound for the degraded Gaussian case, and yields which is the capacity of the standard degraded Gaussian RC [34,Theorem 5].
(3) If P 2 = 0, the capacity of the model (28) is given by In this case, the effect of the interference on the link to the destination is removed by a regular DPC at the source.

Discussion
where h 2 and h 3 may model known channel coefficients in fading environments. Proceeding similarly to the model (28) in Section 4.3, let S (k) a = h a S (k) and S (k) It is interesting to note that the coding scheme that we used to establish Theorem 5 does not apply directly to the model (35) because the noise will be correlated with both the channel input and the CSI in the steps in which S (k) d , k = 1, 2, is treated as part of the noise and (S (k) a ) n as CSI at the encoder. For example, observe that, for the model (35), treating S (1) d as part of the noise as we did in (C.7), (C.8) during the period νT, the resulting noise term (X (1) 1d +S (1) d +Z 2 ) will be correlated with both the input X (1) 1a and the CSI S (1) The same applies for treating S (1) d as part of the noise as in (C.12), (C.13) during the period νT, and for treating S (2) d as part of the noise as in (C.21), (C.22) during the period (1 − ν)T.
In [36] the authors develop a coding scheme for a Gaussian multicast problem with independent interferences known noncausally to the transmitter; they also mention that with some modifications (especially, the common randomness that is mentioned below) their coding scheme also applies for a model in which the interferences are scaled differently on the links to the two receivers (one which is similar to (35), but for the multicast problem). In [36] the direct application of the coding scheme developed for the independent interferences case to the model with scaled interferences incurs (only) a correlation between the noise and the CSI, and the authors mention that such a correlation does not reduce the rates relative to the case in which they are independent if the encoder and the decoders have access to a source of common randomness. The underlying code construction is based on the lattice strategies of [42], and for instance the Inflated Lattice Lemma [42,Lemma 6]. In fact the code construction for the Gaussian multicast problem with independent interferences in [36] can be seen as being basically a careful superimposition of two DPCs and, thus, an appropriate superimposition of two lattice codes with good quantization properties, each designed as in [42], clearly achieves the same rates asymptotically with the dimension of the employed lattices.
In our case, however, as we already mentioned, there is also the correlation between the noise and the channel input. This correlation is due to that the source input is chosen to be (negatively) correlated with the CSI, for the sake of reducing its effect on the link relay-to-destination. While it is possible to get rid of this correlation by transforming the channel into an equivalent channel in which the (equivalent) channel input is independent from the (equivalent) CSI, it is still to be proved that the rate bounds on the rates R (k) a and R (k) d ,  k = 1, 2, in the proof of Theorem 5 which are established using random binning arguments can be achieved using a proper choice of linear lattice strategies at the source and codes at the relay. (This can be obtained by dividing the source input into two independent parts: one part which is proportional to the known CSI S (k) a , k = 1, 2, and thus is considered as part of the equivalent CSI, and another part which is independent from it and is considered as channel input for the equivalent channel.)

Numerical Examples and Discussion.
In this section we discuss some numerical examples for the general Gaussian case.

Generalized DPC versus Standard DPC.
First, we illustrate the rates given in Theorem 3 and Corollary 4, and the efficiency of the coding ideas used therein, through an numerical example. Figure 4 depicts the evolution of the lower bound (22) for a numerical example for the Gaussian RC model (21), as function of the signal-to-noiseratio (SNR) at the relay, SNR = P 1 /N 2 (in decibels). Also shown for comparison are: the rate of Corollary 4, the cut-set bound (16) computed here for the Gaussian channel model (21) and the trivial lower bound obtained by treating the interference at the relay and the destination as unknown noises. Investigating the two curves depicting the rates of Theorem 3 and Corollary 4, shows that, as expected, splitting the message to be transmitted into two parts, and so having the relay decode only one part of it, is beneficial at small SNR. However, this improvement vanishes at large SNR, as the relay can decode all the information transmitted by the source, and therefore there is no benefit from splitting the message at such a range of SNR.
Furthermore, Figure 4 also shows the rate obtained with rate-splitting and standard DPC at the source, that is, the special case of (22) obtained by setting ρ := ρ 1s = 0. Comparing this rate to the one of Theorem 3 (which is based on a coding scheme that employs GDPC or, equivalently, partial cancellation of the interference combined with standard DPC) it can be seen that, GDPC always improves upon standard DPC. This means that, even if only the source knows the interference, both the source and the relay benefit from this knowledge. This is made possible by having the source partially cancel the interference for the relay. For instance, the relay benefits since its transmit signal faces less interference on its way to the destination, and the source benefits in turn since the advantage taken from being assisted by a relay which is actually more efficient favourably counterbalances the loss incurred by spending some power in partially cleaning the channel for the relay.
It is worth mentioning that the improvement brought by GDPC (over standard DPC) is mainly visible at large SNR. This is because, as a prerequisite for the DF relaying strategy (or its variants), the relay can assist the source efficiently only if it decodes the source transmit symbol reliably (at least partially); and so, if it does not so, that is, at low SNR, it is not worth that the source spends power in facilitating relay transmission by (partially) cleaning the channel for it, as this would be accomplished at the cost of some power that could be allocated to strengthen the source transmit signal so that the relay can decode it more reliably, instead. Figure 5 depicts the evolution of the lower bound (29) for two numerical examples for the Gaussian RC model (28), as function of the signal-to-noise-ratio (SNR) at the relay, SNR = P 1 /N 2 (in decibels). The figure also shows the cut-set bound (16) computed here for the Gaussian channel model (28) and the curves corresponding to some other achievable rates obtained as special cases of the rate in Theorem 5 as we mentioned in Remark 1-all shown for comparison purposes. Furthermore, in order to show the improvement brought up by GCC over standard CC, Figure 5 also shows the rate obtained by the latter scheme, that is, the special case of (29) obtained by setting ρ = 0. We mention that the latter scheme is the one that can be obtained by a natural, but careful, extension of the initial CC [36], which was developed for a multicast setup as we already mentioned, to the relay setup.

Generalized CC versus Standard CC.
It is interesting to observe that, just as the scheme that uses GDPC improves upon the one that uses regular DPC in Theorem 3 as we explained previously, here also the coding scheme that employs GCC improves upon the one that employs regular CC. Moreover, the range of SNR for which the improvement is visible corresponds to when the multiaccess part of the bound in (29) is operative, that is, when the obtained rate is given by the information that the source and the relay together can transfer to the destination.
The coding scheme that we developed for the case of one interference (21) in Section 4.2 also applies for the model (28); the allowed rate is obtained by setting ν = 0 in (29) as we indicated in Remark 1. This rate is shown by the   (28), together with the maximizing ρ 12 = 1 − ξ − ρ 2 and ρ 1s := ρ for the lower bound (29). Numerical values are: P 1 = 5 dB, P 2 = 20 dB, Q = 10 dB, N 3 = 10 dB. dash-dotted curve in Figure 5. Comparing this rate with (29), it is insightful to observe that because the model (28) comprises two interferences, time sharing the superimposed GDPC and regular DPC that are applied during the period νT with those that are applied during the period νT (i.e., varying ν ∈ [0, 1]) is advantageous. While the achieved improvement is well expected since optimizing over ν as in Theorem 5 can only increase the rate (relative to the one obtained by fixing ν = 0 in (29), that is, by applying the coding scheme of Theorem 3 to the model (28)), it is insightful to comment on this improvement. Investigating the effect of fixing ν = 0, it can be seen that this causes the information sent through the relay to suffer from the interference S d,i = (S 2,i − S 3,i )/2 during all the transmission time (recall that this interference is considered as an unknown noise with power Q/2 at the decoder). In fact, with the coding scheme used to establish Theorem 5 also, the message W a sent through the relay suffers from the same interference during all the transmission time. However, in this latter case, the relay also helps transmitting message W d . For small SNR, however, there is no benefit from relaying message W d as well; and this explains why the two schemes give the same rate for such a range of SNR, that is, the optimal choice of ν in Theorem 5 is zero for small SNR.
Another numerical example is shown in Figure 6. For this numerical example, the figure also shows the variations of the maximizing ρ 1s := ρ and ρ 12 = 1 − ξ − ρ 2 in (29), as function of the SNR. This shows how the informed source allocates its power among combating the interference for the relay (related to the value of ρ 1s ), sending signals that are coherent with the transmission from the relay (related to the values of ρ 12 ) and sending additional information (related to the values of 1 − ρ 2 12 − ρ 2 1s ).

Conclusion
In this paper, we consider a state-dependent three-terminal full-duplex relay channel (RC) with the states of the channel known noncausally at only the source, that is, neither at the relay nor at the destination. We refer to this communication model as state-dependent RC with informed source. This setup may model the basic scenario of cooperation over a wireless network in which only the sources are cognitive of the states of the channel. We study this problem in the discrete memoryless (DM) setup and in the Gaussian setup. For the Gaussian setup, the channel states model additive Gaussian outside interferences. For the DM case, we establish a lower bound on the channel capacity. This lower bound is obtained by a coding scheme that uses techniques of rate-splitting at the source, regular encoding sliding-window decoding [33] for decodeand-forward (DF) relaying [34,Theorem 4] and a Gel'fand-Pinsker like binning scheme. Due to the rate-splitting at the source, this lower bound is better than the one obtained by assuming that the he relay decodes the source message fully, that is, full-DF combined with a Gel'fand-Pinsker like binning scheme.
For the Gaussian setup, we consider channel models in which each of the relay node and the destination node experiences on its link an additive Gaussian outside interference in addition to additive Gaussian noise. The interferences are known noncausally to only the source, and play the role of additive channel states. We focus on the case of one interference corrupting both transmissions to the relay and to the destination, and also the case of two independent interferences each corrupting one link. We establish lower bounds on channel capacity for each of these two models. Furthermore, we also discuss a case of correlated interferences. For the case of one interference, the applied coding scheme combines techniques of ratesplitting and generalized dirty paper coding [11,13] at the source, and decode-and-forward relaying. For the case of two independent interferences, as part of our coding scheme we employ a carbon copying onto dirty paper (CC) that builds carefully upon the initial CC [36] that was initially developed for a multicast setup; it also generalizes it by allowing negative correlation between the codewords at the source at the known channel states. For both studied Gaussian models, the uniformed relay benefits from the allowed negative correlation and, so, the source benefits in turn.

Appendices
Throughout this section we denote the set of strongly jointly -typical sequences [43,Chapter 14.2] with respect to the distribution P X,Y as T n (P X,Y ).

A. Proof of Theorem 1
Consider the random coding scheme that we described in Section 3.1. We first show that the average probability of error goes to zero as n → ∞.
A.1. Analysis of Probability of Error. Fix a probability distribution P S,U,U1,X1,X2,Y2,Y3 satisfying (6). Let (w r,i−1 , w d,i−1 ) and (w r,i , w d,i ) be the message pair sent in block i − 1 and the message pair sent in block i, respectively. Let s[i] denote any random state sequence observed in block i. As we already outlined after Theorem 1, at the beginning of block i the relay has decoded w r,i−1 and transmits x 2 (w r,i−1 ), and the source transmits a vector x 1 (w r,i−1 , w r,i , w d,i ) with i.i.d. components conditionally given The average probability of error is given by Pr(s)Pr(error | s). (A.1) The first term, Pr(s / ∈ T n (Q S )), on the RHS of (A.1) goes to zero as n → ∞, by the strong asymptotic equipartition property (AEP) [43]. Thus, it is sufficient to upper bound the second term on the RHS of (A.1).
We now examine the probabilities of the error events associated with the encoding and decoding procedures. The error event is contained in the union of the following error events; where the events E 1i and E 2i correspond to encoding errors at block i; the events E 3i and E 4i correspond to decoding errors at the relay at block i; and the events E 5i , E 6i , E 7i , and E 8i correspond to decoding errors at the destination at block i.
Let E 1i be the event that there is no sequence u(w r,i−1 , w r,i , j) jointly typical with s[i] given x 2 (w r,i−1 ), that is, To bound the probability of the event E 1i , we use a standard argument [2]. More specifically, for u(w r,i−1 , w r,i , j) and s[i] generated independently given x 2 (w r,i−1 ) with i.i.d. components drawn according to P U|X2 and Q S , respectively, the probability that u(w r,i−1 , w r,i , j) is jointly typical with s[i] given x 2 (w r,i−1 ) is greater than (1 − )2 −n(I(U;S|X2)+ ) for sufficiently large n. There is a total of J such u's in each bin. The probability of the event E 1i , the probability that there is no such u, is therefore bounded as Taking the logarithm on both sides of (A.3) and substituting J using (10) we obtain that ln(Pr(E 1i )) ≤ −(1 − )2 n . Thus, Pr(E 1i ) → 0 as n → ∞. Let E 2i be the event that there is no sequence u 1w d,i, j1 (u(w r,i−1 , w r,i , j )) jointly typical with s[i] given x 2 (w r,i−1 ) and u(w r,i−1 , w r,i , j ), that is, ∈ T n P X2,U,U1,S .
Proceeding similarly to for the event E 1i above, it can be shown that, conditioned on E c 1i , the complement event of E 1i , we have Pr(E 2i | E c 1i ) → 0 as n → ∞. Let E 3i be the event that u(w r,i−1 , w r,i , j [i]) and y 2 [i] are not jointly typical given x 2 (w r,i−1 ), that is, ) are jointly typical with (s[i], x 2 (w r,i−1 )), and with the source input x 1 (w r,i−1 , w r,i , w d,i ). Thus, Pr(E 3i | E c 1i , E c 2i ) → 0 as n → ∞, by the Markov Lemma [43].
Let E 4i be the event that u(w r,i−1 , w r,i , j) and y 2 [i] are jointly typical given x 2 (w r,i−1 ) for some w r,i ∈ {1, . . . , M}, j ∈ {1, . . . , J} with w r,i / = w r,i , that is, Using the union bound and standard arguments on jointly typical sequences, the probability of the event E 4i conditioned on E c 1i , E c 2i , E c 3i can be easily bounded as For decoding the triple (w r,i−1 , j [i − 1], w d,i−1 ) at the destination, let E 5i be the union of the following two events, Conditioned on the events E c 1i , E c 2i , E c 3i , E c 4i , E c 5i , the probability of the event E 6i can be bounded using the union bound, as For decoding the triple (w r,i−1 , j [i − 1], w d,i−1 ) at the destination, let E 7i be the event Conditioned on ∩ 6 k=1 E c ki , the probability of the event E 7i can be bounded using the union bound, as For decoding the triple (w r,i−1 , j [i − 1], w d,i−1 ) at the destination, let E 8i be the event Conditioned on ∩ 7 k=1 E c ki , the probability of the event E 8i can be bounded using the union bound, as From the above, we have that the average probability of error goes to zero for sufficiently large n if the rate R is chosen to satisfy The set of rates defined by (A.15a), (A.15b), (A.15c), (A.15d), (A.15e) does not change if one adds the two additional rate bounds Then, we obtain the rate in Theorem 1 by applying Fourier-Motzkin elimination (FME) (see, e.g., [44]) to eliminate the variables R r and R d from the obtained system of rate inequalities. Note that eliminating the variable R r by FME algorithm adds the constraint (7a), and eliminating the variable R d adds the constraints (7b) and (7c). This proves the achievability of the rate (9) in Theorem 1 for every measure P S,U,U1,X1,X2,Y2,Y3 of the form (6) that satisfies (7a), (7b), (7c), that is, P S,U,U1,X1,X2,Y2,Y3 ∈ P .
It remains to show that the rate (9) is not altered if one restricts the random variables U and U 1 to have their alphabet sizes limited as in (8a), (8b). This is done by invoking the support lemma [45, page 310], as follows.
Fix a distribution μ ∈ P of (S, U, U 1 , To prove the bound (8a) on |U|, note that we have Similarly, we have Hence, it suffices to show that the following functionals of μ(S, U 1 , can be preserved with another measure μ ∈ P -for given (s, x, x ) ∈ S × X 1 × X 2 , r s,x,x is obtained by marginalizing the measure μ. Observing that there is a total of |S||X 1 ||X 2 | + 2 functionals in (A.20), this is ensured by a standard application of the support lemma; and this shows that the cardinality of the alphabet of the auxiliary random variable U can be limited as indicated in (8a) without altering the rate (9).
Once the alphabet of U is fixed, we apply similar arguments to bound the alphabet of U 1 , where this time |S||X 1 ||X 2 |(|S||X 1 ||X 2 | + 2) − 1 functionals must be satisfied in order to preserve the joint distribution of S, U, X 1 , X 2 , and two more functionals to preserve yielding the bound on |U 1 | indicated in (8b).

B. Proof of Theorem 3
In this proof, we compute the lower bound in Theorem 1 using an appropriate jointly Gaussian distribution on S, U, U 1 , X 1 , X 2 . We assume that X 1 and X 2 are zeromean Gaussian with variance P 1 and P 2 , respectively. The random variable X 2 is independent from S as shown by the distribution in Theorem 1. The random variable X 1 is decomposed as X 1 = X 1r + X 1d , where: X 1r is zero mean Gaussian with variance P 1r , is independent from X 1d and is jointly Gaussian with both S and X 2 with E[X 1r S] = ρ 1s P 1r Q and E[X 1r X 2 ] = ρ 12 P 1r P 2 , for some ρ 1s ∈ [−1, 1], ρ 12 ∈ [−1, 1]; and X 1d is zero mean Gaussian with variance P 1d = P 1 − P 1r , and is independent from both the relay input X 2 and the state S. In what follows we will also use the covariances σ 12 = E[X 1 X 2 ] and σ 1s = E[X 1 S], satisfying

EURASIP Journal on Wireless Communications and Networking
As we outlined after Theorem 3, we consider U = X 1r + αS, where α denotes a scale parameter the range of which will be specified below.
(i) Let us first compute the first term of the minimization in (9). Consider the term [I(U; Y 2 | X 2 ) − I(U; S | X 2 )]. To evaluate the conditional mutual information I(U; Y 2 | X 2 ), let us denote by E[Y 2 | U, X 2 ] the optimal linear estimator of Y 2 given (U, X 2 ) under minimum mean square error criterion. That is, E[Y 2 | U, X 2 ] is given by with γ 1 = P 1r − σ 2 12 /P 2 + (α + 1)σ 1s + αQ P 1r − σ 2 12 /P 2 + 2ασ 1s + α 2 Q , For convenience let us define Δ as the denominator of the expression of γ 1 (for notational convenience, we omit the dependency of Δ on parameters α, ρ 12 and ρ 1s ), that is, Then, we have where (a) follows since the vector (U, X 2 , Y 2 ) is Gaussian; in (b) we used the fact that E[Y 2 | X 2 ] = (σ 12 /P 2 |)X 2 and = γ 1 (P 1r + (α + 1)σ 1s + αQ) and (b) follows by substituting γ 1 using (B.4) and using straightforward algebra to obtain Similarly, to evaluate the conditional mutual information I(U; S | X 2 ), let E[S | U, X 2 ] be the optimal linear estimator of S given (U, X 2 ) under minimum mean square error criterion, that is, (B.10) Then, we have For the computation of the first term of the minimization in (9), it remains to compute [I(U 1 ; Y 3 | U, X 2 ) − I(U 1 ; S | U, X 2 )]. Noting that Y 3 = X 1d + U + X 2 + (1 − α)S + Z 3 , it can be easily shown that (B.13) (ii) Consider now the second term of the minimization in (9). This term can be written as the sum of the first term of the minimization in (9) and the two terms Y 3 − X 2 = X 1 + S + Z 2 and Y 2 differ only through the noise terms. That is, which is readily obtained by substituting N 2 with N 3 on the RHS of (B.12).

C. Proof of Theorem 5
As we outlined after Theorem 5, we divide the total transmission time into two periods of duration νT and νT, respectively. Also we decompose the message W to be transmitted into two submessages W a and W d . During the period νT, the source transmits message W a to both the relay and the destination and message W d to only the relay. The relay decodes-and forwards both messages during this period. During the period νT, the source transmits message W a to both the relay and the destination and message W d to only the destination. The relay decodes-and forwards only message W a during this period. The relay operates in a fullduplex mode during both transmission periods.
Fix nonnegative P 1a , P 1d , P 2a , P 2d such that P 1a + P 1d ≤ P 1 and P 2a +P 2d ≤ P 2 ; ρ 12 ∈ [−1, 1], ρ 1s ∈ [−1, 1] and ν ∈ [0, 1]. Let R (1) a and R (2) a denote the rates at which message W a is transmitted to the destination during the period νT and the period νT, respectively. Similarly, let R (1) d and R (2) d denote the rates at which message W d is transmitted to the destination during these periods. The total rate during the period νT is T 1 = R (1) a + R (1) d , and the total rate during the period νT is We assume that the channel states S (1) 2 and S (1) 3 during the period νT, and the channel states S (2) 2 and S (2) 3 during the period νT, are zero-mean Gaussian with variance Q. Also, we let S (k) a = (S (k) 2 + S (k) 3 )/2 and S (k) d = (S (k) 2 − S (k) 3 )/2, k = 1, 2. The encoding and transmission scheme during the two periods is as follows.
Decoding. Decoding at the relay during the period νT and decoding at the destination during both transmission periods exploit successive cancellation. The details of the computation of some of the mutual information terms in this section are very similar to those in the proof of Theorem 3 in Appendix B, and hence we omit them for brevity. Also, since all the random variables are i.i.d., we sometimes omit the time index. Furthermore, we use the functions defined in Definition 2 and the substitutions: ξ := 1−ρ 2 12 − ρ 2 1s , ρ := ρ 1s , N 2 := N 2 + P 1d + Q/2 and N 3 := N 3 + P 1d + Q/2 .