The capacity of a class of state-dependent relay channel with orthogonal components and side information at the source and the relay

In this paper, a class of state-dependent relay channel with orthogonal channels from the source to the relay and from the source and the relay to the destination is studied. The two orthogonal channels are corrupted by two independent channel states SR and SD, respectively, which are known to both the source and the relay. The lower bounds on the capacity are established for the channel either with non-causal channel state information or with causal channel state information. Further, we show that the lower bound with non-causal channel state information is tight if the receiver output Y is a deterministic function of the relay input Xr, the channel state SD, and one of the source inputs XD, i.e., Y = f(XD, Xr, SD), and the relay output Yr is restricted to be controlled by only the source input XR and the channel state SR, i.e., the channel from the source to the relay is governed by the conditional probability distribution PY r XR;SR j . The capacity for this class of semi-deterministic orthogonal relay channel is characterized exactly. The results are then extended to the Gaussian cases, modeling the channel states as additive Gaussian interferences. The capacity is characterized for the channel when the channel state information is known noncausally. However, when the channel state information is known causally, the capacity cannot be characterized in general. In this case, the lower bound on the capacity is established, and the capacity is characterized when the power of the relay is sufficiently large. Some numerical examples of the causal state information case are provided to illustrate the impact of the channel state and the role of the relay in information transmission and in cleaning the channel state.


Introduction
We consider a state-dependent relay channel with orthogonal components as shown in Figure 1. The channel from the source to the relay and the channel from the source and the relay to the destination are assumed orthogonal. The source wants to send a message W to the destination with the help of the relay in n channel uses. Through a given memoryless probability law P Y r jX R ;X r ;S R P Y X D ;X r ;S D j , the channel outputs Y n r for the relay is controlled by the source inputs X n R , the relay inputs X n r , and the channel state S n R , while the channel outputs Y n for the destination is controlled by the source input X n D , the relay inputs X n r , and the channel state S n D . The state sequences S n R ¼ S R;1 ; S R;2 ; …; S R;n À Á and S n D ¼ S D;1 ; S D;2 ; …; S D;n À Á are independent and identically distributed (i.i.d.) with S R;i e Q S R s R;i À Á and S D;i e Q S D s D;i À Á , respectively. We assume S R and S D are independent. The channel state information about S R and S D is known to the source and the relay causally (that is, only S i R and S i D are known before transmission i takes place) or non-causally (that is, entire S n R and S n D are known before communication commences). The destination estimates the message sent by the source from its received channel output Y n . In this paper, we study the capacity of this model.

Background
In many communication models, the communicating parties typically have some knowledge on the communicating channel or attempt to learn about the channel. State-dependent channels have brought wide attention in recent years [1]. Shannon first considered a singleuser channel, wherein the channel state information was causally known to the transmitter [2]; the capacity of this channel was characterized. Gel'fand and Pinsker [3] derived a method to determine the capacity of a channel when the channel state information was non-causally known to the transmitter; this method was later called the Gel'fand-Pinsker (GP) coding scheme. In [4], Costa studied a Gaussian channel with additive white Gaussian noise (AWGN) and additive Gaussian interference known non-causally to the transmitter; it was demonstrated that with dirty paper coding (DPC), capacity can be achieved as if no interference existed in the channel.
Extensions to multiple user channels were performed by Gel'fand and Pinsker in [5], where it was shown that interference cancellation was possible in the Gaussian broadcast channel (BC) and Gaussian multiple-access channel (MAC). In multiple-user state-dependent channels, the channel state information may be known to all the users or only some of them. In [6], Sigurjonsson and Kim characterized the capacity of a degraded broadcast channel and the capacity of a physically degraded relay channel where the channel state information was causally known to the transmitters. Inner bounds for the two-user BC with non-causal side information at the transmitter were derived by extending Marto's achievable scheme to the state-dependent channels in [7]. In [8], Steinberg derived the inner and outer bounds for a degraded BC with non-causal side information and characterized the capacity region when the side information was provided to the transmitter in a causal manner. In [9], information theoretic performance limits for three classes of two-user state-dependent discrete memoryless BCs with non-causal side information at the encoder were derived.
The state-dependent two-user MAC with state information non-causally known to one of the encoders was considered in [10] and [11]. For the MAC with asymmetric channel state information at the transmitters and full channel state information at the decoder, a singleletter capacity region was characterized when the channel state available at one of the encoders was a subset of the channel state available at the other encoder [12]. However, for the general case, only the inner and outer bonds were derived. It is not easy to characterize the explicit capacity region for general state-dependent MACs even when the channel state information is known to all transmitters. Capacity regions are only characterized in some special cases, e.g., Gaussian MAC with additive interference known to both encoders [13]. In some cases where the cooperation between the transmitters is allowed, capacity regions are also characterized, e.g., in [14], explicit capacity region was characterized for the MAC with one informed encoder which transmitted a common message and a private message, while the uninformed encoder only transmitted a common message; in [15], the capacity region was derived for a two-user dirty paper Gaussian MAC with conferencing encoders.
The relay channels capture both the MAC and BC characteristics. The state-dependent relay channels were studied in [16][17][18][19][20][21]. Zaidi et al. [16] studied the relay channel with non-causal channel state information known to only the relay. The lower and upper bounds were derived by a coding scheme at the relay that used a combination of codeword splitting, Gel'f and-Pinsker binning, and decode-and-forward (DF) relaying. When the channel state information was known only at the source node, the lower and upper bounds were obtained in [17][18][19][20]. In [17], the coding scheme for the lower bound used techniques of rate splitting at the source, partial decode-and-forward (PDF) relaying, and a GPlike binning scheme. In order to derive the lower bound of the capacity, [18] proposed two achievable schemes: (i) state description by which the source described the channel state to the relay and the destination and (ii) Figure 1 Orthogonal relay channel with state information available at both the source and the relay. analog input description by which the source firstly computed the appropriate input that the relay would send had the relay known the channel state and then transmitted this appropriate input to the relay. With the same achievable schemes proposed in [18], the authors [19] obtained two corresponding lower bounds for the state-dependent relay channel with orthogonal components and channel state information known non-causally to the source. A similar orthogonal relay channel that was corrupted by an interference which was known noncausally to the source was considered in [20], in which several transmission strategies were proposed, assuming the interference had structure. Akhbari et al. [21] considered a state-dependent relay channel in three different cases: only the relay or only the source or both the source and the relay knew the channel state information non-causally. Lower bounds of the capacity were established based on using GP coding and compress-andforward (CF) relaying for the three cases.

Motivation
State-dependent channels with state information available at the encoders can be used to model many systems, such as information embedding [22][23][24] which enables encoding a message into a host signal, computer memories with defective cells [25], communication systems with cognitive radios, etc. For the above examples, we are more interested in the communication systems with cognitive radios. In order to improve the frequency spectrum efficiency in wireless systems, some secondary users which are capable of acquiring some knowledge about the primary communication are introduced into an existing primary communication system [26]. Obtaining the knowledge of the primary communication, the secondary users can adapt their coding schemes to mitigate the interference caused by the primary communication. In such models, the channel state can be viewed as the signals of the primary communication and the informed encoders can be viewed as cognitive users [11].
For the state-dependent relay channel with orthogonal components considered in this paper, the channel states S R and S D are viewed as the signals of corresponding primary communication; the source and the relay are viewed as the secondary users which are capable of acquiring the channel state information. Thus, the model studied in this paper can be viewed as a secondary relay communication with cognitive source and cognitive relay. We are interested in studying the capacity of this model.
However, it is tedious to characterize the explicit capacity of the relay channels even if the channel is stateindependent. The capacity for the state-independent relay channel is only characterized in some special channels, e.g., physically degraded/reversely degraded relay channel [27], a class of deterministic relay channels [28] and a class of relay channels with orthogonal components [29]. To the best of our knowledge, explicit capacity results for the state-dependent relay channels with channel state information known to a part of the transmitters or all the transmitters were derived mainly in two cases: (i) physically degraded relay channels with state information causally known to both the source and the relay and (ii) Gaussian physically degraded relay channels with the channel state information non-causally known to the source and the relay. However, when the relay channel is corrupted by the channel state, explicit capacity has not been characterized yet, even if the channel state has structure and the structure is known to the source. In this paper, we try to find some capacity results of the state-dependent relay channel with orthogonal components.

Main contributions and organization of the paper
We investigate a state-dependent relay channel with orthogonal components, where the source communicates with the relay through a channel (say channel 1) orthogonal to another channel (say channel 2) through which the source and the relay communicate with the destination. We assume that channel 1 and channel 2 are affected by two independent channel states S R and S D , respectively. The channel state information about S R and S D is known to both the source and the relay noncausally or causally. In this setup, the main results of this paper are summarized as follows: (1) A lower bound on the capacity of the channel is established when the channel state information is known to the source and the relay non-causally. The achievability is based on superposition coding at the source, PDF relaying at the relay, and cooperative GP coding at the source and the relay. (2) When the channel state information is known to the source and the relay causally, an achievable rate of this channel is derived in a similar way as in the non-causal channel state information case, except that the auxiliary random variables U and U r are independent of the channel state S R and S D . (3) We show that the exact capacity for the channel with non-causal channel state information at the source and the relay can be characterized if the receiver output Y is a deterministic function of the relay input X r , the channel state S D , and one of the source inputs X D , i.e., Y = f(X D , X r , S), and the relay output Y r is restricted to be controlled by only the source input X R and the channel state S R , i.e., the channel from the source to the relay is governed by the conditional probability distribution P Y r X R ;S R j .
(4) Explicit capacity is also characterized for the Gaussian orthogonal relay channel with additive Gaussian interferences known to the source and the relay non-causally. (5) For the Gaussian orthogonal relay channel with additive interferences known to the source and the relay causally, the capacity is derived when the power of the relay is sufficiently large.
The rest of the paper is organized as follows. In Section 2, we present the system model and some definitions as well as notations that will be used throughout the paper. Section 3 is devoted to establishing singleletter expressions for the lower bounds on the capacity of the discrete memoryless state-dependent orthogonal relay channel with channel state information known to the source and the relay either non-causally or causally. In Section 4, we show that when the channel state information is known non-causally, Y = f(X D , X r , S D ) and the channel from the source to the relay is governed by the conditional probability distribution P Y r X R ;S R j , the lower bound derived in Section 3 is tight; thus, the capacity is characterized exactly. In Section 5, the results are extended to the Gaussian cases. In Section 6, some numerical results are provided to illustrate the impact of the additive interferences and the role of the relay in information transmission and in cleaning the interference. In Section 7, we conclude this paper.

Notations and problem setup
Throughout this paper, random variables will be denoted by capital letters, while deterministic realizations thereof will be denoted by lower case letters. Vectors will be denoted by the boldface letters. The shorthand notation x i j is used to abbreviate (x i , x i + 1 , …, x j ), x i is used to abbreviate (x 1 , x 2 , …, x i ), and x i is used to denote the ith element of x n , where 1 ≤ i ≤ j ≤ n. The probability law of a random variable X will be denoted by P X , and the conditional probability distribution of Y given X will be denoted by P Y|X . The alphabet of a scalar random variable X will be designated by the corresponding calligraphic letter X . The cardinality of a set J will be denoted by J j j. T n ε X ð Þ denotes a set of strongly ε-typical sequences x n ∈ X n , while A n ε X ð Þ denotes a set of weakly ɛ-typical sequences x n ∈X n , where ε > 0. E(•) denotes expectation; I (•;•) denotes the mutual information between two random variables. N 0; σ 2 ð Þ denotes a Gaussian distribution with zero mean and variance σ 2 .
As shown in Figure 1, we consider the state-dependent relay channel with orthogonal components denoted by P Y ;Y r X R ;X D ;X r ;S D ;S R j , where Y ∈ Y and Y r ∈ Y r are the channel outputs from the destination and the relay, respectively. X R ∈ X R and X D ∈ X D are the orthogonal channel inputs from the source, while X r ∈ X r is the channel input from the relay. S R ∈ S R and S D ∈ S D denote the random channel states that corrupt channel 1 and channel 2, respectively. The channel states S R,i and S D,i at time instant i are independently drawn from the distribution Q S R and Q S D , respectively. The channel state information S R and S D is known to both the source and the relay non-causally or causally.
The message W is uniformly distributed over the set W ¼ 1; 2; …; M f g . The source transmits a message W to the destination with the help of a relay in n channel uses. Let X n R ¼ X R;1 ; …; X R;n À Á , X n D ¼ X D;1 ; …; X D;n À Á , and X n r ¼ X r;1 ; …; X r;n À Á be the channel inputs of the source and the relay, respectively. The relay channel is said to be memoryless and to have orthogonal components if A (M, n) code for the state-dependent relay channel with channel state information non-causally known to the source and the relay consists of an encoding function at the source a sequence of encoding functions at the relay for i = 1, 2, …, n, and a decoding function at the destination ϕ n : Y n → 1; 2; …; M f g : The information rate R is defined as R ¼ 1 n log 2 M bits per transmission An (ε n , n, R) code for the state-dependent relay channel with orthogonal components and non-causal state information is a code having average probability of error smaller than ε n , i.e.,

Pr
W ≠ϕ n y n ð Þ ≤ε n The rate R is said to be achievable if there exists a sequence of (ε n , n, R) codes with lim n → ∞ ε n = 0. The capacity of the channel is defined as the supremum of the set of achievable rates.
The definition of an (ε n , n, R) code for the statedependent relay channel with orthogonal components and causal channel state information at the source and the relay is similar to that of the state-dependent relay channel with non-causal state information, except that the encoder consists of a sequence of maps {φ i } i=1 n , and φ r;i È É n i¼1 , where i means the time index. Thus, the encoder mappings in (2) to (3) are replaced by respectively, where, i = 1, 2, …, n. The definitions of achievable rate and capacity remain the same as in the non-causal state information case.

Discrete memoryless case
In this section, it is assumed that the alphabets X D , X R , X r , Y r , Y , S D , and S R are finite. Lower bounds on the capacity of the channel with non-causal channel state information or causal channel state information are established, respectively. In the proofs of the lower bounds in the discrete memoryless case, strong typicality is used.

Non-causal channel state information
The following theorem provides a lower bound on the capacity of the state-dependent orthogonal relay channel with channel state information non-causally known to the source and the relay. Theorem 1 For the orthogonal relay channel with channel state information non-causally known to both the source and the relay, the following rate is achievable where the maximization is over all measures on U r ∈ U r and U ∈ U are auxiliary random variables with Remark 1 Since the source and the relay know the channel state information non-causally, with PDF relaying, they can transmit the messages to the destination cooperatively with GP coding, namely, cooperative GP coding. The source communicates with the relay treating s n R as a time-sharing sequence for the same reason that the channel state information is known to both the source and the relay non-causally.

Outline of the proof of Theorem 1
We now give a description of the coding scheme to derive the lower bound in Theorem 1. Detailed error analysis of Theorem 1 is given in Appendix 1. The achievable scheme is based on the combination of superposition coding at the source, PDF relaying at the relay, and cooperative GP coding at the source and the relay.
, and R = R D + R R , are sent over the channel in n (B + 1) transmissions. During each of the first B blocks, the source encodes w D k ð Þ ∈ 1; 2 nR D Â Ã and sends it over the channel. Since both the source and the relay know the channel state sequence s n R k ð Þ, the source encodes w R k ð Þ ∈ 1; 2 nR R Â Ã by treating s n R k ð Þ as a time-sharing sequence [30]. The message w R (k) is expressed as a unique set are sent over the channel multiplexed according to the state sequence s n R k ð Þ. The relay demultiplexes the received sequence y n r k ð Þ into sub-sequences according to the state sequence s n R k ð Þ and decode each sub-message m s R k ð Þ . Consequently, w R (k) is decoded at the relay. The coding scheme is illustrated in Figure 2. With PDF relaying, the relay re-encodes w R (k) and sends it to the destination cooperatively with the source. In the last block B + 1, no new message is sent and let w(B + 1) = (w D (B + 1), w R (B + 1)) = (1, 1). The average information rate R(B/(B + 1)) of the message over B + 1 blocks approaches R as B → ∞.

Codebook generation
Fix a measure P S R ;S D ;X r ;U r ;U;X R ;X D of the form (7).
Þand for each s R ∈ S R , randomly and independently generate 2 nR s R sequences These sequences constitute the sub-codebook

Encoding
We pick up the story in block k.
, be the new message to be sent from the source node at the beginning of block k. The encoding at the beginning of block k is as follows.
(i) The relay knows w R (k − 1) (this will be justified below) and searches the smallest j r k ð Þ ∈ 1; 2; …; 2 nR r;s È É , such that u n r w R k−1 ð Þ; j r k ð Þ ð Þis jointly typical with s n D k ð Þ. If no such j r (k) exists, an error is declared and j r (k) is set to 1. By the covering lemma [31], this error probability tends to 0 as n approaches infinity, if R r,s satisfies Given u n r w R k−1 ð Þ; j r k ð Þ ð Þ , s n R k ð Þ, and s n D k ð Þ, the relay sends a vector x n r k ð Þ with i.i.d. components drawn according to the marginal P X r U r ;S D ;S R j .
(ii) The source also knows w R (k − 1) and s n D k ð Þ, thereby knows u n Þ . If no such j d (k) exists, an error is declared and j d (k) is set to 1. By the covering lemma [31], this error probability tends to 0 as n approaches infinity, if R d,s satisfies express it as a unique set of messages Store each codeword in a first-in-first-out (FIFO) buffer of length n. A multiplexer is used to choose a symbol at each transmission time i ∈ [1, n] from one of the FIFO buffers according to the state s R,i (k). Then, the chosen symbol is transmitted.

Decoding
At the end of block k, the relay and the destination observe y n r k ð Þ and y n (k), respectively. x n 1−ε ð Þp s R ð Þ r;s R k ð Þ. By the law of large numbers (LLN) and the packing lemma [31], the probability error of each decoding step approaches 0 as . Therefore, the total probability error in decodingŵ R k ð Þ approaches 0 for sufficiently large n if the following condition is satisfied: (ii) Observing y n (k), the destination finds a pair If there is no such pair or it is not unique, an error is declared. By the packing lemma [31], it can be shown that for sufficiently large n, decoding is correct with high probability if Combining (11) to (13), is decoded correctly with high probability at the end of block k, if The detailed analysis of error probability is shown in Appendix 1.

Causal channel state information
In many practical communication systems, the state sequences are not known to the encoders in advance. For the case that the channel state information is provided to the source and the relay causally, the capacity is lower bounded as the following theorem. Theorem 2 The capacity of the orthogonal relay channel with channel state information causally known to both the source and relay is lower bounded by where U r ∈ U r and U ∈ U are auxiliary random variables with Remark 2 The achievable rate region in Theorem 2 is obtained by specializing the expression for the region in Theorem 1 to the case where the auxiliary random variables U and U r are independent of S D and S R . This is similar to the relation between the expression for the capacity of the state-dependent channel with causal channel state information introduced by Shannon [2] and its non-causal counterpart, the Gel'fand-Pinsker channel [3].
Proof The achievability poof is derived in a similar way as in the non-causal channel state information case except that the auxiliary random variables U and U r are independent of the channel states S D and S R , and the channel inputs of the source and the relay are restricted to the mappings x D = f D (u, s D ) and x r = f r (u r , s D , s R ), respectively, where f D (⋅) and f r (⋅) are deterministic functions. The details are omitted for brevity.

Semi-deterministic orthogonal relay channel with non-causal channel state information
In this section, we show that the lower bound derived in Theorem 1 is tight for a class of semi-deterministic orthogonal relay channel, where, the output Y of the destination is a deterministic function of X D , X r and S D , i.e., Y = f(X D , X r , S D ), and the output Y r of the relay node is controlled only by X R and S R , i.e., the channel from the source to the relay is governed by the conditional distribution P Y r X R ;S R j . This assumption is reasonable in many cases, e.g., when the two orthogonal channels use two different frequency bands, the received signal Y r at the relay node will not be affected by its input signal X r . The channel can be expressed as where, f(·) is a deterministic function and 1{·} denotes the indicator function. The channel state information on S R and S D is known to both the source and the relay noncausally. The capacity of this class of semi-deterministic orthogonal relay channel is characterized as shown in the following theorem.
Theorem 3 The capacity of the channel (18) with the channel state information known non-causally to the source and the relay is characterized as U r ∈ U r is an auxiliary random variables with and 1{·} denotes the indicator function. Proof The achievability follows from Theorem 1. First note that the joint distribution of (20) can also be written as with additional requirement that Note that, when P U r ;Y ;S D u r ; y; s D ð Þ is fixed, all the items on the right-hand side (RHS) of (19) are fixed except for I (X R ; Y r |S R ), which is independent of P Xr;XD SD;Ur;Y xr;xD sD;ur;yÞ j ð j . Therefore, the maximization over all joint distributions of the form (20) can be replaced by the maximization only over those distributions, where x r and x D are two deterministic functions of (s D , u r , y), i.e., of the form for some mappings g r : (u r , s D ) → x r , g d : (y, u r , s D ) → x D and subject to (23). Thus, we only have to prove the achievability of the rate that satisfies (19) for some distribution of the form (24). The achievability follows directly from Theorem 1 by taking U = Y since Y = f(X D , X r , S D ), letting X R be independent of U r and X r considering the fact that Y r is only determined by X R and S R , and by setting x r = g r (u r , s D ), x D = g d (y, u r , s D ). Note that with these choices of the random variables, if we chose stochastic kernels P X R S R j and P U r ;Y S D j , two deterministic mappings g r :(u r , s D ) → x r and g d :(y, u r , s D ) → x D , combined with Q S D Q S R and the channel law, the joint distribution (24) for which (23) is satisfied will be determined.
The proof of the converse is as follows.
Consider an (ε n , n, R) code with an average error probability P e (n) ≤ ɛ n . By Fano's inequality, we have where δ n → 0 as n → + ∞. Thus, Defining the auxiliary random variable where the second inequality follows from the fact that S n D and S n R are independent of W. Calculate the two terms in (27) separately as follows: where (a) holds since X R,i is a function of W ; S n D ; S n R À Á ; (b) follows from the fact that conditioning reduces entropy and the Markov chain where (a) holds since X r,i is a function of Y i−1 r ; S n D ; S n R À Á ; (b) follows from the fact that conditioning reduces entropy.
From (26) to (29), we have The proof of the bound I(W; Y n ) given in the second term in (19) is as follows: where (a) holds due to Csiszar and Korner's sum identity; (b) follows since S D,i is independent of ðW ; S n D;iþ1 Þ , and (c) follows from the fact that HðY i jW ; Y i−1 ; S n D;iþ1 Þ≥ IðY i ; S D;i jW ; Y i−1 ; S n D;iþ1 Þ.
By (26) and (31), From the above, we have Introduce a time-sharing random variable T, which is uniformly distributed over {1, 2, …, n} and denote the collection of random variables Considering the first bound in (33), we have where the last step follows from the fact that T is independent of all the other variables and the Markov chain Similarly, considering the second bound in (33), we have Defining Therefore, for a given sequence of (ε n , n, R) code with ε n going to zero as n goes to infinity, there exists a measure of the form P S D S R ;X r ;X R ;X D ¼ Q S D Q S R P X r jS D S R P X R ;X D X r ;S D ;S R j , such that the rate R essentially satisfies (19).
Considering the facts that I(X R ; Y r |S R ) is determined by the joint distribution P X R ;S R ;Y r and the other three items on the RHS of (19) is independent of P X R ;S R ;Y r , the maximum in (19) taken over all joint probability mass functions P S D ;S R ;X r ;U r ;X R ;X D ;Y r ;Y is equivalent to that taken over all joint probability mass functions of the form The bound of the cardinality of U r can be proven in a similar way as that proven in Theorem 1. It is omitted here for brevity.
This concludes the proof.

Memoryless Gaussian case
In this section, we study a state-dependent Gaussian relay channel with orthogonal components in which the channel states and the noise are additive and Gaussian. As shown in Figure 3, we consider the state-dependent Gaussian orthogonal relay channel, where channel 1 (dashed line) uses a different frequency band as compared to that used by channel 2 (solid line). The two orthogonal channels, channel 1 and channel 2, are corrupted by two independent additive Gaussian interferences S R and S D , respectively, which are known to the source and the relay. The channel can be described as where Y r and Y are the channel outputs of the relay and the destination, respectively; the Gaussian i.i.d. random variables (X R , X D ) and X r are channel inputs from the source and the relay with the average power constraints The additive interferences S R , S D and the noises Z r , Z d are assumed to be zero-mean Gaussian i.i.d. with E S 2 Further, we assume that S R , S D , Z r , and Z d are independent mutually. As in the discrete memoryless case, we will discuss the capacity of the channel when the additive interference sequences are known to the source and the relay non-causally and causally, respectively.

Channel state information non-causally known to the source and the relay
For the channel shown in Figure 3, when the channel state information is known non-causally to the source and the relay, using cooperative DPC, the capacity is characterized as in the following theorem.
Theorem 4 The capacity of the Gaussian orthogonal relay channel with the channel state information noncausally known to both the source and the relay is given by As in many other dirty paper channels with channel state information known non-causally at the encoders, with dirty paper coding, the capacity of the channel considered here is as same as that of the stateindependent relay channel with orthogonal components. In fact, (39) also characterizes the capacity of the stateindependent Gaussian orthogonal relay channel. Therefore, no matter the channel state information is either causally or non-causally known to the source and the relay, (39) serves as an upper bound on the capacity of the channel shown in Figure 3.
Proof We only need to prove the achievability of (39) since the expression in (39) characterizes the capacity of the state-independent orthogonal relay channel [29] which obviously serves as an upper bound of the channel in this paper.  the channel given by (37) and (38), we evaluate the achievable rate in (6) with the choice of the jointly Gaussian random variables U, U r , S R , S D , X R , X D , and X R given by where E X 2 Þis independent of X r . The parameter β is the ratio of the source power allocated to X D , while β ¼ 1−β is the ratio of the source power allocated to X R . The parameter ρ is the correlation coefficient between X r and X D . With the above definitions of the random variables, it is straightforward to show the achievable rate in (39). The calculation is straightforward, thus omitted for brevity.
However, the calculation above is somewhat algebraic. Proceeding similarly to Costa's dirty paper coding, we extend the result in Theorem 1 for the discrete memoryless (DM) case to memoryless channels with discrete time and continuous alphabets by standard arguments [32]. An alternative proof is outlined in Appendix 2.

Channel state information known at the source and the relay causally
When the channel state information is known to the source and the relay causally, the capacity is not characterized in general. The following theorem gives a lower bound on the capacity.
Theorem 5 For the Gaussian orthogonal relay channel with the channel state information causally known to the source and the relay, the following rate is achievable: Since the interference S R is additive and known to both the source and the relay, the relay can remove S R completely before decoding the message from the source. Actually, the interference S R does not affect the achievable rate.
Remark 5 The source and the relay expend parts ρ 2 d;s βP and ρ 2 r;s γP of their power respectively to clean S D from the channel and use the remaining power for cooperative information transmission. It is different from many other dirty paper channels with non-causal channel state information at the transmitters where the channel states can be completely cleaned by choosing appropriate auxiliary random variables, e.g., by dirty paper coding. If Q D = 0, the entire power of the source and the relay will be used for information transmission, i.e., ρ r,s = ρ d,s = 0. This reduces to the capacity of the state-independent relay channel with orthogonal components as shown in [29] since S R does not affect the achievable rate.
Proof The result in Theorem 2 for the discrete memoryless case can be extended to memoryless channels with discrete time and continuous alphabets using standard techniques [32]. The proof follows through evaluation of the lower bound of Theorem 2 using the following jointly Gaussian input distribution. Fix 0 ≤ β ≤ 1, − 1 ≤ ρ d,s , ρ r,s , ρ d,r ≤ 1 and With these definitions, it can be easily verified that U e N ð0; ð1−ρ 2 d;s ÞβPÞ, X r e N 0; γP ð Þ, and X D e N 0; βP ð Þ. Note that U, U r , and U′ are independent of S D . Obviously, from these definitions, it is evident that E X 2 R À Á þ E X 2 D À Á ≤P and E X 2 r À Á ≤γP. Through straightforward algebra, it can be shown that the evaluation of the lower bound in Theorem 2 using the above choices gives the lower bound in Theorem 5. The computation details are omitted here for brevity.
We next characterize the capacity of the statedependent Gaussian orthogonal relay channel with causal channel state information when the power of the relay is sufficiently large. As shown in Theorem 5, a part of the relay's power is used to clean the interference S D . When the power of the relay is sufficiently large, the interference S D can be cleaned completely and the capacity of the channel can be determined as shown in the following theorem. Theorem 6 For the Gaussian orthogonal relay channel with the additive interference sequences known at the source and the relay causally, when the power of the relay satisfies the capacity can be characterized as Remark 6 When the power of the relay is sufficiently large such that the interference S D is completely cleaned by the relay using part of its power and its remaining power is sufficiently large such that the relay-destination link does not constrain the achievable rate, the message sent from the source is split into two parts: one part is sent directly to the destination through a point-to-point source-destination channel and the other is sent to the destination through a two-hop source-relay-destination channel with DF relaying. The two parts are sent independently, and the rate can be expressed as the sum of the rates of the source-destination channel and the twohop source-relay-destination channel (the rate of the later is constrained by the source-relay link).
It is easy to verify that if γP ≥ Q D , for any fixed β, R 1 (β, ρ) is the maximal when ρ ¼ ρ Ã Denote the maximum of R 1 (β, ρ) as R 1 * (β). Therefore, we have Next, we will show the condition under which R 2 β; ρ Ã 1 À Á is always larger than R 1 * (β) for any β. Let The inequality in (51) is equivalent to It is easy to show that if the inequality in (52) holds for any β. Thus, if γ≥ P 4N þ N 4P þ Q D P þ 1 2 , the following inequality is always satisfied for any β For any β, we have Therefore, is achievable. As mentioned in Remark 3, (39) serves as an upper bound on the capacity of the channel considered here. The converse proof follows by proving that (46) matches the upper bound in (39) if the condition in (45) is satisfied. We denote the two terms on the RHS in (39) as Let C(β, ρ) = min {C 1 (β, ρ), C 2 (β, ρ)}. Similar to steps from (47) to (55), it is easy to prove that for any β if Next, we have to prove that for any β, under the condition γ≥ P 4N þ N 4P þ 1 2 , C(β, ρ) is maximized when ρ = 0. Denote the maximal of C(β, ρ) as C*(β), i.e., C Ã β ð Þ ¼ max −1≤ρ≤1 C β; ρ ð Þ. This can be proven by contradiction. Assume that C(β, ρ) is maximized when ρ = ρ′ (ρ′ ≠ 0). By (59), we get However, we have From (57), it is easy to verify that for any β, C 1 (β, ρ) is maximized when ρ = 0. Thus, (60) and (61) are contradictory. This proves that for any β, Thus, the maximization problem in (39) is equivalent to the following maximization problem This completes the proof.

Numerical examples
In this section, we provide some numerical examples for the achievable rate in Theorem 5.
With these examples, we will show the impact of the channel state and the role of the relay in information transmission and in cleaning the channel state. For γ = 1, Figure 4 shows a comparison of the capacity of the state-independent (Q R = Q D = 0) relay channel with orthogonal components and the achievable rate derived in Theorem 5. Obviously, the larger the power of the additive interference, more power of the source and the relay will be used to clean the interference; this results in a lower achievable rate. As the power values (P) of the source and relay increase, a larger amount of interference can be cleaned, leaving more power for information transmission. Consequently, the achievable rate will approach the capacity of the state-independent relay channel with orthogonal components as P increases. This can also be verified from (44)  P ≫ Q D , the impact of the additive interference S D will be negligible with respect to P. The maximization problem of (44) is approximate to that of (39) by taking ρ r,s → 0 and ρ d,s → 0.
For P/N = 30, Figure 5 shows the role of the relay in cleaning the channel state. As the power of the relay increases, the achievable rate increases. In particular, when the power of the relay is sufficiently large such that the channel state can be cleaned completely and the relaydestination link does not become the bottleneck for the achievable rate, the achievable rate matches the upper bound. This has been proven in Theorem 6. Figure 5 vividly illustrates this result.

Conclusions
In this paper, we consider a state-dependent relay channel with orthogonal channels from the source to the relay and from the source and the relay to the destination. The orthogonal channels are affected by two independent channel states, respectively, and the channel state information is known to both the source and the relay either non-causally or causally. In the non-causal state information case, the lower bound on the capacity of the channel is established with superposition coding at the source, PDF relaying at the relay and cooperative GP coding at the source and the relay. We further show that if the output of the destination Y is a deterministic function of the relay input X r , the channel state S D and one of the source inputs X D , i.e., Y = f(X D , X r , S D ) and the relay output Y r is restricted to be controlled by only the source input X R and the channel state S R , the lower bound is tight, and the capacity can be characterized exactly. As for the causal channel state information case, the lower bound on the capacity is also derived. The expression for the achievable rate in the causal state information case can be interpreted as a special case of that for the achievable rate in the non-causal state information case, where the auxiliary random variables U and U r are independent of S R and S D . This is similar to the relation between the expression for the capacity of the statedependent channel with causal channel state information introduced by Shannon [2] and its non-causal counterpart, the Gel'fand-Pinsker channel [3].
Further, we investigate the Gaussian state-dependent relay channel with orthogonal components, modeling the channel states as additive Gaussian interferences. Capacity is characterized when the additive interference sequences are known non-causally. The expression for the capacity is the same as that for the capacity of the state-independent relay channel with orthogonal components. This observation is similar to the results for the multiple user state-dependent channels shown in [6]. When the state information is known causally, however, the capacity is not characterized in general. In this case, with carefully chosen auxiliary random variables, achievable rate is derived. It is shown that when the power of the relay is sufficiently large, the capacity can be characterized exactly. Finally, two numerical examples are given to illustrate the impact of the channel state and the role of the relay in information transmission and in cleaning the state. The simulation results show that the larger the power of the additive interference, the more power of the source and the relay will be spent to clean the interference; this results in a lower achievable rate. However, as the power P increases, the impact of the interference will be negligible if P ≫ Q D and the achievable rate will approach the capacity of the stateindependent relay channel with orthogonal components. The simulation results also illustrate that when the power of the relay satisfies γP≥ P 4N þ N 4P þ Q D P þ 1 2 À Á P , the capacity of the channel can be characterized.

Appendix 1
Proof of Theorem 1: Analysis of probability of error The average probability of error is given by By the AEP, the first term Pr s n on the RHS of (63) goes to 0 as n → ∞. It is sufficient to upper bound the second term of the RHS of (63). We now examine the probabilities of the error events associated with the encoding and decoding steps. The error event is contained in the union of the following error events, where E 1k and E 2k correspond to the encoding steps in block k, E 3s R k and E 4s R k correspond to decoding the messagem s R k ð Þ at the relay in block k given S R = s R , the events E 5k and E 6k correspond to decoding w D (k) at the destination in block k. The probability of error Pr error s n D ; s n R Þ À is upper bounded as For u n r w R k−1 ð Þ; j r k ð Þ ð Þand s n D k ð Þ generated independently with i.i.d. components according to P U r and Q S D , respectively, the probability that there exits at least one j r k ð Þ∈ 1; 2; …; 2 nR r;s È É such that u n r w R k−1 ð Þ; j r k ð Þ ð Þ is jointly typical with s n D k ð Þ is greater than ð1−εÞ 2 −n I U r ;S D ð Þ þ δ ε ð Þ ð Þ for n sufficiently large. There are 2 nR r;s such u n r s in each bin. Therefore, the probability of event E 1k is bounded by Taking the logarithm on both sides of (64) and following from the inequality ln( Þ . Thus, if R r,s > I(U r ; S D ) + δ(ε), Pr(E 1k ) → 0 as n → ∞, where δ(ε) → 0 as ε → 0. Let E 2k be the event that there is no sequence u n (w D (k), j d (k)|w R (k − 1), j r (k)) jointly typical with s n D k ð Þ, given u n r w R k−1 ð Þ; j r k ð Þ ð Þ , i.e., Similar to the analysis on the probability of the event E 1k , if R d,s > I(U; S D |U r ) + δ(ε), Pr(E 2k ) → 0 as n → ∞.
For each s R ∈S R , let E 3s R k be the event that x Since the destination has successfully decoded u n r w R k−1 ð Þ;ĵ r k ð Þ À Á , the destination can peel off U r to make the channel to the destination equivalent to The destination finds a unique pairŵ D k ð Þ;ĵ d k ð Þ À Á such that ðu n ðŵ D k ð Þ;ĵ d k ð Þjŵ R k−1 ð Þ;ĵ r k ð ÞÞ; u n rŵ R k−1 ð Þ;ĵ r k ð Þ À Á ; y ′ n k ð ÞÞ Â∈A n ε U; U r ; Y ′

À Á
If there is no such pair or it is not unique, an error is declared. By the packing lemma [31], it can be shown that for sufficiently large n, decoding is correct with high probability if From (11) and (83), R D ≤ I(U; Y′|U r ) − I(U; S D |U r ) is achievable. The proof will continue after the following lemma. where the inequality holds due to the facts that U 0 and U r are conditionally independent given S D and conditioning reduces entropy. Thus, if R D ≤ I(U; Y′|U r ) − I(U; S D |U r ) is achievable, we have is achievable.
This completes the proof of Lemma 1. By (82), we define With Costa's dirty paper coding, let Then by Lemma 1, we have is achievable. The decoding at the relay is the same as that in the DM case except that the interference sequence s n R k ð Þ can be subtracted before decoding w R (k) since s n R k ð Þ is additive and known to the relay. This makes the channel from the source to the relay equivalent to Therefore, is achievable. Combining (81), (88) and (90), we show that R = R R + R D is achievable when the following condition is satisfied This completes the proof.