The necessary and sufficient condition for optimality of beamforming for MIMO systems under mean feedback is derived in Section 3.1. The issue about numerical evaluation for the condition is also addressed in Section 3.2.

### 3.1. Derivation of beamforming optimality condition

We start with the results presented in [13]. Venkatesan *et al*. [13] have proved that the optimal transmit covariance *Q* has the same eigenvectors as the channel mean matrix **H**_{
μ
}. Let

{\mathbf{H}}_{\mathbf{\mu}}={U}_{\mu}{\Sigma}_{\mu}{V}_{\mu}^{H}

(7)

be the singular value decomposition (SVD) of channel mean matrix **H**_{
μ
}, and

Q={U}_{Q}{\Sigma}_{Q}{U}_{Q}^{H}

(8)

be the spectral decomposition of *Q*. We have

Substituting (7)-(9) into (5) and noting that the statistics of matrix after multiplication of **H**_{
w
} with a unitary matrix remains unchanged, we have

C=\underset{{\Sigma}_{Q}:Tr\left({\Sigma}_{Q}\right)\le P}{max}E\left(log\mid {I}_{{N}_{R}}+\frac{\left({A}_{2}{\mathbf{H}}_{\mathbf{w}}+{A}_{1}{\Sigma}_{\mu}\right){\Sigma}_{Q}{\left({A}_{2}{\mathbf{H}}_{\mathbf{w}}+{A}_{1}{\Sigma}_{\mu}\right)}^{H}}{{\sigma}^{2}}\mid \right).

(10)

Let L = *A*_{1}∑ _{
μ
} + *A*_{2}**H**_{
w
}, and substituting *L* into the above formula yields

\begin{array}{c}C=\underset{{\Sigma}_{Q}:Tr\left({\Sigma}_{Q}\right)\le P}{max}E\left(log\mid {I}_{{N}_{R}}+\frac{L{\Sigma}_{Q}{L}^{H}}{{\sigma}^{2}}\mid \right)\hfill \\ \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\underset{{\Sigma}_{Q}:Tr\left({\Sigma}_{Q}\right)\le P}{max}E\left(log\mid {I}_{{N}_{R}}+\sum _{i=1}^{{N}_{T}}\frac{{L}_{\bullet i}{L}_{\bullet i}^{H}{\lambda}_{i}^{Q}}{{\sigma}^{2}}\mid \right),\hfill \end{array}

(11)

Where *L*_{·i}represents the *i* th column of the matrix *L*, {\lambda}_{i}^{Q} is the *i* th entry along the diagonal of ∑_{
Q
}. Now, we impose a constraint of being unit rank on the covariance matrix *Q* to derive the condition of beamforming optimality. Without loss of generality, assume that

\begin{array}{c}{\Sigma}_{Q}=\mathsf{\text{diag}}\left\{{\lambda}_{1}^{Q},{\lambda}_{2}^{Q},...,{\lambda}_{{N}_{T}}^{Q}\right\}\hfill \\ \phantom{\rule{1em}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\mathsf{\text{diag}}\left\{P-p,{\beta}_{2}p,{\beta}_{3}p,...,{\beta}_{{N}_{T}}p\right\},\hfill \end{array}

(12)

where the sum of {*β*_{
i
} , *i* = 2,...,*N*_{
T
} } is 1. Formula (11) can be further expressed as

C=\underset{{\Sigma}_{Q}:Tr\left({\Sigma}_{Q}\right)\le P}{max}E\left(log\mid {I}_{{N}_{R}}+\left(P-p\right)\frac{{L}_{\bullet 1}{L}_{\bullet 1}^{H}}{{\sigma}^{2}}+\sum _{i=2}^{{N}_{T}}{\beta}_{i}p\frac{{L}_{\bullet i}{L}_{\bullet i}^{H}}{{\sigma}^{2}}\mid \right).

(13)

As the function *C* is concave with respect to *p*, the necessary condition for beamforming optimality is ∂*C*/∂*p*| _{
p = 0
} ≤ 0. Differentiating *C*(*p*) with respect to *p* and noting that the derivative of the function log |*A* + *xB*| at *x* = 0 is *tr*(*A*^{-1}*B*), we have

\begin{array}{c}\mid \partial C\u2215\partial p{\mid}_{p=0}\hfill \\ =Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\cdot E\left(\sum _{i=2}^{{N}_{T}}\frac{{\beta}_{i}}{{\sigma}^{2}}{L}_{\bullet i}{L}_{\bullet i}^{H}\right)\right\}-Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\cdot \frac{1}{{\sigma}^{2}}{L}_{\bullet i}{L}_{\bullet i}^{H}\right]\right\}\hfill \end{array}

(14)

The second term on the right-hand side of (14) can be further written as

\begin{array}{c}Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\cdot \frac{1}{{\sigma}^{2}}{L}_{\bullet i}{L}_{\bullet i}^{H}\right]\right\}\hfill \\ =\frac{1}{P}E\left\{Tr\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\cdot \left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet i}{L}_{\bullet i}^{H}-{I}_{{N}_{R}}\right)\right]\right\}\hfill \\ =\frac{1}{P}E\left\{Tr\left[{I}_{{N}_{R}}-{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\right\}=\frac{1}{P}\left\{{N}_{R}-\left({N}_{R}-1+E\left(\frac{1}{1+P\u2215{\sigma}^{2}\parallel {L}_{\bullet 1}\mid {\mid}^{2}}\right)\right)\right\}\hfill \\ =\frac{1}{P}\left[1-E\left(\frac{1}{1+P\u2215{\sigma}^{2}\parallel {L}_{\bullet 1}\mid {\mid}^{2}}\right)\right].\hfill \end{array}

(15)

Proceeding with the derivation of (14), we have

\begin{array}{c}\mid \partial C\u2215\partial p{\mid}_{p=0}\hfill \\ =Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\cdot E\left(\sum _{i=2}^{{N}_{T}}\frac{{\beta}_{i}}{{\sigma}^{2}}{L}_{\bullet i}{L}_{\bullet i}^{H}\right)\right\}-\frac{1}{P}\left[1-E\left(\frac{1}{1+P\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}\u2215{\sigma}^{2}}\right)\right]\hfill \\ =Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\cdot \left[\sum _{i=2}^{{N}_{T}}\frac{{\beta}_{i}}{{\sigma}^{2}}\left({A}_{2}^{2}{I}_{{N}_{R}}+{A}_{1}^{2}{\left({\lambda}_{i}^{u}\right)}^{2}{D}_{i}\right)\right]\right\}-\frac{1}{P}\left[1-E\left(\frac{1}{1+P\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}\u2215{\sigma}^{2}}\right)\right],\hfill \end{array}

(16)

where *A*_{1} and *A*_{2} are defined in (2), {\lambda}_{i}^{\mu} is the *i* th diagonal element of the matrix ∑_{
μ
}, *D*_{
i
} = diag(0,...,1,...,0) is an *N*_{
R
} by *N*_{
R
} matrix with the *i* th diagonal element being 1, ||·|| _{
F
} stands for Frobenius norm. Since the sum of {*β*_{
j
} } is 1, we have

\begin{array}{c}\mid \partial C\u2215\partial p{\mid}_{p=0}\hfill \\ =Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\cdot \left[\frac{{A}_{2}^{2}}{{\sigma}^{2}}{I}_{{N}_{R}}+\sum _{i=2}^{{N}_{T}}\left(\frac{{\beta}_{i}}{{\sigma}^{2}}{A}_{1}^{2}{\left({\lambda}_{i}^{u}\right)}^{2}{D}_{i}\right)\right]\right\}-\frac{1}{P}\left[1-E\left(\frac{1}{1+P\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}\u2215{\sigma}^{2}}\right)\right]\hfill \\ =Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\right\}\frac{{A}_{2}^{2}}{{\sigma}^{2}}+Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\cdot \sum _{i=2}^{{N}_{T}}\left(\frac{{\beta}_{i}}{{\sigma}^{2}}{A}_{1}^{2}{\left({\lambda}_{i}^{u}\right)}^{2}{D}_{i}\right)\right\}\hfill \\ -\frac{1}{P}\left[1-E\left(\frac{1}{1+P\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}\u2215{\sigma}^{2}}\right)\right]\hfill \end{array}

(17)

To further derive the above formula, we consider the following matrix inversion lemma [14].

*Lemma 1*: For the matrix *B* = *A* + **xy**^{H} , where **x** and **y** are two vectors, the inversion of *B* is {B}^{-1}={A}^{-1}-\frac{{A}^{-1}\mathbf{x}{\mathbf{y}}^{H}{A}^{-1}}{1+{\mathbf{y}}^{H}{A}^{-1}\mathbf{x}}.

Proceeding with the derivation of (17), the second term on the right-hand side of (17) can be further expressed as

\begin{array}{c}Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\cdot \sum _{i=2}^{{N}_{T}}\left(\frac{{\beta}_{i}}{{\sigma}^{2}}{A}_{1}^{2}{\left({\lambda}_{i}^{u}\right)}^{2}{D}_{i}\right)\right\}\\ =\frac{{A}_{1}^{2}}{{\sigma}^{2}}\sum _{i=2}^{{N}_{T}}\left\{{\beta}_{i}{\left({\lambda}_{i}^{u}\right)}^{2}\cdot Tr\left[E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\cdot {D}_{i}\right]\right\}\\ =\frac{{A}_{1}^{2}}{{\sigma}^{2}}\sum _{i=2}^{{N}_{T}}\left\{{\beta}_{i}{\left({\lambda}_{i}^{u}\right)}^{2}\cdot Tr\left[E\left({I}_{{N}_{R}}-\frac{\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}^{H}{L}_{\bullet 1}}{1+\frac{P}{{\sigma}^{2}}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right)\cdot {D}_{i}\right]\right\}.\end{array}

(18)

Since *D*_{
i
} = diag(0,...,1,...,0) with the *i* th diagonal element being 1, we have

Tr\left[E\left({I}_{{N}_{R}}-\frac{\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}^{H}{L}_{\bullet 1}}{1+\frac{P}{{\sigma}^{2}}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right)\cdot {D}_{i}\right]=E\left[1-\frac{\frac{P}{{\sigma}^{2}}\mid {L}_{i1}{\mid}^{2}}{1+\frac{P}{{\sigma}^{2}}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right],\phantom{\rule{0.3em}{0ex}}\mathsf{\text{for}}\phantom{\rule{0.3em}{0ex}}i\ge 2.

(19)

In consideration of {L}_{\bullet 1}={A}_{1}{\lambda}_{1}^{\mu}+{A}_{2}{\left({\mathbf{H}}_{\mathbf{w}}\right)}_{\bullet 1}, where (**H**_{
w
})_{●1} stands for the first column of the matrix **H**_{
w
}, it is found that

E\left[1-\frac{\frac{P}{{\sigma}^{2}}\mid {L}_{i1}{\mid}^{2}}{1+\frac{P}{{\sigma}^{2}}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right]=E\left[1-\frac{\frac{P}{{\sigma}^{2}}\mid {L}_{21}{\mid}^{2}}{1+\frac{P}{{\sigma}^{2}}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right]=...=E\left[1-\frac{\frac{P}{{\sigma}^{2}}\mid {L}_{{N}_{R}1}{\mid}^{2}}{1+\frac{P}{{\sigma}^{2}}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right]\phantom{\rule{1em}{0ex}}\mathsf{\text{for}}i\ge \mathsf{\text{2}}\mathsf{\text{.}}

(20)

Therefore, without loss of generality, (18) can be further written as

\begin{array}{c}Tr\left\{E\left[{\left({I}_{{N}_{R}}+\frac{P}{{\sigma}^{2}}{L}_{\bullet 1}{L}_{\bullet 1}^{H}\right)}^{-1}\right]\cdot \sum _{i=2}^{{N}_{T}}\left(\frac{{\beta}_{i}}{{\sigma}^{2}}{A}_{1}^{2}{\left({\lambda}_{i}^{u}\right)}^{2}{D}_{i}\right)\right\}\hfill \\ =\frac{{A}_{1}^{2}}{{\sigma}^{2}}\sum _{i=2}^{{N}_{T}}\left\{{\beta}_{i}{\left({\lambda}_{i}^{u}\right)}^{2}E\left[1-\frac{\frac{P}{{\sigma}^{2}}\mid {L}_{21}{\mid}^{2}}{1+\frac{P}{{\sigma}^{2}}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right]\right\}\hfill \\ =\frac{{A}_{1}^{2}}{{\sigma}^{2}}\sum _{i=2}^{{N}_{T}}\left({\beta}_{i}{\left({\lambda}_{i}^{u}\right)}^{2}\right)\cdot E\left[1-\frac{\frac{P}{{\sigma}^{2}}\mid {L}_{21}{\mid}^{2}}{1+\frac{P}{{\sigma}^{2}}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right]\hfill \end{array}

(21)

where *L*_{
ij
} is the (*i*, *j*)th entry of the matrix *L*. Since the condition ∂*C*/∂*p*| _{
p = 0
} ≤ 0 is satisfied, the maximum value of (16) is less than or equal to 0. When *β*_{2} = 1, *β*_{
j
} = 0 for *j* > 2, the formula (21) achieves the maximum value and so does (16). Thus, manipulating (16), (17), and (21), the necessary condition for beamforming optimality is given by

F\left(P,{\sigma}^{2},{N}_{T},{N}_{R},{\Sigma}_{\mu},K\right)\le 0,

(22)

where

\begin{array}{c}F\left(P,{\sigma}^{2},{N}_{T},{N}_{R},{\Sigma}_{\mu},K\right)\hfill \\ =\frac{{A}_{2}^{2}P\left({N}_{R}-1\right)}{{\sigma}^{2}}-1+\left(1+\frac{{A}_{2}^{2}P}{{\sigma}^{2}}\right)E\left(\frac{1}{1+P\u2215{\sigma}^{2}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right)+\frac{P}{{\sigma}^{2}}{A}_{1}^{2}{\left({\lambda}_{2}^{u}\right)}^{2}\left[1-E\left(\frac{P\u2215{\sigma}^{2}\mid {L}_{21}{\mid}^{2}}{1+P\u2215{\sigma}^{2}\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}}\right)\right].\hfill \end{array}

(23)

And the function *F*(·) is referred to as the beamforming function.

Finally, the necessary condition expressed by (22) is also the sufficient condition because the second derivative of *C*(*p*) is equal to or less than 0, i.e., \frac{{\partial}^{2}C\left(p\right)}{\partial {p}^{2}}\le 0. The sufficiency proof is the same as that in [6]. Thus, it is concluded that the necessary and sufficient condition for beamforming optimality is given by the inequality (22). When the condition (22) is satisfied, we can adopt the beamforming strategy at the transmitter to achieve the maximum capacity of MIMO systems under mean feedback. The beamforming strategy includes three steps:

**Step 1**: The transmitter obtains the mean feedback **H**_{
μ
} from the receiver and performs SVD to retrieve the eigenvector *V*_{
μ
} , as in Equation 7.

**Step 2**: The optimal power allocation matrix is set as {\Sigma}_{Q,\mathsf{\text{BF}}}=\mid \mathsf{\text{diag}}\left\{P,0,...,0\right\}{\mid}_{{N}_{T}\times {N}_{T}}.

**Step 3**: Assuming the original user data \mathbf{x}\prime =\left\{{x}_{i}^{\prime},i=1,2,...,{N}_{T}\right\} is coded by i.i.d Gaussian code and therefore each data stream {x}_{i}^{\prime} is i.i.d Gaussian distributed. Then, the transmitter performs linear transformation to obtain the transmitted signal **x**, i.e., \mathbf{x}={\Sigma}_{Q,\mathsf{\text{BF}}}^{1\u22152}{V}_{\mu}{\mathbf{x}}^{\prime}.

### 3.2. Numerical evaluation for beamforming optimality condition

Let *z*_{1}|*L*_{21}|^{2} and {z}_{2}=\parallel {L}_{\bullet 1}{\parallel}_{F}^{2}, the probability density function (PDF) of *z*_{1} is given by

{f}_{{z}_{1}}\left(x\right)=\left\{\begin{array}{c}\left(K+1\right){e}^{-\left(K+1\right)x},\mathsf{\text{for}}\phantom{\rule{2.77695pt}{0ex}}x\ge 0;\hfill \\ 0,\phantom{\rule{1em}{0ex}}\mathsf{\text{else}}.\hfill \end{array}\right.

(24)

Let *z*_{3} = *z*_{2} - *z*_{1}, then the PDF of *z*_{3} can be obtained from the transform of the PDF of a non-central Chi-squared distributed random variable and it is given by

{f}_{{z}_{3}}\left(x\right)=\left\{\begin{array}{c}{exp}^{\left[-\left(K+1\right)x-K{\left({\lambda}_{1}^{\mu}\right)}^{2}\right]}\sum _{i=0}^{\infty}{K}^{i}{\left(K+1\right)}^{i+{N}_{R}-1}{\left({\lambda}_{1}^{\mu}\right)}^{2i}\frac{{x}^{i+{N}_{R}-2}}{i!\left(i+{N}_{R}-2\right)!},\mathsf{\text{for}}\phantom{\rule{2.77695pt}{0ex}}x\ge 0;\hfill \\ 0,\phantom{\rule{1em}{0ex}}\mathsf{\text{else}}.\hfill \end{array}\right.

(25)

As the *z*_{3} is independent of *z*_{1} and the joint PDF of (*z*_{1}, *z*_{3}) is given by

{f}_{{z}_{1},{z}_{3}}\left(x,y\right)=\left\{\begin{array}{c}{exp}^{\left[-\left(K+1\right)\left(x+y\right)-K{\left({\lambda}_{1}^{\mu}\right)}^{2}\right]}\sum _{i=0}^{\infty}{K}^{i}{\left(K+1\right)}^{i+{N}_{R}}{\left({\lambda}_{1}^{\mu}\right)}^{2i}\frac{\sqrt{2}{y}^{i+{N}_{R}-2}}{i!\left(i+{N}_{R}-2\right)!},\mathsf{\text{for}}\phantom{\rule{2.77695pt}{0ex}}x,y\ge 0;\hfill \\ 0,\mathsf{\text{for}}\phantom{\rule{2.77695pt}{0ex}}x<0\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{or}}\phantom{\rule{2.77695pt}{0ex}}y<0.\hfill \end{array}\right.

(26)

where Γ(·) is the Gamma function. As the Jacobian determinant of (*z*_{1}, *z*_{3}) with respect to (*z*_{1}, *z*_{2}) is 1, the joint PDF of (*z*_{1}, *z*_{2}) is given by

{f}_{{z}_{1},{z}_{2}}\left(x,y\right)=\left\{\begin{array}{c}{exp}^{\left[-\left(K+1\right)y-K{\left({\lambda}_{1}^{\mu}\right)}^{2}\right]}\sum _{i=0}^{\infty}{K}^{i}{\left(K+1\right)}^{i+{N}_{R}}{\left({\lambda}_{1}^{\mu}\right)}^{2i}\frac{\sqrt{2}{\left(y-x\right)}^{i+{N}_{R}-2}}{i!\left(i+{N}_{R}-2\right)!},\mathsf{\text{for}}\phantom{\rule{2.77695pt}{0ex}}x\ge 0,y\ge 0,y\ge x;\hfill \\ 0,\mathsf{\text{else}}.\hfill \end{array}\right.

(27)

Therefore, with formulas (25) and (27), the necessary condition for beamforming optimality given by (22) can be evaluated by numerical method.