We aim to design the analog precoder \({{\mathbf{F}}_{\mathrm{RF}}}\) and digital precoder \({{\mathbf{F}}_{\mathrm{BB}}}\), so as to maximize the total sum rate *R*, which can be written as

$$\begin{aligned} \left( {{{\mathbf{F}}_{\mathrm{RF}}},{{\mathbf{F}}_{\mathrm{BB}}},{\mathbf{P}}} \right)&= \mathop {\arg {\mathrm{max}} }\limits _{\left( {{{\mathbf{F}}_{\mathrm{RF}}},{{\mathbf{F}}_{\mathrm{BB}}},{\mathbf{P}}} \right) } R\\&\quad s.t.{\left| {\left. {{\mathbf{F}}_{\mathrm{RF}}^{i,j}} \right| } \right. ^2} = \frac{1}{{{N_t}}},\\&\quad \left\| {{{\mathbf{F}}_{\mathrm{RF}}}{{\mathbf{F}}_{\mathrm{BB}}}} \right\| _F^2 = K{N_s},\\&\quad {{\mathbf{F}}_{\mathrm{RF}}} = {{\mathrm{blk(}}}{{{{{\bar{\mathbf{a }}}}}}_1},{{{{{\bar{\mathbf{a }}}}}}_2}, \ldots ,{{{{{\bar{\mathbf{a }}}}}}_N}{{\mathrm{)}}},\\&\quad \left\| {\left. {\mathbf{P}} \right\| } \right. _F^2 = {P_N}.\end{aligned}$$

(7)

Since the nonzero elements in the analog precoding matrices are usually realized by phase shifters [34], the nonzero elements in \({{\mathbf{F}}_{\mathrm{RF}}}\) satisfy the constant-modulus constraints. Unfortunately, the non-convex constraints on the constant-modulus constraints lead the optimization to be non-convex. In other words, it is difficult to find the globally optimal solution of problem (7).

### Analog precoding design

In the case of multiple users, the inter-user interference can be effectively eliminated by using the baseband BD technology. After removing the interference between users, *R* in (7) can be rewritten as

$$\begin{aligned} R = {\log _2}\left( \left| {{{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{H }}{{\mathbf{F}}_{\mathrm{RF}}} {{\mathbf{F}}_{\mathrm{BB}}}{\mathbf{F}}_{\mathrm{BB}}^H{\mathbf{F}}_{\mathrm{RF}}^H{{\mathbf{H }}^H}} \right| \right) . \end{aligned}$$

(8)

It means we should find the optimal solution \({{\mathbf{F}}_{\mathrm{RF}}}\) in *R* as far as possible. Based on (1), the limitations of the analog precoding matrix design are constant amplitude and BD. However, these non-convex constraints make it difficult to maximize the capacity of (8). Based on the special block diagonal structure of the hybrid precoding matrix \({{\mathbf{F}}_{\mathrm{RF}}}\), we observe that the precoding on different sub-antenna arrays is independent. Inspired by [33, 34], we can resolve the complicated optimization problem (8) into a series of sub-rate optimization problems, which is much easier solved.

In other words, considering each antenna array connected to each RF chain one by one, we can optimize the sum rate of the first antenna array selected by turning off all their antenna sub-arrays. After that, we can select the sum rate of the second antenna array that needs to be optimized.

The traditional SIC method is optimized in a recursive order, but the channel state of each antenna sub-array is different. We can sort the *N* antenna sub-arrays according to the capacity of the channel before optimization. The optimized order of capacity is determined by the pros and cons of the capacity, that is, our optimization order is in the order of screening.

\({C_n}\) is defined as the capacity of the *n*th antenna sub-array in the millimeter wave massive MIMO systems, where \(n = 1,2, \ldots ,N\). After the optimization sequence is determined, we will perform the above-mentioned SIC process until the last antenna sub-array is optimized. During the calculation, we assume that the digital precoding matrix is fixed. Then the objective in (8) can be expressed as follows

$$\begin{aligned} {{\mathbf{F}}_{\mathrm{RF}}}&= \arg \mathop {{\mathrm{max}} }\limits _{{{\mathbf{F}}_{\mathrm{RF}}}} {C_{{\mathrm{max}} }}\\&\quad s.t.{\left| {\left. {{\mathbf{F}}_{\mathrm{RF}}^{i,j}} \right| } \right. ^2} = \frac{1}{{{N_t}}},\\&\quad {{\mathbf{F}}_{\mathrm{RF}}} = {{\mathrm{blk(}}}{{{{{\bar{\mathbf{a }}}}}}_1},{{{{{\bar{\mathbf{a }}}}}}_2}, \ldots ,{{{{{\bar{\mathbf{a }}}}}}_N}{{\mathrm{)}}},\end{aligned}$$

(9)

where \({C_{{\mathrm{max}} }} = \sum \nolimits _{n = 1}^N {{{\log }_2}(1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}} {\mathbf{H }}{{\mathbf{F}}_{\mathrm{RF}}}{\mathbf{F}}_{\mathrm{RF}}^H{{\mathbf{H }}^H}) = {C_1} + {C_2} + \cdots {C_N}\). After the analog precoding is obtained, the optimal digital precoding matrix is solved by the baseband BD technology.

We can divide the hybrid precoding matrix \({{\mathbf{F}}_{\mathrm{RF}}}\) into \({{\mathbf{F}}_{\mathrm{RF}}} = ({\mathbf{F}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^N)\) at the BS. \({\mathbf{F}}_{\mathrm{RF}}^N\) is the *N*th column of \({{\mathbf{F}}_{\mathrm{RF}}}\), and \({\mathbf{F}}_{\mathrm{RF}}^{N - 1}\) is an \(NM \times (N - 1)\) matrix containing the first \(N - 1\) columns of \({{\mathbf{F}}_{\mathrm{RF}}}\). Then the sum rate in (9) can be rewritten as

$$\begin{aligned} {C_{{\mathrm{max}} }}&= {\log _2}\left( \left| {{{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{H }}[({\mathbf{F}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^N)]{{[({\mathbf{F}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^N)]}^H}{{\mathbf{H }}^H}} \right| \right) \\&= {\log _2}\left( \left| {{{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{HF}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^{N - {1^H}}{{\mathbf{H }}^H} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{HF}}_{\mathrm{RF}}^N{\mathbf{F}}_{\mathrm{RF}}^{{N^H}}{{\mathbf{H }}^H}} \right| \right) .\end{aligned}$$

(10)

Define auxiliary matrix

$$\begin{aligned} {{\mathbf{S}}_{N - 1}} = {{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{HF}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^{N - {1^H}}{{\mathbf{H }}^H}. \end{aligned}$$

(11)

Due to the fact that \(\left| {{\mathbf{I + XY }}} \right| = \left| {{\mathbf{I + YX }}} \right|\) by defining \({\mathbf{X}} = {\mathbf{S}}_{N - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^N\) and \({\mathbf{Y }} = {\mathbf{F}}_{\mathrm{RF}}^{N - {1^H}}{{\mathbf{H }}^H}\). (10) can be simplified as

$$\begin{aligned} {C_{{\mathrm{max}} }}&{\mathop {=}\limits ^{(a)}} {\log _2}\left( \left| {{{\mathbf{S}}_{N - 1}}} \right| \right) + {\log _2}\left( \left| {{{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{S}}_{N - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^N{\mathbf{F}}_{\mathrm{RF}}^{{N^H}}{{\mathbf{H }}^H}} \right| \right) \\&{\mathop {=}\limits ^{(b)}} {\log _2}\left( \left| {{{\mathbf{S}}_{N - 1}}} \right| \right) + {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{N^H}} {{\mathbf{H }}^H}{\mathbf{S}}_{N - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^N} \right| \right) .\end{aligned}$$

(12)

Obviously, the second term \(1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{N^H}}{{\mathbf{H }}^H}{\mathbf{S}}_{N - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^N\) on the right side of (b) in (12) is the achievable sub-rate of the *N*th antenna sub-array and the first term has the same form as (8). Further, we can decompose \({\log _2}(\left| {{{\mathbf{S}}_{N - 1}}} \right| )\) using the similar method in (12) as

$$\begin{aligned} {\log _2}(\left| {{{\mathbf{S}}_{N - 2}}} \right| ) + {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}({\mathbf{F}}_{\mathrm{RF}}^{N - {1^H}}{{\mathbf{H }}^H}{\mathbf{S}}_{N - 2}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^{N - 1}} \right| \right) . \end{aligned}$$

(13)

Then, after *N* such decompositions, the total sum rate in (9) can be shown as

$$\begin{aligned} {C_{{\mathrm{max}} }} = \sum \limits _{n = 1}^N {{\log }_2}\left( 1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{H }}^H}{\mathbf{S}}_{n - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^n\right) , \end{aligned}$$

(14)

where \({{\mathbf{S}}_n} = {{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{HF}}_{\mathrm{RF}}^n{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{H }}^H}\) and \({{\mathbf{S}}_1} = {{\mathbf{I}}_N}\).

According to the analysis above, the capacity of the first and the optimized antenna sub-array can be expressed as

$$\begin{aligned} {C_{n,{\mathrm{max}} }} = {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{T}}_{n - 1}}{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) , \end{aligned}$$

(15)

where \({C_{n,{\mathrm{max}} }} \in {\mathrm{max}} \left\{ {\left. {\begin{array}{*{20}{l}} {{C_1}}&{{C_2}}&{\begin{array}{*{20}{l}} \cdots&{{C_N}} \end{array}} \end{array}} \right\} } \right.\) represents the first antenna sub-array that needs to be optimized. \({{\mathbf{T}}_{n - 1}} = {{\mathbf{H }}^H}{\mathbf{S}}_{n - 1}^{ - 1}{\mathbf{H }}\) satisfies the restrictions. Therefore, (15) can be rewritten as

$$\begin{aligned} {C_{n,{\mathrm{max}} }} = {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{G}}_{n - 1}}{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) , \end{aligned}$$

(16)

where \({{\mathbf{G}}_{n - 1}} \in {{\mathbb{C}}^{M \times M}}\) is the corresponding sub-array of \({{\mathbf{T}}_{n - 1}}\) by only keeping the rows and columns of \({{\mathbf{T}}_{n - 1}}\) from the \((M(n - 1) + 1)\)th one to the (*Mn*)th one, respectively. It can be presented as

$$\begin{aligned} {{\mathbf{G}}_{n - 1}} = {\mathbf{R }}{{\mathbf{T}}_{n - 1}}{{\mathbf{R }}^H} = {\mathbf{R }}{{\mathbf{H }}^H}{\mathbf{S}}_{n - 1}^{{{\mathrm{- 1}}}}{\mathbf{H }}{{\mathbf{R }}^H}, \end{aligned}$$

(17)

where \({\mathbf{R }} = {\left[ {\begin{array}{*{20}{l}} {{{\mathbf{0}}_{M \times M(n - 1)}}}\\ {{{\mathbf{I}}_M}}\\ {{{\mathbf{0}}_{M \times M(N - n)}}} \end{array}} \right] ^T}\) is the corresponding selection matrix. Defining the singular value decomposition (SVD) of \({{\mathbf{G}}_{n - 1}}\) as \({{\mathbf{G}}_{n - 1}} = {\mathbf{V }{\varvec{\Sigma }} }{{\mathbf{V }}^H}\), where \({\varvec{\Sigma }} \in {{\mathbb{C}}^{M \times M}}\) is the singular value of \({{\mathbf{G}}_{n - 1}}\), and \({\mathbf{V }} \in {{\mathbb{C}}^{M \times M}}\) is the right singular value vector of \({{\mathbf{G}}_{n - 1}}\).

The optimal solution of (17) can be obtained as

$$\begin{aligned} {\mathbf{F}}_{\mathrm{RF}}^{{N_{{\mathrm{opt}}}}} = {\left[ {\begin{array}{*{20}{l}} 0\\ {{{{{{\bar{\mathbf{a }}}}}}_{N,{\mathrm{opt}}}}}\\ 0 \end{array}} \right] _{NM \times 1}}, \end{aligned}$$

(18)

where \({{{{\bar{\mathbf{a }}}}}_{N,{\mathrm{opt}}}} \in {{\mathbb{C}}^{M \times 1}}\) represents the first column of \({\mathbf{V }}\). Since the elements of \({{{{\bar{\mathbf{a }}}}}_{N,{\mathrm{opt}}}}\) do not obey the constraint in Sect. 3, the analog precoding vector \({\mathbf{F}}_{\mathrm{RF}}^{{N_{{\mathrm{opt}}}}}\) cannot be directly chosen as \({{{{\bar{\mathbf{a }}}}}_{N,{\mathrm{opt}}}}\). Then, by calculating the MMSE between \({\mathbf{F}}_{\mathrm{RF}}^{{N_{{\mathrm{opt}}}}}\) and the solution \({\mathbf{F}}_{\mathrm{RF}}^N\) in the constrained case, the conclusion that the \({\mathbf{F}}_{\mathrm{RF}}^N\) shares the phase of the corresponding element of \({\mathbf{F}}_{\mathrm{RF}}^{{N_{{\mathrm{opt}}}}}\) can be obtained.

Matrices \({\varvec{\Sigma }}\) and \({\mathbf{V }}\) are, respectively, separated into following two parts:

$$\begin{aligned} {\varvec{\Sigma }} = \left[ {\begin{array}{*{20}{l}} \begin{array}{l} {{\varvec{\Sigma}}_1}\\ 0 \end{array}&{}\begin{array}{l} 0\\ {{\varvec{\Sigma}}_2} \end{array} \end{array}} \right] ,{\mathbf{V }} = [\begin{array}{*{20}{l}} {{{\mathbf{v}}_{\mathbf{1 }}}}&\quad {{{\mathbf{v}}_2}} \end{array}]. \end{aligned}$$

(19)

Further, the \({C_{n,{\mathrm{max}} }}\) given by (16) can also be rewritten as

$$\begin{aligned} {C_{n,{\mathrm{max}} }}&= {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{\mathbf{V }}{\varvec{\Sigma }} {{\mathbf{V }}^H}{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) \\&= {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_1}{{\varvec{\Sigma}}_1}{\mathbf{v}}_1^H{\mathbf{F}}_{\mathrm{RF}}^n + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_2}{{\varvec{\Sigma}}_2}{\mathbf{v}}_2^H{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) .\end{aligned}$$

(20)

In order to find the \({\mathbf{F}}_{\mathrm{RF}}^n\) closest to \({\mathbf{F}}_{\mathrm{RF}}^{{n_{{\mathrm{opt}}}}}\), we reasonably assume that \({\mathbf{F}}_{\mathrm{RF}}^n\) is orthogonal to \({{\mathbf{v}}_2}\) which is \({\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_2} \approx 0\). Due to \(\left| {{\mathbf{I + XY }}} \right| = \left| {{\mathbf{I + YX }}} \right|\) and effective theory of high signal-to-noise-ratio (\({{\mathrm{SNR}}}\)) approximation, i.e.,

$$\begin{aligned} {\left( 1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{{\varvec{\Sigma}}_1}\right) ^{ - 1}}\frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{{\varvec{\Sigma}}_1} \approx 1. \end{aligned}$$

(21)

Thus, (20) can be expressed as

$$\begin{aligned} {C_{n,{\mathrm{max}} }}&\approx {\log _2}\left( \left| {1 + \frac{{{P_N}{{\varvec{\Sigma}}_1}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_1}{\mathbf{v}}_1^H{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) \\&\approx {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{{\varvec{\Sigma}}_1}} \right| \right) + {\log _2}\left( \left| {{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_1}{\mathbf{v}}_1^H{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) .\end{aligned}$$

(22)

From (22), we observe that maximizing \({C_{n,{\mathrm{max}} }}\) is equivalent to maximize the square of the inner product between two vectors \({\mathbf{F}}_{\mathrm{RF}}^{{n_{{\mathrm{opt}}}}}\) and \({\mathbf{F}}_{\mathrm{RF}}^n\). Based on this fact, the optimization problem (15) is equivalent to the following

$$\begin{aligned} \mathop {\arg \min }\limits _{{\mathbf{F}}_{\mathrm{RF}}^N \in \zeta } \left\| {{\mathbf{F}}_{\mathrm{RF}}^{{n_{{\mathrm{opt}}}}} - {\mathbf{F}}_{\mathrm{RF}}^n} \right\| _2^2. \end{aligned}$$

(23)

The function of MMSE in all antenna sub-arrays can be expressed as

$$\begin{aligned} &{\mathrm{E}}\left\{ {\left\| {{\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}} - {{\mathbf{F}}_{\mathrm{RF}}}} \right\| _F^2} \right\} \\&\quad = {{\mathrm{tr}}}\left\{ {{{({\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}} - {{\mathbf{F}}_{\mathrm{RF}}})}^H}({\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}} - {{\mathbf{F}}_{\mathrm{RF}}})} \right\} \\&\quad = 2N - {{\mathrm{tr}}}\left\{ {2{\mathop {\mathrm{Re}}\nolimits } {{({{\mathbf{F}}_{\mathrm{RF}}})}^H}{\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}}} \right\} \\&\quad = 2N - 2\sum \limits _{n = 1}^N {\sum \limits _{m = 1}^{{N_t}} {{\mathop {\mathrm{Re}}\nolimits } } } \left\{ {\left. {\left. {\left| {{{\mathbf{F}}_{\mathrm{RF}}}(m,n)} \right. } \right| } \right| \left. {{\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}}(m,n)} \right| {e^{j\varphi (m,n)}}} \right\} ,\end{aligned}$$

(24)

where \(\varphi (m,n) = \angle {{\mathbf{F}}_{\mathrm{RF}}}(m,n) - \angle {\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}}(m,n)\). It is clear that when \(\varphi (m,n) = 0\), the objective function is minimized.

Therefore, the analog precoding matrix can be chosen as

$$\begin{aligned} {{{{\bar{\mathbf{a }}}}}_n} = \frac{1}{{\sqrt{M} }}{e^{j\angle {{{{{\bar{\mathbf{a }}}}}}_{n,opt}}}}, \end{aligned}$$

(25)

where \(\angle {{{{\bar{\mathbf{a }}}}}_{n,opt}}\) represents the phase vector of \({{{{\bar{\mathbf{a }}}}}_{n,opt}}\).

Therefore, the sum rate optimization problem can be transformed into a series of sub-rate optimization problems which can be optimized one by one. After that, according to the idea of SIC after sorting, we only need to continuously update \({{\mathbf{S}}_N}\),and the process is shown in Fig. 2.

According to the capacity \({C_{n,{\mathrm{max}} }} \in {\mathrm{max}} \left\{ {\left. {\begin{array}{*{20}{l}} {{C_1}}&{{C_2}}&{\begin{array}{*{20}{l}} \cdots&{{C_N}} \end{array}} \end{array}} \right\} } \right.\), \({\mathbf{F}}_{\mathrm{RF}}^{1,{\mathrm{max}} }\) indicates the analog precoding corresponding to the first optimized antenna array. \({\mathbf{F}}_{\mathrm{RF}}^{2,{\mathrm{max}} }\) is the second analog precoding that needs to be optimized. This process is repeated until the last antenna sub-array is optimized.

### Digital precoding design

Based on the above solution process, the analog precoding matrix \({{\mathbf{F}}_{\mathrm{RF}}}\) can be obtained. In order to obtain the best digital precoding, BD technology is adopted. The MU-MIMO channel is divided into multiple SU-MIMO channels, which is the main idea of applying BD technology. If it can be guaranteed that the signal received by the *k*th user is in the null space of channels of other users, then the inter-user interference will be eliminated. First of all, the transit matrix \({{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}}\) can be expressed as

$$\begin{aligned} {{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}} = {{\mathbf{H}}_k}{{\mathbf{F}}_{\mathrm{RF}}},k \in \{ 1,2, \ldots ,K\} . \end{aligned}$$

(26)

In order to eliminate interference, the constraint can be expressed as

$$\begin{aligned} {{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,j}}{\mathbf{F}}_{\mathrm{BB}}^k = 0,\forall j \ne k. \end{aligned}$$

(27)

To get the digital precoder, \({{{{\tilde{\mathbf{H }}}}}_k}\) can be defined as

$$\begin{aligned} {{{{\tilde{\mathbf{H }}}}}_k} = {\left[ {\begin{array}{*{20}{l}} {\begin{array}{*{20}{l}} {\begin{array}{*{20}{l}} {{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,1}^T,}&{ \cdots ,} \end{array}}&{{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k - 1}^T,}&{{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k + 1}^T,}&{ \cdots ,} \end{array}}&{{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,K}^T} \end{array}} \right] ^T}. \end{aligned}$$

(28)

Then, the digital precoding \({\mathbf{F}}_{\mathrm{BB}}^k\) should fall in the null space of \({{{{\tilde{\mathbf{H }}}}}_k}\). Therefore, SVD of \({{{{\tilde{\mathbf{H }}}}}_k}\) can get

$$\begin{aligned} {{{{\tilde{\mathbf{H }}}}}_k} = {{{{\tilde{\mathbf{U }}}}}_k}{{{{\tilde{\varvec{\Sigma }}}}}_k}{\left[ {{{{\tilde{\mathbf{V }}}}}_k^{(1)},{{{\tilde{\mathbf{V }}}}}_k^{(0)}} \right] ^H}, \end{aligned}$$

(29)

where \({{{{\tilde{\mathbf{U }}}}}_k}\) and \({{{{\tilde{\varvec{\Sigma }}}}}_k}\) represent the left singular value vector of \({{{{\tilde{\mathbf{H }}}}}_k}\) and the diagonal matrix of \({{{{\tilde{\mathbf{H }}}}}_k}\), respectively. \({{{\tilde{\mathbf{V }}}}}_k^{(1)} = {{{{\tilde{\mathbf{V }}}}}_k}(:,1:(K - 1){N_s})\) and \({{{\tilde{\mathbf{V }}}}}_k^{(0)} = {{{{\tilde{\mathbf{V }}}}}_k}(:,(K - 1){N_s} + 1:end)\) represent the subspace orthogonal basis of \({{{{\tilde{\mathbf{H }}}}}_k}\) and the null space orthogonal basis of \({{{{\tilde{\mathbf{H }}}}}_k}\), respectively. Then we can know

$$\begin{aligned} {{{{{\tilde{\mathbf{H }}}}}}_k}{{{\tilde{\mathbf{V }}}}}_k^{(0)}&= {{{{{\tilde{\mathbf{U }}}}}}_k}{{{{{\tilde{\varvec{\Sigma }}} }}}_k}{\left[ {{{{\tilde{\mathbf{V }}}}}_k^{(1)},{{{\tilde{\mathbf{V }}}}}_k^{(0)}} \right] ^H}{{{\tilde{\mathbf{V }}}}}_k^{(0)}\\&= {{{{{\tilde{\mathbf{U }}}}}}_k}{{{{{\tilde{\varvec{\Sigma }}} }}}_k}{({{{\tilde{\mathbf{V }}}}}_k^{(1)})^H}{{{\tilde{\mathbf{V }}}}}_k^{(0)}\\&= 0.\end{aligned}$$

(30)

The channel becomes \({{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}}{{{\tilde{\mathbf{V }}}}}_k^{(0)}\) called an equivalent channel. SVD of the equivalent channel shows

$$\begin{aligned} {{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}}{{{\tilde{\mathbf{V }}}}}_k^{(0)} = {{\mathbf{U}}_k}{{\mathbf{S}}_k}{\left[ {{\mathbf{V}}_k^{(1)},{\mathbf{V}}_k^{(0)}} \right] ^H}. \end{aligned}$$

(31)

where \({{\mathbf{S}}_k}\) represents the diagonal matrix of equivalent channel (\({{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}}{{{\tilde{\mathbf{V }}}}}_k^{(0)}\)). To eliminate inter-user interference, taking the \({\mathbf{V}}_k^{(1)}\) corresponding to the nonzero singular value matrix as the precoding matrix, and the final digital precoding matrix is given by

$$\begin{aligned} {\mathbf{F}}_{\mathrm{BB}}^k = {{{\tilde{\mathbf{V }}}}}_k^{(0)}{\mathbf{V}}_k^{(1)}. \end{aligned}$$

(32)

There are two types of BD algorithms: average power allocation and water-filling power allocation. Since the transmission capacity of each channel is usually different, the application of average power distribution results in the waste of communication resources and even the loss of communication capacity. The principle of the water-filling method is that after each user’s channel is divided into *N* independent sub-channels, the channel of each user of the multi-channel system may be equal to the channel of each bandwidth *B*. According to the Shannon formula, the subchannel capacity of the *k*th user is:

$$\begin{aligned} C(k) = Blb\left( 1 + {\left| {{f_k}} \right| ^2}\frac{{{p_k}}}{{{n_0}}}\right) . \end{aligned}$$

(33)

where \({{p_k}}\), \(\left| {{f_k}} \right|\), and \({{n_0}}\) are the transmission power, frequency response, and noise component of the *k*th subchannel, respectively. Because when *N* is large enough, the SNR of each channel can be regarded as a constant. In the case of known channel SNR, we can assign different power signals to each different channel to achieve the maximum sum rate. Therefore, the maximum sum capacity can be expressed as:

$$\begin{aligned} &{\mathrm{max}} C = \sum \limits _{k = 1}^N {Blb\left( 1 + {{\left| {{f_k}} \right| }^2}\frac{{{p_k}}}{{{n_0}}}\right) } \\ & s.t.\left\{ {\begin{array}{*{20}{l}} {\sum \limits _{k = 1}^N {{p_k} = {P_N}} }\\ {{p_k} \ge 0(n = 1,2, \ldots ,N)} \end{array},} \right.\end{aligned}$$

(34)

where \({{P_N}}\) is the total power. According to the Lagrangian multiplier algorithm, the power \({{p_k}}\) is:

$$\begin{aligned} {p_k} = \frac{B}{\lambda } - \frac{{{n_0}}}{{{f_k}}}, \end{aligned}$$

(35)

where \(\lambda\) is the Lagrangian multiplier factor, \(\frac{B}{\lambda }\) is called the water-filling line of the water-filling principle.

The principle of water-filling can reach the theoretical maximum of sum rate, and get better communication quality, thus it is widely used. The whole process of the algorithm in this paper is shown in Table 1.