Skip to main content

A fast and low-complexity matrix inversion scheme based on CSM method for massive MIMO systems


Massive multiple-input-multiple-output (MIMO), also known as very-large MIMO systems, is an attracting technique in 5G and can provide higher rates and power efficiency than 4G. Linear-precoding schemes are able to achieve the near optimal performance, and thus are more attractive than non-linear precoding schemes. However, conventional linear precoding schemes in massive MIMO systems, such as regularized zero-forcing (RZF) precoding, have near-optimal performance but suffer from high computational complexity due to the required matrix inversion of large size. To solve this problem, we utilize the Cholesky-decomposition and Sherman-Morrison lemma and propose CSM (Cholesky and Sherman-Morrison strategy)-based precoding scheme to the matrix inversion by exploiting the asymptotically orthogonal channel property in massive MIMO systems. Results are evaluated numerically in terms of bit-error-rate (BER)and average sum rate. Comparing with the Neumann series approximation of inversing matrix, it is concluded that, with fewer operations, the performance of CSM-based precoding is better than conventional methods in massive MIMO configurations.

1 Introduction

Massive multiple-input multiple-output(MIMO), i.e., MIMO with large numbers of transmit and/or receive antennas (massive MIMO technology), is widely accepted as one of the key enabling technologies for next generation(e.g., 5G) wireless communication systems [1, 2]. However, with the increasing of the number of dimensions, using conventional MIMO algorithms may not be suitable any more in terms of computational efficiency and new methods must emerge.

Realizing massive MIMO systems has to solve some challenges in practice, one of which is the low-complexity and near-optimal precoding scheme. Conventional precoding methods can be divided into nonlinear-precoding and linear precoding. The optimal precoding is the nonlinear which is called dirty paper precoding (DPC) [3], which can effectively eliminate the interference between different users and achieve optimal performance. However, the most serious drawback of the nonlinear-precoding schemes is high complexity which is unfriendly to hardware. The other nonlinear-precoding schemes, such as lattice-aided precoding [4], can achieve the close-optimal capacity with reduced complexity, but they are still unaffordable when the dimension of the MIMO system is large or the modulation order is high. Fortunately, due to the characters of massive MIMO systems, such as the columns of channel matrix are asymptotically orthogonal [5] and the channel hardening [6], the linear precoding schemes, such as RZF precoding and MMSE precoding, can also achieve the near-optimal capacity, which makes a better trade-off between the complexity and the performance. However, these schemes require computing unfavorable matrix inversion of large size. To solve this problem, based on Neumann series approximation algorithm, [7] proposed the Neumann-based precoding which can reduce the complexity by converting the matrix inversion into a series of matrix-vector multiplications. Then, [810] proposed SOR-based, LSQR-based, and TPE -based schemes, respectively, which are all based Neumann serise. But, these algorithms’ the reduction in complexity is not obvious and do not consider the property of positive defined Hermitian matrix.

In this paper, we propose CSM-based precoding to reduce the complexity of matrix inversion for classical RZF or MMSE precoding. This is motivated by the fact that the matrix which needs to be inversed in RZF or MMSE precoding is a positive definite Hermitian matrix and tends to be diagonal dominant in massive MIMO systems [1], which provides the potential to utilize the Cholesky-Decomposition [11] and Sherman-Morrison lemma [12]. We also conclude that CSM-based precoding can enjoy a better performance and lower computation complexity than the Neumann-based precoding and SOR-based precoding. The impact of the algorithms in the average sum rate and BER are evaluated and compared via numerical simulations.

Notation: lower-case and upper-case boldface letters denote vectors and matrices, respectively; (·)T,(·)H,(·)−1, det(·) and t r(·) denote the transpose, conjugate transpose, matrix inversion, determinant and trace, respectively; C denotes the set of complex numbers, I N is the N×N identity matrix.

2 System model

We consider the typical downlink transmission of a massive multi-user MIMO system, where each base station (BS) in cells is equipped with N antennas and communicates with K single-antenna users (KN) [1]. Meanwhile, we also assume that channel state information (CSI) is known at the BS perfectly, which is a common assumption in massive MIMO systems [13, 14] and can be acquired by the training pilot [15]. More importantly, we assume that time division duplex (TDD) protocols are used so that the channel vectors are equal for both directions. In time division duplex (TDD) massive MIMO systems, the BS estimates the uplink channel by using the pilots that users send in the uplink, and then the downlink CSI can be easily acquired by using existing channel reciprocity in TDD systems.

In the downlink, the received signal at ith user in jth cell is:

$$ {y_{j,i}} = {\boldsymbol{h}_{\boldsymbol{j,i}}{\boldsymbol{x}_{\boldsymbol{j,i}}} + {n_{j,i}}} $$

where, x j,i is the transmit signal after precoding in jth cell and h j,i represents the random channel vector from jth cell’s BS to ith user. And n j,i is additive white Gaussian noise (AWGN) and follows the distribution \(CN(0,{\sigma _{n}^{2}})\). The channel vector from jth cell’BS to ith user can be specified as below form:

$$ \boldsymbol{h}_{\boldsymbol{j,i}} = {\kappa_{j,i}}\boldsymbol{R}_{\boldsymbol{j,i}}^{1/2}{\boldsymbol{z}_{\boldsymbol{j,i}}} $$

where, z j,i is small scale fading vector that is independent and identically distributed (i.i.d) zero mean, circularly-symmetric complex Gaussian random variables C N(0,1),κ j,i is large scale fading coefficient which is accounted by path loss and shadow fading of each ith user that changes slowly and can remain constant over a coherence time interval and known prior and R j,i is the channel covariance matrix.

Thus, the received vector in jth cell, denoted by y j =[y j,1,y j,2,y j,3···y j,K ]TC K×1, at the receiver is given as:

$$ \boldsymbol{y}_{\boldsymbol{j}} = \boldsymbol{H}_{\boldsymbol{j}} \cdot \boldsymbol{x}_{\boldsymbol{j}} + \boldsymbol{n}_{\boldsymbol{j}} $$

where, H j =[h j,1,h j,2,h j,3···h j,N ]C K×N is the downlink channel matrix, which contains small scale fading factor, large scale fading coefficient and channel covariance matrix. x j C N×1 is signal vector after precoding which satisfies the power limitation \(E[\left \| \boldsymbol {x}_{\boldsymbol {j}} \right \|_{2}^{2}] \le K\). And n j C K×1 is additive white Gaussian noise (AWGN) and follows the distribution \(CN(0,{\sigma _{n}^{2}})\).

For massive MIMO systems, linear-precoding is usually considered, so we have

$$ \boldsymbol{x}_{\boldsymbol{j}} = \boldsymbol{T}_{\boldsymbol{j}} \cdot \boldsymbol{s}_{\boldsymbol{j}}, $$

where T j =[t j,1,t j,2,t j,3···t j,K ]C N×K is the precoding matrix, and s j =[s j,1,s j,2,s j,3···s j,K ]TC K×1 is the transmitted signal vector for K users. We also denote the total transmit power constraints at BS in jth cell as

$$ tr\left({{\boldsymbol{H}_{j}} \cdot \boldsymbol{H}_{j}^{H}} \right) = {P_{j}}, $$

where P j is the total transmit power in jth cell.

In the next section, we will analyze the existing classic massive MIMO precoding method (i.e., RZF precoding and MMSE precoding), as well as the proposed CSM-based scheme.

3 Low-complexity linear-precoding scheme in massive MIMO

In this section, we first give a basic background of the conventional RZF precoding and MMSE precoding. Then, we give the relative knowledge of Cholesky-decomposition [11] and Sherman-Morrison lemma [12]. After that, we prove that CSM-based precoding has a lower computational complexity and better performance than Neumann-based precoding and SOR-based precoding. Finally, we give the pseudo code of CSM-based precoding algorithm.

3.1 Conventional RZF and MMSE precoding

The conventional RZF precoding and MMSE precoding matrix can be expressed, respectively, as [6]:

$$ \boldsymbol{T}_{\boldsymbol{RZF}} = {\rho_{RZF}} \cdot {\boldsymbol{H}^{H}} \cdot {(\boldsymbol{H} \cdot {\boldsymbol{H}^{H}} + \phi \cdot {\boldsymbol{I}_{K}})^{- 1}} $$
$$ \boldsymbol{T}_{\boldsymbol{MMSE}} = {\rho_{MMSE}} \cdot {\boldsymbol{H}^{H}} \cdot {(\boldsymbol{H} \cdot {\boldsymbol{H}^{H}} + \left({{{\sigma_{n}^{2}}} \cdot {n_{t}}} \right) \cdot {\boldsymbol{I}_{K}})^{- 1}} $$

Thus, the classical RZF precoding and MMSE precoding matrix in jth cell is:

$$ \boldsymbol{T}_{\boldsymbol{RZF,j}} = \rho_{RZF,j} \cdot \boldsymbol{H}_{\boldsymbol{j}}^{H} \cdot {(\boldsymbol{H}_{\boldsymbol{j}} \cdot {\boldsymbol{H}_{\boldsymbol{j}}^{H}} + \phi \cdot {\boldsymbol{I}_{K}})^{- 1}} $$
$$ = \rho_{RZF,j} \cdot \boldsymbol{H}_{\boldsymbol{j}}^{H} \cdot \boldsymbol{W}_{\boldsymbol{RZF,j}}^{- 1} $$
$$ \boldsymbol{T}_{\boldsymbol{MMSE,j}} = \rho_{MMSE,j} \cdot \boldsymbol{H}_{\boldsymbol{j}}^{H} \cdot {(\boldsymbol{H}_{\boldsymbol{j}} \cdot {\boldsymbol{H}_{\boldsymbol{j}}^{H}} + \left({{{\sigma_{n}^{2}}} \cdot {n_{t}}} \right) \cdot {\boldsymbol{I}_{K}})^{- 1}} $$
$$ = \rho_{MMSE,j} \cdot \boldsymbol{H}_{\boldsymbol{j}}^{H} \cdot \boldsymbol{W}_{\boldsymbol{MMSE,j,}}^{- 1} $$

where \({\boldsymbol {W}_{\boldsymbol {RZF,j}}} = {({\boldsymbol {H}_{\boldsymbol {j}}} \cdot \boldsymbol {H}_{\boldsymbol {j}}^{H} + \phi \cdot {\boldsymbol {I}_{\boldsymbol {K}}})}\) and \({\boldsymbol {W}_{\boldsymbol {MMSE,j}}} = {({\boldsymbol {H}_{\boldsymbol {j}}} \cdot \boldsymbol {H}_{\boldsymbol {j}}^{H} + \left ({{{\sigma _{n}^{2}}} \cdot {n_{t}}} \right) \cdot {\boldsymbol {I}_{\boldsymbol {K}}})}\). The ϕ j is regularized parameter which can be adaptively selected according to the different CSI, \({\sigma _{n}^{2}}\) is noise power and n t is number of transmitted antenna [6, 16]. And, the ρ R Z F,j or ρ M M S E,j is the power normalization factor which makes RZF precoding or MMSE precoding satisfy the power limitation. Therefore, ρ R Z F,j and ρ M M S E,j can be computed by:

$$ \rho_{RZF,j} = \sqrt {\frac{K}{{tr\left({{\boldsymbol{W}_{\boldsymbol{RZF,j}}}} \cdot {\boldsymbol{W}_{\boldsymbol{RZF,j}}^{H}}\right)}}} $$
$$ \rho_{MMSE,j} = \sqrt {\frac{K}{{tr\left({{\boldsymbol{W}_{\boldsymbol{MMSE,j}}}} \cdot {\boldsymbol{W}_{\boldsymbol{MMSE,j}}^{H}}\right)}}} $$

So, the signal vector x can be rewritten as:

$$ \boldsymbol{x} = {\boldsymbol{T}_{\boldsymbol{RZF}}} \cdot \boldsymbol{s} = {\rho} \cdot {\boldsymbol{H}^{H}} \cdot {\boldsymbol{W}_{\boldsymbol{RZF}}^{- 1}} \cdot \boldsymbol{s} $$


$$ \boldsymbol{x} = {\boldsymbol{T}_{\boldsymbol{MMSE}}} \cdot \boldsymbol{s} = {\rho} \cdot {\boldsymbol{H}^{H}} \cdot {\boldsymbol{W}_{\boldsymbol{MMSE}}^{- 1}} \cdot \boldsymbol{s} $$

We can observe from the (9) or (11) that a matrix W RZF or W MMSE inversion of K×K size is required which means that the high computational complexity is unexpected and unacceptable. If we compute the inversion directly, the resource of hardware would be wasted greatly. Therefore, combining the property of matrix W RZF or W MMSE and some mathematic knowledge such as Cholesky-Decomposition [11] and Sherman-Morrison lemma [12], we design a low-complexity precoding scheme to solve this problem.

3.2 CSM-based precoding

To reduce the computational complexity for precoding, we propose to use CSM-based scheme to avoid the complicated matrix inversion of large size in RZF or MMSE precoding. First, it is necessary to verify that the matrix W RZF and W MMSE are positive definite Hermitian matrix. Here, assuming an arbitrary nonzero vector tC N×1, then we can certificate that

$$ \boldsymbol{t} \cdot \boldsymbol{W}_{\boldsymbol{RZF}} \cdot {\boldsymbol{t}^{H}} = \boldsymbol{t} \cdot (\boldsymbol{H} \cdot {\boldsymbol{H}^{H}} + \phi \cdot {\boldsymbol{I}_{K}}) \cdot {\boldsymbol{t}^{H}} $$
$$ = \boldsymbol{t} \cdot \boldsymbol{H} \cdot {(\boldsymbol{t} \cdot \boldsymbol{H})^{H}} + \boldsymbol{t} \cdot (\phi \cdot {\boldsymbol{I}_{K}}) \cdot {\boldsymbol{t}^{H}} > 0 $$


$$ \boldsymbol{t} \cdot \boldsymbol{W}_{\boldsymbol{MMSE}} \cdot {\boldsymbol{t}^{H}} = \boldsymbol{t} \cdot (\boldsymbol{H} \cdot {\boldsymbol{H}^{H}} + \left({{{\sigma_{n}^{2}}} \cdot {n_{t}}} \right) \cdot {\boldsymbol{I}_{K}}) \cdot {\boldsymbol{t}^{H}} $$
$$ = \boldsymbol{t} \cdot \boldsymbol{H} \cdot {(\boldsymbol{t} \cdot \boldsymbol{H})^{H}} + \boldsymbol{t} \cdot (\left({{{\sigma_{n}^{2}}} \cdot {n_{t}}} \right) \cdot {\boldsymbol{I}_{K}}) \cdot {\boldsymbol{t}^{H}} > 0. $$


$$ {\boldsymbol{W}_{\boldsymbol{RZF}}^{H}} = {(\boldsymbol{H} \cdot {\boldsymbol{H}^{H}} + \phi \cdot {\boldsymbol{I}_{K}})^{H}} = \boldsymbol{W}_{\boldsymbol{RZF}} $$


$$ {\boldsymbol{W}_{\boldsymbol{MMSE}}^{H}} = {(\boldsymbol{H} \cdot {\boldsymbol{H}^{H}} + \left({{{\sigma_{n}^{2}}} \cdot {n_{t}}} \right) \cdot {\boldsymbol{I}_{K}})^{H}} = \boldsymbol{W}_{\boldsymbol{MMSE}}. $$

So, the important conclusion that W RZF and W MMSE are positive definite Hermitian matrix is clear. Due to RZF precoding is as same as MMSE precoding in decomposition and inversion process. Therefore, we just utilize the RZF precoding as an example to demonstrate the CSM-based scheme. Now, we can utilize the Cholesky-decomposition [11] to decompose the matrix W RZF as

$$ \boldsymbol{W}_{\boldsymbol{RZF}} = \boldsymbol{L} \cdot {\boldsymbol{L}^{H}}, $$

where matrix L is the lower triangular matrix. Then, we have knowledge that

$$ {\boldsymbol{W}_{\boldsymbol{RZF}}^{- 1}} = {({\boldsymbol{L}^{H}})^{- 1}} \cdot {\boldsymbol{L}^{- 1}}. $$

Thus, computing the inversion of matrix W RZF can be transformed into computing the inversion of matrix L. Continue to utilize the Sherman-Morrison lemma [12] to iterate the process of computing the L s matrix inversion. Here, we should introduce the Sherman-Morrison lemma [12].

Suppose A is an invertible square matrix and x,y are column vectors. Suppose furthermore that 1+y H·A −1·x≠0 and (A+x·y H) is invertible. Then, the Sherman-Morrison formula states that

$$ {\left(\boldsymbol{A} + \boldsymbol{x} \cdot {\boldsymbol{y}^{H}}\right)^{- 1}} = {\boldsymbol{A}^{- 1}} - \frac{{{\boldsymbol{A}^{- 1}} \cdot \boldsymbol{x} \cdot {\boldsymbol{y}^{H}} \cdot {\boldsymbol{A}^{- 1}}}}{{1 + {\boldsymbol{y}^{H}} \cdot {\boldsymbol{A}^{- 1}} \cdot \boldsymbol{x}}}. $$

This lemma inspires us that we can utilize iteratively method to calculate several times’ simple matrix inversion instead of computing directly complex matrix inversion and eventually simplify the high computational complexity to lower computational complexity.

In order to keep our computational complexity lower, we have to continue to decompose matrix L:

$$ \boldsymbol{L} = \boldsymbol{D} + \boldsymbol{L}^{'}, $$

where D=d i a g(l 1,1,l 2,2,···,l K,K ) is the diagonal matrix which is diagonal component of L and \({\boldsymbol {L}^{'}} = (\boldsymbol {l}_{1}^{'},\boldsymbol {l}_{2}^{'}, \cdot \cdot \cdot, \boldsymbol {l}_{K - 1}^{'},0)\) is a matrix based on L which the diagonal elements of L are replaced by zero.

Based on above the idea, we can calculate the inversion of matrix L. First, the matrix should be rewritten as the following form:

$$ \boldsymbol{L} = \boldsymbol{D} + {\boldsymbol{L}^{'}} $$
$$ = \boldsymbol{D} + \boldsymbol{l}_{1}^{'} \cdot {\boldsymbol{e}_{1}} + \boldsymbol{l}_{2}^{'} \cdot {\boldsymbol{e}_{2}} + \cdot \cdot \cdot + \boldsymbol{l}_{K - 1}^{'} \cdot {\boldsymbol{e}_{K - 1}} + \boldsymbol{0} \cdot {\boldsymbol{e}_{K}}, $$

where e i is the ith row of identity matrix \({\boldsymbol {I}_{K}}, {\boldsymbol {l}_{i}^{\prime }}\phantom {\dot {i}\!}\) is the ith column of matrix L . Thus, we can compute the inversion by:

$$ {\boldsymbol{L}^{- 1}} = {({\boldsymbol{F}_{K - 2}} + \boldsymbol{l}_{K - 1}^{'} \cdot {\boldsymbol{e}_{K - 1}})^{- 1}} $$
$$ = {({\boldsymbol{F}_{K - 2}})^{- 1}} - \frac{{{{({\boldsymbol{F}_{K - 2}})}^{- 1}} \cdot \boldsymbol{l}_{K - 1}^{'} \cdot {\boldsymbol{e}_{K - 1}} \cdot {{({\boldsymbol{F}_{K - 2}})}^{- 1}}}}{{1 + {\boldsymbol{e}_{K - 1}} \cdot {{({\boldsymbol{F}_{K - 2}})}^{- 1}} \cdot \boldsymbol{l}_{K - 1}^{'}}}, $$

where \({\boldsymbol {F}_{K - 2}} = \boldsymbol {D} + \sum \limits _{i = 1}^{K - 2} {\boldsymbol {l}_{i}^{'} \cdot {\boldsymbol {e}_{i}}}\). Therefore, we should continue to the inversion of F K−2:

$$ {\boldsymbol{F}_{K - 2}}^{- 1} = {({\boldsymbol{F}_{K - 3}} + \boldsymbol{l}_{K - 2}^{'} \cdot {\boldsymbol{e}_{K - 2}})^{- 1}} $$
$$ = {({\boldsymbol{F}_{K - 3}})^{- 1}} - \frac{{{{({\boldsymbol{F}_{K - 3}})}^{- 1}} \cdot \boldsymbol{l}_{K - 2}^{'} \cdot {\boldsymbol{e}_{K - 2}} \cdot {{({\boldsymbol{F}_{K - 3}})}^{- 1}}}}{{1 + {\boldsymbol{e}_{K - 2}} \cdot {{({\boldsymbol{F}_{K - 3}})}^{- 1}} \cdot \boldsymbol{l}_{K - 2}^{'}}}, $$

where \({\boldsymbol {F}_{K - 3}} = \boldsymbol {D} + \sum \limits _{i = 1}^{K - 3}{\boldsymbol {l}_{i}^{'} \cdot {\boldsymbol {e}_{i}}}\). And so on, we can compute that

$$ {\boldsymbol{F}_{1}}^{- 1} = {({\boldsymbol{F}_{0}} + \boldsymbol{l}_{1}^{'} \cdot {\boldsymbol{e}_{1}})^{- 1}} $$
$$ = {({\boldsymbol{F}_{0}})^{- 1}} - \frac{{{{({\boldsymbol{F}_{0}})}^{- 1}} \cdot \boldsymbol{l}_{1}^{'} \cdot {\boldsymbol{e}_{1}} \cdot {{({\boldsymbol{F}_{0}})}^{- 1}}}}{{1 + {\boldsymbol{e}_{1}} \cdot {{({\boldsymbol{F}_{0}})}^{- 1}} \cdot \boldsymbol{l}_{1}^{'}}}, $$

where F 0=D. Due to the D matrix is diagonal matrix, its inversion process is simple and efficient. Therefore, the whole process of inversion can be calculate by basic mathematical process and simple process of iteration.

Finally, we can compute the inversion of L by iterating K−1 times.

3.3 Complexity

We evaluate the computational complexity in terms of required number of complex multiplications which is more dominant and complex than other operations for the total computational complexity.

According to [17], we have knowledge that the Cholesky-decomposition can be decomposed quickly and accurately by hardware such as FPGA. So, the computational complexity of the decomposition can be ignored. Observing the numerator \({{{({\boldsymbol {F}_{i-1}})}^{- 1}} \cdot \boldsymbol {l}_{i}^{'} \cdot {\boldsymbol {e}_{i}} \cdot {{({\boldsymbol {F}_{i-1}})}^{- 1}}}\) in (29) or in (31), the column vector of \({\boldsymbol {l}_{i}^{'}}\) and the row vector of e i have many zero elements (much more than half). Utilizing computing features of sparse matrix and vector from [18, 19], we have knowledge that the computational complexity of \(\frac {{{{({\boldsymbol {F}_{i - 1}})}^{- 1}} \cdot \boldsymbol {l}_{i}^{'} \cdot {\boldsymbol {e}_{i}} \cdot {{({\boldsymbol {F}_{i - 1}})}^{- 1}}}}{{1 + {\boldsymbol {e}_{i}} \cdot {{({\boldsymbol {F}_{i - 1}})}^{- 1}} \cdot \boldsymbol {l}_{i}^{'}}}\) is O(4K+1). Thus, after K−1 times of the iterative process, the whole computational complexity of L −1 is O(4K 2−3K−1). Thus, we achieve that the computational complexity of CSM-based RZF precoding is O(4K 2).

In [7], the Neumann-based precoding is

$$ {\boldsymbol{W}^{- 1}} \approx \sum\limits_{n = 0}^{N} {{{({\boldsymbol{I}_{K}} - \boldsymbol{D} \cdot \boldsymbol{Z})}^{n}} \cdot \boldsymbol{D}}, $$

where \(\boldsymbol {D} = diag(\frac {1}{{{\boldsymbol {w}_{1,1}}}},\frac {1}{{{\boldsymbol {w}_{2,2}}}}, \cdot \cdot \cdot,\frac {1}{{{\boldsymbol {w}_{K,K}}}})\). Thus, when the N≥3, the complexity of the Neumann-based precoding is O(K 3), which means the reduction in the complexity of RZF precoding is not obvious. On the other hand, the computational complexity of proposed CSM-based precoding is O(K 2). When L=2, complexity of Neumann-based precoding reduces to O(K 2), but its performance would be greatly reduced.

In [8], the SOR-based precoding which is an improvement of Neumann is utilizing (22) to calculate the result of W −1·s by iterating

$$ {\boldsymbol{t}^{(i + 1)}} = {\left(\boldsymbol{D} + \omega \boldsymbol{L}\right)^{- 1}}\left(\omega \boldsymbol{s} + \left((1 - \omega)\boldsymbol{D} - \omega {\boldsymbol{L}^{H}}\right){\boldsymbol{t}^{(i)}}\right), $$

where the D and L are diagonal component and lower triangular component respectively and satisfy W=D+L+L H. t (i+1) denote the result after (i+1)t h iterator. ω is the relaxation parameter and can be computed by

$$ \omega = a \cdot {e^{- b \cdot (M/K)}} + c. $$

Here, a=0.404,b=0.323 and c=1.035. According the analyze the SOR scheme’s computational complexity and performance, we have knowledge that when the times of iterating is i=3, its performance of sum rate is near optimal but the performance of BER worse than CSM-based precoding, even its complexity is O(4K 2) which is as same as CSM-based scheme. Therefore, CSM-based precoding has lower computational complexity than Neumann-based precoding and SOR-based precoding and better performance.

3.4 Pseudo code of CSM-based precoding algorithm

In this part, we only give the pseudo code of core part of the CSM-based precoding which is the process of calculating the inversion of matrix W.

4 Simulation result

We provide the simulation results of average sum rate and BER of the proposed CSM-based precoding in a 256×16 massive MIMO system and a 256×32 massive MIMO system. For convenience, we set the regularized parameter ϕ=K/S N R. The RZF precoding with exact matrix inversion is also set as benchmark. We compare the performance among RZF precoding, CSM-based scheme, Neumann approximate series and SOR-based precoding in one cell.

Figure 1 compares the average sum rate among RZF precoding, CSM-based scheme, Nuemann-based precoding and SOR-based precoding. From the figure, we can observe that performance of CSM-based scheme as well as RZF and better than the SOR-based precoding and Neumann-based method. In addition, although during the i increasing, e.g. i=3 which means the computational complexity of the SOR-based precoding and Neumann-based method precoding are O(K 3) and O(4K 2), respectively, the performance of SOR-based and Neumann-based have improvement in different level, the performance of Neumann and SOR are sitll close to but less than CSM-based scheme. So, the performance of CSM-based is the best among them.

Fig. 1
figure 1

Average sum rate performance comparison for 256×16 massive MIMO System in Rayleigh fading channels

Figures 2 and 3 show that the BER performance comparison in Rayleigh fading channels. From the two figures, we can obviously find that when the number of users goes to large in massive MIMO systems, the BER performance of all precoding schemes suffers from severe performance loss due to the limited number of BS antennas in practical systems, but the CSM-based scheme loss less than Neumann-based and SOR-based precoding which means its Robust is the best among them. Then, it is clear that when the i increasing, BER performance of Neumann-based and SOR-based precoding have improvement in some extend. But the BER performance of CSM-based precoding is still better than that of Neumann-based precoding and SOR-based precoding even i=3 and close to the RZF. In addition, as SNR increasing, the performance of the proposed CSM-based precoding improves faster.

Fig. 2
figure 2

BER performance comparison for 256×16 massive MIMO system in Rayleigh fading channels

Fig. 3
figure 3

BER performance comparison for 256×32 massive MIMO System in Rayleigh fading channels

5 Conclusions

In this paper, we exploit the special channel property of massive MIMO systems and some mathematic lemmas to propose the CSM-based scheme to reduce the computational complexity from O(K 3) to O(K 2). Meanwhile, CSM-based precoding scheme is able to achieve the near-optimal performance by decomposition and iteratively approach the exact matrix inversion of large size in RZF precoding or MMSE precoding. Simulation results utilize the RZF-based precoding as an example to illustrate that when increasing the SNR and keeping the N/K fixed, CSM-based precoding performances of BER and average sum rate are better than Neumann series and other schemes proposed based on Neumann series such as SOR-based precoding. Moreover, CSM-based scheme approaches the near-optimal performance of RZF precoding in Rayleigh fading channels.


  1. EG Larsson, O Edfors, F Tufvensson, TL Marzetta, Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 52(2), 186–195 (2014).

    Article  Google Scholar 

  2. L Lu, GY Li, AL Swindlehurst, A Ashikhmin, R Zhang, An overview of massive MIMO: benefits and challenges. IEEE J. Sel. TopicsSignal Process. 8(5), 742–758 (2014).

    Article  Google Scholar 

  3. MH Costa, Writing on dirty paper (corresp).IEEE Trans. Inf. Theory. 29(3), 439–441 (1983).

    Article  MATH  Google Scholar 

  4. JH Lee, Lattice precoding and pre-distorted constellation in degraded broadcast channel with finite input alphabets. IEEE Trans. Commun. 58(5), 1315–1320 (2010).

    Article  Google Scholar 

  5. F Rusek, D Persson, BK Lau, EG Larsson, TL Marzetta, O Edfors, F Tufvesson, Scaling up MIMO: opportunities and challenges with very large arrays. IEEE Signal Process. Mag. 30(1), 40–60 (2013).

    Article  Google Scholar 

  6. A Chockalingam, B Sundar Rajan, Large MIMO Systems, India Institute of Science (Foundations: CAMBRIDGE UNIVERSITY PRESS, Inc., Bangalore, 2004).

    Google Scholar 

  7. H Prabhu, J Rodrigues, O Edfors, F Rusek, in Wireless Communications and Networking Conference (WCNC), 2013. Approximativematrix inverse computations for very-large MIMO and applications tolinear pre-coding systems (IEEEShanghai, 2013), pp. 2710–2715.

    Chapter  Google Scholar 

  8. T Xie, Q Han, H Xu, Z Qi, W Shen, in 81st Vehicular Technology Conference(VTC Spring). A Low-Complexity Linear Precoding Scheme Based on SOR Method for Massive MIMO Systems, (IEEEGlasgow, 2015), pp. 1–5.

    Google Scholar 

  9. T Xie, Z Lu, Q Han, J Quan, B Wang, in 82nd Vehicular Technology Conference(VTC Fall). Low-Complexity LSQR-Based Linear Precoding for Massive MIMO Systems (IEEEBoston, 2015), pp. 1–5.

    Google Scholar 

  10. A Mullerx, A Kammounz, E Bjornsonxy, M Debbah, in 8th Sensor and multichannel Signal Processing Workshop(SAM). Efficient Linear Precoding for Massive MIMO Systems using Truncated Polynomial Expansion (IEEEA Coruña, 2014), pp. 273–276.

    Google Scholar 

  11. A Rontogiannis, V Kekatos, K Berberidis, A square-root adaptiveV-BLAST algorithm for fast time-varying MIMO channels. IEEE Signal Process. Lett. 13(5), 265–268 (2006).

    Article  MATH  Google Scholar 

  12. J Sherman, WJ Morrison, Adjustment of an inverse matrix corresponding to change in the elements of a given column or a given row of the original matrix(abstract). Ann. Math. Statist. 20:, 621 (1949).

    Google Scholar 

  13. TL Marzetta, Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. Wirel. Commun. 9(11), 3590–3600 (2010).

    Article  Google Scholar 

  14. L Dai, J Wang, Z Wang, P Tsiaflakis, M Moonen, Spectrum and energy-efficient OFDM based on simultaneous multi-channel reconstruction. IEEE Trans. Singal Process. 61(23), 6047–6059 (2013).

    Article  MathSciNet  Google Scholar 

  15. L Dai, Z Wang, C Pan, Z Yang, Wireless positioning using TDSOFDM signals in single-frequency networks. IEEE Trans. Broadcast. 58(2), 236–246 (2012).

    Article  Google Scholar 

  16. BL Ng, JS Evans, SV Hanly, D Aktas, Distributerd downlink beamforming with cooperative base station. IEEE Trans. Inf. Theory. 54(12), 5491–5499 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  17. C-J Wei, C-S Zhang, J Liu, A new method of solving inverse matrix based on Cholesky matrix, Electron. Des. Eng. 22(1), 159–164 (2014).

    MathSciNet  Google Scholar 

  18. N Goharian, D Grossman, T El-Ghazawi, in Proceedings: International Conference on Information Technology: Coding and Computing (ITCC), 2001. Enterprise text processing:A sparse matrix approach (IEEELas Vegas, 2001).

    Google Scholar 

  19. N Goharian, A Jain, Q Sun, Comparative analysis of sparse matrix algorithms for information retrieval. J. Syst. Cybern. Inform. 1(1) (2010).

Download references


This work was supported by the 863 Program of China under Grant (No.2015AA01A703), the Fundamental Research Funds for the Central Universities under grant (No.2014ZD03-02)NSFC (No. 61571055) and fund of SKL of MMW (No.K201501). Here, it is very grateful to the instructor and above funding for helping my subject.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yue Xu.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xu, Y., Zou, W. & Du, L. A fast and low-complexity matrix inversion scheme based on CSM method for massive MIMO systems. J Wireless Com Network 2016, 251 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Massive MIMO
  • Cholesky-decomposition
  • Sherman-Morrison lemma
  • Neumann series
  • RZF
  • CSM-based precoding