Low computational complexity methods for decoding of STBC in the uplink of a massive MIMO system

Reducing the computational complexity of the modern wireless communication systems such as massive MIMO configurations is of utmost interest. In this paper, we propose algorithms which can be used to accelerate matrix inversion and reduce the complexity of common spatial multiplexing schemes in massive MIMO systems. Here, we specifically investigate the performance of the proposed methods in systems that utilize STBC (Space-Time Block Code) in the uplink of dynamic massive MIMO systems for different scenarios. A multi-user system in which the base station is equipped with a large number of antennas and each user has two antennas is considered. In addition, users can enter or exit the system dynamically. For a given space-time block coding/decoding scheme, the computational complexity of the receiver will be significantly reduced by employing the proposed methods. The first approach is utilizing Neumann series to approximate the inverse matrix for linear decoders. The second tactic is reducing the computational complexity of the STBC decoders when a user is added to system or removed from it. In the proposed schemes, the matrix inversion for ZF and MMSE decoding is derived from inversing a partitioned matrix and Woodbury matrix identity. Furthermore, the suggested techniques can be utilized when the number of users is fixed but the CSI changes for a particular user. The mathematical equations for both approaches are derived and the complexity of the suggested methods is compared to the direct computation of the inverse matrix. Moreover, the performance of the proposed algorithms is evaluated in terms of the system BER (bit error rate). Evaluations confirm the effectiveness of the proposed approaches.


Introduction
Massive MIMO (multiple-input multiple-output) has been explored as one of the underlying technologies for the new generations of wireless communication systems in recent years [1]. In massive MIMO configuration for cellular communications the BS (base station) is equipped with a large number of antennas and simultaneously serves multiple users. In such formations high capacity, energy efficiency as well as high reliability can be achieved via relatively simple signal processing techniques [2].
Additionally, when the number of antennas at the BS is very large, uplink communication channels will be asymptotically orthogonal. Therefore, when multiple users transmit signal in the same frequency band and the same time slots, virtual point-to-point SIMO (single-input multiple-output) links are established in which each user has single antenna and the BS has multiple antennas. As a result, intra-cell/inter-cell interference can be largely eliminated utilizing simple linear signal processing methods such as ZF (zero forcing) or MMSE (minimum mean square error) decoders [3]. Moreover, because the capacity of the multiple antenna systems is proportional to the minimum number of transmit and receive antennas [4], using one antenna in the transmitter will lower the overall throughput of the system. Spatial multiplexing methods can be used to increase the total capacity of the system. For instance one solution to improve the diversity gain of each user in the uplink communication is using multiple antennas along with STBC (space-time block code) at the user side [4][5][6][7][8]. It has been shown that by using a good space-time block code with full diversity and linear receiver, the intercellular interference problem can be solved to a large extent [4]. For a massive MIMO system with two antennas at the user terminal, sufficient condition to design a good STBC with linear receivers is studied in [4]. Also, its performance in terms of attainable throughput is investigated.
It is worth mentioning that many benefits of various massive MIMO configurations come at the price of high computational complexity. For example, when the number of users increases, the linear STBC decoding methods such as ZF and MMSE algorithms require inverting a matrix with large dimensions. Therefore, computationally efficient methods must be developed to cope with this challenge and make the hardware implementation feasible.
In [9][10][11][12][13][14], researchers have explored ideas that aim to reduce computational complexity in different scenarios. Consider a cellular system with M users that are connected to the BS simultaneously and some of the users are moving with high speed. Complexity reduction has been investigated for the cases in which a user is added to the cell or removed from it as well as the case when a user's CSI (channel state information) is changed. In these circumstances, if we calculate the exact inverse of the decoder matrix using conventional methods such as Cholesky decomposition, high computational load will be imposed on the system. In this paper, we propose approaches to reduce the computational complexity at the receiver. One technique is to employ methods to approximate the inverse matrix such as Neumann series. Moreover, we propose calculating the exact inverse matrix by utilizing available information and matrix inversion identities to update the current inverse matrix.

Methods
In this work, the STBC scheme presented in [4] is adopted for a massive MIMO system and low complexity matrix inversion techniques are proposed and evaluated at the receiver of the uplink of the considered configuration. In other words, we will explore solutions to recover data from the received signal with lower computational complexity and without significant performance degradation.
One possible approach is approximating the inverse of the decoder matrix. For example, Neumann series has been used to calculate the inverse of a matrix at the receiver [10]. In the same work, it has been demonstrated that as long as the number of BS antennas is much larger than the number of users, BLER (block error rate) is similar to the case when an exact inverse is calculated while the required computations is reduced by one order of magnitude. Here, we examine the complexity and the BER performance of this method for the considered system model for the different numbers of terms to be computed for the series that is referred to as the order of the Neumann series.
The next approach is proposed for a dynamic massive MIMO system. By dynamic we mean that the users are entering the system or exiting from it. In this situation, it is not necessary to recalculate the inverse of the linear decoder matrix and the existing inverse matrix is updated. For the selected STBC scheme, based on the matrix inversion lemmas such as the inverse of a partitioned matrix and the Woodbury formula [15], we propose and evaluate low-complexity methods to speed up STBC ZF and MMSE decoders. Update equations are derived for the cases that a user is added to or removed from the system as well as the case in which the channel estimate of a user is changed.
Algorithms are evaluated and compared in terms of BER performance and computational complexity. The proposed algorithms need fewer computations which naturally leads to reduction in the run time of a SDR (software-defined radio) program or the complexity of implemented hardware for the. Not only can these algorithms be used in a slow fading environment by switching active users, but also could be used in fast fading channels with frequent changes to the user channel estimates.

System model
Consider the uplink of a cellular multi-user massive MIMO system in which the BS is equipped with N antennas and serves M user (M < N) such that each independent user has two antennas, as illustrated in Fig. 1. The channel is supposed to follow Rayleigh small fading and large scale path loss and shadowing model. The channel gain between the jth antenna of the mth user and the nth antenna of the BS is formulated as β nmj h nmj (1 ≤ n ≤ N, 1 ≤ m ≤ M, j = 1, 2), where β nmj is related to the large scale path loss and shadowing and h nmj denotes the small scale fading. It is assumed that β nmj = β m for n = 1, …, N and j = 1, 2. In addition, to normalize the average power, we assume that β 1 = 1 and β 1 ≥ β 2 ≥ … ≥ β M . Based on the Rayleigh fading model, h nmj is assumed to be an i.i.d (independent and identically distributed) zero mean, circularly symmetric complex Gaussian random variable with unit variance. Furthermore, the fading coefficients and the large scale channel gains from the mth user to the BS are expressed as H m = [h nmj ] N × 2 and L m = β m I 2 respectively. Suppose STBC is adopted by each subscriber in the cell, and the code of the mth user is expressed as X m with the size of 2 × S. With these assumptions, the received signals in the base station over S time slots, Y N × S , is written as follows: with energy restriction EftrðXX H Þg ¼ 2S, superscripts T and H represent the matrix Transpose and Hermitian operators, respectively. Also, ρ demonstrates the received SNR and ffiffiffiffiffiffiffi ffi 1=2 p is used to normalize the transmitted signal energy to be "1" per time slot. W N × S represents the noise whose entries are i.i.d. taken from the zero-mean, circularly symmetric complex Gaussian random variables with unit variance. Next, we explain the STBC coding and decoding algorithms.

Coding matrix for each user
The transmitted signal matrixX is a STBC which is sent from two transmit antennas over S time slots. In this paper, we choose S = 2 and the corresponding STBC for the mth userX m is designed as follows: where wherẽ For linear decoders such as ZF and MMSE filters, when N is large enough, it is desired that the columns ofH m are asymptotically orthogonal. Applying the orthogonality criterion and energy constraint, i.e., EftrðXX H Þg ¼ 4, it is shown that the coding constants are obtained as follows [4]:

Linear decoding for each user
Let us defineG as a matrix with dimensions of 2N × 4M and x as a 4M-dimensional vector: Hence, we can rewrite Eq.
Let Q ZF and Q MMSE be the ZF and MMSE decoder matrices, respectively, we will have Multiplying (7) by these matrices from the left, we have where Q=Q ZF or Q=Q MMSE . The lth transmitted symbol of the mth userx ml is estimated aŝ where the minimum is over the signal constellation of the mth user, [Qvec(Y)] p is the

Proposed algorithms
It is noted that the computational load of Eqs. (8) and (9) mostly lies within the inverse matrix calculation. Conventional methods to compute the inverse matrix, such as Cholesky decomposition, impose high computational complexity on the system and requires O(M 3 ) operations which would be difficult to implement [16,17]. Therefore, we investigate matrix inverting methods which have less computational complexity and lead to feasible receiver for a massive MIMO system.

Inverse matrix approximation using Neumann series
From Eqs. (8) and (9), it can be seen that decoding of the received signal involves computing the inverse of the following matrix: whereG is a 2N × 4M dimensional matrix which contains the coding constants as well as the channel coefficients, however, from now on, for brevity it will be called the channel matrix. Considering that the matrix Z, with dimensions of 4M × 4M, is almost diagonal, an efficient algorithm in terms of hardware constraints is used to approximate the inverse [18]. It is proven in [19] that if Z is decomposed as Z = D + E where D is a diagonal matrix with diagonal entries of Z and E is the corresponding hollow, then the Neumann series can be used to calculate its inverse as follows: where R is the number of terms to be computed in the series andZ −1 R is the R-term approximation of Z −1 . The convergence of (13) is only guaranteed if the maximum modulus of the eigenvalues of matrix (I − D −1 Z) is less than 1 and the approximation approaches equality as R → ∞ [19]. Moreover, the lower the eigenvalues, the faster the convergence; which holds true when the ratio α = N/M is high. The minimum value of this ratio for a high probability convergence of the method is 5.83 [9]. Here, given that each user is equipped with two antennas, the above ratio is half of the single antenna case. For example, if N = 640, the maximum of 55 double-antenna users can simultaneously communicate with the BS whereas for a system with single-antenna users the Neumann series converges for the number of users as high as 110.
Neumann series is a low complexity iterative method. Therefore, contrary to conventional inverse computation methods, it is hardware friendly [19]. As an example, for R = 3, we have the approximation as follows: The number of computations for the first part is M divisions. While calculating the second and the third terms requires 3M 2 − 3M and 16M 3 − 2M real-valued multiplication, respectively. These values tapping out the existence of zeros in the diagonal and the fact that each part of (14) is Hermitian. Now, let us define matrix W = D −1 E and rewrite (13) as For Neumann series with R = 3, and substituting (8) and (9) in (10), we will have Where t ¼G H vecðYÞ is a 4M-dimensional vector. As can be seen, the first term is obtained by multiplying diagonal matrix D −1 in vector t and the approximation is improved by adding each additional term while the computational complexity is slightly increased.
In Section 5, the efficiency of this approach will be examined in terms of system BER and its computational complexity.

Inverse matrix updating
In some situations, the decoding can be done without recalculating the inverse of the decoder matrix Z. For example, a user may be added to or removed from the system or the channel estimate changes for a particular user. Under such conditions, the computational complexity can be greatly reduced by updating the inverse matrix instead of recalculating it. The proposed solutions are based on the inverse of a partitioned matrix and the Woodbury matrix identity. Suppose matrix Z is partitioned as where A and D are square matrices. The inverse of Z is given as where In addition using the Woodbury formula, we have Hence, equations given in (19) can be equivalently written as Next, the algorithms for updating ZF and MMSE decoder matrices are described in different scenarios.

Adding a user
Let us examine the case where a user is added to the cell covered by the BS. Suppose that the initial channel matrix is ½G 2NÂ4M , and let the channel matrix of the user which enters the system be [G a ] 2N × 4 . The new inflated matrix is denoted as G e ¼G G a Â Ã .
Thus, the ZF decoding matrix defined in (12) is given as Thus, the resulting matrix has a dimension of 4(M + 1) × 4(M + 1). Using (21), the inverse of the decoding matrix is calculated as follows: where is the inverse matrix before updating, B ¼ ½G H G a 4MÂ4 , and D ¼ G H a G a þ ð2=ρÞI 4 :

Removing a user
Now we consider the scenario in which a user is removed from the cell. The current channel matrix is indicated asG where G r is the channel matrix of the user to be removed and G f is the channel matrix after the removal of the user. In this case, updating the ZF decoding matrix involves calculating Z −1 Using the inverse of a partitioned matrix, before the user is removed we have where which means to update the inverse of the ZF decoding matrix, we partition the current inverse and find the updated inverse as Also, for the MMSE decoder, we need to compute Z −1 Before the user is removed, we hav: Therefore, similar to what we derived for the ZF decoder, we have Where F 11 , F 12 , F 21 , and F 22 are obtained from partitioning the current inverse matrix.

Updating a user
When a new channel estimate is obtained for a particular user, i.e., its CSI is updated, the number of rows and columns of the channel matrix remains the same. In this case a two-step approach is suggested. In the proposed method, first the rows and columns associated with the updated user are deleted by utilizing the proposed algorithm for removing a user. Then, using the proposed algorithm for adding a user, we apply the new channel coefficients and update the inverse matrix. In other words, in ZF decoding, we first use Eqs. (28) and (29) to remove the rows and columns of the specific user. Then, we use (23) and (24) for the final update of the inverse matrix. Clearly, for the MMSE decoder equations (30) and (31) are used first, and then (26) is applied to find the inverse of the updated channel matrix.
In the next section, we evaluate the proposed techniques in different scenarios.

Numerical results and discussion
In this section, we evaluate and compare the computational complexity of the proposed algorithms as well as the BER performance of the system in the assumed configurations. The next two sub-sections include the computational complexity of the ZF STBC decoder in the uplink of a massive MIMO system and the system BER performance when the proposed algorithms are utilized. It should be noted that similar results can be obtained for the case of MMSE decoder.

Complexity analysis
For a massive MIMO system with N = 320 antennas at the BS and M = 8, 16, 24, 32 users, the computational complexity is studied in terms of the number of arithmetic operations. We consider scenarios in which a user is added to or removed from the system as well as the case that the channel estimate of a user has changed. Assuming that the matrix whose inverse needs to be updated is 4M × 4M dimensional, and K is the number of rows and columns that are added to or removed from the matrix, the number of computations needed for the ZF decoder is summarized in Table 1. The second and the third row of this table shows the computational complexity of the decoder when the inverse matrix is approximated using Neumann series. Also, inflated channel matrix refers to the case that a user is added to the system (M new = M + 1) and deflated matrix represents the case that a user is removed from the system (M new = M − 1). It is clear that for the signal model and the STBC scheme used in this paper K = 4. The 6th row of Table 1 corresponds to the case in which a new channel estimate is obtained for a particular user. For the case that a user enters the system complexity of the decoder is compared for different number of users and different methods of inverse matrix calculation in Table 2. Moreover, in Table 3, complexity reduction is compared when a user exits the system. Table 4 compares computational complexity of the two-stage update algorithm with exact algorithm and inverse matrix approximation algorithm. Updating new CSI 2K 3 + K 2 (16M − 3) +(K + 1)(4(M − 1)) 2 + (4K + 1)(4M) 2 +4M(4N − 1) +8M(8M − 1) Mousavi and Pourrostam EURASIP Journal on Wireless Communications and Networking (2020) 2020:111 Page 10 of 17 As it can be seen, in all three scenarios applying the proposed update techniques for matrix inversion results in considerable reduction in the computational complexity. In addition, inverse matrix approximation method has lower computational complexity compared to the updating method. This complexity reduction is obtained at the cost of BER performance degradation which will be examined in the next section.

Simulation results
In this section, we present the simulation results to evaluate and compare the proposed algorithms in terms of decoder BER. For update scenarios,Z −1 3 is used as the initial inverse matrix. In the simulations, we use BPSK modulation and assume that the channel model is flat fading.
In the first simulation, we examine the efficiency of utilizing Neumann series in the given system. Here, the number of antennas in the BS is set to N = 320 and the number of users is M = 16. The BER performance of the system is evaluated by changing the order of the Neumann series, i.e., R = 2, 3, 4. As it can be seen in Fig. 2, utilizing higher order of Neumann series exhibits better BER performance. However, it should be noted that each additional term in the series will increase the computational complexity by an order of O(M 2 ).
For the simulation of the proposed updating algorithms one user is added to or removed from the system. The number of antennas at the BS is set to N = 320 and the number of current users is equal to M = 8, 16 resulting in the ratio α ≫ 1, which guarantees the convergence of (13) with very high probability. In all subsequent simulations, three decoding algorithms are used: I. Calculating the exact inverse of the current matrix and then utilizing update algorithms, II. Approximating the current matrix inverse with Neumann series and then applying update algorithms,  III. Using Neumann series without applying update algorithms. Figure 3 shows BER performance after adding a user to the system. As it can be seen in Fig. 3a, utilizing Neumann series as well as the proposed algorithm for calculation of inverse matrix in the system with M = 8 users have a subtle performance loss compared to the exact case and they almost overlap. It is also seen in Fig 3-b, that when the number of simultaneous clients is increased, i.e., for M = 16, the performance loss of the approximation method will be more noticeable.
Simulation results for the case where a user is removed from the system is depicted in Fig. 4. As it can be seen in this figure, the number of users has a direct effect on the performance of the approximation method. This means that by comparing simulations (a) and (b), it is observed that BER performance of the system with M = 8 users is nearly overlapped for all decoding methods, but for M = 16 users the BER performance of the approximation method is reduced compared to the exact inverse calculation. In Fig. 5, which corresponds to matrix update without adding or removing a user, similar to previous simulations, by increasing the number of users, the BER performance of the approximation method is reduced compared to the exact algorithm. Notice that for the cases that a user is added or removed, the decoding algorithm II has a better BER performance than III whereas for updating the channel matrix of a specific user, algorithm III has a slightly better performance than II because the update algorithm is performed twice, and since the initial decoding in II is the Neumann series approximation, error propagation occurs and algorithm III in this case has a better performance. Fig. 4 BER performance comparison for deflation update and the number of current users M = 9, 17. BS is equipped with N = 320 antennas that serves M = 9, 17 users. Assuming that a user is removed from the system the proposed algorithm is used to update the inverse matrix for decoding. The results for M = 9 is shown in simulation a and simulation b demonstrates the comparison results for M = 17 after a user exits the cell. As it is predictable, the BER performance of the exact inverse matrix update algorithm is better than the approximation method In this paper, methods for efficient calculating and updating inverse of a matrix in decoding of the space-time codes in large-scale MIMO systems were evaluated. At the receiver, Neumann series are used to approximate the inverse of a matrix with large dimensions. Moreover, by utilizing matrix inversion identities, efficient algorithms are proposed to update the inverse matrix for the ZF and MMSE STBC decoders when users are entering or exiting the system as well as the case in which a new channel estimation is obtained for a user. For STBC ZF decoding, the proposed methods are investigated from two perspectives: reduction of the computational complexity and the BER performance of the system. Based on the complexity analysis and the simulation results, the update algorithms have better BER performance compared to the approximation method while approximation of the inverse matrix imposes less computational complexity on the system. It is worth mentioning that similar approach will also be applicable when more users are added to or removed from the system. However, as the number of users to be added to or removed from the system increases, the reduction in computational complexity decreases.
Last, but not least, it should be noted that although the proposed methods are investigated for the case of STBC, they can be generalized for common spatial multiplexing schemes in massive MIMO systems.