Multiuser MIMO downlink transmission with BEM-based limited feedback over doubly selective channels

This article studies the problem of limited feedback design for heterogeneous multiuser (MU) transmissions over time-and frequency-selective (doubly selective) multiple-input multiple-output downlink channels. Under a doubly selective propagation condition, a basis expansion model (BEM) is deployed as a fitting parametric model for capturing the time-variation of the MU downlink channels and for reducing the number of the channel parameters. The resulting dimension reduction in the time-variant channel representation, in turn, translates into a reduced feedback load of channel state information (CSI) to the base station (BS). To produce limited feedback information, vector quantization of the BEM coefficients is performed at mobile terminals under the assumption that perfect BEM coefficient estimation has been established by existing algorithms. Then, the output indices of the quantized BEM coefficient vectors are sent to the BS via error-free, zero-latency feedback links. To assess the feasibility of using the BEM-based limited feedback design in a MU network with an arbitrary number of active users, the resultant sum-rate performance of the network is provided by employing the block-diagonalization precoding and greedy scheduling techniques at the BS. The relevant numerical results show that the BEM-based limited feedback scheme is able to significantly alleviate the detrimental effect of outdated CSI feedback which likely occurs as using the conventional block-fading assumption in MU transmissions over (fast) time-varying channels.


Introduction
Besides the well-known time, frequency and code divisions in wireless communications, spatial separation has been recently recognized as a new signal dimension for further system performance enhancement, especially in multiuser (MU) transmissions. In the so-called spatial division multiple access (SDMA), the use of multiantenna arrays allows the base station (BS) to simultaneously transmit multiple data streams to multiple users by exploiting the new signal dimension [1,2]. Among several MU transmission techniques, it is well known that dirty paper coding (DPC) [3] is an optimal MU encoding strategy, whose performance achieves the capacity limit of MU broadcast channels. However, the optimal performance of DPC comes with the cost of impractically high complexity (in a large user pool). As an alternative low-complexity linear technique, blockdiagonalization (BD) precoding [4,5] is a suboptimal MU encoding scheme with a realizable implementation.
In the literature, most of existing precoding techniques [2,[4][5][6] assume MU downlink channels to be homogeneous and time-invariant within a transmission block/burst (i.e., the block-fading assumption). However, in a MU network with rapidly moving nodes (e.g., users in cars/trains in long-term evolution (LTE) systems), the resultant time-selectivity of the channel impulse response (CIR) introduces a large number of channel parameters. This induces a very high channel state information CSI feedback load for precoding and scheduling processes with consideration of time-varying channels at the BS. In addition, the presence of timeselective channels would give rise to the problem of outdated CSI feedback [7] that could severely degrade the system performance. To deal with the channels, [8,9] has proposed a minimum mean squared error-based beamforming algorithm for homogeneous MU transmissions over multiple-input single-output, spatially correlated, frequency-flat, time-selective channels. Specifically, the existing technique uses full feedback of channel distribution information and an iterative beamforming process to provide stable MU transmissions over the channels.
Unlike [8,9], this paper is concerned with limited CSI feedback design for BD precoding and greedy scheduling over spatially uncorrelated, doubly selective, multiple-input multiple-output downlink channels with heterogeneous users (i.e., mobile terminals with different numbers of receive antennas and different receiver noise powers). Over the doubly selective channels, a basis expansion model (BEM) [10,11] is used as a fitting parametric model for capturing the time-variation of the channels and for reducing the number of the channel parameters. Specifically, to generate limited feedback information, vector quantization (VQ) of the BEM coefficients is performed at mobile terminals under the assumption that perfect BEM coefficient estimation has been established by existing algorithms. Then, the output indices of the quantized BEM coefficient vectors are sent to the BS via feedback links. To investigate the performance of the limited feedback scheme in a MU network with an arbitrary number of mobile terminals, BD precoding and greedy scheduling are deployed accordingly in the MU network.
The rest of the paper is organized as follows. Section 2 delineates the system and channel models. The suggested BEM-based limited feedback for BD-based heterogeneous MU transmissions is presented in Section 3. Simulation results and relevant discussions are located in Section 4. Finally, Section 5 provides some concluding remarks.
Notations: (X) T and (X) H denote the transpose and conjugate transpose (Hermitian operator) of the matrix X, respectively. E(·) stands for expectation operator. tr (X), |X|, and ||X|| denote the trace, determinant and Frobenius norm of the matrix X, respectively.

A. Transmitted Signal Model
Consider a heterogeneous MU MIMO LTE downlink channel, where the BS is equipped with N t transmit antennas, and different terminals have different numbers of receive antennas and different signal-to-noise ratios (SNRs). Orthogonal frequency division multiplexing (OFDM) modulation with N-point fast Fourier transform (FFT) is employed for the downlink multi-carrier transmission. After inverse FFT (IFFT) and cyclic prefix (CP) insertion, the transmitted baseband signal of the mth OFDM symbol at the pth transmit antenna can be written as where n {-N g ,..., 0,..., N -1}, N g denotes the CP length, X (p) k,m is the kth precoded (data-modulated) subcarrier.

B. Doubly selective channel model
In this article, for each pair of the pth transmit antenna (at BS) and the r u th receive antenna of the uth user (having R u Rx-antennas), the lth (time-variant) channel tap gain that includes the effect of transmit-receive filters and doubly selective propagation is denoted by h (r u ,p) l,n,m , where n and m denote the time and OFDM symbol indices, respectively. In the considered downlink channels, a BEM [10] is employed to capture the timevariation of the channels. With the aid of the BEM, the lth time-variant channel tap gain between the pth transmit antenna and the r u th receive antenna of the uth user at the nth time instance in the mth OFDM symbol can be represented as [11] h (r u ,p) where N s = N + N g denotes the OFDM symbol length after CP insertion and n = 0,..., N s -1. The mobile users' speeds are assumed to be unchanged within M OFDM symbols (in a duration of a number of LTE frames). L denotes the channel length. b n+mN s ,q stand for the qth basis function values of the used BEM. c (r u ,p) q,l are Table 1 Implementation steps of the BEM-based limited feedback Step 1 BEM coefficients are estimated at mobile terminals using existing techniques [10,11,15] Step 2 The vector of the BEM coefficients is partitioned as in (7) Step 3 Limited feedback information is the indices of quantized vectors of BEM coefficients that are determined by (10) Step 4 With the limited feedback information, the BS can recover the CSI using (11) and (14) the BEM coefficients of the channel fitting. Q is the number of basis functions used in the basis expansion modeling.
In the simulation section of this article, the time-variant multipath channels h (r u ,p) l,n,m are first generated by the modified Jakes model [12], and then fitted (approximated) by the DPS-BEM [10], i.e., using a linear combination of Q basis functions as shown in (2).

C. Received signal model
Over the aforementioned time-variant multipath downlink channels, after CP removal, the nth received sample in the mth OFDM symbol at the r u th receive antenna of the uth user, y (r u ) n,m , can be represented as n,m is the additive white Gaussian noise with variance σ 2 u at the uth user. It is assumed that different terminals may experience different receiver noise powers in the considered heterogeneous MU system.
After performing FFT at users, the kth subcarrier in the mth OFDM symbol at the r u th receive antenna of the uth user can be determined by where I n,m exp −j2π nk/N denotes receiver noise in the frequency domain.
In the considered MU network with an arbitrary number of users, precoding and scheduling are performed for each subcarrier in each OFDM symbol in a LTE frame. For the sake of notational simplicity, unless otherwise indicated, the indices of OFDM symbol m and subcarrier k can be omitted in the subsequent formulations. As a result, the kth received subcarriers at the uth user can be represented as where . It is assumed that the BS has an average transmit power constraint tr(Δ) ≤ P Σ where the covariance matrix of the transmitted signal is defined as E X k,m X H k,m . In (5), the ICI can be negligible since its power is much smaller than that of the subcarrier of interest under the considered LTE system settings [13] with a normalized Doppler frequency below 0.1. In particular, as shown in [14], the ICI power P ICI is upper bounded by Where f D = v c f c is the Doppler frequency of the channel, v is the mobile speed of terminals, c = 3×10 8 m/s is the speed of light, f c denotes the carrier frequency, T s stands for the OFDM symbol duration and . For instance, under the LTE system settings [13] with N = 128, f c = 2 GHz, f s = N/T s = 1.92 MHz and the mobile terminal speed of v = 400 km/h, the resulting ICI power is upper bounded by Simulation results of the ICIto-signal power ratio over the range of normalized Doppler frequencies smaller than 0.1 in Figure 1 are in a good agreement with the upper bound of (6).

Multiuser MIMO Transmission with BEM-Based Limited Feedback
In this section, a limited feedback design over time-and frequency-selective (doubly selective) channels is suggested to reduce the CSI feedback load and to alleviate the detrimental effect of outdated CSI feedback (that likely occurs as using the block-fading assumption in MU transmissions). More specifically, a BEM [10,11] is used as a fitting parametric model of the doubly selective channels. The use of BEM helps to considerably reduce the number of time-variant channel representation parameters.
Unlike [5,6] using BD precoding in MU transmission for a fixed number of homogeneous users, this section adopts the BD precoding and greedy scheduling to a MU network with an arbitrary number of heterogeneous users (supporting various types of terminals with different numbers of receive antennas and different SNRs).

A. BEM-based limited feedback
In the limited feedback design, it is assumed that BEM coefficient estimation has been established at users (using existing algorithms [10,11,15] ), then VQ of the available BEM coefficient estimates is performed using a predetermined Linde-Buzo-Gray (LBG) codebook [16]. Owing to possibly large numbers of BEM coefficients Q (e.g., in the presence of high mobile user speeds), a partition of the BEM coefficient vector c (r u ,p) l helps to reduce the codebook pre-generation complexity and the codebook's cardinality under a required VQ distortion level. In particular, the partition can be expressed by where c V} and V is the length of each partitioned BEM coefficient subvector.
In the LBG codebook generation, for each resolvable path l, the LBG algorithm [16] (using 10 5 training BEM coefficient vectors) is employed to pre-generate the following codebook: where g (j) x,l = g (j) x,l,1 , ..., g (j) x,l,V T , l = 0,..., L -1 and B is the number of binary bits for representing each codevector in the used LBG codebook.
As illustrated in Figure 2, it is numerically shown that the distributions of a BEM coefficient (e.g., c (2) under different mobile speeds are different. As a result, to attain low distortion in the VQ of the BEM coefficients, a LBG codebook should be pre-generated for each possible target mobile speed using the LBG algorithm [16] with training vectors of BEM coefficients corresponding to that speed. Then, for each mobile terminal with a known speed, the LBG codebook G with target speed closest to the known speed should be deployed accordingly for the VQ of BEM coefficients.
In practice, it is very difficult to estimate exactly the actual speeds of mobile terminals. In addition, the memory capacity in the receiver of each mobile terminal also limits the number of pre-generated LBG codebooks G corresponding to different target speeds that can be prestored in the mobile terminal. Therefore, the speed mismatch between the actual mobile speed and the target speed of the used LBG codebook always exits in the VQ of BEM coefficient at mobile terminals. In particular, the effect of the speed mismatch problem on the performance of the considered MU network will be numerically investigated in Section IV.
Using a LBG codebook G for a given target speed, the VQ of these partitioned subvectors c Then, the indices i (r u ,p) x,l of the quantized subvectors of the BEM coefficients are sent to the BS via error-free feedback links. Based on the knowledge of the feedback indices, the BS can determine the quantized versions of the BEM coefficients as follows: where x = 1,..., Q/V and l = 0,..., L -1. With these quantized versions c (r u ,p) l of the BEM coefficients, the CFR matrix of the uth user can be determined at the BS as follows: where H It is noted that the basis functions {b n+mN s ,q } Q q=1 are known at both the BS and the users.
As shown in (15), the channel response at each subcarrier in each OFDM symbol in the current LTE time slots can be determined using the quantized versions c (r u ,p) q,l of the BEM coefficients c (r u ,p) q,l . As aforementioned, these BEM coefficients are assumed to be perfectly estimated by existing BEM-based channel estimation algorithms [10,11] using pilot signals from the previous LTE time slots.
After having the BEM-based limited feedback information at the BS, the quantized versions of the user channel responses H k,m,u are naively treated as perfect CSI in the BD precoding and greedy scheduling processes as presented in the next subsections.
The use of a BEM significantly reduces the complexity of the quantization process of time-variant CSI. In particular, the number of doubly selective CIR parameters at each mobile terminal is R u N t LMN corresponding to the duration of M OFDM symbols where R u , N t , L, and N denote the number of receive/transmit antennas, channel length and the used FFT size, respectively. As shown in Table 1 BEM helps to reduce this large number R u N t LMN to R u N t LQ where Q is the number of basis functions (used in the BEM) which is much smaller than MN. As a result, using BEM helps to reduce significantly CSI feedback load from each mobile terminal to the BS. The downside on both the BS and the terminals is the extra memory required to store the related basis function values. In this work, it is assumed that perfect estimation of BEM coefficients has been established. Particularly, the estimation process of BEM coefficients for current OFDM symbols can be performed by existing techniques [10,11,15] within the duration of previous OFDM symbols.

B. BD precoding
With the quantized versions of CSI H u , this subsection adopts the BD precoding process to the considered heterogeneous MU system (supporting various types of users with different numbers of receive antennas and different SNRs). In BD precoding [4], the ICI in (5) can be eliminated by pre-multiplying R u data subcarrier streams of the uth user with precoding matrices. Specifically, let s k,m,u ∈ C R u ×1 and W k,m,u ∈ C N t ×R u be the transmitted symbol vector and precoding matrix corresponding to subcarrier k in the mth OFDM symbol of the uth user, respectively. For a set of U (selected/scheduled) users, the transmitted subcarriers at the BS are Using (5) and (16), the received subcarriers of the uth selected/scheduled user can be represented in a vector form as where u = 1,..., U.
To attain the zero-forcing constraint in (18), W k,m,u must lie in a null space of H ⊥ k,m,u under the following dimension condition N t ≥ U u=1 R u . To obtain a basis set in the null space, the singular value decomposition of H ⊥ k,m,u is determined as follows: where Γ u contains the first n u = U u =1 u =u R u right singular vectors corresponding to the non-zero singular values, and u ∈ C N t ×(N t −n u ) contains the last (N tn u ) right singular vectors corresponding to zero singular values of H ⊥ k,m,u . As shown in Appendix A, one can use (20) to deduce where H u u = I. As a result, the columns of Ω u form a basis set in the null space of H ⊥ k,m,u . With limited CSI feedback, the BS naively treats { H k,m,u } U u=1 as perfect CSI and the condition (18) becomes H k,m,u W k,m,u = 0 for all u ≠ u' and 1 ≤ u, u' = U. As a result, the BD precoding matrix W k,m,u of the uth user can be chosen as follows: where R k,m,u ∈ C R u ×R u can be any arbitrary matrix that satisfies the sum-power constraint and P k,m,u is obtained using the last R u columns of the matrix Ω u , i.e., where In the absence of inter-user interference (after BD precoding), the resultant received signal at the uth user is With the affine constraints and convex objective function, the above problem can be efficiently solved by using the Karush-Kuhn-Tucker (KKT) optimality conditions [17]. For the sake of simplified computations, let D k,m,u = H k,m,u P k,m,u be the effective channel for the uth user after precoding. The solution of C k,m,u under the KKT conditions can be obtained using the eigen-decomposition of the related matrices. In particular, one can perform the following eigen-decomposition containing the eigenvalues (e u,1 , ..., e u , R u ) and u H u = I. Then, it is straightforward to deduce that As shown in Appendix B, the solution of C k,m,u can be determined as follows: where (x) + = max(x, 0). Applying the sum-power constraint and the trace property tr(ABC) = tr(BCA) to (27), the water-level g can be determined by Given a set of selected users, the above BD precoding process attempts to eliminate the inter-user interference and maximize the system sum-rate. As aforementioned, the feasibility of the suggested BEM-based limited feedback scheme will be investigated in a MU network with an arbitrary number of active users. In particular, the limited feedback links provide CSI to not only precoding but also scheduling at the BS. Under the use of sum-rate performance metric, scheduling is to perform user selection with a reasonable complexity for maximizing the system sum-rate. The considered scheduling technique will be addressed in detail in the next subsection.

C. Greedy Scheduling
Given a precoding technique, the purpose of scheduling (user selection) is to find a set of users among all active users to maximize the system sum-rate. Obviously, a simple optimal method for user selection is exhaustive search but it lends itself to impractically high complexity as the number of users is large. To avoid the impractical implementation of exhaustive search, greedy scheduling [18] is considered herein. After performing the aforementioned BD precoding technique on a given user set S (i.e., a set of users' indices), the resulting sum-rate ξ BD (S) can be determined at the BS (for scheduling/user selection) as follows: where P k,m,u and C k,m,u are determined by (23) and (27), respectively.
Then, the detailed implementation of the greedy scheduling for U a active users can be described in the following steps: 1) Initialization: A 0 = {1, 2, ..., U a } is the set of all active users' indices. S 0 = {∅} is the set of selected users, initially assigned to a null set. v = 0 stands for the number of selected users, initially set to zero. R 0 = 0 is the system sum-rate of selected users, initially set to zero.
2) Repetition: • Let u* be the index of a selected user in the current iteration. Specifically, the index u* can be determined as follows:

3) Stop the user selection process.
With a set S of scheduled users and corresponding precoding matrices, the actual achievable sum-rate of the system is

Simulation Results and Discussions
Following the 3GPP-LTE system settings [13], the BDbased heterogeneous MU transmission using the suggested BEM-based limited feedback scheme over doubly selective MIMO-OFDM downlink channels is simulated as follows. With the number of channel tap gains L = 5 and the exponentially decaying power-delay profile [10], the time-variant multipath channels are first generated by the modified Jakes' model [12], and then fitted (approximated) by the DPS-BEM [10] using Q basis functions. More specifically the realization of doubly selective channels h Unless otherwise stated, the considered heterogeneous MU network has U a = 4 active users with mobile speeds of 200 km/h and Q = 18, where U a /2 users are equipped with a single receive antenna (R u = 1 for u = 1,..., U a /2) and the remaining users have two receive antennas (R u = 2 for u = U a /2 + 1,..., U a ). The BS is equipped with four transmit antennas (N t = 4). As a frame format in the 3GPP-LTE system settings [13], one LTE frame consists of 20 time slots and each of these contains seven OFDM symbols (i.e., 140 OFDM symbols in one LTE frame) in the simulated LTE transmission. In addition, 128-point FFT and carrier frequency f c = 2 GHz is used for the simulated multicarrier transmissions. The CP length of each OFDM symbol is set to 10 samples [13]. Unless otherwise indicated, the average transmit power constraint is P Σ = 10 and receiver noise variance σ 2 u = 1. In the figures illustrating the simulation results, each plotted point of the sum-rate performance is obtained by averaging over 500 independent channel realizations. Figure 3 shows the sum-rate performance of the BDbased MU transmission with the BEM-based limited feedback versus number of active users. For comparison, the sum-rate performance of DPC (curve a) is also provided by using an iterative algorithm in [20]. As observed in Figure 3, the BEM-based limited feedback scheme (curve c) offers a significant sum-rate gain relative to the case of using full feedback of CSI but assuming the channels to be time-invariant (curve d) within one LTE frame (i.e., the block-fading assumption). Furthermore, the sum-rate performance of the BEMbased limited feedback scheme with B = 10 bits is slightly smaller than that of the ideal case where the BS uses full feedback of perfect time-variant CSI (curve b). As can be seen from curve d, over time-varying channels, i.e., in LTE systems with mobile users, the detrimental effect of outdated CSI feedback incurs a considerable sum-rate loss as using the block-fading assumption in precoding and scheduling. Figure 4 presents the sum-rate performance of the BD-based heterogeneous MU transmission versus total number of feedback bits B Σ for each receive antenna. Under the use of Q = 18 DPS-BEM coefficients and V = 2 for mobile user speed of 200 km/h, the resulting total number of feedback bits for one receive antenna is B Σ = N t L(Q/V)B = 180B. As can be seen from curve b, the BEM-based limited feedback scheme using B Σ = 180 bit can provide a better sum-rate performance than the case of using the assumption of block-fading in the BDbased MU transmission (curve c).
In Figure 5, the sum-rate results of the MU transmission under different CSI feedback schemes versus normalized Doppler frequency f D T s are plotted. As can be seen, the BEM-based limited feedback (curve b) provides a stable sum-rate performance with robustness against a wide range of user speeds. As observed, the use of the block-fading assumption (curve c) incurs a significant sum-rate loss (due to the detrimental effect of outdated CSI feedback) when f D T s > 0.0012 (i.e., the mobile user speeds are higher than 10 km/h). Figure 6 shows the sum-rate performance of the BDbased MU transmission versus number of DPS basis functions Q (used for fitting the considered time-variant channels). As can be observed, using Q = 18 DPS basis functions for each time-variant channel tap gain allows the BEM-based limited feedback scheme (curve b) to provide a sum-rate performance comparable to that of the ideal case where the BS uses full feedback of perfect time-variant CSI (curve a). With the use of B = 10 bits and Q = 4 DPS basis functions, the BEM-based limited feedback scheme (curve b) outperforms the case of using the block-fading assumption (curve c). Figure 7 shows the sum-rate performance of homogeneous (by using zero-forcing precoding [21]) and       heterogeneous (by using BD precoding) MU transmissions versus number of transmit antennas. In this figure, the considered homogeneous MU system has four (active) single-antenna users (R u = 1, u = 1,..., 4), and the considered heterogeneous one consists of four active users where R u = 1, u = 1,..., 4. Under these considered system settings, at the cost of higher complexity, the BD-based heterogeneous MU transmission provides higher system sum-rate than the ZF-based homogeneous MU transmission.
To pre-generate a LBG codebook G to be used at a given mobile speed, a set of 10 5 training BEM coefficient vectors corresponding to the target speed is employed by the LBG algorithm [16]. Under an ideal scenario, each mobile terminal is assumed to know exactly the actual value of its mobile speed and uses the corresponding LBG codebook for the VQ process of BEM coefficient vectors. However, in practice, each mobile terminal may have only an estimated value of its mobile speed and chooses a LBG codebook with the target speed closest to the estimated speed value.
To investigate the robustness of the LBG-based CSI quantization against the aforementioned scenario of user speed mismatch, Figure 8 shows the sum-rate performance of the BD-based MU transmissions as the actual user speed values are uniformly distributed in the range [v -δ, v + δ] where v is the target speed of the used LBG codebook and δ refers to the speed mismatch level. In this figure, there are four heterogeneous users with different receiver noise powers σ 2 u 4 u=1 = {1, 1, 2, 2}. As can be seen, given a LBG codebook dedicated to a target mobile speed (i.e., v = 100 km/h), using the pre-generated LBG codebook for mobile terminals with actual speed values uniformly distributed around the target speed value v only incurs a slight sum-rate loss (the values of curve b slightly decrease as δ increases from 0 to 40 km/h).

Conclusion
This article introduced a BEM-based limited feedback design for MU transmissions over doubly selective MIMO downlink channels. By employing a BEM to capture the channel's time-variation, the resulting feedback load of BEM coefficients is significantly smaller than that of CIR or CFR. Over time-varying channels, the BEM-based limited feedback helps to reduce the detrimental effect of outdated CSI feedback (as using the block-fading assumption in MU transmissions), and to provide stable sum-rate performance for heterogeneous users with a wide range of mobile speeds. where v = 100 km/h is the predetermined speed of the used LBG codebook Figure 8 Sum-rate performance as the actual user speed value is uniformly distributed around the target speed value of the used LBG codebook G.