Efﬁcient Transmission Schemes for Multiuser MIMO Downlink with Linear Receivers and Partial Channel State Information

. Downlink of a multiuser MIMO system is considered, in which the base station (BS) and the user terminals are both equipped with multiple antennas. E ﬃ cient transmission schemes based on zero-forcing (ZF) linear receiver processing, eigenmode transmission, and partial channel state information (CSI) at the BS transmitter are proposed. The proposed schemes utilize a handshaking procedure between the BS and the users to select (schedule) a subset of users and determine the precoding matrix at the BS. The advantage of the proposed limited feedback schemes lies in enabling relatively low-complexity user scheduling algorithms and high sum-rate throughput, even for a small pool of users. For large user pools and when the number of antennas at each user terminal is at least equal to the number of antennas at the BS, we show that the proposed scheme is asymptotically optimum.


Introduction
Increasing demand for broadband wireless services calls for much higher throughputs in future wireless communication systems.It has been shown that with the use of multiple antennas at the transmitter (Tx) and the receiver (Rx), the capacity of a point-to-point communication link increases linearly with min{M, N} where M is the number of Tx antennas and N is the number of Rx antennas [1,2].Recently, there has been a great interest in multiuser multiple-input multiple-output (MU-MIMO) systems and transmission strategies that would enable similar capacity gains in multiuser environment [3][4][5].In a multiuser downlink with the base station (BS) equipped with multiple antennas, multiple users can be served simultaneously.In fact, it has been shown that to obtain the MU-MIMO downlink sum capacity, transmitting to several users simultaneously must be considered [6].Since the number of users in the system is usually greater than the maximum number that can be served simultaneously through spatial multiplexing, user selection is required.User selection (or scheduling) favours users, which experience better propagation condition while being sufficiently separated in space.Such user scheduling leads to multiuser diversity gain [7,8], which increases with increasing number of users awaiting transmission.
It has been shown that the capacity of the MU-MIMO downlink can be achieved by dirty paper coding (DPC) [6], which is a transmitter multiuser encoding strategy based on interference presubtraction.DPC requires nonlinear search for optimal precoding matrices as well as noncausal channel coding for these users, which is practically impossible in realtime systems.Therefore, suboptimum transmission strategies such as different forms of beamforming have been considered in the literature.In MU-MIMO beamforming, linear or nonlinear transmitter precoding algorithms together with user scheduling are designed to maximize the system's sum rate or some other related objective function (e.g., sum rate under fairness constraint).Unfortunately, most beamforming algorithms considered assume availability of perfect channel state information at the transmitter, which presents a big challenge to their practical implementation (references [9,10] and references therein contain an overview of the subject).
To overcome this challenge, suboptimal MU-MIMO downlink transmission based on partial channel state information (CSI) has been studied in the literature.Some of the proposed approaches can be applied to systems with only single antenna user terminals [11][12][13][14][15][16], while some accommodate multiple antenna user terminals [17][18][19][20][21][22][23].When multiple antenna user terminals are considered, often it is assumed that all user terminals have the same number of antennas.This might not be true in practice.However, schemes which rely on this assumption may use antenna selection to meet the requirement.Most of the existing MU-MIMO downlink schemes using partial CSI fall under three main categories.
(1) Transmission schemes based on availability of quantized channel state information at the BS: the quantized CSI is used to utilize a variant of beamforming at the BS.See [12] and references therein for further details.
(2) User scheduling and precoder selection from a codebook of vectors/matrices known a priori to both the BS and the users based on partial CSI: the scheme proposed in [17] called transmit beam matching (TBM) is one example, which extends the per-user unitary rate control (PU 2 RC) [12,24] approach to multiple antenna users.PU 2 RC is Samsung Electronic's proposal to the 3rd Generation Partnership Project (3GPP).The proposed approach is characterized by the relatively low complexity structure of PU 2 RC, and it uses channel matrix pseudoinverse operation in order to minimize interstream interference at each user's terminal.However, when users have fewer antennas than the base station, the pseudoinverse operation can not completely eliminate interstream interference, which leads to some performance degradation.A similar approach called random precoding has been introduced in [19].
(3) Eigenmode transmission with limited feedback: One example is [20], which employs singular value decomposition (SVD) of user channel matrices and data transmission on the eigenmode with the largest gain.Another example is [25], in which the authors propose a combination of zero-forcing beamforming (ZF-BF) with eigenmode transmission.
All schemes mentioned above use precoding at the BS.In addition to precoding at the BS, multiple antenna users can use their antennas to process their received signal vector using relatively low-complexity linear schemes such as zeroforcing (ZF) and minimum mean squared error (MMSE) processing and send back some sort of channel quality indicator (CQI), for example, SINR or rate, to the BS.One example is [21], in which a MIMO downlink scheme with opportunistic feedback is proposed.In this scheme users use ZF linear processors and send back the quality indicator for each spatial channel to the base station according to an opportunistic feedback protocol.The main contribution of [21] lies in its feedback protocol and not the transmission scheme itself.
In this paper, we present a transmission scheme for MU-MIMO downlink using eigenmode transmission, and ZF linear processing, which only requires partial CSI and falls under the third category mentioned above.We assume that all users have the same number of Rx antennas.With this assumption and the number of Rx antennas of each user terminal being less or greater than the number of transmit antennas, two transmission strategies are proposed.For systems where the number of Rx antennas is greater than or equal to the number of Tx antennas, one user is selected to receive data through eigenmode transmission and its right eigenvector matrix is used for precoding, while other selected users use ZF linear processing.When the number of Rx antennas of each user terminal is less than the number of Tx antennas at the base station, partial CSI at the base station is used to design a precoding matrix such that the number of interfering streams at the selected user terminals (after Rx preprocessing) is reduced to the number of Rx antennas, and ZF receiver processing can be efficiently applied.Analytical expressions and approximations are derived for the sum rate of the proposed scheme and also for time division multiplexing (TDM) with eigenmode transmission.
For the case of N ≥ M (N denotes the number of Rx antennas at each user terminal; M denotes the number of Tx antennas at the BS), our work is distinct from [20] in the following aspects.(1) In our proposed scheme the users do not need to send back their channel singular vectors as required in the scheme of [20]; only one user is asked to send back its right singular vector matrix.(2) The scheme presented here results in zero interuser and interstream interferences, whereas the scheme of [20] does not.(3) In our scheme user selection criterion is straightforward and there is no need for a greedy search algorithm to select users as required by the scheme introduced in [20].Compared to [25], what distinguishes our work is the use of ZF receiver processing and the lower complexity of our user scheduling and eigenmode assignment to selected users compared to the high complexity of exhaustive search to find the threshold value (denoted by t in [25]).Parts of this work have been presented in [26,27].Nevertheless, this paper generalizes our proposed scheme to any number (greater than one) of Tx (at the BS) and Rx (at each user terminal) antennas and provides further analysis on the proposed scheme's sum rate.
The paper is organized as follows.In Section 2, the system model for multiuser MIMO downlink is described.Two well-known transmission schemes based on limited feedback are briefly outlined in Sections 3 and 4. Section 5 describes the proposed transmission techniques along with asymptotic analysis for the case of N ≥ M. Numerical results are provided in Section 6, and Section 7 concludes the paper.
Throughout this paper, upper case and lower case bold characters denote matrices and vectors, respectively.(•) H denotes the conjugate transpose of the matrix argument.E{•} is the expectation operation.Tr(•) denotes the trace of the matrix argument.

System Model
Figure 1 shows the block diagram of a MU-MIMO downlink.Consider a Gaussian MIMO broadcast channel where the base station is equipped with M antennas, and there are K homogeneous users each equipped with N antennas.A quasistatic Rayleigh flat fading model is assumed for the channel where the channel gains do not change within a frame and change independently from frame to frame following complex Gaussian distribution.The kth user receives the following signal vector: where H k ∈ C N×M is the downlink channel gain matrix between the base station and the kth user, and n k ∈ C N×1 is the noise vector.Both H k s and n k s are assumed to have independently and identically distributed (i.i.d.) zero mean unit variance complex Gaussian elements, CN (0, 1).The vector x is the transmitted signal vector such that Tr(E[xx H ]) = P T .Hence, the average signal-to-noise ratio (SNR) equals P T , which also defines the average power constraint of the base station.The data symbol vector s is a size M × 1 vector.When precoding is used, the precoding matrix is denoted by Ψ where x = Ψs, and in case of spatial multiplexing Ψ = I M×M .Let the total (sum) rate delivered by the base station to the users during one time slot be R. Then the expected throughput of the system is obtained by taking ensemble average of R over H k s, that is, R Ave = E H1,...,HK {R}.Throughout the paper, the terms system throughput and sum rate are used interchangeably.

Eigenmode Transmission
Consider the singular value decomposition (SVD) of the kth user's channel gain matrix where M ] are N × N left singular vector and M × M right singular vector unitary matrices, respectively.The matrix Σ k is an N × M diagonal matrix with nonnegative numbers (singular values) on its diagonal.Consider data transmission to only one user at any given time.When the transmitter has the knowledge of H k , it precodes the transmitted signal by V k , while the kth receiver uses U H k as its receive processing matrix.Therefore, the channel is diagonalized into parallel interference-free channels, also called eigenchannels [28], where the gain of each channel equals its corresponding singular value.In this case, the rate delivered to user k (in bits/s/Hz) is obtained as where m is the mth singular value of H k .ρ m denotes the power given to the mth data stream and M m=1 ρ m = P T .The optimum power distribution over the spatial channels is obtained through water-filling [28].For the case of equal power allocation we have ρ m = P T /M.This transmission scheme has been considered within the context of time-division multiplexing (TDM) where the users send back their achievable rate, R k , to the base station and the base station selects the user with the largest achievable transmission rate in each time slot.Compared to multiuser MIMO schemes in which multiple users are served simultaneously, this scheme is very suboptimal as it does not take full advantage of multiuser diversity, which implies that some of the eigenmodes of the selected user's channel matrix might be very weak.

Zero-Forcing Receiver Processing and Scheduling based on Partial Side Information
In case of N ≥ M, with spatial multiplexing at the base station when an independent data stream is transmitted from each Tx antenna and ZF receiver processing is used at each user terminal, the scheduled users can detect their data without interstream interference.ZF receiver processing at the kth user is applied by multiplying the received signal by The postprocessing SNR of the mth data stream at user k is then given as [29] where ρ = P T /M and [A] mm denotes the mth diagonal term of the matrix A. Once the base station is informed of postprocessing SNR of a specific data stream by all users, it will assign that data stream to the user with the highest postprocessing SNR.Therefore, the sum rate (in bits/s/Hz) will be given by where k m = argmax k γ (k) m .While this scheme is asymptotically optimal [30], that is, lim where R DPC is the sum rate of the DPC scheme, for a small pool of users it achieves a relatively poor sum rate.

The proposed Transmission Scheme: Eigenmode Transmission with Zero-Forcing Receiver Processing
In the next subsections our proposed transmission scheme is presented for two scenarios.In the first scenario, each user terminal has the number of antennas at least equal to that of the base station (M ≤ N), and in the second scenario the base station has more antennas than each user terminal (M > N).

Case N ≥ M: Precoding with Right Singular Vector
Matrix.The proposed scheme is presented in an algorithmic form as follows.
(1) All the users perform SVD of their own channel and report back a single rate value evaluated according to where ρ = P T /M.The parameter L is evaluated beforehand based on the system parameters and will be discussed in the next subsection.λ (k) i s are the ordered eigenvalues of the matrix W k = H H k H k which is a complex Wishart matrix [31].λ (k)  1 is the largest eigenvalue.
(2) The base station scheduler selects the user with the largest r k,L (user k s ) and asks that user to send its V ks matrix to the BS.The matrix V ks is obtained through the SVD of the selected users' channel matrix.The matrix V ks is then used as the precoding matrix, Ψ = V ks .User k s will receive its data through the first L(L < M) data streams (encompassing data symbols s 1 , s 2 , . . ., s L ), using U H ks as its receiver processing matrix (eigenmode transmission).
(3) User u (u / = k s ) will estimate its equivalent channel, which at this stage is H u = H u V ks .Then all users (except user k s ) will apply ZF linear processing using the estimated equivalent channel and send back the postprocessing SNR of data streams L + 1 to M to the base station.
(4) For each of the remaining M − L data streams, the base station selects the user with the highest postprocessing SNR.

Finding the Optimum Number of Eigenmodes (L).
Since the precoding matrix, Ψ, in this case is a unitary matrix, the statistics of the equivalent channel Assuming that the first L data symbols have been assigned to user k s and the remaining M − L to users with ZF receivers, which have the highest postprocessing SNR, the average sum rate is obtained by taking the ensemble average of the rate contribution from eigenmode transmission on the first L eigenmodes, R eig (L), and the rate contribution from the remaining M − L data streams using linear ZF receiver processing, R ZF (M − L), over a large number of channel realizations: where and [•] mm denotes the mth diagonal term of its matrix argument.The user v * m is the user which has the largest postprocessing SNR for the mth data stream among K − 1 users (user k s has been subtracted out from the set {1, . . ., K}), that is, The probability density function (pdf) of ξ m,v * m for a square system M = N is obtained using order statistics and is given by [29] which is independent of data stream's index, m.For a nonsquare system (N > M), the exponential functions in (11) are replaced with chi-square distribution functions with 2(N − M + 1) degrees of freedom [29].Using (11), the expected throughput contribution from ZF Rx processing is obtained as which for the case of M = N is further simplified to [29] E{R where E 1 (•) is the exponential integral function [32].To obtain the expected throughput of the eigenmode transmission, the pdf of ordered eigenvalues of W k is required.The joint pdf of the ordered eigenvalues is given by [33] p λ (λ 1 , . . . where ) is the product of 2M Gamma functions.
For L > 1, a closed form analytical expression for the average throughput contribution from eigenmode transmission, E{R eig (L)}, is very complicated to evaluate.However, a close approximation for E{R eig (L)} can be obtained using the following proposition.
Proposition 1.For a Gaussian MIMO broadcast channel with M transmit antennas and K users each equipped with N ≥ M receive antennas, a close approximation to the average sum rate of eigenmode transmission on the first where and is the achievable rate on the ith eigenmode.
Proof.See the appendix.
In summary, to find the optimum L, one has to find the smallest eigenvalue, λ t , for which E{R eig (t)} > E{R ZF (M − t)}.Then the optimum value for L is L = t.To obtain R eig (t), the marginal pdf, CDF, and joint pdfs of λ (k)  l , 1 ≤ l ≤ t are required, which can be obtained using (14).E{R eig (t)} is then approximated using (15).Based on ( 12) and ( 15), the optimum L value depends on M, N, ρ and K.For a system with specific number of Tx and Rx antennas, L can be evaluated for different values of ρ and K beforehand and stored in a lookup table to be used later.

Scaling Law of Sum Rate of the Proposed Scheme.
In this subsection, the asymptotic behaviour of the average sum rate of the proposed scheme described in 5.1 is investigated for systems with a large number of users.First we start with the following lemma, Lemma 1.For fixed M, N, and ρ one has, where ln(•) is the natural logarithm.
Proof.An upper bound for r k,L is Using the definition of the trace of a matrix, which is a chi-square random variable with 2MN degrees of freedom.Since R eig (L) = arg max 1≤k≤K r k,L , according to [30,34] we have lim and that completes the proof.
As the sum capacity (achievable with DPC) for L data streams asymptotically increases with L ln[ln(LK)] [35], R eig (L), in general is not asymptotically optimum.However, for the case of L = 1 we present the following theorem.

Theorem 1. The proposed scheme with
Proof.For L = 1 we have When K is very large, referring to Lemma 1, and according to [36] E{R DPC } ≤ ME log 2 1 + ρλ 1,max where λ 1,max = max k=1,...,K λ (k) 1 .Also [30] lim 6 EURASIP Journal on Wireless Communications and Networking Considering ( 23) and ( 24), lim and since DPC has the optimum scaling sum rate, the ratio in the above equation can not be greater than one.
The above lemma and theorem make one expect that as the number of users increases, the optimum L value will decrease to one, which is confirmed by simulations in Section 6.

Case M > N: Null Space Precoding with Singular Vector
Selection.In this section, the general algorithm proposed for this case is presented, before a novel scheme for the specific case of M = 3 Tx and N = 2 Rx antennas is discussed.
Assume the precoding matrix to consist of M vectors each selected from the right singular vector matrix of a selected user (there is a possibility that one user contributes more than one vector) given in general form by where k i ∈ {1, . . ., K} and p i ∈ {1, . . ., M}.The signal vector at the kth user, k ∈ {k 1 , . . ., k M }, after eigenmode Rx preprocessing (multiplying the received signal vector by U (k) H ) is Considering the fact that the last M − N columns of Σ k are all zero columns, and also for i / k = 0, it can be shown that when Ψ contains M − N rightmost vectors of V k , then the nonzero terms of Δ k form the following submatrix: where Δ k is an N × N matrix, Σ k is an N × N square diagonal matrix with N singular values on its diagonal, V k contains only the first N columns of V k , and Ψ contains N vectors that belong to V ki s where k i ∈ {k 1 , . . ., k M } and k i / = k.In this case, ( 27) can be rewritten as where x is a size N vector and is obtained by eliminating M − N terms from x. Then user k uses G k = Δ −1 k as its receiver processing matrix to detect N out of the total M transmitted data streams.
For the kth receiver to be able to detect its data using ZF receiver processing, the number of interfering data streams (after Rx pre-processing) must not be greater than N.In other words, the matrix Δ k must have M − N zero columns.This further implies that the precoding matrix needs to contain M − N basis vectors of the null space (space spanned by the rightmost M − N vectors of V k ) of each selected user's channel matrix.Therefore, M/(M − N) users can be served simultaneously ( • denotes floor of its argument).Therefore, to be able to take greater advantage of multiuser diversity, N should be as close as possible to M with the best case being N = M − 1.When N < M/2 this scheme becomes identical to TDM.
Since the postprocessing SNR of each data stream in this case depends on the precoding matrix and each selected user's Σ and V matrices, finding users with channel conditions that maximize the sum rate based on partial CSI turns out to be not straightforward.Nevertheless, a heuristic approach would be to adopt a two-stage user selection, where in the first stage a set of users is selected based on a channel quality indicator (CQI), for example, the largest singular value.In the next stage, the selected users send back their full CSI to the BS, and the BS broadcasts their CSI to all users.Then, knowing the CSI of the selected users, each user (outside of the set of selected users) substitutes itself sequentially for each of the selected users and evaluates the resulting sum rate for each substitution.If a user finds that by substituting itself for one of the selected users, the sum rate increases, it will inform the BS of it.The BS will update the user set according to the suggestion of the user which has reported the maximum increase in the sum rate.Our results show that the sum rate obtained by adopting this scheme and user selection based on the largest eigenvalue achieve a higher sum rate compared to TDM, while the gap between the sum rate of this scheme and the optimum DPC increases as the number of antennas increases.In the following subsection we present an efficient transmission scheme for the special case of M = 3 and N = 2.
The Case of M = 3 and N = 2. Considering the general idea discussed for null space precoding based on eigenvector selection, in this case we consider two possibilities for the precoding matrix.One possibility is to construct Ψ using three vectors each taken from right singular vector matrix of a distinct user's channel matrix.Therefore, three users can be served and each user sees only one interfering data stream.However, in order to find the best set of users which maximizes the sum rate, either the base station requires full channel state information of all users which results in a considerably increased complexity compared to limited feedback schemes, or an approach similar to the one discussed in the previous section can be applied.The second option is to construct Ψ using right singular vectors of two selected user channel matrices.Assume that users k s and k p where k s , k p ∈ {1, . . ., K} are the indexes of users that will be ultimately scheduled by the proposed algorithm.In the proposed scheme which is based on a heuristic approach the precoding matrix is assumed to be The reasoning behind this choice of precoding matrix will be clarified once the algorithm is presented.Here are the steps of the proposed algorithm.
(1) Each user performs the SVD of its channel matrix and sends back σ (k) 1 to the base station.(2) The base station selects the user with the largest σ (k)  1 , user k s , and asks that user for V ks matrix.To detect its data, user k s uses U H ks as its receiver processing matrix, where 3 , and As seen in ( 31), the interference caused by the first data stream to the second and third data streams after Rx processing at user k s has been canceled.Therefore, a ZF linear receiver can be used and for the second data stream we have [29] where Ω = χ1 χ2 χ3 χ4 .Thus, the achievable rate for this user will be log 2 (1 + γ (ks)  2 ).(3) The base station broadcasts V ks and σ (ks) 1 to all users.
(4) For now, let us assume that user k is the second selected user.Then the precoding matrix will be User k once selected uses U H k as its receiver processing matrix which will result in where 3 , and 3 .It is evident that the interfering effect of s 2 on the other data streams is canceled, and the first data stream can be detected using a matched filter, which results in γ To detect the third data stream, the effect of the first detected data stream is subtracted out, that is, 2 − σ (k) 1 β k s 1 ( s 1 denotes the first detected data symbol).Canceling the effect of the the first data stream is possible due to the knowledge of v (ks)  3 at user k which enables it to evaluate β k .The SNR for the third data stream, s 3 , after interference cancelation and matched filtering, is obtained as (ignoring error propagation).
Considering (32) and the third step of the algorithm, user k has all the required information to evaluate the rate of user k s as well as its own rate.Therefore, it will send back a sum rate value, 2 ), that is achieved by scheduling data transmission to itself and user k s .
(5) The base station selects the second user, user k p , which has the largest R k and asks that user to send back v At this stage data transmission to the selected users begins.User k p will receive its data from the first and third Tx antennas, and user k s will receive its data from the second Tx antenna.

Numerical Results
In this section, the expected throughputs of the proposed schemes are compared to limited feedback MIMO-downlink techniques using transmit beam matching (TBM) [17], which is a modified version of PU 2 RC for multiple antenna users, zero-forcing beamforming (ZF-BF) using channel vector quantization (CVQ) [18,37,38], spatial multiplexing with zero-forcing receiver processing, and TDM with eigenmode transmission for different numbers of antennas, users, and SNR values.The throughput of the DPC scheme is also given as an upper bound on the sum rate.The sum rate curves for DPC have been obtained using the iterative water-filling algorithm introduced in [39].In the following, we consider two case examples, in which M = N, and one example for the case M > N.
The Case of M = 2 and N = 2.In this case, we find the optimum choice for L in terms of maximizing the average sum rate.Using ( 14) it can easily be shown that the distributions of the ordered eigenvalues, λ 1 and λ 2 , are respectively.The cumulative distribution functions (CDF) of the eigenvalues are then as followss To schedule users in this case we consider three possibilities.
(i) The proposed scheme with L = 1.
The average rate for this scheme is obtained as: (ii) Selecting user k which has the largest r k,2 (8) and only serving that user in each time slot (TDM with eigenmode transmission).
According to Proposition 1, the average sum rate for this scheme can be approximated by where σ rk,2 and μ rk,2 are obtained using ( 16).
(iii) ZF receiver Rx processing with partial CSI.
The average sum rate for this scheme is obtained as According to Figure 2, the proposed scheme with L = 1 achieves a considerably higher sum rate compared to ZF linear processing and TDM.Furthermore, Figure 2 compares the average sum rate of the proposed scheme with that of TBM.For the TBM scheme, a codebook size of 4 has been considered, where each codebook consists of a 2 × 2 unitary matrix and it is assumed that each user sends back to the base station 8 SNR values, corresponding to all vectors of all unitary matrices in the codebook.As the figure shows, even for a very small user pool, for example, K = 5 users, the proposed scheme has a great sum rate advantage over the sum rate of other limited feedback schemes, which are plotted.Sum rate curves obtained using the analytical expressions of ( 37), (38), and (39) are in good agreement with the simulation results.
The case of M = 3 and N = 3.We consider four possibilities for this case.
(i) The proposed scheme with L = 1.
The average sum rate for this scheme is obtained as with the pdf and CDF of λ 1 given as follows (ii) The proposed scheme with L = 2 In this case the average sum rate is obtained as where σ rk,2 and μ rk,2 are obtained using (16), and marginal and joint eigenvalue distributions are given by λ1+λ2) .
(43) (iii) Selecting user k which achieves the largest rate and only serving that user in each time slot (TDM with eigenvalue distribution).The average sum rate in this case is approximated by (15) with L = M = 3 and using [40] where more simplified expressions (for case M = L) have been given for σ rk,3 and μ rk,3 .
(iv) ZF receiver processing scheme using partial side information.
The average sum rate for this scheme is obtained as The average sum rates of the four cases considered above are compared in Figure 3.As seen in the figure, the proposed scheme with L = 2 achieves a higher average sum rate compared to the case of L = 1 while there are up to K = 12 users in the system.When there are more users in the system, the proposed scheme with L = 1 achieves a higher sum rate.The intersection of the average sum rate curves for L = 1 and L = 2 can be explained by considering the fact that for a small pool of users it is less likely that a subset of users with high ZF postprocessing SNR (good channel conditions) exist in the system, and therefore transmitting on the first two noninterfering eigenmodes to one user leads to a higher sum rate.For larger user pools and in agreement with Theorem 1 due to multiuser diversity it is more likely that a user subset can be found such that it achieves higher sum rate than eigenmode transmission on the first two eigenmodes to one user.According to Figure 3, the proposed scheme achieves a considerably higher sum rate compared to ZF receiver processing.For transmit beam matching (TBM), a codebook size of 4 has been considered, where each codebook consists of a 3 × 3 unitary matrix and it is assumed that each user sends back 12 SNR values to the base station.Sum rate curves obtained using the analytical expressions given above are in good agreement with the simulation results.
In Table 1, the optimum L values for M = 4 to M = 7 antennas have been given for systems with equal  numbers of Tx and Rx antennas along with the percentage sum rate increase achieved by using the proposed scheme over the transmission schemes using ZF receiver processing and TDM, when there are K = 30 users available in the system and at ρ = 10 dB SNR.The gain of the proposed scheme over ZF receiver processing and TDM with eigenmode transmission schemes (TDM in brief) have been normalized to the sum rate of these schemes, respectively (i.e., (R Ave − E{R ZF (M)})/(E{R ZF (M)}) × 100 for the case of ZF Rx processing).As seen in Table 1, the proposed scheme provides a significant sum rate increase over ZF receiver processing and TDM for different numbers of antennas.For example for the case of M = 6 Tx and N = 6 Rx antennas, the proposed scheme exceeds the sum rate of that achieved by ZF receiver processing scheme by about 29 %.
The Case of M = 3 and N = 2.This case example was explained in detail in Section 5.2. Figure 4 shows the average sum rate advantage of the proposed scheme over two wellknown limited feedback schemes.As seen in the figure, the proposed scheme achieves a higher sum rate compared to TBM and zero-forcing beamforming (ZF-BF) with channel vector quantization (CVQ).The proposed scheme has over 1 bit/s/Hz advantage over TBM and ZF-BF with CVQ for even small user pools (K < 10).

Comparison of Feedback Requirement for Different
Schemes.In limited feedback schemes, there is usually a tradeoff between the sum rate and feedback load.An example of this tradeoff is seen in the PU 2 RC scheme where there are two feedback modes.In one mode which achieves higher average sum rate, the SINRs of all codewords are sent back to the base station, and in the other mode only the largest SINR and the index of its corresponding codeword are sent back to the base station.In ZF-BF with CVQ each user sends back the index of a selected quantization vector along with its corresponding SINR lower bound [18,37].In the transmission scheme based on spatial multiplexing at the base station with linear receiver processing at each user terminal, each user sends back M SNR values to the base station.In TDM with eigenmode transmission, each user sends back only one real value (a rate value), before the user with the highest reported rate is asked to send back its right singular matrix, which for a system with M = N has 2M 2 real terms.
In our proposed scheme and for the case of N ≥ M, users send back information in three stages.At the first stage all users send back a single rate value, in the second stage one user sends back an N × N matrix of complex values, and in the third stage all users except one send back M − L SNR values.This amount of feedback is larger than the amount required in TDM with eigenmode transmission, yet it is comparable to PU 2 RC and spatial multiplexing at the base station with ZF receiver processing schemes at user terminals described in Section 4.
For the proposed scheme in case of M = 3 and N = 2, each user needs to feedback only one real value to the base station in the first stage.In the second stage, one user needs to send back a 2×2 matrix, and in the third stage all users except one need to send back one rate value.Finally, the second selected user sends back two vectors to the base station.This amount of feedback is larger than the amount required in TDM with eigenmode transmission.Yet, it is less than ZF-BF with CVQ [37], since except for the two users, all other users send back only two real values in two stages.

Conclusion
We have proposed limited-feedback MIMO downlink transmission schemes for a system in which the base station and each user terminal are equipped with M(> 1) and N(> 1) antennas, respectively.For the case of N ≥ M, one user receives data through eigenmode transmission on its L strongest eigenmodes (L < M is a predetermined value, which maximizes the average sum rate) while each of the remaining M − L data streams is assigned to a user with the highest ZF receiver postprocessing SNR.We have shown that in this case the average sum rate of the proposed scheme scales with ln[ln(MK)] (K is the number of users in the system), which is asymptotically optimal.In case of M > N, the precoding matrix consists of right singular vectors of at least two and at most M users such that the number of interfering streams at each selected user terminal is reduced to the number of its receive antennas, and hence, the interstream interference can be effectively removed using ZF receiver processing.The results show that the proposed schemes lead to a higher average sum rate compared to a number of well-known limited feedback schemes, especially for a small pool of users.

Figure 2 :
Figure 2: Sum rate of the proposed scheme compared to a number of multiuser MIMO techniques for M = 2 Tx and N = 2 Rx antennas at 10 dB SNR.

Figure 3 :
Figure 3: Sum rate of the proposed scheme for L = 1 and 2 compared to a number of multiuser MIMO techniques for M = 3 Tx and N = 3 Rx antennas at 10 dB SNR.

Figure 4 :
Figure 4: Sum rate of the proposed scheme compared to multiuser MIMO techniques using TBM (modified PU 2 RC), ZFBF with CVQ, and TDM with eigenmmode transmission for M = 3 Tx and N = 2 Rx antennas at 10 dB SNR.

Table 1 :
Optimum L values and the percentage increase of the proposed scheme's sum rate over ZF and TDM schemes for different numbers of antennas.