Determinant-based MIMO scheduling with reduced pilot overhead

In a multiuser multiple-input multiple-output (MU-MIMO) cellular system where there are many candidate users, it is critical to select a user group which maximizes the overall throughput of the system. However, the optimal scheduling strategy (exhaustive user selection) is computationally prohibitive when the total number of users is large. In this article, we propose a determinant-based user selection algorithm which reduces the search complexity without much performance degradation. A key contribution of this article is to use a matrix determinant as a measure of orthogonality as well as channel quality in user selection. Linear precoding schemes (zero-forcing beamforming or block diagonalization) widely used in MU-MIMO systems require two sets of pilots to estimate both raw and effective channels, which results in increased pilot overhead. In order to reduce the overhead, we also propose a new pilot scheme with only one set of pilot, which is another key contribution of this article. The new pilot scheme is combined with the proposed scheduling algorithm. Simulation results show that the proposed scheduling algorithm with a new pilot scheme reduces computational complexity and pilot overhead with negligible performance degradation compared to the exhaustive scheduling with a conventional pilot scheme.


Introduction
Multiuser multiple-input multiple-output (MU-MIMO) systems have drawn much attention recently because of higher capacity compared to single-user MIMO (SU-MIMO) systems [1]. In multiuser single-input singleoutput (MU-SISO) systems, it was shown that allocating entire power to the user who has the best channel is the optimal strategy [2]. On the other hand, in MU-MIMO systems, it has been known that allocating power to multiple users simultaneously is better, which is different from MU-SISO systems. However, MU-MIMO systems have some practical issues. A MU-MIMO system needs complicated user selection process compared to a SU-MIMO system. It needs to find a user group which maximizes the throughput while a SU-MIMO system finds only the best user. By the way, it is not simple to find the best group of users because a linear precoding matrix to remove inter-user interference needs to be computed for each candidate group of users. Especially, the optimal (exhaustive) user selection scheme is computationally prohibitive when the total number of users is large in a given system (cell). Many suboptimal user scheduling algorithms have been proposed to reduce the computational complexity of the optimal user selection scheme.
A suboptimal scheduling algorithm by using Gram-Schmidt Orthogonalization (GSO) in MU-MISO system was proposed [3]. However, it cannot be used in MU-MIMO systems since GSO is used only for vector channels. In conventional MU-MIMO systems, user channels are represented with matrices. Two low complexity user scheduling algorithms, the capacity based algorithm and the Frobenius norm-based algorithm, that can be applied to MU-MIMO systems were proposed in [4]. These schemes achieve performance close to the optimal scheduling algorithm but still with high complexity. The capacity-based algorithm needs frequent computation of singular value decomposition (SVD) and the Frobenius norm-based algorithm also needs heavy GSO computation. Therefore, our motivation is to further reduce the complexity of the user selection algorithm in MU-MIMO systems.
Inter-user interference occurs when the resources are allocated to multiple users in MU-MIMO systems. The inter-user interference must be eliminated since it limits the sum-capacity of the system. Dirty paper coding (DPC) is the optimal non-linear precoding technique to avoid the inter-user interference, but its complexity is prohibitive [5]. Zero-forcing beamforming (ZFBF) and block diagonalization (BD) were proposed to reduce the complexity as linear precoding techniques [6][7][8]. The ZF scheme eliminates the inter-user interference by using the pseudo-inverse of the selected users' channel matrix in multiuser MISO systems. The BD scheme removes the interference by finding the null space of the selected users' concatenated channel matrices which excludes the intended user's channel matrix. It can be considered as an extended scheme of the ZF scheme.
The linear precoding techniques require the channel state information (CSI) at the transmitter. The channel estimation has to be done at each receiver by embedding pilots in the signal, and the estimated channel has to be fed back to the transmitter. However, ZF and BD need two sets of pilots. One is common pilot, which is used to estimate the raw channel. The other is dedicated pilot, which is used to estimate the effective channel which is a combined channel of the raw channel and the precoding matrix. A SU-MIMO system and a random beamforming (RBF) system (a MU-MIMO system) do not need dedicated pilots since the beamforming matrix is available at the receiver. But, in the ZF scheme and the BD scheme, the receiver does not know the beamforming (precoding) matrix since the beamforming matrices depend on the selected users, and each receiver does not know the users selected in transmitter. Therefore, ZF and BD need dedicated pilots to estimate the effective channel. Obviously, the data rate will suffer with large pilot overhead due to the two sets of pilots, which may make MU-MIMO less attractive.
There exist many low complexity MU-MIMO scheduling algorithms since the optimal scheduling scheme is too computationally complex to be used in the practical systems [3,4,[9][10][11][12]. In [12], an MU-MIMO scheduling algorithm by chordal distance was proposed. The chordal distance estimates distance between spaces [13], and thus can be used for MU-MIMO systems. In selecting users, the degree of orthogonality measured by the chordal distance is not enough, and the channel power for each user also matters. Therefore both chordal distance and channel power should be considered simultaneously to select users. A weighting factor between the chordal distance and the channel power, a, was used.
In this article, we propose two new methods. One is the low complexity MU-MIMO scheduling, and the other is a new pilot scheme which uses only one set of (dedicated) pilots. The new user selection (scheduling) scheme proposed in this article is based on a determinant property and an iterative matrix computation, which is different from [3,4,[9][10][11][12]. Simulation results show performance comparison between the combination of two methods and the existing scheduling and pilot schemes. The rest of the article is organized as follows. The system model section introduces the system model. A new low complexity MU-MIMO user selection scheme with BD is proposed in the section of determinant-based multiuser MIMO scheduling algorithm. The complexity of the proposed algorithm is analyzed in the section of computational complexity analysis. The section of low overhead pilot design for BD presents a new pilot scheme that uses a single set of pilots with BD. Numerical results are shown in the section of simulation results. Finally, conclusions are given in the last section.

System model
We consider an MU-MIMO downlink channel with a single base station (BS) which has M transmit antennas, and K T users with N receive antennas. We assume that the receivers estimate their channels perfectly and the BS knows the exact CSI of all the users. The broadcast channel of MU-MIMO system is given by where K is the number of the simultaneously selected users, H i is the N × M channel matrix of the ith user (selected by a scheduler from K T ≥ K users), n i is the complex white Gaussian noise vector for the ith selected user (each element of n i is i.i.d. complex Gaussian with zero mean and unit variance), and y i is the received signal vector at the ith selected user. Note that the channel entries of H i are independent identically distributed (i.i. d.) complex Gaussian with zero mean and unit variance. The transmitted signal, x, is expressed as where s l is the symbol vector of the lth selected user with E[∥s l ∥ 2 ] = 1, P l is the power allocated for the lth selected user, and V l is the precoding matrix for the lth selected user.

Block diagonalization
The received signal (1) can be divided into the desired signal, the interference signal, and the noise as follows.
In contrast to SU-MIMO, the interference term in (3) is non-zero if the channel matrices of the selected users are not orthogonal to each other in MU-MIMO. There are two kinds of interference mitigation/cancellation techniques. The first kind is the orthogonal RBF scheme [9]. It selects the users who have maximum signal to interference plus noise ratio (SINR) for each beam of a fixed random unitary matrix. We get better performance as the number of users increases since the users who have larger SINR tend to be selected in the case. The per user unitary rate control (PU2RC) system [10] was proposed as an extended version of RBF.
The second kind is the ZFBF scheme. The BD scheme is an extended scheme of ZFBF which is an MU-MISO system. The ZFBF and the BD schemes eliminate interference signal by using a precoding matrix. In the BD scheme, the precoding matrix V i is selected such that H i V j = 0, ∀i ≠ j. In order to maintain the power con- This is an orthonormal basis for the null space of the matrix formed by stacking all H j , j ≠ i matrices together. This removes the interference term in (3). However, at most K = ⌈M/N⌉ users can be selected simultaneously because of the constraint of null space where ⌈a⌉ is the minimum integer number not smaller than a. In the BD scheme, we can express (3) as

Power allocation
An MU-MIMO system can be divided into K independent parallel SU-MIMO in the BD scheme as in (4). Water-filling-based power allocation can be used since the BS knows perfect channel information of all the receivers. When we assume each receiver receives N streams, there are total N · K streams on which we use water-filling. The capacity with water-filling [11,12] is written as where P is the total transmit power, n is the number of total streams, S is the set of the selected users, H(S) is the set of channels in S , l i (i = 1,..., n) is the singular value of the ith channel of H(S), g i satisfies The sum rate of the MU-MIMO system with BD and water-filling among the selected users can then be given by where V(S) is set of precoding matrix in S .

Low complexity MU-MIMO scheduling algorithm
The optimal MU-MIMO scheduling with BD is performed by Therefore, the optimal scheduling is computationally prohibitive if K T ≫ K since the number of all possible pairs of the selected users is important consideration in selecting users in MU-MIMO systems with ZFBF or BD is the orthogonality among the channels of the selected users. If the orthogonality among the selected users is not large, the beamforming direction of a user is going to be misaligned from its own channel since the precoding matrix becomes the null space of channels of the other selected users. That is, the precoding matrix by ZFBF or BD eliminates not only the undesired interference signal, but also the intended signal, and the reduction of the desired signal is roughly proportional to the distance between its own channel and the precoding matrix. For example, if the channels among the selected users are orthogonal to each other, the desired signal does not suffer degradation because the precoding matrix is aligned with the channel of the desired signal perfectly. In the other extreme case, if the channels among the selected users is in the same direction or plane, the desired signal is lost completely since the precoding matrix is orthogonal to the channel of the desired signal.
The orthogonality among the selected users was estimated by GSO in [3]. The scheduling algorithm in [3] has good performance with low complexity, but it can be used only in MU-MISO systems since GSO can be applied to vectors only. On the other hand, the capacity-based algorithm and the Frobenius norm-based algorithm were proposed in [4] for MU-MIMO systems. The capacity-based algorithm selects a user that achieves maximum capacity first, and successively finds another user which has maximum sum capacity based on greedy algorithm. However, it requires SVD computation at each step, and still has high complexity. The Frobenius norm-based algorithm has lower complexity than the capacity-based algorithm. It finds a receiver set with the sum of Frobenius norm of orthogonal elements computed by a projection matrix among the candidate users' channels instead of computing the sum capacity directly by finding a precoding matrix with SVD.
In this article, a determinant is used to estimate the channel power of users and the chordal distance between different user channels simultaneously in the MU-MIMO user selection algorithm with low complexity. The absolute value of the determinant of a matrix corresponds to the volume of the parallelepiped spanned by those vectors, which can be thought of as a measure of the channel quality and the distance between user channels. For example, Figure 1 shows the relationship between orthogonality and determinant, and computes the volume of an geometric object spanned by vectors. Note that the volume of Figure 1a is larger than that of Figure 1b even when the edges of (b) have the same length as those of (a). If the vector channels of a given set of receivers are considered as the spanning vectors of the parallelepiped of Figure 1, the volume of the parallelepiped increases when the vector channels are close to orthogonal to each other assuming that each vector has fixed magnitude. We can also extend the argument from a vector to a matrix, and the volume of the geometric object gets larger as the orthogonality among the users increases even when matrix channels are used. Therefore, the determinant-based scheme can be regarded as an extension of the GSO scheme in [3]. It should also be noted that a determinant is also related to the channel quality as well as the distance between the selected user channels. If the size of the spanning vectors in Figure 1 is large, the volume also increases. Hence, a determinant is useful for user selection because it measures both distance and channel quality.
In summary, it is critical to have orthogonality between different user channels for the MU-MIMO system with BD to be effective. The chordal distance and the channel quality are independent with each other. Even if the selected users have large channel power, the volume may be small when the chordal distance among the selected users is small. Thus, the volume of the parallelepiped spanned by vector or matrix channels is related to both the chordal distance and the channel power.
The first step of the proposed algorithm is to find the user which maximizes capacity among candidates in T . A next user that maximizes determinant with the selected user set S is chosen in a successive fashion without computing a precoding matrix by SVD. The volume S of a figure composed by a general m × n complex matrix, B, is given by [14] S = det (B H B).
The determinant of the combined matrix of the channel H(S) of the selected set and the channel of the nth candidate user H n is given by From (9) and (10) Note that the first term det(H(S)H(S) H ) in (11) has nothing to do with H n which is the channel of a candidate user in T , and it can be skipped in the user scheduling process.  in each iteration. In the scheduling algorithm, an extra user that maximizes the volume in (11) with the previous selected user set S . The user s i that is selected at the ith iteration is dropped from T , and it is added to S . The sum-rate is also computed at each iteration for updated S by (5) and (6). If the sum-rate of the ith iteration is smaller than that of the (i -1)th iteration, the algorithm stops with the result of the (i -1)th iteration.
The detailed user scheduling process is described in Algorithm 1. It should be noted that fewer than K users can be selected in the user scheduling process. This may occur when a new user which is close to the subspace of existing users reduces the null subspace in the BD scheme. In the DPC scheme, all K users are always used, but the number of selected users in BD can be smaller than K.

Computational complexity analysis
The key contribution of the proposed scheduling algorithm is to achieve low complexity without sacrificing performance. In this section, we compare the proposed scheduling algorithm with the existing scheduling algorithms which include the optimal scheduling algorithm and other low complexity scheduling algorithms in terms of complexity. The complexity is counted as the number of flops, denoted as ψ. A flop is defined as a real floating point operation [15]. Each of a real addition, a multiplication, and a division is counted as one flop [4]. A complex addition and multiplication have two flops and six flops, respectively. Although flop counting cannot show complexity precisely in practical systems, it can indicate a rough order of the computational complexity.

Complexity of typical matrix operation
We assume K T ≫ K, N ≤ M, and K = M N in this section. For an N × M complex valued matrix H, the number of typical matrix operation is given by Shen et al. [4].

Optimal scheduling algorithm
A BS performs exhaustive search over all possible user sets in the optimal scheduling algorithm. Thus the order of the optimal scheduling algorithm is given by

Suboptimal scheduling algorithm
We analyze computational complexity of the capacitybased algorithm, the Frobenius norm-based algorithm [4], and the chordal distance-based algorithm [12]. The complexity of the capacity-based algorithm is expressed by The complexity of the Frobenius norm based is given by The complexity of the chordal distance-based scheduling algorithm is

Determinant-based scheduling algorithm
As for the determinant-based scheduling algorithm, the complexity is given by The number of flops needed for computing I + P M HH H which is a positive definite matrix is 8MN 2 + N 2 + N flops. We need 8M 2 N + 8MN 2 flops for H n X n H H n which is also a positive definite matrix. To reduce computation, we can use Cholesky decomposition which decomposes a positive definite matrix into a product of an lower triangular matrix and its hermitian. With Cholesky decomposition, the determinant of a positive definite matrix can be computed by The Cholesky decomposition of an n × n matrix needs n(n -1) real additions, n(n -1) real multiplications, n(n − 1) 2 real divisions, 1 6 n(n − 1)(n − 2) complex additions, 1 6 n(n − 1)(n − 2) complex multiplications, and n real square root calculations. To compute the determinant, additional n real multiplications are needed [11]. Therefore. the number of flops to compute det(H n XH H n ) is N . The number of flops for water-filling and the update of X is omitted since it is dependent not on K T , but on K which is much smaller than K T . From the section of complexity of typical matrix operation, the complexity order of singular decomposition plus water filling is O(M 3 ). Since the complexity order of the inverse part of computing X and the water filling (with singular decomposition) is O (M 3 ), the overall complexity of the X update and the water filling is O(KM 3 ), so it does not change the overall complexity order of (16) because K ≪ K T . As shown in the above computational complexity analysis, we observe that all the low complexity scheduling algorithms have complexity proportional to K T while the optimal scheduling algorithm is not. The determinantbased scheduling algorithm has the lowest computational complexity among the scheduling algorithms that are considered. It is interesting to note that its complexity is dependent not on K, but on M and K T only. Therefore, the determinant-based scheduling algorithm has an advantage in the systems with large K.

Low overhead pilot design for BD
Pilot signals are used for a receiver to estimate its CSI, and the pilot overhead may decrease the data throughput. There are two types of pilots in practical implementation of MU-MIMO systems. One is common pilots and the other is dedicated pilots. The common pilots are used to estimate the raw channel without the precoding matrix, and the dedicated pilots are used to estimate the effective channel which is a combination of the raw channel and the precoding matrix. Figure 2 shows a conventional MU-MIMO transmitter block diagram describing where the common pilot and the dedicated pilot signals are inserted.
In an open loop SU-MIMO or a closed loop SU-MIMO with beamforming matrix, the channel is estimated with the common pilots. Note that a receiver computes or knows the beamforming matrix in a closed loop SU-MIMO. It is enough to use only the common pilot for both RBF and PU 2 RC which are MU-MIMO systems. The two schemes use a fixed unitary beamforming matrix which is known to both the BS and the receivers. Therefore, the effective channel can be easily computed at the receiver by using the known beamforming matrix.
In the ZFBF and BD schemes, the effective channel cannot be estimated with common pilots only which are used to estimate raw channel since receivers do not know the beam-forming matrix for the selected users determined by the BS. Thus, the dedicated pilots are needed to estimate the effective channel which is the combined channel of the raw channel and the beamforming matrix. The additional dedicated pilots are overhead which reduces the overall data throughput. The requirement of the additional pilot overhead is a disadvantage of ZFBF and BD although they have performance gain compared to SU-MIMO or other MU-MIMO systems.
We propose a reduced pilot scheme by matrix inversion for ZF and BD in this section. In the proposed scheme, only the dedicated pilot is used. We assume block fading channel and zero feedback delay for simplicity. First, we define M × M unitary pilot matrix, F, which is given by where j i is N × M pilot matrix and K = M N for simplicity. The matrices j i 's (i = 1,..., K) are orthonormal to each other. Note that each row of j i is a time-domain sequence of length M which is orthogonal to each other. The received pilot signal at the kth user at time slot t is described by where H (t) k is the channel matrix of the kth user at time slot t. Note that V (t) at time slot t is given by where I M is an M × M identity matrix. The channel of the kth user at time slot t is estimated by Each receiver estimate H k by dedicated pilot F, which means that the raw channel is not estimated. In fact, each receiver does not have to know its own raw channel since effective channel is enough to decode the data. Note that the BS need to estimate the raw channel of each receiver to construct the precoding matrices.
The BS estimates the raw channel of each receiver by multiplying the inverse of the precoding matrix to (21). (22) is whether V -1 exists or not. If V -1 does not exist, the raw channel in (22) may not be obtained precisely. In order for V -1 to exist, V should be an M × M square matrix, and it has to have full rank. We use two methods to deal with the inverse issue. At first, the user scheduling algorithm selects a set of K users for which the inverse of V -1 exists. When the user selection algorithm fails to find such set of users, we use a generalized (Moore-Penrose) inverse [15] instead of an ordinary inverse, which may degrade the system performance when this occurs. It was observed that the event that the ordinary inverse does not exist occurs rarely in the simulations, so the impact on the performance is negligible.
Our proposed scheduling algorithm selects the users who are near-orthogonal to each other, and thus the precoding matrices also tend to near-orthogonal to each other. By well known properties of an i.i.d. Rayleigh fading matrix channel, the channel matrix is of full rank with probability one [16,17]. In our simulations, it was empirically observed that the inverse of V almost always exist, which may be due to the fact that the proposed algorithm selects near-orthogonal users. In case the inverse does not exist, we can simply keep re-selecting users until the corresponding V -1 exist. This way, the existence of the inverse can be achieved with high probability. In the proposed reduced pilot scheme, only one pilot (dedicated pilot) is used so that the overall throughput is improved.

Simulation results
In this section, we compare the performance of the proposed scheduling algorithm with the optimal scheduling algorithm and the low complexity scheduling algorithms which include the capacity-based algorithm, the Frobenius norm-based algorithm, and the chordal distancebased algorithm. Figure 3 shows the sum-capacity comparison of various scheduling algorithms with respect to K T and SNR when M = 8, N = 4. We can see the determinant-based scheduling algorithm has performance similar to the optimal scheduling algorithm in Figure 3. It also has similar or higher performance than the other low complexity scheduling algorithms at high SNR with the lowest complexity. Figure 4 compares sum-capacity of various scheduling algorithms with respect to K T and SNR when M = 8, N = 2. Compared to Figure 3, K is twice larger in this case. Figure 4 also shows an advantage of the determinant-based scheduling algorithm. It achieves almost the same performance as the capacity-based algorithm and the Frobenius norm-based algorithm, and it is close to the optimal scheduling algorithm, and has better performance than the chordal distance-based algorithm for all SNR range. Interestingly, our proposed algorithm can be considered as a capacity upperbound in high SNR since the determinant of a matrix is an approximation of the channel capacity in high SNR. Figures 5 and 6 show the effect of the reduced pilot scheme on the optimal scheduling algorithm and the determinant-based scheduling algorithm in terms of sum-capacity. It is observed that the degradation due to the reduced pilot scheme is almost negligible in terms of sum capacity. There is a slight gap in low SNR between the two pilot scheme and the one pilot scheme, but the two schemes have almost the same performance in high SNR. The small degradation in the low SNR regime is due to the fact that the number of the selected users is fewer than K at low SNR in the scheduling algorithm with two pilots, and it may be better to choose fewer than K users at low SNR. In the reduced pilot scheme, K users always have to be selected, which    causes a minor degradation. On the other hand, K users tend to be selected at high SNR even in the scheduling algorithm with two pilots. In practical systems, the throughput increases when the proposed pilot scheme is used since more data can be transmitted instead of common pilot signals although the proposed pilot scheme has slight capacity degradation compared to the conventional pilot scheme of Figure 2. Let us discuss why fewer than K users may perform better at low SNR. It was shown in [1] that TDMA performs as well as DPC (the optimal MU-MIMO scheme) below about 5dB of SNR. It was also shown in [18] that linear precoding such as BD performs worse than DPC, so linear precoding is expected to be worse than TDMA at low SNR (below 5dB). Hence, a system with fewer than K users may perform better than the system with K users. Figure 7 compares run time of various low complexity algorithms when M = 8, N = 2. It is observed that the determinant-based scheduling algorithm has the shortest run time among the low complexity scheduling algorithms. It has significant gain over the capacity-based algorithm and the Frobenius norm-based algorithm in terms of run time, and it is slightly better than the chordal distance-based algorithm. However, it achieves better sum capacity than the chordal distance-based algorithm as can be seen in Figures 3 through 7.

Conclusions
In this article, we propose a low complexity MU-MIMO scheduling algorithm with BD and a reduced pilot scheme. A key contribution of this paper is that the user selection algorithm uses the determinant of a matrix composed of users' channel matrices so that the orthogonality as well as the channel quality of the selected users are measured. Its performance is close to the optimal scheduling algorithm, and has advantage over the other low complexity scheduling algorithms in terms of both sum-capacity and computational complexity.
Another key contribution of the paper is a new pilot scheme which reduces pilot overhead by using only one set of pilots called dedicated pilots. The simulations show that the performance of the new single pilot scheme is comparable to the conventional two pilot scheme. It appears that the proposed methods of new scheduling and pilot reduction are promising for practical implementation of next generation wireless systems such as 3GPP LTE Advanced.