Practical decentralized high-performance coordinated beamforming for both downlink and uplink in time-division duplex systems

Coordinated beamforming (CBF) has been studied in hope of mitigating the inter-cell interference experienced by cell-edge users. Unfortunately, due to the limitations and/or impracticalities of the proposed designs, the expected performance gains have yet to be realized. Relying on channel soundings from the users and equivalent channel soundings from the cell sites (all on the same frequency), both downlink and uplink decentralized frameworks (and various example designs) are proposed in this paper for the practical transceiver and signaling design of a K-pair system desiring to employ CBF. Remarkably, the proposed singular value decomposition (SVD) example design for both frameworks is equivalent to a centralized interference alignment (IA) design. Furthermore, three other proposed design examples achieve better bit error rate, mean square error, and sum capacity performances than the proposed IA-equivalent SVD example design, respectively. In addition, higher sum capacities than the generalized iterative approach, a centralized minimum mean square error CBF design, are numerically observed. Clearly, practical CBF designs which can deliver the expected performance gains are finally available.a


Introduction
Coordinated multipoint transmission/reception (CoMP) [1][2][3] is currently attracting a lot of attention (e.g., in Long Term Evolution-Advanced (LTE-A)). It, being based on network multiple-input and multiple-output (MIMO), has multiple cell sites that coordinate their transmissions or receptions (intra-site CoMP can involve only one cell site but this technicality is ignored here for the sake of clarity). If successfully implemented, CoMP can mitigate the inter-site interference, e.g., the inter-cell interference experienced by cell-edge users. In addition, it can improve the coverage and spectral efficiency of the next generation cellular systems [1]. Based on [3], CoMP is divided into four types: joint transmission, dynamic cell selection, coordinated scheduling, and coordinated beamforming (CBF). CBF, the type studied in this paper, has the involved cell sites that coordinate their transmissions/receptions and transceiver designs in order to minimize the inter-site interference experienced by their users (each user's data comes from or goes to only one of the cell sites). The coordination of the multiple cell sites must not incur too high of a load on the network. Yet it must provide the desired performance. Thus far, no scheme has achieved this difficult goal. On the contrary, the gap between gains envisaged with perfect channel feedback (in academia and industry) and with practical feedback schemes remains high [4]. So, the search continues for a practical and high-performing CoMP scheme.
There are four fundamental reasons why most of the proposed CBF designs are impractical. Firstly and most importantly, the vast majority of the proposed CBF designs, in comparison with their said benefits, require too much information exchange between the nodes. Many of the proposed designs assume that a central processing unit has full perfect channel state information (CSI) and performs the entire transceiver design (e.g., [5][6][7]). Others assume that some type of iterative process can occur between the cell sites and/or the users (e.g., [8][9][10]). All in all, most designs assume that a large amount of information (whether it be CSI, precoders, decoders, etc.) can be transported through the network. This is especially true with regard to the CSI if the system is frequency-division duplex (FDD) in nature.
Secondly, many of the proposed CBF transceiver designs are too complex. Due to the number of design variables and the coupling between them, the majority of the designs are iterative in nature (e.g., [5][6][7][8][9][10][11]). Though some have proofs of convergence, there is generally no guarantee of how many iterations (equivalently, time) are needed. As the channels are time-varying, this is highly undesirable. Thirdly, many of the designs do not decouple the transceiver designs of the cells involved. Their designs for one cell's (a cell site and its user(s)) precoder(s) and decoder(s) are generally heavily dependent on the modulation coding schemes (MCS), number of data streams, transmit powers, etc. of the other cells. This can greatly complicate the schedulers. Last but not the least, most of the designs are done using an average total power constraint per transmitter (e.g., [5,6,[8][9][10][11]). Since in practice, each antenna is connected to its own power amplifier, this is not sufficient; the instantaneous power per antenna must be adequately constrained.
In this work, both downlink and uplink frameworks (and various example designs) are proposed for the practical transceiver and signaling design of a K-pair system desiring to employ CBF (the downlink framework has been partially presented in [12]). Relying on channel soundings from the users and equivalent channel soundings from the cell sites -all on the same frequency (possible in time-division duplex (TDD) systems) -both frameworks are able to overcome the four critical issues listed above. Firstly, the frameworks are decentralized and require a low amount of information exchange. Neither central processing unit is used nor is there any explicit feedback or feed-forward of CSI, precoders, and decoders; each node obtains all the CSI it needs from the channel soundings/equivalent channel soundings. Each node designs its own precoder or decoder. Furthermore, time synchronization of the cells is essentially only needed for the initial channel sounding.
Secondly, the frameworks' transceiver designs at each cell site and user are of a low complexity. At a cell site, the transceiver designs consist of only two steps: (a) using a singular value decomposition (SVD) to form the nulling portion of the precoder or decoder (the nulling action is essentially a block diagonalization [13] of the overall system channel matrix) and (b) applying a singleuser MIMO closed-form solution. At a user, they are even simpler -apply a single-user MIMO closed-form solution. Thirdly, the frameworks completely decouple the transceiver designs of the different pairs. It is proved that its design for a pair does not depend on the MCSs, number of data streams, transmit powers, etc. of the other pairs. Lastly, the frameworks allow limits to be imposed on the instantaneous transmit power of each antenna.
Even though the proposed frameworks are practical, their performances are comparable to those which are not. It is proven that one of the proposed example designs (specifically, the SVD design) for both frameworks is equivalent to a centralized interference alignment (IA) design. Furthermore, it is shown numerically that three other proposed examples achieve better bit error rate, mean square error, and sum capacity performances than the proposed IA-equivalent SVD example, respectively. In addition, higher sum capacities than the generalized iterative approach, a centralized minimum mean square error (MMSE) CBF design, are numerically observed for some proposed examples. Clearly, here are practical lowinformation exchange overhead CBF designs which are able to deliver the long-awaited performance gain.
There are other CBF schemes in the literature which result in zero inter-site interference. They are however, to the best of our knowledge, significantly different from our proposal. For example, the papers with single antenna receivers (e.g., [7,9]) do not need a decoder. The papers with all multiantenna nodes (e.g., [14] and other IA schemes) run into the four practicality issues discussed before. Lastly, the vast majority of them are purely transceiver designs -they do not consider signaling as is done here.
We have one last introductory comment. The strong aspects of this work are in the practical side: the local channel information (obtained via sounding in TDD mode) is adequate, and each cell site can construct its precoder and decoder locally without further information exchange among different cell sites and users. However, the practicality of the proposed approach may be reduced if some practical challenges faced in a large scale network (such as many antennas, a lot of users, channel estimation error, and block diagonalization error) cannot be properly addressed. Fortunately, the new developments on cloud-radio access network (C-RAN), massive MIMO, spatial user scheduling, etc., have addressed a lot of these practical issues and have helped to demonstrate the importance and timeliness of this work. These practical challenges and their possible solutions will be briefly discussed in the Conclusions section.
The notation of this paper is as follows. All boldface letters indicate vectors (lower case) or matrices (upper case). A ′ , A , A * , tr(A), E(A), and rank(A) stand for the transpose, conjugate, conjugate transpose, trace, expectation, and rank of A, respectively. λ max (A) denotes the largest eigenvalue of A.
[a] i denotes the ith element of a. diag […] denotes the diagonal matrix with elements […] on the main diagonal. I r signifies a r × r identity matrix. 0 signifies a zero matrix with proper dimension. A > 0 denotes that A is a positive definite matrix. (a) + ≜ max (0,a). CN (μ,σ 2 ) denotes a complex normal random variable with mean μ and variance σ 2 .

System model
The system considered has K cell site-user pairs; the k th cell site and user only want to send data to each other (k = 1,…, K). The k th cell site and k th user have b k and u k antennas, respectively. It is assumed that b k > u l , ∀k, ∀l. Since the cell sites have more antenna elements than the users, the cell sites will carry out the interference mitigation task for both downlink and uplink scenarios in the proposed frameworks.
In the downlink scenario (see Figure 1), the received signal vector at the k th user is given by There are m k data streams for the k th pair where m k ≤ u k . The source (data) to be transmitted from the k th cell site to the k th user, s k , is m k × 1 and is characterized by its positive definite source covariance matrix, ; s k is precoded by F k , the b k × m k precoder, and then transmitted. The channel from the k th cell site to the lth user is u l × b k and is denoted by H lk . The noise vector, n k , is u k × 1 and is characterized by its noise covariance matrix, Þ¼β k I u k . The sources and noises of different nodes are all independent of each other and zero-mean. Once the k th user receives y k , it applies its m k × u k decoder, G k , to process the received vector.
The notation and definitions for the uplink scenario (see Figure 2) are analogous to those for the downlink. An underline '_' is added below the downlink variables to obtain the corresponding uplink ones. For convenience, we will also denote the antenna numbers at the cell site and the user as b k and u k , respectively. Thus, the receive signal vector is where s l , Φ sl ¼ σ sl 2 I m l , m l (m l ≤ u l < b l ) and F l are the source vector, source covariance matrix, number of data stream, and precoder of the lth user, respectively; n k , Φ nk ¼ β k I b k , and G k are the noise vector, noise covariance matrix, and decoder of the k th cell site; and H kl is the uplink channel matrix from the lth user to the k th cell site. Note that the downlink and uplink channels are reciprocal, i.e., H kl ¼ H 0 lk ; ∀k; ∀l:

Proposed framework for the downlink scenario
The proposed framework has five phases. In the first phase, the users perform channel soundings so that each cell site can estimate its reverse channels, i.e., so that the k th cell site can estimate H kl , ∀l. Due to reciprocity, the k th cell site can thus have an estimate of H lk ¼ H 0 kl ; ∀l . In the second phase, each of the cell sites uses its channel estimates to design its own precoder. The k th cell site designs its precoder F k = F k,L F k,R by first designing its b k × d k left precoder F k,L for inter-pair interference mitigation and then its d k × m k right precoder F k,R for performance enhancement. The parameter d k denotes the maximum number of data streams allowed for the k th pair where both intra-data-stream and inter-pair interference can be mitigated (see feasibility condition in (23a)).
To avoid the intra-data-stream interference for the k th pair, we need b k ≥ d k ≥ m k . The condition required for mitigating the inter-pair interference is a bit more complicated. Here, we choose for the k th cell site the d k  columns of F k,L to be an orthonormal basis for the null space of where k − = k − 1 and k + = k + 1. It can be easily done using the SVD. Note that full rank. This restriction to the antenna setup due to the need for d k to be greater than or equal to m k is discussed in more detail in the section on feasibility conditions, in Section 4.2.
Since each cell site picks its left precoder in this way, the nulling constraint, There will be no interpair interference at the users (note that the receiver processing is not taken into account when calculating the null space because we want to minimize the signaling load). From the perspective of F k,R , the entire system is simply a single-user MIMO system where F k,R and H kk F k,L are the equivalent precoder and channel matrix, respectively. Using the nulling constraint in (4), (1) reduces to Using H kk F k,L , the k th cell site will thus design F k,R to optimize its own link subject to some power constraint on F k . In addition to H kk F k,L , the noise covariance matrix Φ nk ¼ E n k n Ã k À Á of the k th user may also be employed in the design of F k if its estimate is available at the k th cell site (see Section 3).
In the third phase, each of the cell sites performs an equivalent channel sounding with either its designed precoder or left precoder. The reason why there are two choices is to allow two different decoder designs.
Depending on which case, the k th user can estimate H kk F k or H kk F k,L . With H kk F k , the k th user can design the MMSE decoder while with H kk F k,L , it can design the SVD one. Since the designed precoder causes no interference to the other users, this and the final two phases do not need synchronization among the pairs. Furthermore, orthogonal pilots are not needed in this phase. In the fourth phase, each user uses its noise covariance matrix Φ nk and the estimate of H kk F k or H kk F k,L from phase 3 to design its decoder G k (see Section 3). Once finished, the fifth and final phase, the data transmission, can now occur. The five phases are summarized in Table 1.

Proposed framework for the uplink scenario
The proposed framework for the uplink scenario also has five phases. In the first phase, the users perform channel soundings so that each cell site can estimate its channels, i.e., so that the k th cell site can estimate H kl , ∀l. In the second phase, each of the cell sites uses its channel estimates to design its own decoder. In particular, the k th cell site partitions its decoder G k = G k,L G k,R where the d k × b k right decoder G k,R is employed for inter-pair interference mitigation and the m k × d k left decoder G k,L for the performance enhancement. The parameter d k denotes the maximum number of data streams allowed for the k th pair where both intra-datastream and inter-pair interference can be mitigated (see feasibility condition in (23b)).
To avoid the intra-data-stream interference for the k th pair, we need b k ≥ d k ≥ m k . For mitigating the inter-pair interference, we choose for the k th cell site the d k rows of its right decoder G k,R to be an orthonormal basis of the left null space for Uses (3) to design F k,L to satisfy (4) Uses (6) to design G k,R to satisfy (7) Designs F k,R using (5) (see Section 3) Designs G k,L using (8)  This restriction to the antenna setup due to the need for d k to be greater than or equal to m k is discussed in more detail in the section on feasibility conditions, Section 4.2.
Since each cell site picks its right decoder in this way, the nulling condition is satisfied for any There will be no interpair interference at the cell sites after the right decoders (note that the transmitter processing is not taken into account when calculating the null space because we want to minimize the signaling load).
After decoding by Gk;R , (2) becomes For the k th pair, the entire system simply becomes a single-user MIMO system where Gk;L , Gk;R y k , Gk;R H kk , and Gk;R n k are the equivalent decoder, received signal vector, channel matrix, and noise vector, respectively. Given G k,R , its estimate of H kk , and its estimate of Φ nk , the k th cell site thus designs its left decoder G k,L (see Section 3).
In the third phase, each of the cell sites performs an equivalent channel sounding using the transpose of its designed right decoder so that its user can estimate the equivalent channel, i.e., so that the k th user can estimate G k,R H kk . Note that due to the nature of G k,R , this equivalent channel sounding causes no interference at users l, ∀l ≠ k, and does not have to be synchronized with that of the other pairs. Furthermore, orthogonal pilots are not needed here. In the fourth phase, each user uses its estimate of the equivalent channel from phase 3 to design its precoder subject to some power constraint (the k th user uses G k,R H kk to design F k ). In addition to G k,R H kk , the k th user may also use an estimate of the equivalent Once finished with the fourth phase, the fifth and final phase, the data transmission, can now occur. The five phases for the uplink are summarized in Table 1.

Conventional and equivalent single-user MIMO systems
As mentioned in Section 2, the nulling constraint (4) (or (7)) decouples the K-pair system into K-independent equivalent single-user MIMO systems as described by (5) for the downlink (or (8) for the uplink). The remaining tasks are to design the precoder and decoder of each equivalent single-user MIMO system for facilitating efficient data transmission. In this subsection, we will compare a conventional single-user MIMO system (shown in Figure 3) with the equivalent single-user MIMO systems for both downlink and uplink scenarios. Firstly, the corresponding variables for the conventional and equivalent single-user MIMO systems are listed in Table 2. Secondly, differences with respect to power constraints and noise covariance will be discussed.
Regarding the power constraints, there exists no difference between the conventional and uplink equivalent single-user MIMO systems; but there appears to be some difference between the conventional and downlink equivalent single-user MIMO systems. The average total power (ATP) constraint for the conventional single-user MIMO system is The constraint is named as such because it constrains the average power, tr{E(Fss*F*)}, to P. Likewise, the ATP constraint for the downlink equivalent single-user MIMO system is The downlink equivalent single-user MIMO system, however, is not the original downlink system. Does (9b) really constrain the ATP of the k th cell site to P k ? Surprisingly, yes.
because F k,L is chosen using (3) and thus guarantees The instantaneous array power (IAP) constraint for the conventional single-user MIMO system is It is named as such because it constrains the instantaneous sum power of the antenna array (and hence the instantaneous peak power of each antenna) of the transmitter. The physical meanings of L can be understood from the following two special cases. Firstly, if the precoder F is a unitary matrix, one obtains from (11a) which represents the average power of each data stream. Thus, the ATP in (9a) is equivalent to the IAP in (11a) if P = mL when F is unitary. Secondly, if a constant amplitude modulation scheme is used and the system is fully loaded, s Ã s ¼ mσ 2 s , one obtains from (11a) and (11b) as which represents the upper bound of the spatial average of the instantaneous antenna sum power. From (11a), the IAP constraint for the downlink equivalent single-user MIMO system is Again, the downlink equivalent single-user MIMO system, however, is not the original downlink system. Does (12) really constrain the IAP of the k th cell site to L k max s k s Ã k s k The answer is yes.
Thus, we conclude that the ATP and IAP constraints can also be employed, without any modification, for the downlink equivalent single-user MIMO system. Summarized in Table 3 are the ATP and IAP constraints for the conventional and equivalent single-user MIMO systems.
Regarding the noise covariance, there exists no difference between the conventional and downlink equivalent single-user MIMO systems. But there appears to be some difference between the conventional and uplink equivalent single-user MIMO systems. The noise covariance is E(nn*) = βI for the former and E G k ; R n k n k Ã G k ; R Ã ð Þ for the latter. However, because the rows of G k,R are chosen to be an orthonormal basis for the left null space of A k in (6). Applying (14), we see the noise covariance for both the conventional singleuser MIMO system and the uplink equivalent single-user MIMO system is just at scalar times an identity matrix. Summarized in Table 3 are the covariance matrices for the conventional and equivalent single-user MIMO systems.

Example designs
The main conclusion of the previous section and Tables 2 and 3 is that for the ATP or IAP constraints, the downlink and uplink equivalent single-user MIMO systems  can be treated as conventional single-user MIMO systems. Thus, all example designs in this subsection are (a) given using the notation of the conventional single-user MIMO system (F, G, H, etc.) and (b) are applicable to both downlink and uplink equivalent single-user MIMO systems.
Presented here are practical minimum mean square error (PMMSE), practical maximum mutual information (PMMI), practical minimum symbol error rate (PMBER), and practical SVD (PSVD) designs for the precoder and decoder. Each design is subject to either the ATP or the IAP constraint. The word 'practical' is used as a reminder that these designs are for the proposed practical decentralized downlink and uplink frameworks which have been discussed in Section 2 and summarized in Table 1. Based on Figure 3 and Tables 2 and 3, the four design approaches are outlined in Table 4.
The first three approaches (PMMSE, PMMI, or PMBER) in Table 4 are formulated as optimization problems. The cost functions of PMMSE and PMMI are mean square error (to be minimized) and mutual information (to be maximized), respectively. The PMBER approach maximizes for each pair a lower bound for the minimum distance between symbol hypotheses and is an approximate alphabet-independent minimum bit error rate (BER) design. For the PMMSE design subject to ATP or IAP constraint (denoted as PMMSE-ATP or PMMSE-IAP, respectively), the solution can be readily obtained by applying Lemma 1 or 3 of [15]. Similarly, Lemmas 2 and 4 of [15] can be applied for the PMMI-ATP and PMMI-IAP problems, respectively, and Lemmas 5 and 6 of [15] for the PMBER-ATP and PMBER-IAP problems, respectively. The closed-form solutions of these three approaches are provided in [15] and summarized in Appendix for the convenience of the readers.
Not only these closed-form solutions are optimum for their respective problems in Table 4 (see proofs in [15]), they also fit perfectly into the proposed frameworks. The only requirement which is not met is that the transmitter (i.e., the cell site for the downlink scenario or the user for the uplink scenario) needs an estimate of the noise covariance of the receiver, which increases the network load. As such, some systems may prefer to use other closed-form solutions instead. One such option is to adopt the PSVD approach (the last design listed in Table 4) where the transmitter does not need to know Table 4 Four designs of precoder F and decoder G subject to either ATP or IAP constraint Approaches for designing precoder F and decoder G Constraint the noise covariance of the receiver. In PSVD, the precoder and decoder can be derived directly by performing SVD of the channel (as shown in Table 4): In (15), vectors t 1 … t m and matrix W are given in Table 4. The PSVD-ATP and PSVD-IAP only differ in the αs they use. If L = P/m, the two αs are the same and thus the PSVD-ATP and PSVD-IAP are the same. Alternatively, instead of using the SVD decoder in (15), the MMSE decoder in (29) can also be employed for the PSVD approach. Note that the decoder is designed at the receiver. Thus, in this case, the transmitter still does not need to estimate the noise covariance, unlike in the other three approaches in the Appendix.

Optimality
In the following, the optimality of the PMMSE solution for the downlink and uplink frameworks will be proven. The optimality of PMMI and PMBER results can be established following the same procedure and are therefore omitted. Consider the downlink equivalent singleuser MIMO system first. The MMSE cost function and the power constraint are (from Tables 2,3 Since F k = F k,L F k,R and I d k ¼ F Ã k;L F k;L ; (16) and (17) Note that (18) and (19) define the PMMSE problems for the k th pair data transmission in the downlink framework. Since F k,L has been determined previously for inter-pair interference cancellation and is known, the optimal solution {G k , F k,R } for the downlink equivalent problem in (16) and (17) (see [15]) can be used to construct the optimal decoder and precoder G k ; F k f g F k ¼F k;L F k;R of the k th pair in the downlink framework. Next, consider the uplink equivalent single-user MIMO system. The cost function and the power constraint are Since Note that (22) and (21) define the PMMSE problems for the k th pair data transmission in the uplink framework. Since G k,R has been determined previously for inter-pair interference cancellation and is known, the optimal solution {G k,L , F k } for the uplink equivalent problem in (20) and (21) (see [15]) can be used to construct the optimal decoder and precoder {G k = G k,L G k,R , F k } of the k th pair in the uplink framework.

Feasibility conditions
Regardless of which scenario, the k th pair's data transmission will only be feasible if its equivalent channel (H kk F k,L in downlink and G k,R H kk in uplink) has sufficient rank. The goal of this subsection is thus to derive necessary and sufficient conditions for Data transmission for the k th pair is feasible in the downlink framework if and only if it is feasible in the uplink framework. The reason is twofold. First, since A k = A k ′ , F k,L ′ is a valid choice for G k,R , and G k,R ′ is a valid choice for F k,L . Second, with m k = m k and G k,R = F k,L ′ , (23a) holds if and only if (23b) holds. As such, the following will only focus on the downlink. Without loss of generality, let 0 < m k ≤ u k and, let A k and H kk F k,L be full rank. By observing that (23a) holds if and only if d k ≥ m k and by applying the rank nullity theorem, the necessary and sufficient condition for (23a), is obtained. Interestingly, the feasibility of data transmission depends solely on the number of antennas and data streams -not on the particular F k,L chosen or the channel realization (assuming A k and H kk F k,L be full rank). Similarly, the necessary and sufficient condition for (23b) is

Equivalencies between downlink and uplink frameworks
Let , and let (24a) and (24b) hold. Also, let both downlink and uplink be under the same ATP or IAP. Then, there are actually equivalencies between the performances of the downlink and uplink frameworks for the k th pair: a) Let G k ; R ¼ F 0 k;L and one of the optimum closedform solutions in (29) to (34) is employed. If the k th pair uses the same solution for both downlink and uplink, its downlink and uplink mean square error (MSE) per stream, signal to interference and noise (SINR) per stream, and mutual information are the same. b) For a given power constraint (ATP or IAP), the lowest achievable sum MSE (derived from PMMSE) and highest achievable mutual information (derived from PMMI) for the k th pair are the same for the downlink and uplink frameworks.
Here is a rough sketch of the proof. When the optimum closed-form solutions in (29) to (34) are used, the MSE per stream, SINR per stream, and mutual information are essentially functions of only the eigenvalues of the matrix Ξ in (28). With appropriate variable mappings, this matrix is for the downlink and uplink frameworks, respectively. In (25b), G k ; R ¼ F 0 k;L and H kk ¼ H 0 kk have been employed.
As Ξ k and Ξ k have the same non-zero eigenvalues, 'point (a)' follows because the MSE per stream, SINR per stream, and mutual information are functions of these non-zero eigenvalues. 'Point (b)' can be proved from point (a) and the fact that solutions (29) to (34) are optimum in their respective sense (see Table 4 and the proofs in [15]).

Equivalencies among some optimal solutions
Consider the downlink scenario (the uplink scenario is analogous and is omitted). It has been shown in (A4) that PMMSE-IAP and PMMI-IAP are equivalent (i.e., they have the same precoder and decoder). It has also been shown in that if the power constraint parameters in (9a) and (11a) are related by L = P/m, the PSVD-ATP and PSVD-IAP are equivalent. Interestingly, if the MMSE decoder is employed for PSVD approach and Φ nk ¼ β k I u k , the PMMSE-IAP, PMMI-IAP, and PSVD-IAP are equivalent for any L k . Thus, if L k = P k /m k , Φ nk ¼ β k I u k and the MMSE decoder is employed for PSVD approach, the PMMSE-IAP, PMMI-IAP, PSVD-IAP, and PSVD-ATP are equivalent. When there is only one data stream per pair, PMMSE-ATP, PMMI-ATP, and PMBER-ATP are exactly the same (since Λ in (28) is just a scalar in this case, (30), (32), and (34) will all yield the same Ω in (29)). When in addition, Φ nk ¼ β k I u k and the MMSE decoder is employed for PSVD approach, PSVD-ATP is also MMSE and max information rate. These equivalencies (summarized in Table 5) can be derived, with some work, using (28) and the closed-form solutions for the PMMSE-IAP and PMMI-IAP problem.

Relationship to interference alignment
Firstly, we will show that some of our example designs satisfy the IA conditions. In the downlink scenario (the uplink scenario is analogous and is omitted), a set of precoders {F k } and decoders {G k } achieve IA [16] when Let (24a) hold for every pair. Looking at the downlink framework, one can easily see that all of its  Only one data stream per pair; PSVD employs MMSE decoder; Φ nk ¼ β k I uk implementations satisfy (26b); the left precoders result in A k F k = 0, ∀k, or equivalently H kl F l = 0, ∀l ≠ k, ∀k. In addition, one can easily see that some of its implementations (e.g., PSVD-ATP in Section 3.2) satisfy (26a). As a result, constructive proofs are obtained for the feasibility of IA in the downlink scenario when (26a) holds for every pair. Thus, the PSVD-ATP is an IA-equivalent implementation.
Secondly, we will show that an IA solution satisfies the nulling constraint of our schemes when each pair's number of data streams is equal to its user's number of antennas, i.e., m k = u k . With m k = u k , any {F k ,G k } which achieves IA must therefore satisfy (a) (26a) ∀k; (b) G k −1 exists, ∀k; and finally (c) the nulling constraint H kl F l = 0, ∀l ≠ k, ∀k (equivalently A k F k = 0, ∀k).
Finally, the example transceiver designs in Appendix are optimal for their metrics under their power constraints and nulling constraints A k F k = 0, ∀k. However, the IA-equivalent implementation (such as PSVD) is not designed under those criteria and conditions and, therefore, may not be optimal. Thus, the performance of an example transceiver design in Appendix will be at least as good as that of the IA-equivalent PSVD design for its given metric and power constraint, e.g., the PMMSE-ATP will obtain a MSE at least a small as the MSE of the IA-equivalent PSVD design (see [5,16]). On the other hand, the example designs in Appendix may not be able to achieve easily what general IA designs can [17,18].

Numerical results
To demonstrate the performance of the proposed frameworks, this section presents simulation results for typical CBF K-pair systems under the downlink scenario. The results for the uplink scenario are not presented due to the equivalencies (Section 4.3) and the similarities in the results. Three configurations are considered. In configuration A, K = 2, b k = 4, u k = 2, ∀k; in configuration B, K = 2, b k = 8, u k = 4, ∀k; and in configuration C, K = 4, b k = 8, u k = 2, ∀k. In configuration A, two cases are considered. In case A-1, m k = 1, ∀k (partially loaded), and in case A-2, m k = 2, ∀k (fully loaded). In configurations B and C, only the fully loaded case is presented. Therefore, m k = 4, ∀k, in configuration B and m k = 2, ∀k, in configuration C. As (24a) is satisfied for all cases, data transmission using the proposed framework is always feasible in all of them. No matter for which case, the source covariance matrices are identity matrices. Each data stream consists of uncoded BPSK modulated symbols. The noises are independently identically distributed CN (0,ε) random variables and Φ nk ¼ β k I u k ¼ εI u k , ∀k. The channel elements are independently identically distributed CN (0,1) random variables in each case.
For each case, five designs under the ATP condition are considered: the GIA-ATP, PMMSE-ATP, PMMI-ATP, PMBER-ATP, and PSVD-ATP. For comparison, the PMMSE design under the IAP condition (i.e., PMMSE-IAP) is also included. The MMSE decoder is employed for all designs. The GIA-ATP (see [5]) is a centralized MMSE design and is included as a performance benchmark. Note that the designs considered here are but a subset of the possible implementations. One can derive others using results from [19,20]. The various equivalence relations among different designs under special conditions are discussed in Section 4.4 and summarized in Table 5.
Because PMMSE-IAP's F k;R ¼ ffiffiffiffiffi L k p V k , ∀k, its tr{F k F k * } = m k L k , ∀k. In addition, its L k 's are chosen such that i.e., so that the maximum instantaneous antenna power of each cell site is equal to P, the total average power for a cell site under the ATP constraint divided by its number of antennas. Note that the average power under the IAP constraint will thus be upper bounded by P. For the sake of comparison, perfect CSI is used in the GIA-ATP. In addition, no errors are incurred by the channel soundings and the equivalent channel soundings in the proposed designs.
The sum MSEs, system BERs, and sum capacities for case A-1 versus signal-to-noise ratio (SNR) ≜ 10log 10 (P/ε) are plotted in Figure 4a,b,c, respectively. They are obtained by averaging over 15 channel realizations. First, let us look at the sum MSEs of the six designs in Figure 4a. The GIA-ATP and the PMMSE-IAP provide the best and worst performances, respectively. The other four designs result in exactly the same performance (because there is only one data stream per pair and these four results are equivalent; see Table 5). Furthermore, the sum MSEs of all designs is merging together as the SNR increases. The better performance of the GIA-ATP is expected; it is a centralized MMSE design and its precoders do not necessarily need to null out the interfering channels. The PMMSE-IAP's performance is behind the others because its average total power per cell site is less than P.
Next, let us look at the system BER results. All of the BERs are very good and the performance order of the designs is the same as with the sum MSEs. For the sum capacity results, the designs are still in the same performance order. In addition, all of the curves have approximately the same slope. Though the GIA-ATP is a MMSE design, it has the highest sum capacity. This can be attributed to it being a centralized design while the others are distributed. Moreover, the PMMI-ATP cannot do any waterfilling between data streams because there is only one data stream.
The sum MSEs, system BERs and sum capacities for case A-2 versus SNR are plotted in Figure 5a,b,c, respectively. Since each pair has more than one data stream, the equivalencies between PMMSE-ATP, PMMI-ATP, PMBER-ATP, and PSVD-ATP no longer hold. Note that the performance of each of the three example designs (PMMSE-ATP, PMMI-ATP, and PMBER-ATP) in Appendix is better than that of PSVD-ATP (the equivalent centralized IA design) under its metric and power constraint. Granted, the PMBER-ATP design is using an approximate minimum BER metric. Consequently, it is observed that the PSVD-ATP does slightly outperform it in low SNRs.
First, let us look at the sum MSEs of the six designs. The performance order is, from best to worst, the GIA-ATP, PMMSE-ATP, PSVD-ATP, PMMI-ATP, PMBER-ATP, and PMMSE-IAP. The sum MSEs of all designs is merging together as the SNR increases. Because m k = 2 and Φ nk = ε I 2 , ∀k, the PSVD-ATP is MMSE subject to IAP in (12) with L k = P/2, ∀k. There are two interesting remarks: (a) the optimum performances under ATP in (9b) and IAP in (12) are similar when the average total power of the two are the same and (b) the difference in the performances of the PMMSE-IAP and PSVD-ATP is due to the value of L k used in (12).
For the system BER results, the performance order is, from best to worst, the GIA-ATP, PMBER-ATP and PMMSE-ATP, PSVD-ATP, PMMI-ATP, and PMMSE-IAP. Interestingly, the PMBER-ATP, using an approximate minimum BER design, provides excellent results. In addition, the PMMSE-ATP, though designed for MSE, provides essentially the same BER results as the PMBER-ATP.
For the sum capacity results, the performance order is dependent on the SNR. Among the five decentralized practical designs, the PMMI-ATP has the largest sum capacity (for all SNRs) because it is designed to maximize the mutual information. The PMMSE-IAP has the smallest sum capacity because its transmitted power is less than that used by other designs under the ATP condition. PMBER-ATP has the second smallest sum capacity because it is designed to maximize only the minimum eigenvalue of the matrix shown in Table 4.
Remarkably, the GIA-ATP, though it is a centralized design, does not always have the highest sum capacity. Moreover, two other much simpler decentralized designs, PSVD-ATP and PMMSE-ATP, have achieved similar sum capacities as the centralized GIA-ATP. In fact, PSVD-ATP has a slightly larger sum capacity than GIA-ATP and PMMSE-ATP for high SNRs. This is because PSVD-ATP is equivalent to PSVD-IAP (if L k = P/2, ∀k) and, furthermore, PSVD-IAP is equivalent to PMMI-IAP (since m k = 2 and Φ nk = ε I 2 , ∀k). Thus, the PSVD-ATP is max information rate subject to IAP in (12), i.e., PMMI-IAP, with L k = P/2, ∀k. Note that Figure 4a,b,c presents the single data stream results and Figure 5a,b,c presents the two data stream results. Comparing Figure 4a with Figure 5a, the MSEs in Figure 4a are smaller than half of the sum MSEs (i.e., the corresponding average MSEs over the two data streams) in Figure 5a. Comparing Figure 4b with Figure 5b, the BERs in Figure 4b are smaller than the corresponding average BERs over the two data streams in Figure 5b. Comparing Figure 4c with Figure 5c, the capacities in Figure 4c are larger than half of the sum capacities (i.e., the corresponding average capacities over the two data streams) in Figure 5c. All of the above observations are due to the fact that each of the two communication pairs in configuration A is an equivalent 2 by two single-user MIMO system, and the two eigenchannel gains of the equivalent 2 by two single-user MIMO system are usually very different. Thus, one of the two data streams in case A-2 (results presented in Figure 5a,b,c) must go through the eigenchannel with the smaller channel gain. But the single data stream in case A-1 can always use the eigenchannel with the larger channel gain (results presented in Figure 4a,b,c). Thus, the per-stream performances in Figure 4a,b,c are generally better than those in Figure 5a,b,c.
To demonstrate the usefulness of the proposed schemes, we also present the numerical results of the two larger systems. In configuration B, the number of antennas at each user is twice of that in configuration A; in configuration C, the number of users is twice of that in configuration A. Obviously, the number of antennas at the cell site in configuration B or C needs to be twice of that in configuration A as well. The MSEs, system BERs and sum capacities for configuration B are plotted in Figure 6a,b,c, respectively. In addition, the MSEs, system BERs, and sum capacities for configuration C are plotted in Figure 7a,b,c, respectively. Although the systems are larger, the observations made for configuration A can also be made for configurations B and C. Moreover, comparing twice in size (in terms of the number of the antennas) of the system represented by Figure 5a,b,c.

Conclusions
Two frameworks (and various example designs) are proposed for the practical transceiver and signaling design of a K-pair system desiring to employ CBF. Though one is for the downlink scenario and the other for the uplink scenario, they are very similar. Firstly, both of them use the same mechanisms (e.g., channel soundings from the users, equivalent channel soundings from the cell sites, decoupling the system into K single-user MIMO systems) and have the same feasibility conditions. Secondly, there exist equivalencies between their performances. Thirdly, both have implementations which are constructive proofs for IA. For example, one of the example designs, the PSVD, is shown to be equivalent to a centralized IA design. Unlike [21], there is no difficulty dealing with more than one data stream per pair. Fourthly, optimum closed-form solutions are able to be given for both of them. Fifthly, the performances of these optimum closed-form solutions in their corresponding design metrics are at least as good as those of the centralized IA-equivalent PSVD design. For example, the information rates of the max information rate closed-form solutions (PMMI) are at least as high as those of the centralized IAequivalent PSVD design. Sixthly, the numerical results show that they both have implementations which obtain higher sum capacities than the GIA (a centralized MMSE approach). Clearly, they are both frameworks for practical low-information exchange CBF designs which are able to deliver the long-awaited performance gain. Over the years, there has been much debate over whether to use TDD or FDD. In the light of CBF and this paper's proposed designs, it becomes clear that the ability of TDD to support channel soundings in the reverse direction is a great underused advantage. We envision that this ability will be a key for implementing and fully harnessing the benefits of other MIMO techniques as well. As both LTE and wireless interoperability for microwave access (WiMAX) utilize TDD, the newly proposed C-RAN network [22][23][24] also utilizes TDD. It may not be long before reverse channel sounding enabled MIMO techniques, such as this paper's proposed designs, are employed. The usefulness of CBF is not limited to mitigating the inter-site interference between cell sites of a cellular system. It can be used whenever multiple transmissions are using the same frequency at the same time. Due to the practical decentralized nature of this paper's proposed designs, it seems possible that CBF will be used to mitigate the interference that macrocells and femtocells cause to each other [25]. In addition, it seems possible that it will be used in non-cellular systems as well (e.g., ad hoc networks, mesh networks).
Although only the analysis of a K-pair system is presented in this paper, the five phases of the proposed frameworks in Table 1 can be extended to deal with a multiuser scenario where each cell site needs to talk to multiple users in its cell simultaneously. The closed-form optimal solutions (e.g., the ones in Appendix) for the Kpair system are no longer available for the K-multiuser system. Many multiuser precoder-decoder designs are available for the K-multiuser system. But, they may require additional signaling load. Investigation will be needed to determine how to trade off between performance and signaling load for the K-multiuser system. The number of users which can be served simultaneously is limited by the number of antennas of the cell sites. If many users exist in the same cell, some kind of user scheduling or selection scheme [26,27] is required. One possibility is frequency multiplexing, a natural solution in orthogonal frequencydivision multiplexing (OFDM) systems like LTE.
Currently, the practicality of the proposed approach is somewhat limited by the fact that the antenna setup is restricted and, therefore, only a small number of cell or user antennas can be supported. However, the proposed approach is very promising for a future large-scale network, because the current research trend is massive MIMO [28] where huge cell site antenna arrays are employed.
There is a concern about the effectiveness of zero forcing used for block diagonalization in a cellular system since the users may not be of the same distances from the cell site. The problem can be mitigated to a certain extent by power control where the received powers of all users are controlled to be at the same level at the cell site. Smart scheduling, like frequency multiplexing, can also be employed to group users with similar SNRs together. In addition, it is shown in [28] that the channel matrix for a massive MIMO system tends to be well conditioned and, therefore, the zero-forcing technique may becomes more appealing as the number of antennas increases. There is also a concern about the effects due to the channel estimation error. It is shown in [28] that as the number of antennas increases, the thermal noise can be averaged out so that the system is predominantly limited by interference from other transmitters. In summary, all these practical limiting factors are reduced as the number of antennas increases. We conclude that the more massive MIMO and TDD technologies advance, the more practical and promising our proposed approach will become.
Endnote a A part of this work has been presented in IEEE Sarnoff Symposium 2011 (see [12]).
where M ≤ m is chosen so that ω i >0 when i ≤ M and ω i =0 when M < i ≤ m. For the PMMSE-IAP and PMMI-IAP problems, Hence, the optimum closed-form solutions provided for the PMMSE-IAP and PMMI-IAP problems are the same. For the PMBER-ATP problem, For the PMBER-IAP problem, For the PMMI-ATP problem, where J ≤ m is chosen so that ω i >0 when i ≤ J and ω i = 0 when J < i ≤ m.