MSE minimized joint transmission in coordinated multipoint systems with sparse feedback and constrained backhaul requirements

In a joint transmission coordinated multipoint (JT-CoMP) system, a shared spectrum is utilized by all neighbor cells. In the downlink, a group of base stations (BSs) coordinately transmit the users’ data to avoid serious interference at the users in the boundary of the cells, thus substantially improving area fairness. However, this comes at the cost of high feedback and backhaul load; In a frequency division duplex system, all users at the cell boundaries have to collect and send feedback of the downlink channel state information (CSI). In centralized JT-CoMP, although with capabilities for perfect coordination, a central coordination node have to send the computed precoding weights and corresponding data to all cells which can overwhelm the backhaul resources. In this paper, we design a JT-CoMP scheme, by which the sum of the mean square error (MSE) at the boundary users is minimized, while feedback and backhaul loads are constrained and the load is balanced between BSs. Our design is based on the singular value decomposition of CSI matrix and optimization of a binary link selection matrix to provide sparse feedback—constrained backhaul link. For comparison, we adopt the previously presented schemes for feedback and backhaul reduction in the physical layer. Extensive numerical evaluations show that the proposed scheme can reduce the MSE with at least 25%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$25\%$$\end{document}, compared to the adopted and existing schemes.

IoT is needed. (c) Higher system reliability and quicker round trip times are to be available in transportation systems and industrial process control. To fulfill these requirements, a combination of various new techniques such as massive multiple input multiple output (MMIMO), dense small cells, cooperative communications such as device-todevice (D2D) and coordinated multipoint (CoMP), advanced air interface, additional spectrum at higher frequencies (mm-wave), and integrated access and backhaul (IAB) are needed [1][2][3][4][5].
One of the most promising concepts to cover all the above requirements is ultra-dense networks (UDN) with frequency reuse (close to) one, where more small base stations (BSs) are deployed within the service area. However, using the same frequency for all BSs, exposes the users to severe inter-cell-interference (ICI), especially at the cell edges. Recently, to reap the benefits of UDN, cloud radio access network (CRAN) architecture has been proposed, where ICI can be effectively mitigated by employing the CoMP technique [6,7]. Initially, CoMP was introduced for LTE-A by the third generation partnership project (3GPP) to mitigate ICI in cell-edge users [8]. Qualcomm has implemented a fifth-generation (5G) CoMP testbed and showed a fourfold increase in system capacity [9]. Currently, CoMP is considered as one of the potential technologies for 5G cellular networks. Accordingly, 5G enhanced some aspects of CoMP such as control signaling and channel feedback [10,11].
There are three main categories for downlink CoMP, i.e. joint transmission (JT), dynamic point selection (DPS), and coordinated scheduling/beam-forming (CS/CB). In JT-CoMP, the data related to a user is available at all serving BSs and is transmitted simultaneously by each BS. This transmission can be coherent or non-coherent. In coherent transmission (also known as multiple input multiple output (MIMO) network), the signal strength is enhanced by precoding the data to exploit the phase and amplitude information of each channel. We consider coherent JT-CoMP, since it generally outperforms the laters in system performance [12].
In JT-CoMP, a group of BSs forms virtual antenna array distributed across multiple cells. In the downlink, two or more geographically separated BSs cooperate to jointly and coordinately transmit to cell-edge users, where the improvement is most needed and exploit the interference as a useful signal. Superposition of signals at the user position is performed in a way to maximize the desired signal (constructive) and at the same time minimize the ICI (destructive). This requires that accurate channel state information (CSI) is available at the transmitter side. In time division duplex (TDD) transmission, CSI is acquired from the reciprocity of uplink and downlink channels, while in frequency division duplex (FDD) transmission, users need to feedback the received CSI from all serving BSs to the BS with the strongest link. In the centralized approach of FDD JT-CoMP, the CSI for all users is aggregated in the central coordination node (CCN) to calculate the precoding weights for the subsequent downlink transmission. Through backhaul, the precoding weights with the users' data are to be sent from CCN to the corresponding BSs. Finally, each BS transmits a weighted combination of all users' data to the users.
The throughput of downlink JT-CoMP heavily relies on the quality of the CSI at the transmitters. Feedback latency and reliability of the feedback channel degrade the system performance [13,14]. Other impairments such as imperfect carrier and sampling frequencies among BSs cause a mismatch between the precoder and the actual channel, which limits the potential gains of JT-CoMP [15,16]. In practical implementations of JT-CoMP, feedback and backhaul loads are two main challenges that need to be addressed properly. As system performance depends on the CSI quality, CSI has to be sent back at very low latency to avoid it to be outdated before being used for precoding. Therefore a large amount of CSI is required, which poses a considerable feedback load. The users' data needs to be available at all serving BSs. Moreover, the precoding weights are to be sent to the corresponding BSs, which pose a heavy burden on the backhaul traffic. Also, increasing the number of cooperating BSs to improve the spectral efficiency, increases the backhaul load [17][18][19].
Although CoMP is one of the main solutions to mitigate ICI and has been considered from 4G to B5G, feedback and backhaul loads have been identified as two of the key challenges for its practical implementation and prevent the CoMP from real take off. It is expected in the 5G era, a powerful fiber backhaul is available at least for macro-cells in CRAN architecture. Such deployment can inherently provide a low-latency and highcapacity backhaul needed for JT-CoMP, while connections of small-cells might be ranging from fiber to various relaying and IAB approaches, which have lower capacity but might be more cost-effective [3,5].
It is highly desirable to reduce the feedback and backhaul loads by routing users' data to a limited number of BSs. This reduction can be done in the medium access control (MAC) layer [20][21][22][23][24] or physical (PHY) layer [7,[25][26][27]. Furthermore, recently by considering the cache mechanism to BSs, the transmission latency and the total backhaul bandwidth consumption is reduced which is based on upper layers of the network [28]. In this paper, we aim to reduce feedback and backhaul loads simultaneously in JT-CoMP using the PHY layer schemes.
In the PHY layer based schemes, limited feedback and backhaul precoders are designed with respect to sum-rate maximization [25][26][27] or maximizing the number of users admitted to the network [7]. In [25], absolute and relative thresholding was proposed for feedback load reduction in the FDD downlink. In absolute thresholding, only CSI with a corresponding signal to noise ratio (SNR) exceeding a predefined threshold is sent back, whereas in relative thresholding, the threshold is set based on the strongest channel. It has been shown that the latter technique provides a good trade-off between sum-rate performance and feedback overhead. An absolute threshold is used for selective feedback in [26], where for limiting the user data exchange through the backhaul, two schemes including scheduling and precoding techniques are employed. The proposed framework has a good sum-rate performance with limited system overhead. The idea of relative thresholding is followed by [27], where the precoder is designed using a successive second order cone programming (SSOCP) to maximize the sum-rate for all cell-edge users.
Recently, the 3GPP initiated a standardization activity to employ codebook-based precoding at BSs with an aim to decrease CSI feedback overhead to satisfy the spectral efficiency requirement of future cellular systems. In 3GPP LTE, codebook type I was introduced [29] and for more accurate CSI feedback to better support the transmission in new radio (NR), codebook type II was introduced in 3GPP Release 15 [30]. This throughput gain comes at the expense of a significant increase in feedback overhead. To this end, Release 16 introduces enhanced CSI feedback by compressing the CSI report in the frequency domain and extending the codebook type II to support MIMO channels with rank larger than two [31]. These enhancements increase throughput and reduce CSI feedback overhead [32].
In this paper, we design a novel JT-CoMP transmission scheme based on singular value decomposition (SVD) to minimize the sum mean square error (MSE) at boundary users. As in practical networks especially mobile systems, implementation of continuous rate adaptation is impossible and rate is selected from a limited discrete set [33][34][35], we consider the MSE criteria for system performance evaluation. We optimize a binary link selection matrix, in which each element corresponds to the link between an antenna of a BS and an antenna of a user. If an element is one, the corresponding antenna serves the user, otherwise, it is not involved in the transmission. We propose a two-layer recursive optimization method; In the inner layer, the SVD of the CSI matrix is utilized to design a precoder fulfilling a sum-power constraint. In the outer layer, the link selection matrix is designed, providing required feedback and backhaul load reductions and load balancing between BSs. To obtain a further reduction of the feedback load, we consider a CSI codebook based limited feedback strategy, where each user selects a codeword from a CSI codebook and feeds back its index to the serving BS. The CCN collects all the codeword indexes and calculates the precoding matrix. Random vector and uniformly distributed quantizations are employed respectively for quantizing the channel direction information and phase ambiguity.
We compare our scheme with two recent works in this area [26,27], having a close target performance goal for our design. Moreover, we adopt two existing precoders, zero forcing (ZF) and Wiener, in our two layer optimization scheme to provide sparsity constraints on feedback and backhaul. As shown, our scheme outperforms [26,27] with at least 30% , and adopted ZF and Wiener precoders with 25% from the MSE aspect. The key contributions of this paper are summarized as follows: • In the previously presented approaches, just the average backhaul or feedback loads are controlled, while the hardware must be available for the worst-case scenario. In our scheme, the feedback and backhaul loads are strictly constrained. • As the load balancing has a key role in radio resource optimization, we consider a constraint for the number of users that are served by a specific BS. This association constraint between BSs and users, reduces the maximum load in each BS. • The proposed scheme is not sensitive to the type of receiver and has the same performance in receivers with different receive filters, while the performance of adopted ZF and Wiener precoders depend on the type of receive filter. Therefore, an advantage of the proposed scheme is that it has good performance in simplified receivers such as receivers using no receive filter. • We employ a CSI codebook based feedback strategy to further reduce the feedback load. In this regard, the random vector and uniform distribution are used for quantizing the channel direction and phase information. It is showed that by employing only 6 bits for quantization, a performance near to the full CSI feedback is attainable. In this technique, users employ different CSI codebooks to independently quantize their CSI.
• The proposed scheme has good convergence properties. It converges after transmitting 5 subframes, in the worst case.
The remainder of the paper is organized as follows, in Sect. 2, preliminaries including notation and system model are presented. Section 3 is devoted to designing the new proposed JT-CoMP transmission scheme, in Sect. 4, benchmark schemes are explained, and Sect. 5 numerically evaluates the proposed scheme and its efficiency in comparison with the benchmarks. The paper conclusions are made in Sect. 6.

Notation
In this paper, scalar variables are denoted by small italic letters e.g. x and vector variables by small italic bold letters e.g. x . Sets are denoted by calligraphic letters e.g. W .
The absolute value of scalar variables or the number of members in a set or matrix is shown by |.| , the maximum integer lower than x is shown with ⌊x⌋ . The Euclidean and  Table 1. Figure 1 shows a schematic form of the considered network, including 3 neighboring cells and 3 users at the common boundary of the cells. There is a cluster area in the middle of the cells, in which there would be high interference, in case of no media division or coordination between cells. This is shown with a gray area in Fig. 1.

System model and network structure
In downlink transmission of the JT-CoMP scheme, BSs are coordinated and jointly transmit as a single virtual multi-antenna transmitter with distributed antennas. To this end, data for the users in the cluster center is sent to all BSs via a backhaul link. Each BS transmits a linear combination of users' data with a proper precoding weight. Precoding weights are calculated in the CCN, based on the CSI of all links between BSs and cluster centered users.
In TDD transmission, CSI could be implicitly estimated at the BSs based on channel reciprocity. But in FDD, CSI is estimated by the users and are sent back to the BSs. All CSI is sent from the BSs to the CCN to calculate the precoding weights. Figure 2 shows the downlink of the mentioned system. In general, there are N b BSs each of them with N t antennas that are serving N u users, each of them with N r antennas. Thus, there are in total N B = N b N t transmit and N U = N u N r receive antennas. The channel gain at the u-th user, The aggregated data symbols are denoted by is the data symbol for the u-th user, where E xx H = σ 2 x I N U . Each BS transmits a linear combination of all precoded data symbols. The precoding matrix corresponding to the u-th user, which belongs to different BSs, is denoted by W u ∈ C N B ×N r , 1 ≤ u ≤ N u and the aggregated precoding matrix is denoted by

Fig. 1 A simple schematic model for the considered network
the transmitted signal by all BSs is W 1 x 1 + · · · + W N u x N u and the received signal at the uth user is where n u is a complex Gaussian random variable with zero mean and variance of σ 2 n . The signal to interference and noise ratio (SINR) at the u-th user is computed as By considering the aggregated channel gain as is the noise vector. The receive filter at the u-th user is denoted by g u ∈ C N r ×N r and the detected signal at the u-th receiver is x u = g u y u . If the receive filters are aggregated as G = diag g 1 · · · g N u ∈ C N U ×N U , the detected symbols are computed as This paper aims to minimize the weighted sum MSE at all users which can be calculated as (3) y = HW x + n, (4) x = Gy.
, and a u is the non-negative user weight. To make the MSE calculation to be meaningful, α = diag(α 1 I N r · · · α N u I N r ) ∈ R N U ×N U + is considered, where α u is a scalar factor which can be considered as an automatic gain control (AGC) gain in the receiver [36][37][38][39]. In Sect. 3.4, the traditional receive filters, and the AGC scalar factor are stated for the proposed system.

Design of a novel JT-CoMP transmission scheme with sparse feedback and constrained backhaul
This section aims to design a novel scheme for downlink transmission in a centralized JT-CoMP system. The goal of the design is to minimize the MSE in (5). As stated before, the drawback of JT-CoMP is its feedback and backhaul loads. It is interesting to design a transmission scheme in which the CSI requirement and backhaul load are constrained. Towards this aim, we define a binary link selection matrix when the link between the i-th transmit antenna and the j-th receive antenna is active and it is 0 when the mentioned link is idle. The column-wise sub-matrix of the link selection matrix related to the b-th BS is defined as The sparse precoding matrix is defined as where the backhaul load reduction is proportional to the cardinality of the set

and is defined as
Similarly, the feedback load reduction is proportional to the number of zeros in the sparse aggregated channel matrix Ĥ , i.e. the cardinality of We assume the equivalent feedback of Ŵ as Ĥ = H · S T and this results in a feedback load reduction as Note that in our proposed scheme, the selection matrices for feedback and precoding are transpose of each other, and the feedback and backhaul load reduction ratios are the same, i.e. r fl = r bl . Indeed nnzc(S b ) shows the number of users which are served by the b-th BS. The load of BSs may be balanced, by considering the following constraint [40] In the following, a general optimization problem to find W and S is set up and solved. The transmission scheme is briefly shown in Fig. 3. At the beginning of the transmission, S = 1 N B ×N U , i.e. we start with a non-sparse case. Substituting (3) and (4) in (5) and considering σ 2 x = 1 and E{n} = 0 , the MSE in (5) is computed as where σ 2 n is the noise variance. The detailed steps of MSE computation are described in "Appendix 1". The goal is to minimize the MSE provided that the total transmission power is constrained to P t , feedback and, backhaul loads are constrained and load is balanced between BSs. Thus the optimization problem is set up as follows subject to: Remark 1 Note that S and W are calculated at the CCN, which is not necessarily aware of the receiver and its filter, thus the CCN can only manage to minimize the part M 1 in (11).
Remark 2 Users estimate channel H and find S from the received data, and feed back a composite and sparse version of the CSI, which also contains the information about the receive filter. I.e., H f = (αGH) · S T is sent back and aggregated in the CCN. The CCN receives the sparse composite CSI and it estimates the full composite CSI as where H may be an old version of H f or long term channel statistics (e.g. received signal strength indicator (RSSI)). (11) is too complicated to be solved directly, especially for its boolean parameters. Thus, we use a sub-optimum solution by converting it to a twolayer optimization procedure. In the inner layer, W is calculated considering a fixed S and subject to constraint C(1). In the outer layer, S is found subject to constraints C (2) and C(3), i.e. the problem is converted to

Remark 3 As seen, problem
In the following, we first explain the two-layer optimization scheme, and next considerations about traditional receive filters are explained.

Inner layer optimization: precoder design
In this section, we aim to design a robust precoder by minimizing the part M 1 in (11) for different types of receive filter that also has good performance in a receiver with no filter. We design the precoder weights based on the composite channel gains, H . If W = Ŵ |Tr Ŵ HŴ ≤ P t is the set of all possible weights that is satisfying the total power constraint C(1), by considering fixed S and a = I , the problem (11) can be written as (17) and applying the triangle inequality, an upper bound for the objective function is acquired as Now, instead of optimization of the objective function in (17), we try to optimize its upper bound. First, we consider the section F1 of (18) and adapt the Theorem 2 from [41] to minimize it as follows (15) . Theorem 1 Let W denotes a nonempty convex set, then U W = VH and V W = UH are optimal for the problem (19), where UH, VH are unitary matrices from the SVD of H and U W , V W are obtained from the SVD of W .
The proof of Theorem 1 is similar to the proof of Theorem 2 in [41]. However, in [41], the scalar factor in MSE computation is omitted and the Theorem is proved for perfect and statistical CSI.
Consider the SVD of the composite channel as containing the singular values of the composite channel in decreasing order. Unitary matrices UH ∈ C N U ×N U and VH ∈ C N B ×N B are scaling and rotation matrices such the singular values of weights. Based on Theorem 1, the left and right singular vectors for the optimal precoder are equal to the right and left singular vectors of the composite channel which simplifies the norm in problem (19) as Since the Frobenius norm is invariant with respect to unitary transformation [42], it is equivalent to Therefore, the section F1 is simplified to Second, we try to minimize the section F2 of (18). In this regard, an auxiliary vector i,j , which is constructed by stacking the columns of VH .U * H one below the other, is defined, and the elements of W which are corresponding to idle links (zero elements of S ) are denoted as Using these definitions, section F2 of (18) is rewritten as To compute power constraint, Tr Ŵ HŴ is computed as  (23) and (26), the total power constraint C(1) can be rewritten as Finally, by adding (25) to problem (22), the sparse precoding design problem is summarized as subject to: In order to solve the above problem, the sequential quadratic programming (SQP) optimization method [43,44] can be used. In this method, the search direction denoted as d is obtained by solving the following sub-problem subject to: where f ( W ) and g( W ) are the objective and constraint functions and φ is a symmetric positive definitive matrix. By a starting point W , the new point is generated as W = W + εd , where ε is a scalar non-negative step size.

Outer layer optimization: selection matrix design
Finding the optimal solution of the outer layer of the problem (11) might be too computationally cumbersome, as it involves Boolean constraints. One naive way to find S is to use exhaustively search among all possible combinations of the selection matrix for the one that gives the best MSE. Although the exhaustive search might be the only mechanism for a truly optimum selection of users to be served by each BS under load balancing constraint, its computational complexity grows quickly and it becomes impractical. Tr An alternative is to use the convex optimization techniques. Since the elements of S are binary variables, it makes the optimization problem NP-hard. To solve this optimization problem, it's possible to use the concept of linear programming relaxation [45], where the constraint that each element must be a binary variable is relaxed to a weaker constraint that each is a real number in the interval [0, 1] . The performance of convex optimization with relaxation is very close to the optimal one based on exhaustive search. Although its complexity is not as prohibitive as the exhaustive search, it is still high, being approximately in the order of O n 3 , where O presents the complexity and n = N B N U [46].
To deal with this issue, a Greedy algorithm that attempts to approximate the optimal solution can be implemented. The Greedy algorithms have been widely applied in the framework of wireless communication, particularly in scheduling for CoMP systems, where the objective is to select the set of users that maximizes a certain metric function [47]. The Greedy algorithm which is guaranteed to converge has two main advantages. First, it allows a considerable reduction in complexity, requiring roughly O n 2 operations. Second, it can be applied to a wide range of metrics of interest [48]. We use the following Greedy algorithm to find a local optimum for our problem subject to C(2) and C(3), where the metric function f (H, S) is defined as The principle of the proposed Greedy algorithm is as follows. We start with an initial selection matrix obtained by randomly selecting Q = N B N U (1 − r fl ) elements of S to be one and the remaining to be zero, providing the conditions C(2) and C(3) are fulfilled. Then, we select the first element among the zero-value elements and find the one-value element that when replaced with the selected zero-value leads to a reduction in the metric function. When this occurs, S is updated by replacing the zero-value element with the one-value that presents the largest reduction in the metric function. This process is repeated for other zero-value elements and therefore the selection matrix is designed in a way that the MSE is minimized. The algorithm is summarized in Algorithm 1. (30).
The computational complexity of the pseudo code in Algorithm 1 is computed in "Appendix 2". The overall complexity of the Algorithm 1 is O CN 2 B N 3 U , where C is a constant. Then, the complexity is a function of the number of coordinated BSs, number of users and number of transmit and receive antennas. It increases with the square of the number of BSs and transmit antennas and with the third power of the number of users and receive antennas.

Limited feedback using CSI codebook
For a further reduction of the feedback load, we consider a CSI codebook based limited feedback strategy, where each user selects a codeword from a pre-designed CSI codebook and feeds back its index to the serving BS [7,49]. The CCN collects all the codeword indexes sent from different BSs and calculates the precoding matrix. Different CSI codebooks are employed by the users, so they can independently quantize their per-BS channel direction information (CDI). The per-BS CDI for the u-th user can be expressed as h u,b = h u,b / h u,b , where h u,b ∈ C N r ×N t is the CSI of the links spanning from the b-th BS to the u-th user, i.e. h u,b (i, j) = H u (i, j), 1 ≤ i ≤ N r , (b − 1)N t + 1 ≤ j ≤ bN t . Random vector quantization (RVQ) is considered for quantizing the per-BS CDIs, where the quantized version of the CDI is given by [49] as where C u,b is the CSI codebook used by the u-th user to quantize the CDI of the b-th BS, which consists of 2 B CDI codewords. The codeword c n ∈ C N r ×N t , is a random vector with unit norm, and B CDI denotes the number of bits used for quantizing the CDI.
We assume that the per-BS channel norm, h u,b is perfectly known at the CCN, which is not included in the feedback information. The knowledge of these scales can be obtained at each BS by averaging the per-BS channels [50]. After aggregating of all the CDI indexes in the CCN, the CSI is reconstructed as The global CDI quantization error is computed as is the normalized per-BS channel norm and µ u,b = h u,bĥ H u,b is the normalized per-BS quantization gain. The angle between the per-BS CDI and its codeword is denoted as ϕ u,b , i.e. e jϕ u,b =h u,bĥ H u,b h u,bĥ H u,b , which is named as phase ambiguity (PA). As expected, in the perfectly quantized CDI condition, ε u = 0.
In contrast to a single point transmission system, where the PA does not affect the CDI quantization performance, in coherent transmission, the PA affects the co-phasing of the system and degrades the feedback scheme performance. This is owing to the fact that the codeword selection in (35) only maximizes the magnitude and ignores its phase. The performance degradation due to the PA is more severe for cell-edge users [51].
The PA is uniformly distributed in [−π , π] and employing a uniform quantizer for PA quantization is optimal [7]. In this regard, the PA can be fed back with aid of a few bits by using a scalar uniform quantizer. By considering B PA bits to quantize the PA, the quantized PA is given by

Receiver considerations
In this section, traditional receive filters are adopted for the transmission scheme. Receivers use stream specific pilots to estimate the effective channel, which includes the precoding weights and feedback (similar to implicit channel in the 3GPP standard). The effective channel for the u-th user is defined as (35) Common linear filters such as matched filter (MF) or minimum mean square error (MMSE) filter may be used in receivers to combat channel distortion and noise. Although motivated by aiming for a low-complexity receiver, we consider a receiver to use no filter at all, i.e. the receive filter at the u-th user is g No u = I N r . In MF, the filter is designed to maximize the signal portion of the received signal, and as the signal to interference ratio is not minimized, it is useful in low-noise conditions [36]. In our sparse system, the MF receive filter for the u-th user is The MMSE receive filter is designed to minimize the MSE and finds a good tradeoff between the signal portion and the interference [52]. To compute the MMSE filter in the sparse system, by setting α u = 1, u = 1, . . . , N u in (5), the MSE for the u-th user can be expressed as To minimize the MSE, we can apply the trick of taking the conjugate complex derivative [53] w.r.t. g H u and set to zero as where the expectation in the first term is the variance of the received signal by the u-th user and the second term is the cross-correlation of the data symbol with the received signal. Noting (1), these expectations are computed as follows where R x u = σ 2 x I N r and R n u = σ 2 n I N r are the variance of the data symbols and noise for the u-th user. Setting the derivative (41) to zero gives the MMSE receive filter in the sparse system as Note that the received signal variance according to (42) has three parts, including desired signal, interference and noise. The desired part can be computed from the effective channel directly, while the interference needs explicit channel estimation and some information about the precoding weights. However, it is possible to compute the received signal variance directly from the two-dimensional (frequency-time) received signal vector (similar to reference signals in LTE Release 14 [54, Section 6.10]).
Finally, scalar factor α in (5) is considered as the AGC gain and is used to adapt the input signal with the dynamic range of the analog to digital converter (ADC) [36]. Similar to (39) MMSE receive filter computation, the optimum value of the scalar factor can be calculated with minimization of MSE as

Two-layer optimization with inner ZF precoder
In this section, we adopt the conventional ZF precoder as the inner precoder for the proposed two-layer optimization scheme. The ZF precoder is designed to remove the interference completely and has good performance in high SNRs [36]. Using the composite channel in (15), the ZF precoder can be designed as where β ZF is used for power control, and in the sum-power constraint of P t it is computed as In the outer layer of the proposed two-layer optimization, we compute the metric function for the Greedy algorithm based on the inner ZF precoder and design the selection matrix in a way that the metric function is minimized. By substituting the closed form precoding matrix from (46) in (11), a new metric function for the Greedy algorithm is computed as The following steps of the Greedy algorithm are similar to Algorithm 1, except substitution of the metric function with (48) and removing steps 2 and 3.

Two-layer optimization with inner Wiener precoder
Similar to the previous section, we adopt the conventional Wiener precoder as inner precoder for the proposed two-layer optimization scheme. The Wiener precoder minimizes the interference and maximizes the signal to interference and noise ratio (SINR). This precoder has better performance in comparison with ZF, especially in low SNRs [36]. The Wiener precoder is derived as where β WF controls the transmitter power, and for the sum-power of P t it is computed as The metric function for the Greedy algorithm in the outer layer is computed by substituting the closed form precoding matrix (50) in (11) as In Algorithm 1, by substitution of the metric function with (52) and removing steps 2 and 3, the Greedy algorithm for the outer layer is obtained. Note that for computing the Wiener precoding matrix in (50), it is required that the receive filter is known in the CCN, which is contradictory to designing the precoder using only the composite channel. Therefore, the Wiener precoder cannot be used directly as the inner precoder in the proposed transmission system, and here we consider it only for comparison purpose.

Selective feedback precoder
In the selective feedback technique [26], users with weak links are prevented from feeding back their CSI to the CCN and each user feeds back at least its strongest CSI. By exploiting a binary feedback index matrix, the coefficients of the channel matrix whose CSI is below a specified threshold are replaced with zeros. This technique can be categorized as an absolute thresholding approach for feedback load reduction.
To overcome the backhaul overhead related to routing users' data to several BSs, two schemes are proposed: one scheme based on MAC layer scheduling, and the other is based on the physical layer precoding. In this paper, we consider the latter, where by vectorization and eliminating of zero elements of the channel matrix, the precoder is designed using the ZF precoding approach.

SSOCP based relative thresholding precoder
In relative thresholding, users feed back only the CSI of links with channel value within a threshold relative to the strongest BS. In [27], an SSOCP based precoder for maximizing the weighted sum-rate is proposed in which the long term channel statistics are used to model the statistical interference for the unknown CSI. The precoder design problem with per antenna power constraint is considered as subject to: where γ u is the SINR for the u-th user and P a is a per antenna power constraint. For comparison with the sum-power constant of P t in (12), we consider P t = N B P a .

Results and discussion
The numerical evaluation program is developed based on 3GPP time-coherent channel model [55] by MATLAB. The performance of the proposed scheme is evaluated by Monte-Carlo simulation. The inner optimization is performed using the SQP method and the outer layer is based on the Greedy algorithm. The simulation parameters are summarized in Table 2.

Channel model
Consider a JT-CoMP scenario where a set of N u = 3 single antenna users at the cluster center are being served by N b = 3 cooperating BSs with each N t = 1, 2 antennas. The cell radius is R = 500 m and the cell-edge SNR is variable. According to an example in Fig. 4, users are uniformly dropped at the cluster center, along an ellipse with semi-major and semi-minor axis of length R 16 and h/2 16 , respectively where h = √ 3 2 R is the height of the hexagon of the cluster area. We consider the 3GPP channel model [55,56]. The fading channel model includes the path-loss component γ PL = 128.1 + 37.6 log(R) (R is in km ), γ SF = N (0, 8 dB) shadowing fading and a Rayleigh fast fading component Ŵ which is simulated as a circularly symmetric complex Gaussian random variable as CN (0, 1) . The i.i.d channel between the BSs and the users is calculated as where G = 1 is the gain of the antennas at the BSs and C ∈ R N T ×N T is the correlation matrix of the antennas at the BSs, with the correlation between the antennas being ρ = 0.5 for all antenna pairs. We consider a time coherent channel model, where the CSI is varied only due to the effect of user movement, and the channel coefficient of the new CSI is based on Clarke's model [57]. The channel evolves in time as (54)  where ρ = J 0 (2π f d △t) is the channel correlation coefficient. Here, J 0 (.) is the zero-order Bessel function, the Doppler frequency is f d = vf c c with the velocity of the user being v, the carrier frequency is f c = 2 GHz , c is the velocity of propagation, and △t is the evolved. The value of △t is considered 1 ms as the FDD uplink/downlink frame duration.
The receiver noise power is N 0 = k B T 0 B n Watts, where k B is the Boltzmann's constant 1.38 × 10 −23 Joules/Kelvin , T 0 = 290 Kelvin is the operating temperature, and B n = 10 MHz is the system bandwidth. The number of channel realization is 10 3 and maximum BS power with cell-edge SNR = 10 dB is 22.8 dBm or 0.19 W.

Simulation results
This section is devoted to the numerical evaluation of the performance of the designed JT-CoMP scheme. The general form of the network structure is depicted in Fig. 1. Time invariant and variant channel models are adapted from (54) and (55). To comprehensively evaluate the proposed scheme, we consider three stages. In the first stage, the proposed precoder is compared with the adopted ZF and Wiener precoders in Figs. 5 and 6, and comparison with selective feedback [26] and SSOCP based relative thresholding [27] precoders are performed in Fig. 9. In the second stage, performance of the proposed scheme is widely analyzed in Figs. 10, 11, and 12, where the effect of load reduction, probability distribution of MSE, and time convergence of the algorithm are investigated. In the third stage, the limited feedback effect on the proposed scheme is analyzed in Figs. 14 and 15.

Comparison to other schemes
In Fig. 5, performance of the adopted ZF, Wiener, and the proposed scheme are compared in a wide range of edge SNR in a receiver without receive filter. Note, in throughout the simulations, SNR is defined before any receive filter. The system is in (55)  scheme is not sensitive to the type of receiver, while performance of adopted ZF and Wiener precoders depends on the receiver filter. Therefore, a privilege of the proposed scheme is that, it has good performance with simplified receivers, such as receivers using no receive filter. In Fig. 6, the performance of the proposed scheme is compared with the adopted ZF and Wiener precoders in sparse feedback (SFB) -constrained backhaul (CBH) configuration. We consider load reduction as r fl = r bl = 0.33 and the receiver is without receive filter. The channel is time-variant with △t = 1 ms and v = 5 km/h . The achievable MSE is depicted for a wide range of edge SNRs. From the results shown in this figure, the proposed precoder outperforms adopted ZF and Wiener precoders for at least 25% from the MSE aspect. The superior performance of the proposed scheme is valid for the MF and MMSE receive filters, but due to the space limitation, the simulation results of other common filters are not shown.
By substitution of Ŵ u instead of W u in (2), the SINR for the SFB-CBH configuration is computed. In Fig. 7, the cumulative density function (CDF) of the SINR of the proposed scheme is compared with the adopted ZF and Wiener precoders in edge SNRs of 5 dB and 10 dB . The load reduction is r fl = r bl = 0.11 and channel parameters are similar to Fig. 6. From the results shown in this figure, in edge SNR of 5 dB , the proposed precoder outperforms the adopted ZF and Wiener precoders with 6.49 dB and 3.85 dB on 80% point, respectively. In edge SNR of 10 dB , the superiority of the proposed precoder on others is 6.36 dB and 4.73 dB , respectively.
To evaluate the performance of the individual users in the proposed scheme, we define the MSE difference as △MSE = MSE m − MSEḿ , where the user m experiences the best MSE and the ḿ experiences the worst one in a given channel realization. The computation of MSE at each user is described in "Appendix 1". The CDF of the MSE difference is shown in Fig. 8. Based on this result, the proposed scheme has less variance compared to the ZF precoder in SFB-CBH configuration. Although in this configuration, the CDF of MSE difference in the proposed precoder is slightly better than Fig. 7 CDF comparison of the proposed precoder with adopted ZF and Wiener precoders in terms of the users' SINR. The channel is time-variant and r fl = r bl = 0.11 , N t = 1, N r = 1 , t = 1 ms and v = 5 km/h the Wiener precoder. As expected, in the FFB-FBH configuration, the ZF precoder distributes an equal MSE to the users, hence the difference becomes zero.
In Fig. 9, the MSE of the proposed scheme is compared with selective feedback [26] and SSOCP based relative thresholding [27] precoders in a time-variant channel with edge SNR of 10 dB , N t = 1 and SFB-CBH configuration. In the selective feedback precoder, to change r fl from 0 to 60% , it is needed to change the absolute threshold level from −100 to −120 dB , while in the SSOCP precoder, the relative threshold level must be changed from 0 to 11 dB . Note that, in these precoders, only average loads can be controlled by adjusting the threshold value, while in the proposed precoder, the loads can be controlled strictly. As seen, the proposed scheme outperforms the selective feedback and SSOCP precoders for at least 30% from the MSE aspect. Based on numerical evaluations, we can conclude that the proposed scheme has better MSE performance in comparison to the 3GPP Release 15 codebook type II precoder. These numerical comparisons are omitted here due to limited space. Although, in Release 16 and 17 more advanced and effective CSI reporting is possible.

Performance analysis of the proposed scheme
In Fig. 10, performance of the proposed scheme is compared in various configurations w.r.t to feedback and backhaul load reductions. The receiver is without receive filter and the channel is time-variant. As expected, the proposed precoder has the best performance in FFB-FBH configuration and with increasing the r fl and r bl , the system performance decreases. It is worth noting, for r fl = r bl = 0.11 , the MSE increases as 40%. To evaluate the effect of backhaul load reduction alone, a sparse feedback-full backhaul (SFB-FBH) configuration is considered where r fl = 0.11 and r bl = 0 . As expected, the proposed precoder has better performance in comparison to SFB-CBH with equal r fl and slightly worse performance in comparison to FFB-FBH configuration.
In Fig. 11, the CDF of the MSE in the proposed scheme is showed for different feedback and backhaul load reduction values in SFB-CBH configuration. Edge SNR is 10 dB and the µ values in the legend show the average value. As seen, the average MSE is increased by increasing the feedback and backhaul load reductions. Figure 12 depicts the convergence behavior of the proposed scheme. A SFB-CBH configuration with r fl = 0.11 and r bl = 0.11 is assumed and the MSE is shown for different edge SNRs. As seen, the scheme converges after transmitting an acceptable number of precoded data. In the worst case, the MSE converges after 5 subframes.
Based on numerical evaluations, the SINR performance of the proposed scheme in SFB-CBH configuration is slightly decreases with increasing △t that can be considered as CSI reporting period. These numerical comparisons are omitted here due to limited space.

Performance of the proposed scheme in the CRAN network
To evaluate the performance of the proposed scheme in 5G and B5G systems, a scenario in the ultra-dense CRAN is considered, where in a square area of 400m × 400m , both users and BSs are uniformly distributed. To satisfy seamless coverage, the density of BSs is anticipated to come up to 40−50 BSs/km 2 [7,58], therefore the number of BSs and users are set to 8 and 14 to have densities of 50 BSs/km 2 and 87 Users/km 2 . Because of limitation in the maximum number of users that can be supported by each backhaul link, each user is assumed to be served with its nearest 3 BSs [7]. It is considered each BS has N t = 8, 16, 32 transmit antennas and the users are equipped with N r = 2 receive antennas. Figure 13 compares the performance of the proposed scheme in SFB-CBH ( r fl = r bl = 0.3 ) and FFB-FBH configurations. As expected, the proposed precoder has the best performance when N t = 32 and by decreasing the number of transmit antennas, the MSE increases. It is worth to note that when the  number of transmit antennas is high, the performance degradation arising from feedback load reduction is negligible.

Feedback quantization effect
In Fig. 14, performance of the proposed quantization scheme in the SFB-CBH configuration with r fl = 0.167 , N t = 2 , N r = 1 , perfect PA, and varying bit number for CDI quantization is depicted. The achievable MSE is plotted for a wide range of edge SNRs in a receiver without receive filter. We can see a performance gap between the scheme of perfect CDI and of B CDI bits quantization. However, with few bits for CDI quantization, the performance of the CSI codebook based feedback is significantly improved, and with B CDI = 8 bits, the performance loss is negligible.  The impact of the number of CDI and PA quantization bits on system performance implies that, only a small number of bits is necessary to benefit from a CSI codebook based quantization scheme in the proposed precoder. Especially, by considering the total number of bits for quantization of each link spanning from a BS to a user as B = B CDI +B PA N t , it is clear that by employing B = 6 bits quantization, a performance near to the full CSI feedback is attainable.

Conclusion
For a centralized JT-CoMP FDD downlink system, we designed and investigated the performance of a novel sparse feedback and constrained backhaul transmission scheme. To design the precoder matrix by providing feedback and backhaul load reductions, under a total power constraint and load balancing between BSs, a suboptimum two-layer optimization method was proposed. In the inner layer, SVD of the CSI matrix was utilized to design the precoder matrix fulfilling a sum-power constraint and pre-known idle links. In the outer layer, the Greedy algorithm was exploited to design the link selection matrix, providing required feedback and backhaul load reductions and load balancing between BSs. In addition, sparse feedback and constrained backhaul schemes were introduced with adopting ZF and Wiener precoders. To further reduction of the feedback load, a CSI codebook based limited feedback strategy was considered. Numerical evaluations show a performance gain in terms of MSE of the proposed scheme, when compared to adopted ZF, Wiener, selective feedback and SSOCP based relative thresholding.
(H u W i )(H u W i ) H g H u + σ 2 n g u g H u . (61) MSE =E x − α x 2 H i,j i,j W g( W ) and ∇g( W ) is O(CN U ) and O QN 2 U respectively, where Q is the number of zero elements in the matrix S . From the simulations, the SQP sub-problem converges within C < 100 iterations with no further improvement. By assuming Q ≪ N B N U overall computational complexity of solving the problem (29)

Methods/experimental
The purpose of this study was to design a downlink in a centralized JT-CoMP system with sparse feedback and constrained backhaul links. The system consists of neighboring cells and users at the common boundary, or cluster area, in the middle of the cells. The channels are assumed to be time-variant following 3GPP channel model. The throughput of the system in terms of MSE was optimized using a two-layer method including an inner SVD precoder design and an outer Greedy link selection approach. Furthermore, sparse feedback and constrained backhaul schemes based on ZF and Wiener precoders were defined and used as benchmark for the proposed scheme.