Low-Complexity QL-QR Decomposition Based Beamforming Design for Two-Way MIMO Relay Networks

In this paper, we investigate the optimization problem of joint source and relay beamforming matrices for a twoway amplify-and-forward (AF) multi-input multi-output (MIMO) relay system. The system consisting of two source nodes and two relay nodes is considered and the linear minimum meansquare- error (MMSE) is employed at both receivers. We assume individual relay power constraints and study an important design problem, a so-called determinant maximization (DM) problem. Since this DM problem is nonconvex, we consider an efficient iterative algorithm by using an MSE balancing result to obtain at least a locally optimal solution. The proposed algorithm is developed based on QL, QR and Choleskey decompositions which differ in the complexity and performance. Analytical and simulation results show that the proposed algorithm can significantly reduce computational complexity compared with their existing two-way relay systems and have equivalent bit-error-rate (BER) performance to the singular value decomposition (SVD) based on a regular block diagonal (RBD) scheme.


I. INTRODUCTION
Recently, wireless relay networks have been the focus of a lot of research because the relaying transmission is a promising technique which can be applied to extend the coverage or increase the system capacity. There are various cooperative relaying schemes have been proposed, such as amplify-and-forward (AF) [1] and [2], decode-and-forward (DF) [3], denoise-and-forward (DNF) [4], and compress-andforward (CF) [5] cooperative relaying protocols. Among these approaches, AF is most widely used due to without detecting the transmitted signal. Therefore, an AF relay scheme requires a less processing power at the relays compared to other schemes.
In one-way relaying (OWR) approach, to completely exchange information between two base stations, four time slots are required in uplink (UL) and downlink (DL) communications, which leads to a loss of one-half spectral resources [6]. In order to solve this problem, a two-way relaying approach has been considered in [7], [8], and [9]. In a typical twoway relaying scheme, the communication is completed in two steps. First, the transmitters send their symbols to two relays, Wei  simultaneously. After receiving the signals, each relay processes them based on an efficient relaying scheme to produce new signals. Then the processed signals are broadcasted to both receiver nodes.
Multi-input multi-output (MIMO) relay systems have been investigated in [10]- [13]. It is shown that, by employing multiple antennas at the transmitter and/or the receiver, one can significantly improve the transmission reliability by leveraging spatial diversity. Relay precoder design methods have been investigated in [14], [15], [16]. A problem in designing optimal beamforming vectors for multicasting is challenging due to its nonconvex nature. In [14], the authors propose a transceiver precoding scheme at the relay node by using zero-forcing (ZF) and MMSE criteria with certain antenna configurations. The information theoretic capacity of the multi-antenna multicasting is studied in [15], along with the achievable rates using lower complexity transmission schemes, as the number of antennas or users goes to infinity. In [16], the authors propose an alternative method to characterize the capacity region of two-way relay channel (TWRC) by applying the idea of rate profile.
Joint optimization of the relay and source nodes for the MIMO TWRC have been studied in [9], [17]. In [9], the authors develop a unified framework for optimizing two-way linear non-regenerative MIMO relay systems and show that the optimal relay and source matrices have a general beamforming structure. The joint source node and relay precoding design for minimizing the mean squared error in a MIMO two-way relay (TWR) system is studied in [17].
Since singular value decomposition (SVD) and/or generalized SVD (GSVD) are widely used to find the orthogonal complement to solve an optimization problem [2], [9], [16], [29], but their computational complexities are extremely high. In order to reduce the complexity, the SVD can be replaced with a less complex QR decomposition [18] in this work. However, this approach leads to degrading the BER performance. In addition, it is difficult to realize in TWRC. In this paper, we investigate the joint source and relay precoding matrix optimization for a two way-relay amplify-and-forward relaying system where two source nodes and two relay nodes are equipped with multiple antennas. Also, in order to apply the QL/QR decomposition to the TWRC, we design a three part relay filter. Compared with existing works such as [9]- [14], the contributions of this paper can be summarized as follows. Firstly, we investigate a two-way MIMO relay system using the criteria which minimizes an MSE of the signal waveform estimation for both two source nodes. We prove an optimal sum-MSE solution can be obtained as the Winer filter while signal-to-noise-ratio (SNR) at both source nodes are equivalent [20] which leading to an MSE balancing result. Secondly, we propose a new cooperative scenario, i.e., the QL-QR compare with the Choleskey decomposition which significantly reduces the computational complexity of the optimal design. In this proposed design, the channels of its left side are decomposed by the QL decomposition while those of its right side factorized by the QR decomposition. And the equivalent noise covariance is decomposed by the Choleskey decomposition. We also design the three part relay filter, which is comprised by a left filer, a middle filter, and a right filter, to efficiently combine two source nodes and the relay nodes. By these approaches, the received signals at both two source nodes are able to be redeemed as either lower or upper triangular matrices. Stemming from one of the properties of triangular matrices such that their determinant is identical to the multiplication of their eigenvalues, we are able to straightforwardly solve the optimization problem as a determinant maximization problem. Also, we can obtain the BER performance equivalent to that of SVD-RBD scheme.
The rest of this paper is organized as follows. Section II describes a system model of the TWRC and raises a sum-MSE problem. In Section III, we propose an iterative QL-QR algorithm and a joint optimal beamforming design. In Sections IV, we discuss the computational complexity of an efficient channel model. The simulation results are presented to show the excellent performance of our proposed algorithm for the TWRC in Section V. Section VI concludes this paper.
Notations: A * and A T denote the conjugate and the transpose of a matrix A, respectively. d(.) denotes a diagonal matrix and an N × N identity matrix is denoted by I N . E(.) stands for the statistical expectation and (.) H denotes a Hermitian transpose of a given matrix. tr(.) and R(.) denotes matrix trace and the range of a matrix.

II. SYSTEM MODEL AND SUM-MSE
We consider a TWRC consisting of two source nodes S 1 and S 2 , and two relay nodes R 1 and R 2 as shown in Fig.  1. The source and relay nodes are equipped with M and N antennas, respectively. We adopt the relay protocol with two time slots introduced in [14]. In the first time slot, the information vector x i ∈ C M×1 is linearly processed by a precoding matrix V i ∈ C M×M and then be transmitted to the relay nodes. The received signals at R i , i ∈ {1, 2}, can be expressed as where y Ri ∈ C N ×1 , i ∈ {1, 2}, indicates the received signal vector, H i,j ∈ C N ×M , i, j ∈ {1, 2}, represents the channel matrix from source j to relay i, as shown in Fig.1, s i ∈ C M×1 is the transmitted symbol vector from S i , and n Ri ∼ CN (0, σ 2 Ri I N ) represents the additive white Gaussian noise (AWGN) vector with zero mean and variance σ 2 Ri at relay node i. The term s i is subject to a power constraint, To find an appropriate power normalization vector ρ R , we express the total transmission power at the source node with V i as In this paper, we assume that each transmit antenna satisfies the unity transmission power constraint. To satisfy the power constraint, we propose the following power normalization vector In the second time slot, after power normalization, the relay node R i linearly amplifies y Ri with an N × N matrix F i and then broadcasts the amplified signal vector x Ri to source nodes 1 and 2. The signals transmitted from relay node i can be expressed as Using (1) and (4), the received signal vectors at S 1 and S 2 can be, respectively, written as where H T i,j , i, j ∈ {1, 2}, indicates the M × N channel matrix from the relay node i to the source node j, and n i , i ∈ {1, 2}, is an M × 1 noise vector at S i . We assume that the relay nodes perfectly know the channel state information (CSI) of H i,j . The relay node R i performs the optimizations of F i and V i , and then transmits the information to the source nodes 1 and 2. Since source node i knows its own transmitted signal vector s i and full CSI, the self-interference components in (5) can be efficiently cancelled. The effective received signal vectors are given by where are the equivalent MIMO channels seen at source nodes S 1 and S 2 , respectively. The vectors are the equivalent noises at source node S 1 and S 2 , respectively.
Due to the lower computational complexity, linear receivers are applied at source node i to retrieve the transmitted signals sent from the other nodes. The estimated signal waveform vector is given as , which can be further written as The sum-MSE of the two source nodes in the proposed system model can be written as: Note that the sum-MSE minimization criterion measures the overall transmission performance of both the DL and the UL. Since the two data streams are transmitted at different directions during the two time slots are considered in the TWR network.

III. JOINT SOURCE AND RELAY BEAMFORMING DESIGN
In this section, we develop an iterative QL-QR algorithm by using the MSE balancing result. The QL-QR algorithm involves two steps, i.e., the linear receiver matrix optimization and the joint source and relay beamformer design.

A. Proposed Optimal Detector and Optimization Problem
We would like to find the jointly optimal beamforming vectors W i , V i and F i such as the following sum-MSE is minimized According to (4), we consider the following transmission power constraint at relay node where . The P Ri denotes the power constraint at the relay node R i , and P R is the total relay power. The transmission power constraint at two source nodes can be written as where P i is the available power at the ith source node. According to (10), (11) and (12), the joint optimization problem of the sum-MSE can be formulated as follows: It is shown in [20] that at the optimum, SN R 1 = SN R 2 holds true, thus leading to an SNR balancing result. Otherwise, if SN R 1 > SN R 2 , then P 2 can be reduced to retain SN R 1 = SN R 2 , and this reduction of P 2 will not violate the power constraint, i.e., In Fig. 2, we show two examples of the SNR regions with α 1 = 0.5 and α 2 = 0.3, where ω i ∈ [0, 1] is a Lagrange multiplier weight value and α i ∈ [0, 1] is an SNR weight value. We have assumed the sum of SNR is a constant value. It is clear that the SNR region of α 1 is larger than that of α 2 . For further details, see [20]. As discussed in [21], the optimization problems have the performance matrix that are functions of SNR , namely the MSE at the output of a linear-MMSE (LMMSE) filter of each user By these two approaches, the max-min optimization problem in (13) can be efficiently written as where i ∈ 1, 2. Since the optimization problem (16) is nonconvex, it is difficult to obtain the globally optimal solution. In this paper, we present a locally optimal solution of the joint optimization problem over W i , V i and F i where i = 1, 2, which can be solved by three stages, i.e., 1 : The linear receiver weighted matrices are optimized with the fixed source precoding matrix V i and relay amplifying matrices F i (W i is not in constraints (17) and (18)). 2 : With given W i and fixed F i , update V i . 3 : With given W i and V i , obtain suboptimal F i to solve (16).
Lemma 1 : For any fixed V i and F i , the minimization problems in (16) are convex quadratic problems and the optimal W i can be obtained as the Wiener filter which is used to decode s i shown as follows Proof : For source node i, the MSE can be further expressed as: Based on (20), the derivation of an optimal MSE detection matrix W opt i is equivalent to solving the following equation: Then, we may obtain closed-form solution of W i , which is This completes the proof. With the optimal W o 1 fixed, the outer minimization problem in (16) can be rewritten as where MSE o 1 is the MSE matrix using W o 1 . By substituting (19) into (8), we have Note that the matrix inversion lemma is used to obtain (24).

B. Joint Optimal Source and Relay Beamforming Matrices Design and Iterative Algorithm
In this section, we focus on the source and relay beamforming matrices design and develop an iterative algorithm which is suboptimal for the general case, but has a much lower computational complexity. For the fixed F i , the source precoding matrix V i is optimized by solving the following problem min V1,V2 The Lagrangian function associated with the problem (25) is given by Case 1: When µ i = 0, making the derivative of L V with respect to V 2 be zero, we obtain Since V 2 and Φ are nonsingular matrices, (27) can be represented as Simplifying (28), I M > 0 and V H 2 ΦV 2 ≥ 0. Consequently, in Case 1, the optimal solution is not existent.

Case 2:
When µ i > 0, we rewrite the Lagrangian function as Since V H 2 and Φ are nonsingular matrices, multiply both sides by V H 2 −1 and Φ −1 , we have Due to Φ is Hermitian and positive definite, we apply the Choleskey decomposition of Φ = Ω H Ω, where Ω is a lower triangular matrix. Consequently, we represent (31) as By the definition of the matrix identity as for any M × N matrix X, we can rewrite (32) as Solving (35) for V 2 , we obtain (32) as where ∇ = Υ H Ω − 1 2 . Obviously, the precoding matrix V 1 can be obtained in the same way.  to S 1 and S 2 (output). 1 Lemma 2: The optimal relay filter constructive of F L , F R and F D matrices i.e., for R 1 and R 2 can be designed as Proof : For F L,i and F R,i , the proof is similar as Theorem 3.1 in [22]. For F D,i , using Theorem 2 in [9], the structure of F D,i is optimal for the two cases of (a) : (37) Case a: If N = 2, the optimal F D,i is a diagonal matrix given as If N = 2a, a = 2, 3, ..., the optimal F D,i is a 2 × 2 block diagonal matrix given as where F D,i,1 and F D,i,2 are N 2 × N 2 matrices. Case b: The optimal F D,i is defined as Discussion: In Case b, since F ⋆ D,i is optimal, but the computational complexity will be considerably increased compared with Case a, so we exclude it.
For Case a, Before we develop a numerical method to solve vector F D,i , let us have some insights into the structure of this suboptimal relay beamforming matrix. To simplify relay beamforming matrix F D,i , we introduce a following property: Property 1: The statistical behavior of a unitary matrix U remains unchanged when multiplied by any unitary matrix T independent of U. In other worlds, TU has the same 1 distribution as U, i.e., in (36), Now, let us introduce the following QL decompositions where Q L,i for i = 1, 2, is a unitary matrix with a dimension C N ×N , and {L 1 , L 2 } ∈ C N ×M are lower triangular matrices. Similarly, let us introduce another decomposition, namely, QR decomposition as where Q R,i ∈ C N ×M for i = 1, 2, is a unitary matrix, and {R 1 , R 2 } ∈ C M×M are upper triangular matrices. Substituting (42), (43), and (36) back into (6), i.e,. for S 1 , we may get equivalent received signals shown as, where H 1 = L T 1 F D,1 R 1 + L T 2 F D,2 R 2 and n 1 = L T 1 F D,1 n R1 +L T 2 F D,2 n R2 +n 2 are efficient channel and noise coefficients, obtained from the covariance of n 1 , we have For fixed V 1 and V 2 , using (44) and Property 1, the optimal problem (23) becomes where the lemma tr(A + B) = tr(A) + tr(B) has been used.
Since the matrix C 1 is Hermitian and positive definite, we can decompose this matrix using Cholesky factorization as where Ξ 1 denote a lower triangular matrix. By substituting (49) back into (48), we can simply rewrite the optimal problem as where a = denotes n has nothing to do with the maximum solution and B 1 = H H 1 Ξ −1 1 . Thus, the optimal problem can be represented as the determinant maximization of |B 1 | 2 .
In Case a, since F D,i is the block diagonal matrix, its determinant can be written as Let A, B, C, and D be an N 4 × N 4 matrix. We can define detF D,i,i , for i ∈ 1, 2 as, where I stands for an N 4 × N 4 identity matrix. In (52), to obtain maximum detF D,i,i , we should minimize BA −1 D. Let us introduce the SVD of B, A, and D as where b i , d i , and a i are the diagonal elements of Σ B , Σ D and Σ A , respectively. To simplify our discussion, we assume F D,i,i is a semi-positive matrix, thus, we have the minimum solution as b i d i = 0. Interestingly, if both b i and d i are 0, F D,i,i is a diagonal matrix. Otherwise, it is a lower/upper triangular matrix. In addition, for S 1 , the equivalent channel H 1 , since the terms L T 1 , L T 2 , R 1 , and R 2 are upper triangular matrices, the optimal F D,i should be an upper triangular matrix. Since the equivalent channel H 2 , L 1 , L 2 , R T 1 , and R T 2 are lower triangular matrices for S 2 , the optimal F D,i is a lower triangular matrix. Therefore, if and only if F D,i is a diagonal matrix, the sum-MSE is optimal in our proposed method. This completes the proof for Lemma 2.
Consequently, we have where F D,1 , R 1 , L 2 , F D,2 , and R 2 , respectively. Since Ξ 1 and Ξ −1 1 are also lower triangular matrices, we have where ξ i is the diagonal element of Ξ −1 1 . Now, the optimization problem can be reformulated as max FD,1,FD,2 is a convex problem for beamformer vectors F D,1 and F D,2 , which can be efficiently solved by the interior-point method [25].
In summary, we outline the iterative beamforming design algorithm as follows (QL − QR Algorithm): i,j , for i, j = 1, 2, set n = 0; 2. Repeat: 1: for n ← n + 1 do 2: for given F  Since in the QL-QR algorithm, the solution of each subproblem is optimal, we conclude that the total MSE value is decreased as the number of iterations increases. Meanwhile, the total MSE is lower bounded.
Discussion: The extended two system models are shown in Fig. 4, which are the multipair scenario with two relay nodes and K pair source nodes, and the Z (Z should be even number) relay nodes scenario with two source nodes. In Figure  4(a), each pair sources and two relay nodes can be seen as a group. Since each pair source nodes are independent with each other, we can design that there are K RFs , LFs and one CF equipped at each relay nodes. Therefor, the extended system model (a) can be seen as K parallel of our proposed system model. In Figure 4(b), two source nodes and every two relay nodes can be seen as one group. Obviously, the extended system model (b) can be seen as Z 2 parallel of our proposed system model.

IV. COMPUTATIONAL COMPLEXITY ANALYSIS
In this section, we measure the performance of the proposed QL-QR scheme in terms of the computational complexity compared with existing algorithms by using the total number of floating point operations (FLOPs). A flop is defined as a real floating operation, i.e., a real addition, multiplication, division, and so on. In [26], the authors show the computational complexity of the real Choleskey decomposition. For complex numbers, a multiplication followed by an addition needs 8 FLOPs, which leads to 4 times its real computation. According to [27], the required number of FLOPs of each matrix is described as follows: 1. Multiplication of m × n and n × p complex matrices: 8mnp − 2mp; 2. Multiplication of m × n and n × m complex matrices: 4nm × (m + 1); 3. SVD of an m × n(m ≤ n) complex matrix where only Σ is obtained: 32(mn 2 − n 3 /3); 4. SVD of an m × n(m ≤ n) complex matrix where only Σ and Λ are obtained: 32(nm 2 + 2m 3 ), 5. SVD of an m × n(m ≤ n) complex matrix where U ,Σ, and Λ are obtained: 8(4n 2 m + 8nm 2 + 9m 3 ); 6. Inversion of an m × m real matrix using Gauss-Jordan elimination: 2m 3 − 2m 2 + m; 7. Cholesky factorization of an m × m complex matrix: 8m 3 /3. 8. QR or QL decomposition of an m × n conplex matrix 16 n 2 m − nm 2 + 1 3 m 3 .
For the conventional RBD method [28], the authors consider a linear MU-MIMO precoding scheme for DL MIMO systems. For the non-regenerative MIMO relay systems [29], the authors investigate a precoding design for a 3-node MIMO relay network. In [2], a relay-aided system based on a quasi-EVD channel is proposed. We compare the required number of FOLPs of our proposed method with conventional precoding algorithm, such as the conventional RBD, the non-regenerative MIMO relay system, and the CD-BD algorithm as shown in Tables I, II, III, and IV, respectively, under the assumption that For instance, the (2, 2, 2) × 6 case denotes a system with three users (K = 3), where each user is equipped with two antennas (N i = 2) and the total number of transmit antennas is six (N T = 2 × 3 = 6). The required number FLOPs of the QL-QR algorithm, the conventional RBD, the non-regenerative  MIMO relay system, and the CD-BD algorithm are counted as 33530, 40824, 45306, 34638, respectively. From these results, we can see that the reduction in the number of FLOPs of our proposed precoding method is 17.87%, 25.99%, and 3.20% on an individual basis compared to the conventional RBD, the non-regenerative MIMO relay systems, and the CD-BD algorithm. Thus, our proposed QL-QR algorithm exhibits lower complexity than conventional algorithms. In addition, the complexity reduces as N i and N T increase with fixed K. We summarize our calculation results of the required number of FLOPs of the alternative methods in Tables I, II, III, and IV and show them in Figures 5 and 6. Figure 5 shows the computational complexity where N i = 2 and a value of K varies. And Figure 6 shows the computational 8 complexity where K = 4 and a value of N i varies. For the conventional RBD method, the orthogonal complementary vector V k,0 requires K times SVD operations. If only V k,0 is obtained, it is not computationally efficient. In step 5, the efficient channel H ef f = H i P a i is decomposed by the SVD with a dimension R ef f ×N T , where R ef f is the rank of H ef f . In the nonregenerative MIMO relay method and the CD-BD algorithm, two SVD operations are performed for the channels from the source to relay and from relay to the destination, and then the efficient channel covariance matrix is measured. In the nonregenerative MIMO relay method, the authors compute A using the EVD an then they diagonalize G. In the CD-BD algorithm, the authors calculate V a i by the SVD of H † mse and then they structure V b i by using the Choleskey decomposition. In our proposed QL-QR algorithm, we take advantage of QL and QR decompositions instead of the SVD operation, and then we compute an efficient channel as well as decompose a noise covariance matrix by the Choleskey decomposition. Finally, we calculate the determinant of B 2 i to solve an optimization problem. Obviously, our proposed QL-QR algorithm outperforms conventional algorithms in the light of the computational complexity.

V. SIMULATION RESULTS
In this section, we study the performance of the proposed QL-QR algorithm for two-way MIMO relay networks. All the simulations are performed on the assumption that all the channels are the Rayleigh fading channel and they are independently generated following ∼ CN (0, 1). The noise variances σ 2 i are equally given as σ 2 . The total relay power constraint can be written as where a ∈ [0, 1] is an auxiliary value as well as a power allocation coefficient between two relay nodes. All the simulation results are averaged over 1000 channel trials. In Fig. 7, we compare the sum mutual information (SMI) of various MU-MIMO schemes where full CSI is known at each node. We set P 1 = P 2 = 10 dB, M = 1, and an equal power budget for the two relays (a = 0.5 is assumed). The negative SMI is adopted in [16] which can be defined as In our proposed method, the SMI shown in the simulation results is calculated as −2 log 2 |B 2 i | by using (49), (56), and (57). It can be observed that the proposed QL-QR algorithm has the same SMI performance as an optimal solution in [16]. Figure 8 shows the performance of our proposed SMI performance versus the number of the relays, T which is even. We consider a practical scenario with different relay power constraints. We set P R = 30 dB and a = 0.5. It is clear that, for different values of P 1 and P 2 , a solution of our proposed QL-QR algorithm shows better performance than a max-power solution. Figure 9 exhibits the BER performance of the BD water filling, the RBD, the SVD-RBD, and our proposed QL-QR method, where the QPSK modulation is made use of. As   pointed out in [31], the BER performance for a MIMO precoding system is actually determined by the energy of the transmitted signal. To simplify our discussion, we assume a = 0. In the RBD, det HH where H ∈ C N ×M , for M < N , is an equivalent channel matrix with its eigenvalues λ i . In our proposed QL-QR method, for source node S 1 , we have det Under the stipulaton that detF D,i = 1, we are able to easily obtain λ i = ς 1 . Therefore, our proposed QL-QR method has the same BER performance as the SVD-RBD method.

VI. CONCLUSIONS
This paper studies joint optimization problem of an AF based on the MIMO TWRC, where two source nodes exchange Step Operations FLOPS Case: (2, 2, 2) × 6  [29].
Step Operations FLOPS Case:  Step Operations FLOPS Case: (2, 2, 2) × 6 1 their messages with two relay nodes. A relay filter has been designed, which is able to efficiently join the source and the relay nodes. Our main contribution is that the optimal beamforming vectors can efficiently be computed using determinant maximization techniques through an iterative QL-QR algorithm based on a MSE balancing method. Our proposed QL-QR algorithm can significantly reduce the computational complexity and has the equivalent BER performance to the SVD-BD algorithm.