Low complexity frequency domain hybrid-ARQ chase combining for broadband MIMO CDMA systems

In this article, we investigate efficient minimum mean square error (MMSE) frequency domain equalization (FDE)-based iterative (turbo) packet combining for cyclic prefix (CP)-CDMA MIMO with Chase-type ARQ. We introduce two turbo packet combining schemes: (i) In the first scheme, namely "chip-level turbo packet combining", chip-level MMSE-FDE and packet combining are jointly performed at the chip-level. (ii) In the second scheme, namely "symbol-level turbo packet combining", chip-level MMSE-FDE and despreading are separately carried out for each transmission, then packet combining is performed at the level of the soft demapper. The key idea of the proposed schemes is to exploit the diversity among all transmissions with a very low cost by introducing new variables recursively computed. The complexity and performances are evaluated for some representative antenna configurations and load factors (i.e., number of orthogonal codes with respect to the spreading factor) to show the gains offered by the proposed techniques.


I. INTRODUCTION
Space-time (ST) multiplexing oriented multiple-input-multiple-output (MIMO) and hybridautomatic repeat request (ARQ) are two core technologies used in the emerging code division multiple access (CDMA)-based wireless packet access standards [1].In ST multiplexing architectures, independent data streams are sent over multiple antennas to increase the transmission rate [2].In hybrid-ARQ, erroneous data packets are kept in the receiver to help decode the retransmitted packet, using packet combining techniques (e.g.see [3] and references therein).
To support heterogeneous data rates in CDMA systems, multiple spreading codes can simultaneously be allocated to the same user if he requests a high data rate [4].This method is often refereed to as "multi-code transmission," and has been considered in the high speed packet access (HSPA) system [5].In MIMO CDMA systems, multi-code transmission offers a spectrum efficiency that linearly increases in the order of the number of spreading codes and transmit antennas.This is achieved by assigning the same spreading code group to all transmit antennas.However, in severe frequency selective fading wireless channels, the performance of this scheme can dramatically deteriorate due to co-antenna interference (CAI) and inter-chip interference (ICI).This results in a large delay (due to multiple transmissions) when an ARQ protocol is used in the link layer.Motivated by this limitation, we investigate efficient hybrid-ARQ receiver schemes that allow to reduce the number of ARQ rounds required to correctly decode a data packet in MIMO CDMA ARQ systems with multi-code transmission.
Recently, cyclic-prefix (CP) aided single carrier (SC) CDMA transmission with chip-level minimum mean square error (MMSE)-based frequency domain equalization (FDE) has been introduced [6].It is a transceiver scheme that allows to achieve attractive performance with affordable computational complexity cost.Turbo MMSE-FDE for CP-CDMA has then been proposed to cope with severe ICI [7].In [8], MMSE FDE has been applied to perform packet combining for multi-code CP-CDMA systems with ARQ operating over severe frequency selective fading channels.It has recently been demonstrated that ARQ presents an important source of diversity in MIMO systems [9].Interestingly, it has been shown in [9] that for both short and long-term static1 ARQ channel dynamics, multiple transmissions improve the diversity order of the corresponding MIMO ARQ channel.The case of block-fading MIMO ARQ, i.e., multiple fading blocks are observed within the same ARQ round, has been reported in [10].Information rates and turbo MMSE packet combining strategies for frequency selective fading MIMO ARQ channel have been investigated in [11].Turbo MMSE packet combining for broadband MIMO ARQ systems with co-channel interference (CCI) has recently been reported in [12] and [13] using time and frequency domain combining methods, respectively.
In this paper, we consider Chase-type ARQ with multi-code CP-CDMA MIMO transmission 2 over broadband wireless channel.We propose two iterative (turbo) packet combining schemes where, at each ARQ round, the data packet is decoded by iteratively exchanging soft information in the form of log-likelihood ratios (LLRs) between the soft-input-soft-output (SISO) packet combiner and the SISO decoder.In the first turbo packet combining scheme, we exploit the fact that both the CP chip-word and data packet are retransmitted at each ARQ round.This allows us to view each transmission as a group of virtual receive antennas, and build up a virtual MIMO channel that takes into account both multiantenna and multi-round transmission.We therefore perform combining of multiple transmissions jointly with chip-level soft MMSE FDE.This scheme is called chip-level packet combining.In the second scheme, both chip-level soft MMSE FDE and despreading are separately carried out for each transmission.Combining is then performed at the level of the soft symbol demapper.We analyze both the computational complexity and memory required by the proposed techniques, and show that they are less sensitive to the ARQ delay, i.e., maximum number of ARQ rounds.Finally, we evaluate and compare the throughput performance of the proposed schemes for some representative load factors (i.e., number of parallel codes with respect to the spreading factor) and antenna configurations.Throughout this paper, (.) ⊤ and (.) H denote the transpose and transpose conjugate of the argument, respectively.diag {x} ∈ C n×n and diag {X 1 , • • • , X m } ∈ C mn 1 ×mn 2 denote the diagonal matrix and block diagonal matrix constructed from x ∈ C n and X 1 , • • • , X m ∈ C n 1 ×n 2 , respectively.For x ∈ C T N , x f denotes the discrete Fourier transform (DFT) of x, i.e. x f = U T,N x, with U T,N = U T ⊗ I N , where I N is the N × N identity matrix, U T is a unitary T × T matrix whose (m, n)th element is √ T e −j(2πmn/T ) , j = √ −1, and ⊗ denotes the Kronecker product.The rest of this paper has the following structure.In Section II, we present the CP-CDMA MIMO ARQ transmission scheme then provide its corresponding communication model.In Section III, we derive the two iterative soft MMSE FDE-aided packet combining schemes we propose in this paper.Section IV, analyzes the complexity and memory size required by both schemes, then focuses on the comparison of their throughput performances.The paper is concluded in Section V.

A. CP-CDMA MIMO ARQ Transmission Scheme
We consider a single user multi-code CP-CDMA transmission scheme over a broadband MIMO channel with an ARQ protocol in the upper layer, where the ARQ delay is An information block is first encoded using a ρ-rate encoder, then interleaved with the aid of a semirandom interleaver Π, and spatially multiplexed over produce the coded and interleaved frame b which is serial-to-parallel converted to N T sub-streams T s denotes the length of the symbol block transmitted over each antenna (index Each sub-stream is then symbol mapped onto the elements of constellation S where |S| = 2 M .
For each antenna, the symbol block is passed through a serial-to-parallel converter and a spreading module which consists in C orthogonal codes.The same spreading matrix is used for each transmit antenna, where is a Walsh code of length N (i.e., spreading factor), and C ≤ N is the number of multiplexed codes.
The rate of this space-time code (STC) is therefore The C parallel chip-streams on each antenna are then added together to construct a block of . The chips at the output of the N T transmit antennas are arranged in where and s t,n,i denotes the symbol transmitted by antenna t at channel use (c.u) i using Walsh code w n .
Transmitted chips are independent (infinitely deep interleaving assumption), and the chip energy is to be transmitted.We consider Chase-type ARQ: When the decoding outcome is erroneous at ARQ round k, the receiver feeds back a negative acknowledgment (NACK) message, then the transmitter completely retransmits chip-matrix X ′ in the next round.
A successful decoding incurs the feed back of a positive acknowledgment (ACK) message.The transmitter then stops the transmission of the current frame and moves on to the next frame.Fig. 1 depicts the considered CP-CDMA MIMO transmission scheme with ACK/NACK.

B. Communication Model
The broadband MIMO propagation channel connecting the N T transmit and the N R receive antennas . We assume a quasi-static block fading channel, i.e., the channel is constant over an information block and independently changes from block to block.The N R × N T channel matrix characterizing the lth discrete tap at ARQ round k is denoted l , and is made of zero-mean circularly symmetric complex Gaussian random entries.The average channel energy per receive antenna is normalized as where h l .At the receiver side, after removing the CP-word at ARQ round k, a DFT is applied on received signals.This yields T c frequency domain components grouped in block which can be expressed as, where vectors group the DFTs of transmitted chips and thermal noise at round k, respectively, and n The channel frequency response (CFR) matrix Λ (k) at ARQ round k is given by

III. ITERATIVE RECEIVERS FOR CP-CDMA MIMO ARQ
In this section, we present two efficient algorithms for performing turbo packet combining for CP-CDMA MIMO ARQ systems : i) chip-level turbo packet combining, and ii) symbol-level turbo packet combining.In both schemes, signals received in multiple ARQ rounds are processed using soft MMSE FDE.Transmitted data blocks are decoded, at each ARQ round, in an iterative fashion through the exchange of soft information, in the form of LLR values, between the soft packet combiner, i.e., soft-over ARQ rounds equalizer and demapper, and SISO decoder.

A. Chip-Level Turbo Packet Combining
To exploit the diversity available in received signals y (1) f Tc−1 , we view each ARQ round k as an additional group of virtual N R receive antennas.The MIMO ARQ system can therefore be considered as a point-to-point MIMO link with N T transmit and kN R receive antennas, where the T c kN R × 1 chip-level virtual received signal vector y (k)  f is constructed as, The frequency domain communication model after k rounds is then given as, where and Soft ICI cancellation and frequency domain MMSE filtering are jointly performed over all ARQ rounds.We call this concept "chip-level turbo packet combining".This requires a huge computational cost since the complexity of computing MMSE filters is cubic in the order of the ARQ delay.In addition, the required receiver memory size linearly scales with the ARQ delay because all CFRs Tc−1 are required at round k [14].In the following, we introduce an efficient turbo MMSE implementation algorithm for chip-level combining where both receiver complexity and memory requirements are quite insensitive to the ARQ delay.Let x and σ 2 t,i denote the conditional mean and variance of x and x t,i , respectively.Soft MMSE processing can be written in a compact forward-backward filtering structure as in [15].By using the matrix inversion lemma [16], we can express soft MMSE chip-level packet combining at round k as, where C TcN T ×TcN T denote the forward and backward filters at round k, respectively, and are given by, Ξ is the N T × N T unconditional covariance of transmitted chips, and is computed as the time average of conditional covariance matrices according to the following recursions, Note that recursions (20) and ( 21) present an important ingredient in the proposed chip-level combining algorithm since both complexity and memory requirements become less sensitive to the ARQ delay.
These issues are discussed in detail in Section IV.The inverse DFT (IDFT) is then applied to z (k) f to obtain the equalized time domain chip sequence.After despreading, extrinsic LLR value φ (e) t,j,m corresponding to coded and interleaved bit b t,j,m ∀ t, j, m is computed as, where , with r t,j , and θ are the despreading module output, gain, and residual interference variance, respectively.φ (a) t,j,m ′ denotes a-priori LLR value corresponding to b t,j,m ′ .λ m ′ {s} is an operator that allows to extract the m ′ th bit labeling symbol s ∈ S, and S m β is the set of symbols where the mth bit is equal to β, i.e. S m β = {s : λ m {s} = β}.The obtained extrinsic LLR values are de-interleaved and fed to the SISO decoder.The proposed low complexity algorithm is summarized in Table I.

B. Symbol-Level Turbo Packet Combining
In this combining scheme, the receiver performs chip-level space-time frequency domain equalization separately for each ARQ round, then combines multiple transmissions at the level of the soft demapper.At each iteration of ARQ round k, soft ICI cancellation and MMSE filtering are performed similarly to (17) using communication model (9).Extrinsic information is computed using despreading module outputs corresponding to all ARQ rounds.This requires the inversion of the k × k covariance matrix of residual interference plus noise.By observing that despreading module outputs obtained at different transmissions are independent, extrinsic LLR value φ (e) t,j,m corresponding to coded and interleaved bit b t,j,m can be expressed as, t,j (s) is recursively computed according to the following recursion, t,j (s) = 0. (24) Note that this recursive implementation relaxes both the complexity and memory requirements.The proposed low complexity algorithm is summarized in Table II.

A. Complexity Evaluation
In this subsection, we briefly analyze both the computational cost and memory requirements of the proposed packet combining schemes.First, note that both algorithms have identical implementations.
The only difference comes from steps approximately have the same implementation cost.In the following, we focus on the number of arithmetic additions and memory required to perform recursions (20), (21), and (24).
The main idea in the proposed algorithms is to exploit the diversity available in multiple transmissions without explicitly storing required soft channel outputs (i.e., signals and CFRs) or decisions (i.e., filter outputs), corresponding to all ARQ rounds.This is performed with the aid of recursions (20), (21), and (24), and translates into a memory requirement of 2T c N T (N T + 1) and T s N T 2 M real values for chip-level and symbol-level turbo combining, respectively.Note that in both schemes, the required memory size is insensitive to the ARQ delay.The number of rounds only influences the number of arithmetic additions required in the update procedures corresponding to recursions (20), (21), and (24).At each ARQ round, the chip-level turbo combining algorithm involves i .The symbol-level turbo combining scheme requires T s N T N iter 2 M arithmetic additions to update ξ (k) t,j (s) at each round, where N iter denotes the number of turbo iterations.
Table III summarizes the maximum number of arithmetic additions and memory size required by both schemes.Note that the number of additions does not have a great impact on receiver computational complexity.The required memory size is the major implementation constraint to take into account when choosing between chip-level and symbol-level combining.In the case of low-order modulations (i.e., M ≤ 2), symbol-level has less memory requirements than chip-level combining independently of the spreading factor N, number of codes C, and number of transmit antennas N T .
For high-order modulations, (i.e., M ≥ 3), the required memory size mainly depends on system parameters.For instance, when M = 4, N T = 4, and the system is fully loaded, (i.e., N = C), chip-level combining offers less memory requirements than symbol-level combining.When the load factor is reduced to 50%, (i.e., C N = 1 2 ), symbol-level becomes more attractive than chip-level.

B. Performance Evaluation
In this subsection, we evaluate the throughput performance of the proposed CP-CDMA MIMO ARQ turbo combining schemes.Following [17], we define the throughput as where R is a random variable (RV) that takes R when the packet is correctly received or zero when the packet is erroneous after K ARQ rounds.K is a RV that denotes the number of rounds used for transmitting one data packet.We use Monte Carlo simulations for evaluating η.
The E c /N 0 ratio appearing in all figures is the signal to noise ratio (SNR) per chip per receive antenna.We use Max-Log-maximum a posteriori (MAP) for SISO decoding.The number of turbo iterations is set to three.In all scenarios, we consider the matched filter bound (MFB) throughput performance of the corresponding CP-CDMA MIMO ARQ channel to evaluate the ICI cancellation capability achieved by the proposed techniques.
In Fig. 2, we report throughput performance curves for a balanced MIMO configuration, i.e., N R = N T = 2.We observe that both combining schemes have similar throughput performance for quarter and half loads.In the case of full load, chip-level combining outperforms symbol-level combining in the region of low SNR.For instance, the performance gap is around 0.6dB at η = 12.5bit/s/Hz throughput.Also, note that for all configurations, the slopes of the throughput curves of both techniques are asymptotically similar to that of the MFB.Therefore, both combining schemes asymptotically achieve the diversity order of the corresponding CP-CDMA MIMO ARQ channel.
In Fig. 3, we provide throughput curves when only one receive antenna (N R = 1) is used, i.e., unbalanced MIMO configuration.In this scenario, chip-level combining clearly outperforms symbollevel combining for half and full loads.The performance gap is about 3dB at η = 12.5bit/s/Hz for a full load configuration.This suggests that chip-level turbo combining can be used for high speed downlink CDMA MIMO systems with high loads.Note that, both techniques fail to achieve the full diversity order in the case of half and full loads.

V. CONCLUSIONS
In this paper, efficient turbo receiver schemes for multi-code CP-CDMA transmission with ARQ operating over broadband MIMO channel were investigated.Two packet combining algorithms were introduced.The chip-level technique performs packet combining jointly with chip-level MMSE FDE.
The symbol-level scheme combines multiple transmissions at the level of the soft demapper.We analyzed the complexity and memory size required by both techniques, and showed that, from an implementation point of view, chip-level is more attractive than symbol-level combining for systems with high modulation order and load factor (number of codes with respect to the spreading factor).
We also investigated the throughput performance.Simulations demonstrated that both techniques