System value-based optimum spreading sequence selection for high-speed downlink packet access (HSDPA) MIMO

This article proposes the use of system value-based optimization with a symbol-level minimum mean square error equalizer and a successive interference cancellation which achieves a system value upper bound (UB) close to the Gaussian UB for the high-speed downlink packet access system without affecting any significant computational cost. It is shown that by removing multi-code channels with low gains, the available energy is more efficiently used, and a higher system throughput is observed close to the system value UB. The performance of this developed method will be comparable to the orthogonal frequency division multiplexing-based long-term evolution scheme, without the need to build any additional infrastructure. Hence, reduce the cost of the system to both operators and consumers without sacrificing quality.


Introduction
Wireless communication systems known as multipleinput multiple-output (MIMO) systems, which have multiple transmit and receive antennas, can be used to exploit the diversity and the multiplexing gains of wireless channels to increase their spectral efficiency. As an extension to Shannon's capacity [1], the MIMO channel capacity bound was obtained by Foschini and Gans [2] and Telatar [3] independently. Assuming that perfect channel state information (CSI) is available at the transmitter, the MIMO system capacity upper bound (UB) can be obtained using the eigen modes of the MIMO channel matrix by performing water-filling (WF) over the spatial sub-channels. An important MIMO system design consideration is to operate the system close to its capacity UB. The objective of this article is to show how the highspeed downlink packet access (HSDPA) MIMO system can operate close to its capacity UB.
The third generation partnership project (3GPP) has developed the HSDPA system, given in the Release 5 *Correspondence: m.gurcan@imperial.ac.uk Intelligent Systems and Networks Group, Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, UK specification [4] of the Universal Mobile Telecommunications System, as a multi-code wide-band code division multiple access (CDMA) system. To further increase the data rate, the HSDPA system introduced new features [5] such as adaptive modulation and coding and fast scheduling. The standardization of the Dual Stream Transmit Diversity (D-TxAA) HSDPA MIMO system for a singleuser in 3GPP Release 7 [6] further improved the downlink throughput without requiring a new spectrum or any additional bandwidth.
In [7], measurements are carried out to evaluate the performance of the standardized 3G HSDPA MIMO system with a CDMA transmission. It is shown that the current systems are utilizing only about 40% of the available downlink capacity. The capacity curve is approximately 10 dB away from the capacity UB [8] at high signal to noise ratios. There is an opportunity to improve the HSDPA system capacity, when operating over frequency selective channels, by enhancing the HSDPA MIMO standard of the equal energy allocation scheme as is specified in [6].
The frequency selectivity problem, which causes a large drop in throughput for the HSDPA due the inter-symbol interference (ISI) problem, is not a major problem for http://jwcn.eurasipjournals.com/content/2013/1/74 the orthogonal frequency division multiplexing (OFDM)based systems [long-term evolution (LTE) advanced and WiMAX] as they use a guard period to deal with the ISI problem. If the throughput reduction problem is not solved in the HSDPA system, the OFDM-based systems will have the upper hand over the HSDPA system in urban environments. The HSDPA single-input singleoutput (SISO) system has been the main focus of the study in [9], which provides tools to combat frequency selectivity, when bringing the HSDPA SISO performance close to the OFDM-based systems. Should the ISI problem be solved for the HSDPA MIMO-based systems, the current HSDPA MIMO system would achieve throughputs close to the LTE advanced without the need to change the whole infrastructure by using throughput optimization methods. This is the focus of the current investigation.

Current investigation and related work
The downlink throughput optimization for the HSDPA multi-code CDMA system [10] considers the signature sequence and the power allocation for downlink users. 3GPP standardized an approach to spread the transmission symbols by using a given fixed set size of orthogonal variable spreading factor (OVSF) signature sequences. A MIMO system requires a signature sequence set size higher than the given single set of OVSF signature sequences available for a SISO system. 3GPP standardized a method to increase the OVSF set size by multiplying the given set with precoding weights and then concatenating the weighted sets of the spreading sequences. Each concatenated spreading sequence is used to transmit one symbol and is orthogonal to the remaining set of spreading sequences available at the transmitter for the transmission of other symbols. However, the spreading sequences' orthogonality is lost at the receiving end after transmission over frequency selective multipath channels. In [11,12], it is proposed that a linear minimum mean square error (MMSE) equalizer followed by a de-spreader could be used to restore partial orthogonality between the receiver de-spreading and the matched filter sequences in the detection process after receiving signals transmitted over a multipath channel. Recent developments have shown that linear MMSE equalizers suffer from a self-interference (SI) problem caused by ISI and multiple access interference, when operating over multipath channels. SI reduces the system throughput performance, but good receiver design will minimize the degradation caused by the SI. When encountering SI various versions of interference cancellers could be used in conjunction with non-optimal receivers to improve the system throughput for the HSDPA system over frequency selective parallel channels. In [13], it is shown that a successive interference cancellation (SIC) scheme performs better than a parallel interference cancellation scheme, when the signal-to-noise ratio (SNR) differs over each frequency selective parallel channel. The works reported in [14][15][16] focus on the use of linear MMSE equalizers and SIC in reduction of the overall SI.
A two-stage SIC detection scheme with transmitter power optimization is examined in [16,17] to improve the throughput performance for multi-code downlink transmission. In [18], the power at the transmitter and a twostage SIC receiver are jointly and iteratively optimized for a multi-code MIMO system. However, at each iteration of the SIC, the equalizer coefficient and the power allocation calculations require an inversion of a covariance matrix for the received signal. The dimension of the covariance matrix is usually large and, as such, the iterative power allocation, the linear MMSE equalizer and the SIC implementations at the receiver become computationally expensive.
The focus of the article is on an HSDPA MIMO-based radio downlink system, which has a number of parallel SISO or MIMO frequency selective channels over which data are transmitted. The data are represented by a number of data symbols, which are spread by a group of spreading sequences when using the HSDPA system either with or without a SIC scheme. A set of signature sequences generated from the OVSF codes with precoding, as specified in the 3GPP Release 7, will be considered. A receiver with a symbol-level linear MMSE equalizer will be examined to jointly optimize the transmission energy allocation and the receiver for a single user system either with or without a SIC.
At the receiver each spreading sequence s k has a system value λ k , which is associated with the SNR γ k at the output of each de-spreading unit. The system value λ k for each spreading sequence depends on the transmission multipath channel and also on the availability of the SIC scheme. The implementation of each HSDPA MIMO system can be a non-SIC scheme as shown in Figure 1 or a SIC-based receiver scheme as shown in Figure 2. The non-SIC and the SIC-based receivers have different ways of determining the system values for a given number of spreading sequences, a given frequency selective multipath channel and a total transmission energy. This article will outline MIMO transceiver structures for the non-SIC and SIC-based receivers to introduce the system value concept. The system value UB for the HSDPA MIMO system will be presented. This system value capacity UB is close to the capacity of an additive white Gaussian noise channel. It will also be shown how the system values will be used to determine the total transmission rates for both the non-SIC and the SIC-based SISO and MIMO systems to maximize the total transmission rate.
The objective of the total transmission rate maximization for a given total number of spreading sequences will http://jwcn.eurasipjournals.com/content/2013/1/74 be to bring the downlink throughput close to the system value UB. This will be achieved by retaining the spreading sequences with the highest system values for a given total received SNR corresponding to a given total transmission energy E T . A given number of sequences will be ordered so that the corresponding system values are used at the transmitter in ascending order. The optimum number K * of signature sequences will be determined to select the first K * -ordered signature sequences, which maximize the transmission throughput. The receivers will operate in a sequence, where the detection is ordered in the descending order of the corresponding system values for the SIC-based systems.
As shown in [19], the WF optimization is generally used for parallel channels with different sub-channel gains to provide optimum sub-channel selection, energy distribution, and also channel ordering. The iterative WF sum-capacity optimization is extensively examined in [20][21][22][23] and is proven to converge to the sum capacity UB of the multiple-access channel [20] to provide an UB for Figure 2 Receiver system block diagram for SIC based detection. The block diagram for the SIC based HSDPA MIMO system is given. http://jwcn.eurasipjournals.com/content/2013/1/74 non-discrete rates. Other sub-channel removal methods have been studied in [24][25][26] to determine the number of active data streams. In [24], the eigen decomposition of the covariance matrix is used to isolate the "bad" data streams so that the sum MSE is minimized. In [25], it is suggested that low signal-to-interference and noise ratio (SINR) streams will be switched off to focus the available power on the remaining streams during the iterative power allocation process. In [26], the removal of subchannels is proposed to improve the capacity when the rounding of the discrete rate does not improve the system throughput. The WF and channel removal schemes do not use the system value concept for signature sequence selection nor use rate adjustment to maximize the total throughput.
In this article, three bit rate adjustment methods will be considered with the appropriate energy allocation schemes. These methods will be applicable to both SIC and non-SIC-based receivers, when using discrete and non-discrete rates. Initially, an iterative WF algorithm will be proposed with a sub-channel removal for the selection of signature sequences. The system values will be used to maximize the throughput for non-discrete rate allocation by accounting for the channel SINRs corresponding to the received signature sequences instead of using only the channel gains to find the water levels. When using discrete rates the signature sequence selection scheme will be further extended to optimize the total rate for the HSDPA system downlink. The system values will be used to select an optimum number of spreading signature sequences from a given total number of sequences without any prior energy allocation. The chosen optimum number of sequences will be loaded with discrete rates using both the equal SINR allocation methods proposed in this article and the equal energy allocation schemes as specified in the current HSDPA standard. The equal SINR and energy loading schemes will use the mean and the minimum of system values for a given total energy to transmit the symbols at the required discrete rates. These three methods will be named as the iterative WF-based continuous bit loading method, the mean system value-based discrete bit loading method, and the minimum system value-based discrete bit loading method.
The mean and minimum system value-based methods will require different and equal transmission energy allocations, respectively. The iterative energy allocation methods will be described for the mean system value-based discrete bit loading systems.
The link throughput improvements for these three methods will be described, when considering the receiver design, power control, and signature sequence selection algorithms. A complexity reduction method will be presented for covariance matrix inversions. The results show that the HSDPA MIMO system, using the optimization methods proposed in this article, achieve a system throughput close to the system value capacity UB for the frequency selective channels. The results are then comparable with the LTE system, without incurring the cost of building new infrastructures.
In Section 3, two HSDPA MIMO system models will be described for receivers with the non-SIC and the SICbased MMSE de-spreading units. In Section 4, the system value formulation will be presented and the MMSE filter coefficient calculations will be given. The system value UB concept for both the non-SIC and the SIC-based receivers will be presented in Section 5. The formulation of a simplified iterative covariance matrix for use in the design of the SIC-based receivers with MMSE equalizers will be described in Appendix 2 to support the material presented in Section 5. The system value-based sum capacity/throughput maximization methods for optimum signature sequence selection, energy allocation, and rate maximization methods will be described in Section 6. These schemes will be based on the iterative WF and the mean and the minimum system value optimization methods. Finally, the results will be described in Section 7 before the conclusions are given in Section 8.

Notation
a is a scalar, a is a column vector, and A is a matrix. The identity matrix with dimension L is given as I L .

Transmitter and a non-SIC-Based receiver model
The HSDPA MIMO system model used in the following sections will be briefly described in this section for both the non-SIC and the SIC-based receivers. Initially, a non-SIC-based multi-code CDMA MIMO downlink transmission system will be considered with N T transmit antennas and N R receive antennas with their respective indices represented by n t and n r . Given the spreading factor N of the system, the maximum number K of spreading sequences satisfies the relationship K ≤ min(N T , N R )N where each spreading sequence index is represented by k. When selecting the optimum number K * of spreading sequences, weak channels corresponding to a specific set of signature sequences will be excluded to maximize the total rate. The system under consideration will operate with the selected optimum number K * of spreading sequences. Each spreading sequence will transmit a symbol operating at a discrete rate chosen from a set of rates according to the CSI updated at regular transmission time intervals (TTIs). In the system model, each parallel binary bit packet u k for k = 1, . . . , K * of length N U will be encoded to produce a length B vector d k and mapped to quadrature amplitude modulation (QAM) symbols each carrying b = log 2 M bits, where M is the chosen constellation size. The encoding rate r code = N U B http://jwcn.eurasipjournals.com/content/2013/1/74 will be used to obtain a realizable discrete rate of b p = r code × log 2 M bits per symbol where p = 1, . . . , P are the different discrete bit indices available. The bit rate for each spreading sequence is represented by b p k for k = 1, . . . , K * .
A non-SIC-based system model is shown in Figure 1. In the CDMA system, the number of symbols transmitted per packet is given by N (x) = TTI NT c where T c is the chip period and NT c is the symbol period. In each parallel channel, the mapped packet of symbols corresponding to d k over 1 TTI is represented by an N (x) long vector x k for k = 1, . . . , K * , where each symbol in x k carries unity average energy. The symbols over K * parallel channels are stored in an Each spreading sequence will have an energy allocated, where the assigned energies are stored in a K * ×K * dimen- The energy weighted symbols will then be spread by signature sequences (spreading codes) and are represented by an (N T N) × K * signature sequence matrix where |s k | 2 = 1 and S n t =[ s 1,n t , s 2,n t , . . . , s K * ,n t ] is a N ×K * spreading sequence matrix of the n t th antenna. The length N transmit signal vector at antenna n t is given by z n t (ρ) = S n t Ay (ρ) for the symbol period ρ. The vector z n t (ρ) will then be fed to a pulse shaping filter at integer multiples of T c before up converting to the desired carrier frequency. The length N × N T MIMO transmit signal vector is given Assuming the clocks at the transmitter and the receiver are fully synchronized, the signals arriving at the receive antennas will be firstly down converted to the baseband before sampling at every T c at the output of the receiver chip match filter.
The receiver matched filtered signal vector r(ρ) for each symbol period will be represented by an N R (N + L − 1) long vector r(ρ) =[ r T 1 (ρ), . . . , r T n r (ρ), . . . , r T N R (ρ) ] T , where L is the number of resolvable paths in a multipath wireless channel. The samples at the output of the chip match filter of the n r th antenna are represented by an (N + L − 1)-length vector r n r (ρ) = [ r n r,1 (ρ) , . . . , r n r ,(N+L−1) (ρ)]. The N R (N + L − 1) × N (x) -dimensional matched filter matrix R is formed by taking r(ρ) as its ρth-column such that R = ]. With r (ρ) and the N R (N + L − 1) × K * -dimensional MMSE linear de-spreading filter matrix W = w 1 , . . . , w K * containing de-spreading filter coefficients w k each of which is calculated using (10) for k = 1, . . . , K * . The estimate y(ρ) of the transmitted symbol y(ρ) can be found as follows: The vector y(ρ) is used to form the )] T by using x T k = w H k R to de-spread the received signal vector of the kth channel. The de-spread signals pass through the decision device, where the signals are quantized, de-mapped and decoded to form binary data vectors u k,D for k = 1, . . . , K .
At the output of each receiver, the mean square error (MSE) between the transmitted symbol y k (ρ) and the esti- When the MSE is minimized, it has a relationship with the SINR γ k and the system value λ k as ε k = 1 1+γ k = 1 − λ k . Therefore, the system value is given by Figure 2 illustrates the system model for a SIC-based receiver, which collects the received signals r 1 (ρ) to r N R (ρ) to formulate the received signal vector r(ρ). The receiver processes and cancels the signals channel by channel to ensure that the SI is minimized. Starting from channel K * and by setting the received signal matrix

The SIC-based receiver model
x k,D are the detected stream of the current symbol period, the detected streams with ISI symbols received in the previous and the next symbol periods, respectively. The N R (N + L − 1)-dimensional receiver matched filter sequences q k , q k,1 and q k,2 are given in (8), (31), and (32) for the current, previous, and next symbols, respectively.
At each kth channel, the estimated symbol vector x k,D is generated by using each MMSE de-spreading vector w k from (15) to yield a de-spread signal vector of x T k = w H k R k and an estimated bit stream u k,D . The decoded bit vector u k,D is re-coded at the receiver and re-modulated to regenerate the transmit symbol vector x k,D at the output of the decision device. The vector x k,D is used to form k http://jwcn.eurasipjournals.com/content/2013/1/74 which is required to generate R k−1 for the next channel. This process of cancelling the detected symbols continues from k = K * to k = 1. The next section will introduce the system value and the de-spreading filter coefficient calculations for both the SIC and the non-SIC-based systems.

System value and MMSE de-spreading filter coefficient formulations
In this section, the system values and the corresponding MMSE de-spreading filter coefficients are expressed in terms of the received signal vector r(ρ).

System values for a non-SIC-based receiver
The received signal vector r(ρ) over the symbol period ρ is given in terms of the transmitted signal vector z(ρ) as and the received signal matrix is given by contains the concatenated noise samples at the output of the receiver chip matched filters. The N R (N + L − 1) × N T N matrix H represents the overall MIMO channel convolution matrix formed as follows: The channel convolution matrix between the pair of antennas H (nr,nt) is determined by their channel impulse It is assumed that the signals from each n t th transmit antenna to each n r th receive antenna undergo the same channel condition for the packet duration with L resolvable paths, and the channel conditions obtained from the feedback of pilot signals. The corresponding channel convolution matrix between the pair of antennas is formed as The spatiotemporal MIMO channel matrix for the previous symbol block and the next symbol block are given as where J is a vector shifting matrix. The notation . When multiplied with a matrix, (J T N+L−1 ) N shifts the columns of the matrix up by N chips and fills the empty contents with zeros, while J N N+L−1 shifts the columns of the matrix down by N chips and fills the empty contents with zeros.
The N R (N + L − 1) × K * -dimensional receiver matched filter signature sequence matrix Q is calculated as follows: The system value for the spread spectrum system based on a receiver without the SIC scheme is given by In (9), the covariance matrix C is calculated using (30) in terms of Q and the noise covariance matrix E n(ρ)n H (ρ) as shown in Appendix 1.
The normalized MMSE de-spreading coefficients w k for k = 1, . . . , K * when the MSE per channel is minimized can be formed in terms of C, as shown below: These coefficients are then stored in a matrix

System values and MMSE de-spreading filter coefficients for a SIC-based receiver
Similar to the received signal vector r(ρ) which is constructed in (4), a SIC-based received signal vector is formed to improve the SINR at the output of each receiver. For the SIC scheme, the system value λ k for k = 1, . . . , K * is determined using the following equation: where q k is the kth column of (8). The covariance matrix C k is initialized as C 0 = 2σ 2 I N R (N+L−1) and then iteratively constructed for k = 1, . . . , K * using the following relationship: where After all iterations k = 1, . . . , K * have been completed the covariance matrix given in (30) is set to be C = C K * . When calculating the system values for the SIC scheme, each system value λ k in (11) for k = 1, . . . , K * involves http://jwcn.eurasipjournals.com/content/2013/1/74 one matrix inversion C −1 k , which requires high computational complexity. By applying the matrix inversion lemma (13) and C k in (12), an iterative covariance matrix inversion method is formed by constructing the inverse matrices C −1 k and D −1 k using (33) and (34), respectively, as a function of C −1 k−1 as shown in Appendix 2 so that the total number of matrix inversions required to obtain λ k for k = 1, . . . , K * reduces to 1.
The inverse matrices C −1 k and the corresponding system values, λ k , are calculated iteratively so that the system value λ k given in (11) is reorganized using (34) to simplify the SINR γ k at the output of the kth SIC receiver to the following form using the steps in (35) to (38) given in Appendix 2. Therefore, γ k can be calculated when D −1 k is obtained using (33).
The MMSE linear equalizer de-spreading filter coefficients w k for the kth SIC receiver in (10) is expressed in terms of C k as for k = 1, . . . , K * .

Sum capacity optimization using system values
The main focus of this article is to find the optimum number K * of spreading sequences, which maximizes the total rate, where K * is a subset of the total number K of spreading sequences used for transmission. The total rate b T = K * k=1 b p k is maximized by minimizing the total MSE ε T = K * k=1 ε k , where b p k is the number of bits allocated to each spreading sequence symbol for k = 1, . . . , K * . The total MSE minimization criterion has been studied in [24,27,28] and can be expressed in terms of the Lagrangian dual objective function: where λ is the Lagrangian multiplier. The minimization of the total MSE using the above equation provides solutions for E k and the Lagrangian multiplier λ, subject to the energy constraint K * k=1 E k ≤ E T . Since b p k is expressed as a function of ε k and E k , b T = K * k=1 b p k will be determined only after energy allocation, which could be computationally expensive, when an iterative energy calculation is required. Therefore, this article uses the system value optimization originally presented in [9], where the system value λ k of the kth channel is calculated using (9) and (11) for the non-SIC and the SIC-based receivers, respectively. Differing from [9], in this article a method is proposed to calculate the discrete rate for each spreading sequence using the mean system value λ mean prior to allocating the energy for each sequence.
The mean system value λ mean is calculated by allocating energies equally such that E k = E T K * and then obtaining the system value λ k from (9) for the non-SIC receiver or (11) for the SIC receiver, using the following equation The total system capacities for the MMSE receivers for both the SIC and the non-SIC-based receivers are then given as where is the gap value. To relate the system values to discrete bit rate optimization, one can use the discrete bit rate and its SINR relationship b p k = log 2 1 + γ k . Thus, the target SINR can be expressed as a function of the discrete rate b p k as follows: and the corresponding target system value λ * k expressed as a function of b p k can be obtained using The next section will provide a detailed description of the system value based throughput optimization methods for both the non-SIC and the SIC-based spread spectrum MIMO systems.

System value-based discrete and WF algorithm-based non-discrete bit loading
In this section, an iterative WF algorithm and two discrete bit loading algorithms will be presented using the system value approach. These methods operate with a given total energy E T when implemented with or without the proposed SIC receiver. First, the iterative WF algorithms will be presented for continuous bit loading. Two iterative discrete bit loading methods will then be proposed to maximize the total rate without the need for any prior energy allocation. These discrete bit loading methods maximize the total rate by jointly allocating the discrete rate and then selecting the optimum number K * of ordered spreading sequences. The first discrete http://jwcn.eurasipjournals.com/content/2013/1/74 bit loading algorithm will use the mean system value λ mean to determine the optimum number K * of spreading sequences and to select the sequences prior to allocating the energy for each sequence. The second discrete bit loading method will use the minimum system value λ min to select the optimum number of sequences.
The system values will be ordered in an ascending order for all combinations of K opt = K , . . . , 1 for both discrete bit loading methods prior to selecting the optimum number of signature sequences. The temporary number K opt of optimum spreading sequences is used as an initial value for each loop in an iterative sequence number optimization process.
For the discrete bit loading methods with λ mean and λ min , margin adaptive (MA) loading (equal rate) algorithms will be considered initially so that all spreading sequences have the same rate b p = b p k for k = 1, . . . , K * by using the target system values identified in (19) in terms of the available discrete rates. The total transmission rate is R T = K * b p . Then, the two-group (TG) rate adaptive optimization will be described for both cases to use the wasted (residual) energy caused by quantization loss, by loading a certain number of channels, m, with the next discrete rate b p+1 to further increase the total rate to R T ,TG = (K * − m)b p + mb p+1 .

Iterative WF-based continuous bit loading
The iterative WF was originally developed to remove subchannels, which contain negative energies, and to maximize the total rate. This section describes the iterative WF optimization, which finds the optimum sub-channels K * WF using the system values for continuous unequal bit loading. This iterative WF algorithm can also be applied to the HSDPA system with and without SIC by using the system value λ k formulation given in (9) and (11), respectively. The algorithm first allocates energies to the channels before the rates and the optimum number of channels are determined. The iterative WF starts with K opt = K , where K opt is the temporary optimum number of codes. In each K opt th iteration, the WF calculates the channel SINR per energy unit vector [ g] k and assigns energies E k for k = 1, . . . , K opt . The signature sequences are reordered starting with those signature sequences which have the lowest channel SINR. The first sub-channel is removed if it was assigned with a negative energy. When there are no more sub-channels with negative energies, energies are allocated iteratively until they converge. This continues unless a channel with negative energy is detected during the process. With the later case, the corresponding subchannel will be removed and energies are recalculated as before. The algorithm will return the optimum number of coded channels K * , their respective allocated energies and signature sequences, covariance matrix C or C k and the MMSE receiver coefficient.
The iterative WF algorithm initializes K opt = K and the procedure is summarized as follows: 1. Initialize the loop counter as I = 1. The number of energies E k is K opt and vectors q k , q k,1 , q k,2 , and k order are of length K opt . 2. Perform energy allocation: (a) Calculate the channel SINR per energy unit by finding λ k from (9) for non-SIC or (11) when using the SIC receiver. (b) Determine the WF constant 3. Perform signature sequence reordering procedure: (a) Find the term c k , the indices of the k th smallest element of g. Store it in the vector ] c k as well as energies E k = E a k for k = 1, . . . , K opt .
4. Carry out the channel removal process: If E 1 < 0, remove this channel by setting Repeat the process from step 1. Otherwise, K * = K opt , set counter I = I + 1 and repeat the process from step 2 until I = I max is reached.

System value-based signature sequence ordering for discrete loading
This section will describe the use of system values for ordering the signature sequences to maximize the system capacity by determining and selecting the optimum number of signature sequences for receivers with and without the SIC scheme. The signature sequence ordering process starts with by setting K opt = K and continues by iteratively adjusting K opt = K opt − 1 until K opt = 1 is reached.
In each iteration, the system values are calculated, then the signature sequences (or coded channels) are ordered, and the signature sequence containing the smallest system value is removed. This generates a new set of selected and ordered signature sequences for each K opt th iteration. http://jwcn.eurasipjournals.com/content/2013/1/74 By allocating energies equally to all selected spreading sequences k = 1, . . . , K opt for that iteration, the system values are obtained from (9) or (11) for the non-SIC and SIC cases, respectively. These system values are stored in a K opt length vector λ =[ λ 1 , . . . , λ K opt ]. The mean system values λ mean and the minimum system value λ min for each K opt iteration are stored in the K -length vectors [ λ mean ] K opt and [ λ min ] K opt respectively where λ mean and λ min are initialized as 0 K .
The system values given in λ are sorted in an ascending order for the current K opt iteration and are stored in the K opt th column of the K × K matrix λ store , i.e., in [ λ store ] 1:K opt ,K opt . The indices of the ordered system values are stored in a K opt length vector k order , where indices range from 1 to K opt . The next step is to find the K opt -length vector k select = [ a 1 , . . . , a k , . . . , a K opt ] which contains the indices of the selected subset of the signature sequences used in the current K opt iteration. These are also ordered according to the ascending order of the system values using k order . Next k select will be stored in the K opt th column of the K × K upper triangular matrix K seq such that [ K seq ] 1:K opt ,K opt . The vector k select is initialized as k select =[ 1, . . . , K ] and K seq is initialized as 0 K ×K .
Defining Q orig , Q orig1 , and Q orig2 as the original unmodified receiver signature sequence matrices of Q, Q 1 , and Q 2 with its order is equivalent to S, reordering procedure is carried out by setting The signature sequence removal will be completed by removing the first element of k select so that the vector length is reduced to K opt − 1, and by removing the first columns of Q, Q 1 , and Q 2 so that the received signature sequence matrix dimension becomes N R (N + L − 1) × (K opt − 1). This reduced matrix will be used to calculate the system values, and order and remove the spreading sequences with the smallest system value for the next K opt iteration by setting K opt = K opt − 1 and repeating the process until K opt = 1.
The procedure can be summarized as below.
1. Find all system values corresponding to each K opt from K opt = K to K opt = 1 by using the following steps.
(a) Allocate energy equally for each signature sequence such that E k = E T K opt , for k = 1, . . . , K opt . Form the amplitude matrix A. (b) Find λ k for k = 1, . . . , K opt using (9) and C from (30) for non-SIC, or λ k from (11) and C k from (12) for SIC. Store λ =[ λ 1 , . . . , λ K opt ]. (c) Store the minimum system value λ min K opt = min(λ) and the mean system value 2. Reorder the signature sequences and remove the signature sequence with the minimum λ k for each K opt iteration: (a) Find the indices of the k th smallest elements for k = 1, . . . , K opt of λ, store it in k order . (b) Store the system values in [λ store ] 1:K opt ,K opt in ascending order. (c) Find the vector k select =[ a 1 , . . . , a k . . . , a K opt ] which contains the indices of the selected subset of the signature sequences and with ordering according to k order . Store the reordered sequence index a k in K seq k,K opt .
and repeat steps 1 and 2. Otherwise, the optimum signature sequence identification for the discrete loading schemes will be performed, as described in the next two sections.

Mean system value-based discrete bit loading algorithm
To achieve the same SINR distribution at the output of each de-spreading unit so that a higher b p is selected for equal rate loading, transmission energies need to be adjusted to achieve a target (fixed) SINR at each receiver. The discrete transmission rate will be identified using the mean of the system value λ mean . This method will operate with an energy constraint K * k=1 E k ≤ E T to identify the optimum number K * of signature sequences and select the transmission signature sequences to maximize the total transmission rate R T ,mean .
With the relationship of the target system value λ * and the bit rates b p in (19), a set of target system values stored in the P-length vector λ * corresponding to all bit rates b p will be generated. In the earlier Section 6.2, the ordered signature sequences for different number of signature sequences are given in K seq for all combinations of K opt = K , . . . , 1. For these values the rate to be transmitted b p k will be identified by comparing λ mean K opt to the target system values in λ * for all K opt combinations. The optimum number of codes, K * , will be selected from the K opt combination, which gives the highest total rate R T ,mean = K * b p . This algorithm returns the total rate R T ,mean , optimum number codes K * and the selected and http://jwcn.eurasipjournals.com/content/2013/1/74 ordered signature sequence matrix S (mean) . The algorithm is described below: 1. For the set of bit rates {b p } P p=1 , find the corresponding target system value 3. Store the total rate R temp,mean K opt = K opt ×[ b mean ] K opt for K opt = 1, . . . , K . 4. Select the optimum signature sequences satisfying The total rate R T ,mean = max(R temp,mean ).

Construct the signature sequence matrix
The TG optimization can be used to further maximize the total rate by loading m channels with b p+1 so that the total rate becomes R T ,TGmean = (K * mean − m mean )b p,mean + mb p+1,mean . For the mean system value based optimization method, the number of channels m mean which loads the next discrete rate b p+1,mean will be obtained by finding the maximum m mean that satisfies the following inequality

Energy allocation for non-SIC
This section describes the energy allocation schemes for the mean system value-based discrete bit loading allocation for both the non-SIC receiver and the SIC receiver with equal rate or TG allocation. When allocating equal rate, the bit rates of each channel are equal, i.e., b p k = b p for k = 1, . . . , K * ; while bit rates are allocated as b p k = b p for k = 1, . . . , (K * mean − m mean ) and b p k = b p+1 for k = m mean , . . . , K * mean when using the TG allocation. With K * , b p k and λ * (b p k ) obtained in Section 6.3 the transmission energies for the non-SIC scheme can be iteratively calculated as shown below: where i is the iteration number. The term C −1 i−1 is calculated by inverting C i−1 given in (30), which is a function of E k,(i−1) for k = 1, . . . , K * mean with E k,0 = E T K * mean initialized for all channels. The iteration continues until the energies converge to fixed values or the maximum number of iterations, I max is reached.

Energy allocation for SIC
As the iterative calculation of energy E k,i depends on C −1 i−1 which requires energies E k,(i−1) for k = 1, . . . , K * for each iteration i, the SIC-based energy allocation method was developed to simplify the calculation of energy so that E k,i depends only on E k,(i−1) and the stored covariance matrix inverse C −1 k−1 which is a function of E k−1,I max . The inverse covariance matrix C −1 k will be calculated once per spreading sequence after having obtained the energy E k,I max .
The energies for the SIC-based receiver can be iteratively calculated from E 1 to E K * mean without any need to invert a matrix for each energy iteration by rearranging (14) as follows: By using (33), the energy calculation given in (23) can be simplified to where the weighting factors ξ ,ξ 1 ,ξ 2 ,ξ 3 ,ξ 4, ξ 5 , and ξ 6 are constructed from C −1 k−1 , q k , q k,1 , and q k,2 using (36) and the covariance matrix, used for the calculation of E 1 , by initializing as C −1 0 = 1 2σ 2 I N R (N+L−1) . The terms ζ 1,(i−1) and ζ 2,(i−1) are calculated using (37) as a function of E k,(i−1) ; while γ * k is the target SINR calculated as a function of b p k using (20). The iterations of E k,i continue until the energy converges to a fixed value or I max is reached. Then, C −1 k is calculated in terms of E k,I max using (34). This process is repeated for all selected transmission channels for k = 1, . . . , K * mean . Once the energies are allocated, the transmitter provides the receiver with the allocated energies. The next section will describe the minimum system value-based discrete bit loading schemes.

Minimum system value-based discrete bit loading algorithm
An equal energy loading method is adopted for the current HSDPA standards to load a discrete rate to each spreading sequence. Equal energy allocation produces varying SINRs at the receivers, but makes it simpler to allocate energies than the equal SINR loading scheme. As the channel with the minimum SINR is chosen as the target SINR to guarantee the quality of the service, this will also be referred to as the minimum system value-based discrete bit loading method. This section will describe how to select the optimum number and the corresponding signature sequences to maximize the total rate for the HSDPA downlink. For the http://jwcn.eurasipjournals.com/content/2013/1/74 minimum system value-based discrete bit loading, the transmission energies are allocated equally E k = E T K * and there is no iterative energy adjustment. Differing from the mean system value-based discrete bit loading, the minimum system value λ min will be used to determine the transmission rate for each spreading sequence.
With λ min for all K opt combinations and the ordering of the signature sequences given in K seq as described in Section Sec5.2, the bit rate b p will be selected in a similar way to the mean system value based loading, except λ min is used to compare with the target system value. The algorithm will return the optimum number of codes K * mean , the total rate R T ,mean = K * k K * mean b p and the ordered signature sequence matrix S (min) . The minimum system value-based loading is summarized below: 1. For the set of bit rates {b p } P p=1 , find the corresponding target system value 3. Store the total rate R temp,min Again, a TG allocation can be performed to further increase the total rate. For the equal energy allocation, the channels that have system values λ k >[ λ * ] p+1 where p corresponds to the index of b p,mean will be loaded with the next discrete rate b p+1,mean . The total rate for the minimum system value TG allocation will be R T ,TG min = (K * min − m min )b p,min + m min b p+1,min . The next section will provide the results obtained from the simulations and the discussions about the performance of the different loading algorithms.

Results
Two separate experimental setup systems were developed using the Matlab and the National Instruments (NI) Lab-VIEW platforms with the parameters as listed in Table 1.
The proposed system value optimization methods both with and without the SIC implementation were tested using the Matlab and LabVIEW simulation packages with the parameters: a spreading factor of N = 16, the full number of spreading sequences K f = 2N, an additive white noise variance of σ 2 = 0.02, and a gap value of = 0 dB. A set of discrete rates {b p } P p=1 , which range from 0.5 to 6 bits per symbol with intervals of 0.5, was considered for transmission over a 2 × 2 MIMO HSDPA system. The OVSF codes, which are precoded according to 3GPP Release 7 given in [6], were used as spreading sequences.
The objective of using the two experimental platforms is to cross check the system performance obtained from the Matlab simulation environment and the LabVIEW environment. A real-time channel emulator was implemented by modifying the National Instruments FPGA channel emulation software. This emulator is fed with the vectors containing the channel impulse response samples which are externally generated from power delay profiles (PDP) as specified by the standardization organizations such as ITU and 3GPP. Two industry standard profiles, known as the pedestrians A and B PDP, shown in Tables 2 and 3, were adopted in this article as specified [29] by the ITU organization.
The pedestrians A and B PDP correspond to the channel impulse responses taken at non-regular intervals with a resolution of 10 ns. The PDP given in the ITU specification as shown in Tables 2 and 3 can be written as where P i is the linear power (not the logarithmic scale) at delay τ i . This PDP is sampled with a sampling rate of 1

T c
where T c = 260 ns is the chip period. The new PDP is given as where P T c ,l is the power component at the lth chip period and L is the length of the sampled PDP. P T c ,l is given as the sum of all power in P (t) in the time interval t = lT − T c 2 and t = lT + T c 2 such as The pedestrians A and B channels shown in Tables 2  and 3 are re-sampled at the chip period intervals as shown in Table 4. After sampling, power is normalized so that the PDP has a unity power gain. This produces the normalized square root PDP given in a vector form as h = Two PDP sampled at chip period intervals for the pedestrians A and B channels were produced as: h ped A =[ 0.9923, 0.1034, 0.0683] T and h ped B = [ 0.6369, 0.5742, 0, 0.3623, 0, 0.253, 0, 0, 0, 0.2595, 0, 0, 0, 0, 0.047] T at regular chip period of T c = 260 ns, which corresponds to the HSDPA system operating at 3.84 Mchips/s. The pedestrian A channel has a short delay spread of 3 chip periods and the pedestrian B PDP corresponds to a delay spread of 15 chip periods. The channel impulse response samples taken at the regular chip period intervals of T c = 260 ns were used in the Matlab and the LabVIEW test environments. The pedestrians A and B PDP were specifically chosen to have channel impulses, which result in short and long ISI in the detection processes. In Table 4, the pedestrians A and B PDP taken at chip period intervals are listed to generate individual impulse responses by applying complex Gaussian random variables to each coefficient of the square root of the PDP.
Each entry in columns 2 and 3 of Table 4 corresponds to the non-zero square-root PDP coefficient for the pedestrian channel impulse response vectors h ped A and h ped B .

The entries h ped A l+1
and h ped B l+1 in Table 4 identify the square-root PDP coefficients for the non-zero elements of vectors h ped A and h ped B with index l + 1. The PDP given in Table 4 and then the response is normalized using Where each coefficient a l and b l for l = 0, . . . , L − 1 is drawn from a normal distribution with zero mean and unity variance. Tables 5, and 6 list six sets of MIMO impulse responses generated from the pedestrians A and B PDP, respectively, to produce results for the experimental systems. The entries h i,j l+1 in Tables 5 and 6 identify the PDP amplitudes for the non-zero elements of vectors h i,j with index l + 1. These responses were used in the Matlab and LabVIEW environments to obtain a set of mean total throughput versus signal to noise ratio curves for the pedestrians A and B channels. It was observed that both the Matlab and LabVIEW experimental setup environments produced almost identical results. Results were produced for the throughput UBs and different optimization strategies for discrete rates in terms of system throughput in bits per symbol against the total SNR per symbol period per receiver antenna for 2 × 2 MIMO. The total received SNR is expressed in dB by using   10 × log 10 Trace QA 2 Q H 2σ 2 N R dB where N R = 2 is the total number of receiver antennas. For the UB throughput examination, the system value and the iterative WF UBs were simulated using the methods described in Sections 5 and 6.1, respectively. The corresponding curves for the water filling and the system value UBs both with and without the SIC schemes were labeled using the labels SIC WF UB, SIC SV UB, WF UB, and SV UB. Figure 3 shows the results for the WF UBs and system value UBs for both the non-SIC and the SIC schemes for the pedestrian A channel. The proposed system value UB achieves the same system capacity as the iterative WF for the systems with and without SIC. However, the system value UB is a good alternative to the WF UB due to its simplicity and its shorter processing time for calculating the system capacity. In the same figure, it is shown that the SIC UB achieves a much higher sum capacity especially at a high input SNR, where the total available energy is greater, and the energy per channel is higher. Thus, a higher interference is introduced to other parallel channels above a given total SNR and the system capacity saturates at an asymptotic value. To improve the sum capacity the SIC-based receiver cancels the interference corresponding to the detected symbols, starting from those which have the highest system value. As the SIC UB achieves a much higher sum capacity than the non-SIC system, it will be used as the ultimate UB, when comparing the performance and improvements obtained through different optimization strategies for the rest of this section.
Discrete bit rate allocation methods based on the use of the mean and the minimum system values for the equal energy and SNR cases were simulated as described in   These labels were appended with either FULL or OPT for the configurations corresponding to the systems with the full and optimum number of spreading sequences. The signature sequence ordering for a given set of total receiver SNRs was implemented using the algorithm described in Section 6.2. The optimum number of spreading sequences and also the data rates to be transmitted for the mean and minimum system value-based algorithms were calculated using the methods described in Sections 6.3 and 6.4, respectively.
The mean system value-based rate allocation requires iterative energy calculations, which were produced using the methods described for the non-SIC and the SICbased systems, respectively, in Sections 6.3.1 and 6.3.2. Iterative energy allocation methods were used to achieve equal SINR levels at the output of the de-spreading units. transmission energy allocation, the iterative power allocation stops, either when the sum difference between the current energy and the previous energy in the energy iteration loop is less than 1% of the total energy, i.e., E = K k=1 |E k,i − E k,i−1 | ≤ 0.001E T or when the maximum number of iteration I max is reached. The energy for each coded channel E k for the SIC ES allocation iterates until E k = |E k,i − E k,i−1 | ≤ 0.001 E T K . The processes described above were repeated for various total signal to noise ratios at the output of the despreading units for channels with pedestrians A and B channel PDP.
In Figure 4, the results are shown for the two-group equal SINR allocation using an optimum sub-channel selection and SIC optimization strategies, when transmitting spread signals over pedestrian A channel. The improved system for the equal SINR allocation with SIC achieves system throughputs corresponding to the curves SIC TG ES OPT, SIC TG ES FULL, and these achieved throughputs are very close to the SIC UB. It is not  Figure 4 The two group equal SINR throughput results for SIC and optimum signature sequence selection for the Pedestrian A channels. Results for two-group equal SINR allocation with the use of optimum sub-channels selection and SIC optimization strategies transmitted over the pedestrian A channel are shown. http://jwcn.eurasipjournals.com/content/2013/1/74 necessary for the SIC-based receiver to determine the optimum number of spreading sequences, when allocating equal SINR as the SIC scheme reduces these interferences. The SIC TG ES OPT scheme provides a 3-dB improvement over the transmission system with the TG ES FULL strategy. The TG ES OPT scheme, on the other hand, provides a 1.5-dB enhancement over the TG ES FULL scheme, when the total SNR is 35 dB. Figure 5 shows the pedestrian A results for a system with the optimum number of ordered spreading sequences, the SIC receiver and the discrete bit loading method based on minimum system value. It is shown that the SIC TG EE OPT scheme has a 4.5-dB improvement over the TG EE FULL-based system before the system throughput saturates at the total SNR value of 35 dB. The use of an optimum number of ordered signature sequences at the total SNR of 35 dB results in the TG EE OPT scheme having a 2.5-dB improvement over the TG EE FULL scheme. The performance of the receiver with the SIC TG EE FULL scheme is enhanced by 3 dB over the TG EE FULL scheme using the full number of spreading sequences. It is observed that the system with the TG equal energy (EE) allocation, SIC and the optimum number of spreading sequences approaches the non-SIC system value UB. It is further noted that at the total SNR value of 35-dB a 3-dB difference is observed compared with the SIC UB before the system throughput diverges. Figure 6 shows the simulation results corresponding to data transmitted over the pedestrian B channel. The system throughput saturates for the TG ES FULL scheme at a lower total SNR (at 30 dB) compared to the pedestrian A channel. At the total discrete data rate of 100 bps, the SIC  Figure 5 The minimum system value based discrete bit loading system throughput versus total SNR for the Pedestrian A channels. The optimization strategies using optimum sub-channels selection and SIC for the TG with minimum system value loading are shown.  Figure 6 Total throughput versus total received SNR results for the pedestrian B channels when using SIC based receivers and optimum signature selection scheme. Results showing greater improvements when using SIC based receivers and optimum sub-channels selection and when operating over the pedestrian B channel.
TG ES OPT provides 7 and 4 dB improvements, respectively, over the systems with TG ES FULL and TG ES OPT schemes. At the total discrete rate of 120 bps, more than 10-dB improvement is observed when using the SIC TG EE OPT scheme with the optimum number of spreading sequences over the TG EE FULL scheme. An 8-dB improvement is achieved by using the optimum number of ordered spreading sequences. Around the total SNR value of 30 dB the SIC TG EE OPT receiver with the optimum number of channels produces a 3-dB improvement over the TG EE OPT scheme without the SIC receiver. For the pedestrian B channel, the SIC TG EE OPT scheme for the TG discrete bit loading method produces a throughput, which exceeds the throughput of the TG method TG ES OPT with the optimum number of spreading sequences. The collaborative use of the SIC scheme with the optimum number of signature sequence selection scheme achieves a system throughput close to the system value UB.
The results extracted from Figures 3, 4, 5, and 6 are tabulated for the pedestrians A and B channels as shown in Tables 7 and 8, respectively. The entries in Tables 7 and  8 express the SNRs for specific data rates together with the total discrete rates at specific signal to noise ratios. The SIC scheme provides higher throughputs for both pedestrians A and B channels at an SNR of 35 dB. Specific entries as shown in Table 9 are extracted from Tables 7  and 8 (Table 9). The reason the TG EE FULL scheme achieves 29.7 and 82% of the SIC TG ES performances for pedestrians A and B channels, respectively, is that the PDP lengths or delay spreads for the pedestrians A and B channels are 3 and 15 chip periods, respectively. The HSDPA system, which uses the equal energy discrete bit loading method without the optimum number of spreading sequences suffers from a reduction in the total throughput compared with an HSDPA MIMO system with the optimum  number of ordered spreading sequences, when encountering multipath channels with PDP lengths approaching the processing gain, N, of the system. The proposed method of finding the optimum number of ordered signature sequences improves the performance of equal energy loading systems.

Conclusion
This article has developed and proposed algorithms, which maximize the system throughput, while reducing the computational cost. Complexity reduced system value UBs are proposed, which achieve the same sum capacity as iterative WF. In terms of complexity reduction, the use of system values proposed in this article finds the rates and provides optimum sub-channels selection before power allocation is performed. This eliminates the requirement to undertake iterative searches for the optimum bit rates combined with computationally intensive iterative power allocation for the equal SNR (ES) allocation. The optimum number of signature sequences can produce the maximum system throughput close the system value capacity UB. The proposed SIC increases the system throughput, but also simplifies the covariance matrix inversion process required for both the EE and the ES allocations. The computational reduction is especially significant for the ES allocation, where iterative energy allocation is required. It is shown that a system throughput improvement is achieved close to the SIC UB for both the pedestrians A and B channels by using the SIC-based receivers for the ES allocation. The SIC schemes with the full and optimum number of channels produce identical total rate results, when plotted against the total signal to noise ratio. It was observed that the signature sequence ordering was not essential for the equal SNR discrete bit loading algorithm. The identification of the optimum number of signature sequences for the equal energy allocation scheme significantly improves the total system throughput. The resultant scheme with the equal energy allocation, when using an SIC-based receiver with the ordered optimum number of signature sequences achieves a system throughput close to the non-SIC UB. http://jwcn.eurasipjournals.com/content/2013/1/74 The mobile radio channels with a longer channel impulse response length, which are measured in terms of the number of chip period intervals, have severe sum capacity throughput degradations compared with the system value UBs for equal energy loading HSDPA MIMO systems without the optimum number of ordered spreading sequences. The influence of the Doppler frequency on the performance of the proposed HSDPA system is currently under investigation and will be reported in future publications.
The results presented in this article confirm that the proposed optimum signature sequence selection scheme for the SIC receiver provides a significant performance improvement for the HSDPA system. As it is now possible to obtain system throughput near the UB. The proposed schemes with HSDPA will achieve results comparable to the LTE, without incurring significant additional cost to modify the existing HSDPA infrastructures.

Appendix 1
The receiver matched filter sequences q k =[ Q] k , q k,1 =[ Q 1 ] k = I N R ⊗ (J T N+L−1 ) N q k and q k,2 =[ Q 2 ] k = I N R ⊗ J N N+L−1 q k will be used to determine the covariance matrix C. The covariance matrix C of the received signal of dimension N R (N + L − 1) × N R (N + L − 1) is constructed using the following equation where ⊗ is the Kronecker product, Q e = [Q, Q 1 , Q 2 ] is the extended Q matrix of dimension N R (N + L − 1) × 3K * , Q 1 represents the previous symbol period components and Q 2 represents ISI from the next symbol period formed as Q 1 = I N R ⊗ (J T N+L−1 ) N Q = q 1,1 , . . . , q k,1 . . . , q K * ,1 (31) Q 2 = I N R ⊗ J N N+L−1 Q = q 1,2 , . . . , q k,2 . . . , q K * ,2 . (32) where y(ρ − 1) and y(ρ + 1) are the ISI components from the previous and next symbol periods; n(ρ) is the additive white Gaussian noise component with E(n(ρ)n H (ρ)) = 2σ 2 I N R (N+L−1) and σ 2 = N 0 2 is the noise variance per dimension.