Blind Direct Multiuser Detection for Uplink MC-CDMA: Performance Analysis and Robust Implementation

We consider the problem of blind (i.e., without training sequences) linear mitigation of multiple-access interference in the uplink of quasi-synchronous multicarrier code-division multiple-access (MC-CDMA) systems. In the ﬁrst part of the paper, we present the analytical performance assessment of the recently proposed blind two-stage multiuser detector, whose synthesis requires only the knowledge of the spreading code of the desired user. The analysis allows one to evaluate the actual performance when the receiver’s parameters are estimated by resorting to a ﬁnite data record. Based on this analysis, in the second part of the paper, we propose to improve the performance of the two-stage detector by adding a quadratic constraint in the ﬁrst stage synthesis, which exploits the knowledge of the spreading codes of the active users within the cell of interest. It is shown analytically that incorporation of such a quadratic constraint improves the receiver robustness against errors in the estimated statistics of the received data, although it slightly reduces the interference suppression capabilities of the two-stage detector. The e ﬀ ectiveness of the proposed receiver is further corroborated by computer simulation results.


INTRODUCTION
The wideband direct-sequence code-division multiple-access (DS-CDMA) technique has emerged in recent years as the preferred air interface for providing voice and multimedia services in third-generation mobile communications.However, the use of DS-CDMA technology does not seem to be realistic [1] for very high data-rate multimedia services (at speeds of the order of several hundred megabits per second) due to the severe multipath-induced interchip and intersymbol interference, as well as because of synchronization difficulties.In order to alleviate the previous drawbacks, a great bulk of research activities has focused on the multicarrier CDMA (MC-CDMA) technology [2], which integrates the advantages of multicarrier transmission systems, such as orthogonal frequency-division multiplexing (OFDM), with those of DS-CDMA.As discussed in [2], MC-CDMA systems can be categorized in two major types, according to whether the code spreading is performed in the time or frequency domain.The MC-CDMA system considered in this paper, originally proposed in [3], is based on frequency-domain spreading, which consists of copying each information symbol over the N subcarriers and multiplying it by a user-specific code.Besides representing an inherent form of frequency diversity, transmission over the N subcarriers allows one to cope with interchip and intersymbol interference more effectively than in DS-CDMA systems by lowering the data rate by serialto-parallel (S/P) conversion and introducing a cyclic prefix (CP) in the transmitted data.Additionally, since the symbol rate on each subcarrier is much lower than the chip rate in a DS-CDMA system with comparable processing gain, the synchronization task is easier in MC-CDMA and, therefore, it is reasonable to consider a quasi-synchronous (QS) uplink [4,5], with a beneficial impact on system performance and capacity.
Early papers on MC-CDMA reception [3,6] deal with synchronous downlink transmission, wherein the receiver can be implemented by means of simple diversity-combining strategies [7], such as orthogonal restoring combining (ORC), equal gain combining (EGC), maximal-ratio combining (MRC), or minimum mean-square error combining (MMSEC) (see also [8]).In addition to the knowledge of the spreading code and timing of the user to be demodulated, the ORC, MRC, and MMSEC receivers require also the knowledge of the corresponding channel impulse response.When employed in the asynchronous uplink channel, MC-CDMA with these simple diversity-combining strategies can still perform better [9] than both DS-CDMA with a comparable value of the processing gain and RAKE reception, and MC-CDMA schemes with time-domain spreading.However, due to the presence of severe multiple-access interference (MAI), diversity-combining schemes tend to exhibit exceedingly large values of the bit error rate (BER) floor in certain scenarios, even for a QS uplink [8].To drastically improve the performance in this case, more sophisticated reception strategies, such as multiuser detection (MUD) techniques, are needed.Among these, the use of a linear MMSE receiver was originally proposed in [6,10] to mitigate MAI in the synchronous downlink of a MC-CDMA system; in the asynchronous uplink scenario, this detector significantly outperforms [11] all the diversity-combining schemes, requiring the same a priori information (i.e., code, timing, and channel of each user to be demodulated) with a slightly increased complexity.A fractionally-spaced version of the MMSE (FS-MMSE) receiver, which does not require timing information, is proposed in [11], at the price of a further increased complexity over the MMSE detector while still requiring the knowledge of the desired channel impulse response.
Most of the above-mentioned diversity-combining and MMSE MUD techniques rely on channel estimation, which can be performed by resorting to bandwidth-consuming training sequences.To avoid waste of resources, a subspacebased blind (i.e., without requiring training sequences) version of the linear MMSE receiver for a QS-MC-CDMA system is proposed in [12], where the channel of the desired user is estimated on the basis of the eigenstructure properties of the received autocorrelation matrix; such a receiver belongs to the class of indirect blind MUD techniques, where the channel is first estimated and then the estimate is plugged into the corresponding nonblind detector.By extending some of the concepts originally proposed in [13], which have proven fruitful in the area of joint multiuser detection and equalization in asynchronous DS-CDMA systems, a direct MUD technique is proposed in [14], where the detector's parameters are extracted from the received data without performing an explicit channel identification.This receiver consists of two stages: the former performs a suitably prefiltering of the received signal in order to mitigate MAI; the latter exploits the constant modulus (CM) property of the transmitted symbol sequence to recover the de-sired signal.Since the direct two-stage receiver requires the only knowledge of the code of the desired user, it is a blind and delay-independent MUD technique.
In this paper, with reference to the QS uplink of a MC-CDMA system, we first provide the analytical performance assessment of the direct blind MUD two-stage receiver proposed in [14], aimed at evaluating the performance degradation, in terms of signal-to-interference-plus-noise ratio (SINR) at the output of the first stage when the receiver is implemented by using a finite data record.The analysis allows one to identify sufficient conditions assuring that the second stage, based on the CM, converges to the extraction of the desired symbol.The analysis, moreover, allows one to derive a new optimization criterion aimed at improving the robustness of the two-stage receiver when it is implemented by using very short data records.The new criterion is based on the assumption, which is reasonable in the uplink, that the base station receiver has knowledge not only of the desired spreading code, but also of the spreading codes of a group of users, for example, the users within its cell.This same assumption, considered in the context of DS-CDMA systems, leads to the synthesis of the so-called group-blind receivers [15,16]; although in principle, these receivers could be extended to the MC-CDMA case, they would fall into the class of indirect methods, wherein channel identification is first performed for all the known users by a costly eigendecomposition; moreover, they would require oversampling the received signal and/or employing an array of sensor at the receiver.Since our method, instead, is a direct one, it does not require any explicit eigenstructure-based channel estimation step; moreover, it does not require oversampling and/or multiple sensors at the receiver and, hence, it is inherently simpler.
The paper is organized as follows.Section 2 introduces the basic signal model of the considered QS-MC-CDMA system.Section 3 briefly reviews the two-stage approach proposed in [14] and presents the performance analysis in terms of SINR at the output of both stages.Section 4 proposes and analyzes the robust version of the two-stage receiver.Section 5 is devoted to the numerical performance analysis carried out by means of Monte Carlo computer simulations.Finally, conclusions are drawn in Section 6.

THE QUASI-SYNCHRONOUS MC-CDMA UPLINK MODEL
In the rest of the paper, we will use the following notations.Upper-and lower-case bold letters denote matrices and vectors, respectively; the superscripts * , T, H, −1, and † denote the conjugate, the transpose, the conjugate transpose, the inverse, and the Moore-Penrose inverse of a matrix, respectively; C, R, and Z are the fields of complex, real, and integer numbers, respectively; C n r and R n r (C n and R n ) denote the vector spaces of all n-column random (deterministic) vectors with complex and real coordinates, respectively; similarly, C n×m r and R n×m r (C n×m and R n×m ) denote the vector spaces of all the n × m random (deterministic) matrices with complex and real elements, respectively; 0 n , 0 n×m , and I n denote the n-column zero vector, the n × m zero, and n × n identity matrices, respectively; trace(A) denotes the trace of a square matrix A; rank(A) and R(A) denote the rank and the column space of any matrix A; A, B trace(AB H ) will denote the inner product in C n×m and A trace(AA H ) the induced (Frobenius) norm; A = diag[A 11 , A 22 , . . ., A nn ] is the block diagonal matrix wherein {A ii } n i=1 are diagonal matrices; the subscript c stands for continuous-time signal and E[•] denotes statistical averaging; and, finally, and i √ −1 denote (linear) convolution and imaginary unit, respectively.
We consider (see Figure 1) the baseband equivalent of a MC-CDMA uplink with N subcarriers.The information symbol b j (n) emitted by the jth user in the nth (n ∈ Z) symbol interval multiplies the frequency-domain spreading code c j [c (0) j , c (1) j , . . ., c (N−1) j ] T ∈ C N ; the resulting N-length sequence is subject to the inverse discrete Fourier transform (IDFT), producing thus the N, consisting of a replica of the last L cp symbols of u j (n), is inserted at the beginning of u j (n), obtaining thus the vector u j (n) = T cp W IDFT c j b j (n), where P L cp + N and T cp [I T cp , I N ] T ∈ R P×N , with I cp ∈ R Lcp×N obtained by drawing out the last L cp rows of the identity matrix I N .The block u j (n) is subject to parallel-to-serial (P/S) conversion, and the resulting sequence 1 {u (m) j (n)} P−1 m=0 feeds a digital-to-analog (D/A) converter with impulse response ψ c (t), operating at rate 1/T c = P/T s , where T s and 1 To avoid notational complications, we denote with u (m) j (n) the (m+1)th component of vector u j (n), for m = 0, 1, . . ., P − 1.
T c denote the symbol and the sampling period, respectively.The continuous-time signal at the D/A output is therefore given by where τ j = d j T c + β j , with d j ∈ {0, 1, . . ., P − 1} and β j ∈ [0, T c ), represents the transmission delay of the jth user.The signal (1) is transmitted over a multipath channel modeled as a linear time-invariant2 system with impulse response h c, j (t).Denoting with φ c (t) the impulse response of the receiving filter and assuming that ideal carrier-frequency recovery is carried out at the receiver, the (overall) received baseband signal in the uplink channel (i.e., mobile to base station) can be expressed as follows: where J is the number of users picked up by the base-station receiver, g c, j (t) ψ c (t) h c, j (t) φ c (t) is the impulse response (including transmitting filter, physical channel, and receiving filter) of the composite channel of the jth user, and v c (t) represents the additive noise at the output of the receiving filter.The following assumptions will be considered throughout the paper: (A1) the information symbols b j (n) are mutually independent zero-mean and independent identically distributed (i.i.d) sequences, with equal variance σ 2 b E[|b j (k)| 2 ]; (A2) the additive noise v c (t) is a zero-mean wide-sense stationary complex proper process, which is independent of the sequences b j (n), for j ∈ {1, 2, . . ., J}; (A3) the composite channel impulse response g c, j (t) of the jth user spans L j sampling periods, that is, g c, j (t) ≡ 0, for t ∈ [0, L j T c ), with L j within one symbol interval, 3that is, L j < P.
In the sequel, we assume that, without loss of generality, the desired user is the first one ( j = 1) and that, with reference to the uplink of a QS-MC-CDMA system [4,5,12], the first J in out of J users are within the cell of interest (referred to as in-cell users) and attempt to synchronize4 their transmissions by resorting to a local reference clock (obtained, e.g., with the help of a GPS device) or to a pilot signal transmitted by the base station, whereas the remaining J out J − J in users are outside the cell of interest (referred to as out-of-cell users).Moreover, according to [17,19], we reasonably assume that (A4) the CP length L cp satisfies the inequality L cp ≥ max j∈{1,2,...,Jin} [L j + d j + 1]; under this assumption, it can be shown that by exploiting the structure of the matrices G j (p), for p ∈ {0, 1, 2}, the IBI contribution for each in-cell user can be completely discarded by dropping the first L cp components of r(k).This operation can be accomplished in matrix form by defining the matrix R cp [O N×Lcp , I N ] ∈ R N×P and forming at the receiver the product r(k) R cp r(k) ∈ C N .According to (A4), it results that R cp G j (1) = O N×P , for j ∈ {1, 2, . . ., J in }, which, in its turn, implies that, after CP removal, the received signal is given by where G j (p) R cp G j (p)T cp ∈ C N×N , for p ∈ {0, 1, 2} and j ∈ {1, 2, . . ., J}, and v(k) R cp v(k) ∈ C N .Moreover, the signatures G j (0)W IDFT c j of the in-cell users can be parameterized as (see [12] for details) follows: where IDFT denoting the DFT unitary symmetric matrix and

j
] ∈ C N×N , and we have defined the full-column rank matrix ) , which accounts for the unknown delay d j , and the vector g j [g j (0), g j (1), . . ., g j (L j )] T ∈ C Lj +1 , which collects the unknown channel coefficients.Finally, by substituting ( 8) in (7), we obtain where the N-column vector represents the overall disturbance (MAI plus noise); the 3Jout) collect all the interfering symbols and signatures of the in-cell and out-of-cell users, respectively, with Some comments are now in order about model (9).First, observe that, since the out-of-cell users are QS with respect to a different base station, the CP removal does not assure the complete elimination of their IBI.Moreover, note that assumption (A4) requires only upper bounds (rather than the exact knowledge) on the channel orders and delays of the incell users.This is a reasonable assumption in the considered scenario since (i) in general, depending on the transmitted signal parameters (carrier frequency and bandwidth) and application (indoor or outdoor), the maximum channel multipath spread is known; (ii) for QS cellular systems, the delays of the in-cell users are confined to a small uncertainty interval, whose support can be typically predicted [19].

PERFORMANCE ANALYSIS OF THE BLIND TWO-STAGE RECEIVER
This section provides a detailed analysis of the two-stage detector (see Figure 2) recently proposed in [14].In particular, our analysis consists of two steps: firstly, we present an analysis of the SINR at the output of the first stage when the receiver's parameters are computed from K samples of the received vector r(k); secondly, we investigate the relationship between the potential for "interference capture" of the CMbased second stage and the SINR at the output of the first stage.To put the basis, we briefly review in Section 3.1 the two-stage approach of [14].

The blind two-stage receiver
In the framework of linear blind and delay-independent MUD, the problem of detecting the desired user symbol b 1 (k) consists of synthesizing, without requiring knowledge of the timings and channel impulse responses of all the active users (included the desired one), a linear filter The two-stage detector (see Figure 2) proposed in [14] for QS-MC-CDMA systems is based on factorizing the overall receiver weight vector as f = F u, where the weight vector u ∈ C Lcp in the second stage is determined according to the well-known CM criterion (see, e.g., [20]): with γ E[|b 1 (k)| 4 ]/σ 2 b being the second-order dispersion coefficient of the desired symbol sequence b 1 (k); whereas the output of the first stage x(k) is a linear transformation of r(k), that is, x(k) = F H r(k), which, accounting for (9), can be expressed by means of the concise vector model Moreover, by observing that under assumption (A4), for the in-cell users, Q j can be factorized as where the full-column rank matrix ) is unknown (it depends on L j and d j ), ( 12) can be rewritten as where Υ j C j Π ∈ C N×Lcp is a known matrix and g j Q j,2 g j ∈ C Lcp is the unknown signature of the jth in-cell user, for j ∈ {1, 2, . . ., J in }.
A careful choice of F ∈ C N×Lcp must assure MAI-plusnoise mitigation at the input of the second stage so as to avoid the interference capture phenomenon [21] typical of the CM criterion.Such a choice is pursued in [13,14] by solving the linearly constrained optimization problem where the linear matrix constraint is aimed at preserving the desired symbol b 1 (k) and does not require neither channel nor timing knowledge.The solution of ( 15) can be canonically decomposed [13] as where with ∈ C N×N being the statistical correlation matrices of r(k) and d(k), respectively.We will refer to the receiver based on ( 16)- (17) as the optimal two-stage receiver.Observe that while F (0) opt depends only on the desired code and, thus, it can be evaluated offline, F (a)  opt must be estimated from the received data by resorting to a consistent estimate of R rr .In this case, if one resorts to batch algorithms, the computational complexity of the first stage is basically dominated by the matrix inversion in (17), which is of order O[(N − L cp ) 3 ].On the other hand, reasoning as in [13], the matrix F (a)  opt can also be estimated by means of a simple and effective recursion, similar to the well-known RLS algorithm, with a complexity per symbol interval of order only O The disturbance suppression capability of the optimal first stage ( 16) can be analyzed by following the guidelines given in [13] under the assumption that the noise v(k) is white with variance It can be shown [13] that, in the high signal-to-noise ratio (SNR) region, that is, as σ 2 v /σ 2 b → 0, the filtering matrix F opt is able to achieve perfect disturbance cancellation if and only if with D J in + 3J out − 1 representing the total number of MAI signatures (see the signal model ( 9)-( 10)).
In this case, the first stage behaves as a blind zero-forcing detector.By using straightforward rank inequalities, it can be easily seen that the equality rank(B H 1 G) = rank(G) requires that N − L cp ≥ D, that is, the number of degrees of freedom N − L cp for disturbance suppression must be greater than or equal to D.

Ideal performance analysis
A different measure of MAI-plus-noise suppression capability achieved by the first stage, which can be more directly related to the interference capture phenomenon [21] of the second stage, is the SINR at the output of the first stage, which, for an arbitrary F ∈ C N×Lcp , is defined, on the basis of ( 14), as Since, from ( 14), one has E[ (18) can also be written as Therefore, maximizing SINR (I) (F ) with the constraint 2 ] with the same constraint; hence, the maximum value of the (constrained) SINR at the output of the first stage can be obtained by substituting (16) in (18), or in (19), and is given by where represents the residual disturbance power at the output of the first stage.
We now focus the attention on the interference capture of the CM-based filter employed in the second stage.To this end, we initially observe that, accounting for (14), the output of the second stage can be written as follows: (22) and, thus, for a given F ∈ C N×Lcp and an arbitrary u ∈ C Lcp , the SINR at the output of the second stage can be defined as follows: Since a closed-form expression for the solution of the minimization problem (11) is not available, the interference capture behavior of CM-based filters is typically studied by assuming that the gradient descent (GD) algorithm is employed to minimize the CM cost function.Along this line, Schniter and Johnson Jr. have derived in [21] a sufficient condition, expressed in terms of SINR, which assures that, in a noiseless multiuser scenario, the GD-based minimization of the CM cost function safely extracts the desired symbol.In the following, we recall this result (we refer to [21] for further details), particularizing it to our framework.
Theorem 1. Assume that, in addition to (A1), the sequences the GD minimization of the CM cost function, initialized with u 0 , will converge, in the absence of noise, to a solution extracting the desired symbol b 1 (k).
In practice, Theorem 1 represents a sufficient condition assuring that, in the high SNR region, the desired symbol is extracted, provided that conditions (C2) and (C3) are fulfilled.As pointed out in [21], the gain condition (C2) is not critical if the value of SINR (II) (u 0 ) is far enough from its critical value 1 + √ 2; in this case, extraction of the desired symbol is guaranteed also for a value of u H 0 F H R rr F u 0 lying in a bounded interval around (2γ)/(κ b + 2).Note that, for a given filtering matrix F , condition (C2) can be blindly satisfied by suitably scaling the initial weight vector u 0 ; for this reason, in the sequel, we will essentially concentrate on condition (C3).
The last step of our analysis is to relate condition (C3) to the SINR at the output of the first stage, that is, to express SINR (II) (u) as a function of SINR (I) (F ).To this aim, we restrict our attention to the subset of matrices F that satisfy the constraint F H Υ 1 = I Lcp ; in this case, one has The denominator of SINR (II) (u) in ( 24) cannot be explicitly expressed in terms of trace(F H R dd F ): in general, let λ max denote the maximum eigenvalue of F H R dd F , one is able to derive [18] only the following bound u H F H R dd F u ≤ λ max u2 , which, utilized in (24), leads to where ρ(u) (u H g 1 )/( u • g 1 ) represents the correlation coefficient between the weight vector u and the desired signature g 1 .Accounting for (25) and observing that λ max ≤ trace(F H R dd F ), the SINR at the output of the second stage can be related to the SINR at the output of the first stage as follows: which shows that, for an arbitrary u ∈ C Lcp , the minimum value of the SINR at the output of the second stage is proportional to the SINR at the output of the first stage.By using the lower-bound (26), condition (C3) can be translated into an equivalent condition over the SINR at the output of the first stage; indeed, condition (C3) is verified if It is worthwhile to note that, under condition (C1), the proposed first stage behaves as a blind zero-forcing detector in the high SNR region, that is, SINR (I) (F opt ) → ∞ and, thus, the sufficient condition ( 27) is certainly fulfilled by using the optimal two-stage receiver.

Performance analysis for finite sample size
The aim of this subsection is to investigate the SINR degradation when the first stage is synthesized by using the sample correlation matrix of r(k), estimated over K symbol intervals, that is, when the adaptive part (17) of the filtering matrix is evaluated as follows: with In this case, since the overall matrix is random, the expectations in (18) must be evaluated also with respect to F opt .To this end, we rewrite (18), with F = F opt , as follows: where the last equality accounts for the constraint F H opt Υ 1 = I Lcp .The starting point of the analysis is to find a simple expression for the adaptive matrix F (a)  opt , which is more suited to our purposes.By substituting ( 9) in (29), one has where represent sample estimates of the symbol variance σ 2 b , the cross-correlation matrix between the disturbance vector d(k) and the desired vector (at the output of the first stage) g 1 b 1 (k), and the correlation matrix of d(k), respectively.Obviously, for a finite K, the sample cross-correlation matrix R is nonzero even if the disturbance d(k) is statistically independent of the desired symbol b 1 (k) (see assumptions (A1) and (A2)).By substituting (31) in (28), we obtain, after tedious but straightforward matrix algebra, which evidences that the estimate of F (a)  opt is composed of two terms: the former represents a sample estimate of the optimal matrix F (a)  opt given by ( 17), while the latter is the perturbation resulting from the nonzero sample cross-correlation matrix R. To simplify the analysis, following [22], we resort in (32) to the approximation that is, we replace the sample correlation matrix R dd with the exact one R dd .As noted in [22] and confirmed by simulation results not reported here, this approximation is rather poor for very low values of the sample size, that is, for K ≈ N −L cp , whereas, for moderate to large values of the sample size, for example, K ≥ 3(N − L cp ), the effect on the SINR of replacing R dd with R dd is marginal since the matrix R is the principal cause of the SINR degradation.
In Appendix A, it is shown that, by invoking assumptions (A1) and (A2), it results that where SINR (I)  max is given by (20).Under the assumption that the noise v(k) is white with variance σ 2 v , it is interesting to note that as σ 2 v /σ 2 b → 0 and under condition (C1), it results that SINR (I)  max → ∞ and, thus, expression (34) becomes lim which shows that, due to the effect of the finite sample size K, the SINR saturates to a fixed value even when σ 2 v /σ 2 b → 0. In this case, by using (27), we observe that the second stage can safely extract the desired symbol b 1 (k) if the sample size K satisfies the inequality Relation (36) allows one to derive two interesting conclusions.Firstly, the minimum sample size K min required to avoid the interference capture in the second stage increases linearly with the number of degrees of freedom N − L cp for disturbance suppression which, in its turn, increases linearly with the total number D of the MAI signatures in order to fulfill condition (C1): this ultimately implies that K min increases linearly with D. Secondly, in the case of a finite sample size, the initial weight vector u 0 plays an important role in determining the overall performance of the two-stage receiver.In fact, if u 0 is mistakenly chosen so as to be nearly orthogonal to the unknown signature g 1 , that is, |ρ(u 0 )| ≈ 0, the extraction of the desired symbol requires an exceedingly large sample size.Therefore, in setting the initial vector u 0 , one has to find in principle an approximation that is close to ρg 1 , with ρ ∈ C, across all possible scenarios of interest.In practice, one can only resort to some reasonable ad hoc choices.In macrocellular system, typical multipath intensity profiles show [7] that most of the average power is concentrated within the first sampling interval: in this scenario, a reasonable approximation [23] of the channel vector g 1 is given by g 1 = [1, a, . . ., a] T , with a 1/L 1,max , where L 1,max represents a known upper bound of the desired channel order L 1 .In our case, accounting for the structure of the composite channel vector g 1 , we have chosen in Section 5 the following initialization for the second stage: where d 1,max denotes a known upper-bound of the desired transmission delay d 1 .This choice was verified by computer simulations to lead to acceptable values of |ρ(u 0 )|.

ROBUST VERSION OF THE BLIND TWO-STAGE RECEIVER
The analysis carried out in Section 3.3 shows that the SINR degradation at the output of the first stage due to the finite sample size is basically imputable to the effect of the sample cross-correlation matrix R between the disturbance vector d(k) and the desired vector g 1 b 1 (k); moreover, this degradation increases as the number of degrees of freedom N − L cp increases.A simple and effective way to reduce the SINR degradation is thus to suitably reduce the degrees of freedom, which is equivalent to adding constraints to the optimization problem (15).On the other hand, for a fixed disturbance suppression level, reducing the number of the degrees of freedom entails a reduction of the total number of MAI signatures that the two-stage receiver is able to handle.In this section, our goal is to add an appropriate constraint in the synthesis of the first stage in order to gain robustness against finite sample-size effects without significantly compromising its MAI suppression capability.

The robust receiver
We start from considering the sample power P out K −1 K−1 k=0 x H (k)x(k) at the output of the first stage which, accounting for ( 14) and (31), can be expressed as follows: Observe that, accounting for (10), matrix R can be explicitly written as where the sample crosscorrelation matrix between the symbol vector b in (k) of the interfering in-cell users and the desired vector b 1 (k)g 1 , whereas Ξ , represents the sample cross-correlation matrix between the residual disturbance (k) (out-of-cell MAI plus noise) and the desired vector.It is important to observe that, since the spreading codes of all the in-cell users are available at the base station, the matrix H in is partially known at the receiving side; in fact, taking into account parameterization ( 8) and ( 13), it results that5 where Q in [Υ 2 , Υ 3 , . . ., Υ Jin ] ∈ C N×(Jin−1)Lcp and G in diag[g 2 , g 3 , . . ., g Jin ] ∈ C (Jin−1)Lcp×(Jin−1) .By substituting (39) and (40) in (38) and imposing the linear constraints F H Υ 1 = I Lcp , one obtains This relation suggests a simple strategy to exploit the knowledge of the matrix Q in for partially mitigating the sample cross-correlation between the disturbance and the desired vector.To this end, observe that, by invoking the Cauchy-Schwarz inequality, one has (see [18]) from which it results that, by imposing that F satisfies the quadratic constraint F H Q in 2 ≤ 0 , with 0 being a nonnegative number, the squared modulus of the contribution to the output power P out due to the sample cross-correlation between the in-cell MAI and the desired vector is at most equal to 0 G in R in 2 .This means that the magnitude of the second term in (41) can be deterministically bounded by appropriately choosing the value of 0 .Based on this consideration, we propose to modify (15) and to choose the filtering matrix F so as to satisfy the following optimization problem with a linear equality constraint and a quadratic inequality constraint: Similarly to ( 16), the linear equality constraint F H Υ 1 = I Lcp gives to the solution of (43) the canonical structure where the matrix F (a)   rob ∈ C (N−Lcp)×Lcp r turns out to be the solution of the quadratically constrained optimization problem subject to [F (0) opt − B 1 F (a) ] H Q in 2 ≤ 0 , whose solution is given by (see Appendix B) where µ 0 ≥ 0 is the Lagrange multiplier, which is chosen so as to satisfy the equation It should be observed that, unlike linearly and quadratically constrained minimum power beamforming techniques [24], which are well-known reception strategies in the context of array processing, the amount of loading induced by the quadratic constraint in (43 I N , and depends on the spreading codes of the in-cell active users.When µ 0 = 0, matrix (46) degenerates into the adaptive matrix F (a)  opt given by (28): this corresponds to the case where 0 → ∞, that is, when the quadratic constraint is inactive.On the other hand, the value of the Lagrange multiplier µ 0 cannot be chosen arbitrarily large or, equivalently, the constraint value 0 cannot be chosen arbitrarily small.In fact, in order to assure that the constrained optimization problem (45) admits a solution, the constraint value 0 must satisfy the condition min B shows that, when the matrix Q in is full-row rank, 6 a reasonable choice for the constraint value is 0 ≥ trace[(F (0)  opt ) Unfortunately, the optimal value of µ 0 is related to 0 by means of the transcendental equation (47) and, thus, it can be evaluated only numerically [24,25].This can be accomplished by observing that (47) can be equivalently written as follows: where Assuming that (49) is not satisfied when µ 0 = 0, that is, g(0) > β 0 , the following iterative procedure can be used to determine the optimal value of the Lagrange multiplier µ 0 : starting with µ (0) 0 = 0, let µ (1)  0 = µ (0) 0 + ∆µ 0 , . . ., µ ( ) 0 = µ ( −1) 0 + ∆µ 0 , where ∆µ 0 is a small positive number.At the th step, compute g(µ ( ) 0 ) and compare it with the threshold β 0 : if f (µ ( ) 0 ) ≤ β 0 , then choose µ ( ) 0 as the optimal value of the Lagrange multiplier µ 0 ; else, perform the ( + 1)th iteration and repeat the procedure.
A final remark is now in order about the computational load of the robust version of the first stage.For a given value of the Lagrange multiplier µ 0 , the synthesis of the robust filtering matrix F (a)  rob in (46) involves essentially the same computational complexity required to estimate in batch mode the optimal matrix F (a)  opt in (17); furthermore, reasoning as in [13], one can estimate F (a)  rob adaptively by means of RLS-based algorithms, with computational requirements per symbol interval of order of O[(N − L cp ) 2 ].When the abovementioned iterative procedure is used to determine the optimal value of µ 0 , such a quadratic complexity must be multiplied by the number of iterations involved.

Performance analysis for finite sample size
In this subsection we provide a first-order analysis of the SINR at the output of the first stage synthesized by using the robust filtering matrix (46): this analysis is aimed at showing the SINR enhancement provided by using the quadratic constraint in (43) as well as the impact of this constraint on the number of degrees of freedom for disturbance suppression.
Accounting for (18) and reasoning as in Section 3.3 and in Appendix A (see, in particular, (30) and (A.1)), the SINR SINR (I) ( F rob ) SINR (I) ( F opt ) ≈ SINR (I)  max and, thus, adopting the quadratic in ( 43) is practically useless: this typically happens in the low SNR region and/or when condition (C1) is near to be violated.
Our analysis is conservative: indeed, it applies only to very small values of Ω(µ 0 ) (compared to N − L cp ).However, it should be noted that even a small decrease of N − L cp in the denominator of (34) can lead to a nonnegligible increase of SINR (I) ( F rob ) with respect to SINR (I) ( F opt ).In fact, we consider a small perturbation 0 < Ω(µ 0 ) N − L cp of the degrees of freedom N − L cp .Accounting for (34) and ( 52), it turns out that which shows that the relative SINR variation is greater than Ω(µ 0 ) by a factor SINR (I) ( F opt )/K, which can be valuable for low values of the sample size K and/or for high values of SINR (I) ( F opt ).For example, referring to the scenario considered in Example 1 (see Section 5), it turns out that, for SNR = 25 dB and K = 250 symbols, SINR (I) ( F opt ) = 10.0952(expressed in natural unit) and Ω(µ 0 ) = 3.6736, which, accounting for (55), lead to a relative SINR variation of about 15%.According to Theorem 1 and accounting for the discussion reported in Section 3.3, this SINR enhanceis expected to improve the performance of the CM algorithm in the second stage by lowering, with respect to (36), the minimum sample size K min required to avoid the interference capture.

SIMULATION RESULTS
To confirm the results of the analysis previously carried out and to give more insight into the achievable performance of the two-stage receiver proposed in [14] (referred to as TS in the plots) as well as that of its robust implementation (referred to as robust TS in the plots), we present in this section the results of Monte Carlo computer simulations and compare them with the analytical results.
In all the experiments, the following common simulation setting is adopted.The QS-MC-CDMA network employs N = 32 subcarriers, with a CP of length L cp = 8, and QPSK symbol modulation, which implies that the dispersion coefficient to be used in the CM cost function ( 11) is γ = 1; the frequency-domain spreading codes are length-32 Walsh-Hadamard sequences.The multipath channel of the jth user is g c, j (t) = 4 m=1 β m, j ϕ c (t − τ m, j ), where ϕ c (t) is a Nyquistshaped pulse with 35% roll-off, the first path (m = 1) is assumed to be deterministic with amplitude β 1, j = 1 and propagation time τ 1, j = 0, the remaining path gains β m, j , for m = 2, 3, 4, are modeled as mutually independent complex circular Gaussian zero-mean random variables, with standard deviation 0.3, whereas the corresponding propagation times τ m, j are modeled as mutually independent random variables, uniformly distributed over L j + 1 = 5 sampling periods, for j = 1, 2, . . ., J. The (integer) transmission delays d j are modeled as discrete random variables, assuming equiprobable values in {0, 1, 2}, for j = 1, 2, . . ., J in , and in {0, 1, . . ., P−1}, for j = J in +1, J in +2, . . ., J. The additive noise samples v ( ) (k) in (3) are modeled as mutually independent complex circular zero-mean white Gaussian processes, with variance σ 2 v , and the SNR of the desired user at the detector input is defined, according to (7), as follows: We considered a severe near-far scenario: in all the experiments, the path gains of each user channel are adjusted so that each interfering in-cell user is 10 dB stronger than the user of interest ( j = 1), whereas each out-of-cell user is received with the same power of the desired user (worst case).Unless otherwise specified, the number of the out-of-cell users is fixed to J out = 4.All the results are obtained by carrying out 100 independent trials, with each run using a different set of noise samples and, for each user, a different set of transmission delays, channel parameters (path gains and propagation delays), and data sequences.
Example 1 (SINR performance of the first stage).In this example, we resort to Monte Carlo simulations to evaluate the SINR performance of the first stage of the robust TS and compare it with that of the first stage of the TS receiver; moreover, the obtained results are compared with the analytical formulas (34) and (52).The number of active users is fixed to J = 16 and, after estimating the adaptive matrices F (a)  opt and F (a)  rob on the basis of the given data record of length K, the output SINR is evaluated using (18).As to the robust receiver, in order to validate the first-order analysis of Section 4.2, the Lagrange multiplier µ 0 was chosen so as to satisfy the relation µ 3 reports the values of SINR as a function of SNR ranging from 0 to 30 dB, with a sample size K = 250 symbols.In this case, the order of the magnitude of µ 0 varies from 10 −4 (low values of SNR) to 10 −6 (high values of SNR).It can be seen that, even though vanishingly small values of µ 0 are employed, the robust TS assures a valuable enhancement of the SINR at the output of the first stage with respect to its TS counterpart; in particular, both the first stages exhibit practically the same performance for low values of SNR, whereas the SINR increase provided by the incorporation of the quadratic constraint becomes more evident for moderate to high values of SNR.Observe that, taking into account the small value used for the sample size K, the absolute and relative behaviors of the two first stages are well predicted by the analytical results.To further corroborate the analysis, we evaluated the performance of the two considered first stages as a function of the sample size K (in symbols) ranging from 100 to 500, in the high SNR region, that is, for SNR = 30 dB; in this region, the order of magnitude of the Lagrange multiplier is 10 −6 .Results of Figure 4 evidence a good agreement between experimental and analytical results and, in particular, show that the first stage of the robust TS appreciably outperforms its TS counterpart for all the considered values of the sample size.In this experiment, according to (36), we also evaluated the (average) minimum sample size K min required to avoid the interference capture in the second stage when the CM is initialized with the vector u 0 given by (37), with d = 2 and L 1,max + 1 = 6.Results show that, for the TS receiver, the interference capture in the second stage is surely avoided if a minimum sample size of 219 symbols is used, whereas for the robust TS receiver, K min turns out to be equal to 187 symbols.
Example 2 (SER performance of the overall receiver).In this example, we present the Monte Carlo performance analysis of the overall TS receivers, together with a comparison with both nonblind (i.e., the exact knowledge of the channel impulse response and transmission delay of the desired user is assumed) and blind versions of the subspace-based MMSE detector recently proposed in [12] (referred to as MMSE and blind MMSE in the plots, respectively).As (overall) performance measure, we resorted to the symbol error rate (SER) at the output of the considered receivers.After estimating the receiver weights (i.e., the correlation matrix R rr ) in batch mode on the basis of the given data record of length K, an independent record of K ser = 10 5 symbols is considered to evaluate the SER at the output of the considered receivers.For the blind receivers, the equalized symbols are first rotated and scaled before evaluating the SER.The Lagrange multiplier µ 0 was chosen according to the algorithm described in Section 4.1, with ∆µ 0 = 10 −6 , whereas the estimate of the optimal weight vector u opt in (11) is obtained by resorting to the GD method, initialized by using a properly scaled (in accordance with condition (C2)) version of the vector u 0 given by (37), with d 1,max = 2 and L 1,max + 1 = 6, where the com-   plex gradient vector [26] (with respect to u * ) of the CM cost function E[(γ − |u H x(k)| 2 ) 2 ] is estimated from the received data in batch mode (see [27] for details).
In the first part of this example, the SER of the robust TS detector is firstly evaluated as a function of the quadratic constraint value 0 = δ trace[(F (0)  opt ) with δ ranging from 2 to 22. Figure 5 reports the SER of the robust TS receiver for different values of SNR, where the number of active users is J = 16 and the sample size is fixed to K = 250 symbols.It is apparent that, for low values of SNR, the best performance is achieved for δ opt = 4, whereas, for moderate values of SNR, the optimal choice of δ turns out to be δ opt = 6; moreover, observe that, except for δ = 2, the SER gracefully degrades as δ deviates from its optimal value for all the considered values of SNR.Similar considerations apply to Figure 6, where the SER of the robust TS detector is depicted for different values of the sample size K (in symbols) for a number of users J = 16 and SNR = 20 dB.It is shown here that, in the considered scenario, the optimal value of δ is practically independent of the sample size.Finally, in Figure 7, we reported the SER of the robust TS receiver for different values of the number J out of the out-ofcell users; in this experiment, the number J in of in-cell users is fixed to J in = 12, the sample size and the SNR are set to K = 250 symbols, and SNR = 20 dB, respectively.Results show that, for a fixed number of in-cell users, the SER is not considerably affected by increasing or decreasing the number of out-cell-users, provided that the total number of MAI signatures is obviously less than or equal to the number of degrees of freedom for disturbance suppression.
The second part of this example is devoted to the comparison between the TS receivers and both nonblind and blind versions of the subspace-based MMSE detector proposed in [12].In the first experiment, we evaluated the SER of the considered receivers as a function of SNR ranging from 5 to 30 dB.The number of active users is J = 16 and the sample size is fixed to K = 250 symbols.The quadratic constraint value is chosen equal to 0 = 12 trace[(F (0)  opt ) 8, it can be observed that, for high values of SNR (i.e., SNR ≥ 25 dB), the robust TS receiver exhibits performances that are better than or equal to those of the MMSE receivers, assuring an SER significantly inferior to 10 −3 for SNR = 20 dB, whereas the performance of the TS receiver is quite unsatisfactory, showing an SER floor of about 3 × 10 −3 for high values of SNR.It should be observed that, although the blind MMSE receiver outperforms the robust TS for values of SNR ≤ 20 dB, its implementation is much more computationally expensive (two eigendecompositions are involved) and, in the considered scenario, requires also the additional knowledge of the number J out of    the out-of-cell users.The second experiment investigates the convergence behavior of the detectors under comparison.We have considered the same simulation setting described in the previous experiment (with J = 16 active users and  K ≥ 200, it requires 100 symbols more than its robust counterpart.Finally, we have reported in Figure 10 the values of SER as a function of the number J of active users ranging from 14 to 20, where the SNR is set to 20 dB and the sample size is fixed to K = 350 symbols.The quadratic constraint value is chosen 7 , with δ being equal to the number of in-cell users, except for J = 19 and J = 20, where δ = 16 and δ = 23, respectively.Results of Figure 10 confirm the above observations, showing that, in comparison with the TS detector, the robust TS receiver assures a substantial performance gain for small to moderate values of the number of users, that is, for J ≤ 20.Finally, observe that, as the number of users exceeds 16, the robust TS receiver performs comparably to or better than the blind MMSE detector, exhibiting performances that are close to those of the nonblind MMSE receiver.

CONCLUSIONS
In this paper, we have theoretically analyzed the performance of the two-stage receiver recently proposed in [14] for the QS uplink of a MC-CDMA system, when the receiver's parameters are estimated by using a finite sample size.Results of this analysis have suggested the formulation of a robust version of the two-stage receiver, which is based on the introduction of a suitable quadratic constraint in the synthesis of the first stage.This constraint is constructed by exploiting in  the uplink the knowledge of the spreading codes of the in-cell users.The theoretical analysis has evidenced that the incorporation of the quadratic constraint has the effect of slightly reducing the degrees of freedom for disturbance suppression of the first stage, thus gaining robustness against errors in the estimated statistics of the received data.Moreover, results of computer simulations have shown that, even when small sample sizes are considered, the proposed receiver performs comparably to the nonblind MMSE receiver, outperforming the two-stage detector proposed in [14] in moderately loaded cells with strong out-of-cell MAI.Finally, our current research is aimed at investigating the feasibility of implementing the first stage of the robust two-stage receiver with recursive least squares updating, where the optimal value of the Lagrange multiplier µ 0 is adaptively adjusted at each step.

APPENDICES A. DERIVATION OF SINR FOR THE OPTIMAL TWO-STAGE RECEIVER
To evaluate the expectation in the denominator of (30), we resort to the conditional expectation rule by writing ; moreover, we observe that F opt , being estimated from {r(k)} K−1 k=0 , turns out to be statistically independent from d(k), provided that k ≥ K + 2 (see the signal model (7)).Thus, one obtains By substituting (33) in the denominator of (A.1) and invoking assumptions (A1) and (A2), we obtain, after rearrangement, From (A.3), accounting again for (A1) and (A2), one has Finally, by substituting (A.4) in (A.2) and the result in (A.1), we finally get (34).

B. SOLUTION OF THE QUADRATICALLY CONSTRAINED MINIMIZATION PROBLEM
The problem consists of minimizing the real-valued scalar function f F (a)   trace of the matrix F (a) ∈ C (N−Lcp)×Lcp , subject to the constraint g(F (a) ) ≤ 0 , where vec(B H 1 R rr F (0) opt ) ∈ C (N−Lcp)Lcp , subject to the constraint g(f (a) ) ≤ β 0 , where g f (a)  f (a) H Af (a) − f (a) H q − q H f (a) , (B.4)In the following, we assume that the constraint value β 0 is set so as to satisfy condition (B.5).For example, under the assumption that the matrix Q in is full-row rank, the matrix A turns out to be positive definite and, thus, the function g(f (a) ) is strictly convex; in this case, it is easily seen that g(f (a) ) assumes its minimum value for f (a) = A −1 q, which implies that min f (a) g(f (a) ) = −q H A −1 q < 0 and, therefore, an acceptable choice for the constraint value is β 0 ≥ 0 or, equivalently, . In order to solve the optimization problem (B.3)-(B.4),we resort to the method of Lagrange multipliers [25].The Lagrangian for the problem at hand is defined as follows: where ∇ (f (a) ) * (•) represents the complex gradient operator [26] with respect to (f (a) ) * , and either µ 0 = 0 or the inequality constraint is satisfied with equality [25].Since our aim is to estimate the detector's parameters from the received data by using small to moderate values of the sample size, we reasonably assume that the optimal solution f (a) opt = vec( F (a) opt ), corresponding to µ 0 = 0, does not allow the inequality constraint to be satisfied, that is, g( f (a)  opt ) > β 0 ; in this case, a solution f (a)  rob of the problem (B.3)-(B.4)necessarily occurs on the boundary of the constraint region, that is, g(f (a)  rob ) = β 0 .Since, in general, the Hermitian matrix A is positive semidefinite 8 and µ 0 ≥ 0, the matrix R rr + µ 0 A is positive definite; In order to simplify the analysis, our aim is to obtain a firstorder approximation of the F (a)  rob and, thus, we restrict our attention to the case where In this case, the matrix (I N−Lcp + µ 0 ΦB 1 ) −1 is well approximated by the first two terms of expansion (C.3), that is, by neglecting the summands of order o(µ 0 ΦB 1 ), one has By substituting (C.5) in (C.2) and neglecting the summand of order o(µ 0 ΦB 1 ), one obtains, after some manipulations, the following first-order approximation:

Figure 3 :
Figure 3: SINR at the output of the first stage versus SNR (first example, K = 250).

Figure 4 :
Figure 4: SINR at the output of the first stage versus sample size K (first example, SNR = 30 dB).

(B. 2 )
By using the properties of the Kronecker product[28], the optimization problem (B.1)-(B.2) is equivalent to the minimization of the real-valued scalar functionf f (a) f (a) H R rr f (a) − f (a) H p rr − p H rr f (a) (B.3) of the vector f (a) vec[F (a) ] ∈ C (N−Lcp)Lcp , with R rr I Lcp ⊗ (B H 1 R rr B 1 ) ∈ C (N−Lcp)Lcp×(N−Lcp)Lcp and p rr
E Frob trace F H rob R dd F rob = trace F H opt R dd F opt +