A Uniﬁed Approach to List-Based Multiuser Detection in Overloaded Receivers

A wireless communication system is overloaded when the number of transmitted signals exceeds the number of receive antennas. The presence of the resulting cochannel interference (CCI) under overload causes linear detection techniques to perform poorly. We develop a uniﬁed approach to the separation and detection of the user signals for an overloaded system using a novel iterative list-based multiuser detector. It combines a linear preprocessor with a nonlinear list detector and approximates optimum joint maximum-likelihood detection at lower complexity. Complexity savings are achieved by ﬁrst, exploiting the spatial separation of the users to mitigate CCI in the preprocessor stage and second, by estimating residual CCI in the following list detection stage. The proposed list detection algorithm is applied to receivers with either a uniform circular array or a uniform linear array. The preprocessor is implemented using either a special purpose spatial ﬁlter to mitigate the CCI or maximum ratio diversity combining to achieve diversity gain. Simulation results and a complexity analysis indicate that the approach is suitable for practical application.


INTRODUCTION
The use of multiple receive antennas allows significant increases in capacity and reliability of wireless data transfer by exploiting spatial diversity [1][2][3][4].Space-time processing for the detection of the signals from multiple users is now receiving considerable attention.Wireless systems where the number of signals to be resolved exceeds the number of receive antennas are referred to as overloaded systems [5].Severe cochannel interference (CCI) occurs in such systems.Under overload, the receive antenna array's number of degrees of freedom is exceeded.This causes linear detection techniques to perform poorly [2,6].Multiuser detection (MUD) of the user signals is then difficult.
Comprehensive fundamental work on MUD is available in [7].Here, we restrict ourselves to reviewing literature specifically focused on MUD in the overloaded case.Signal separation and detection in overloaded environments has been shown to be possible by exploiting the response differences among the user's received cochannel signals [4].In [8,9], maximum likelihood approaches to blind MUD in nonoverloaded receivers with antenna arrays were studied.This work was extended to the overloaded case in [5,10], which showed that under overload, linear detection algorithms suffer severe degradation and that joint maximum likelihood (JML) detection is optimum.JML requires an exhaustive search over all possible symbol combinations.Due to the search complexity, JML is not feasible for most applications.Therefore, reduced complexity algorithms that achieve near JML performance are of significant interest.This is particularly important under overloaded conditions.
Several reduced complexity algorithms have been developed.In [6,[10][11][12][13][14], a high-altitude receiver with symbolsynchronous signals impinging on a circular antenna array is considered.This is often referred to as the "base station in the sky" model.For this model it has been shown that a preprocessor at the receiver can improve performance of reduced complexity detection [5,15].The work of [6,[11][12][13][14] employs a spatial filter as a preprocessor to mitigate CCI.It achieves no diversity gain since it employs beam forming.
The detectors in [11][12][13] use either successive or parallel interference cancellation following preprocessing.Compared to JML, complexity is low but the performance is poor if the user signals have similar energies.In contrast, spatially reduced search joint detection (SRSJD) [6], when used with a circular array, achieves near JML performance.It employs a beam former as a preprocessor and reduces complexity by searching a reduced-state search trellis, constructed over the subset of signals with "dominant" energy in each beam.(The term "dominant" refers to a user signal that has significantly more energy than other signals.)The search relies on delayed-decision feedback sequence estimation (DDFSE) [16] and is efficiently done using the Viterbi algorithm [17].SRSJD requires the user's overall channel matrix as seen at the receiver to have a "trellis-oriented" form which is achieved by only a few array geometries such as circular arrays.(A matrix is said to be "trellis-oriented" if it has a diagonal banded structure.) Recently, we have developed two iterative list-based parallel detection algorithms for use under overloaded conditions.These employ list feedback of the best estimates [14,18].One, known as parallel symbol detection with reduced complexity interference estimation (PSD-RCIE) [14], uses the linear beam former of [6] as its preprocessor.The second, known as parallel symbol detection with parallel interference cancellation (PSD-PIC) [18], uses maximum ratio combining (MRC) in the preprocessing stage.A linear spatial beam former employed by a receiver with an Melement array can at most cancel M − 1 interfering signals [19] and provides no diversity gain.On the other hand, MRC maximizes the instantaneous signal-to-noise ratio (SNR) at the combiner output [20] but fails to eliminate CCI under overload.The residual CCI level increases in both cases with the receiver overload factor.
In the detection stage, PSD-RCIE explicitly estimates the residual CCI based on a trellis representation and is hence restricted to trellis-oriented array geometries.PSD-PIC does not have this limitation.Following MRC, it performs iterative parallel interference cancellation (PIC) coupled with joint list-based detection of the user symbols.Both algorithms use estimates of the residual CCI to cancel interference.In both instances, a list of the most likely symbols in each interval is obtained by searching over the signal symbols with "dominant" energy.This is done for each received signal and creates a list for each.These per signal lists are combined into a global list which is fed back to obtain improved symbol estimates.After several iterations, the global list is output by the detector.The iterative approach has the advantage that, even with inaccurate estimates of the residual CCI, symbol detection is possible.
In this paper, we develop a unified list-based, iterative approach to MUD in overloaded receivers that includes the PSD-RCIE and PSD-PIC approaches we proposed in [14,18] as special cases.The algorithm is here applied to receivers with either a uniform circular array (UCA) or a uniform linear array (ULA) but can easily be extended to an arbitrary geometry.Both a linear spatial prefilter and an MRCbased diversity combiner are considered as preprocessors.Performance is evaluated using Monte Carlo simulation.The results show that our MUD approach outperforms existing reduced complexity algorithms and approximates JML at lower complexity, especially under heavy overload.
In Section 2, the system model and the receiver structure are introduced.Spatial filtering and diversity combining are discussed in Section 3. Symbol detection is described in Section 4 and performance is evaluated in Section 5. Complexity is analyzed in Section 6. Conclusions are drawn in Section 7.

SYSTEM MODEL AND RECEIVER STRUCTURE
Consider a single-input multiple-output (SIMO) communication system with an M-element arbitrary receive array and D single-antenna users.The receiver load factor is f = D/M, where f > 1 under overload.The D users are assumed to transmit QAM signals which are incident on all receive antennas.For simplicity, we consider symbol synchronous signals with no intersymbol interference present in the channel.(The extension to the symbol nonsynchronous case is straightforward.)Figure 1 shows a model of the proposed receiver.At each antenna, the received signal is passed through a filter matched to the transmitted pulse shape and then sampled at symbol rate to give the M × 1 received signal vector where T is the D × 1 symbol vector containing the user symbols, s d .Each user symbol s d is independent and uniformly drawn from an alphabet A. The vector s is multiplied by the M × D composite array response matrix being the M × 1 array steering vector for the dth user.(In a more complex channel, the matrix A also includes the channel response.)We assume that A is computed by a channel estimator which estimates the direction of arrival for each of the D signals.
The quantity z is an M × 1 temporally uncorrelated noise vector with zero mean and autocorrelation where E[•] denotes expectation.For spatially uncorrelated noise, Φ zz = σ 2 z I, where σ 2 z denotes the noise variance and I is the M × M identity matrix.Throughout this paper, any time dependance in equations is dropped for convenience.

Uniform circular array
The UCA has isotropic antenna elements equispaced on a circle with radius R as shown in Figure 2. Following [21], the array steering vector for each of the D signals is denoted where θ d is the estimated azimuthal angle of arrival (AOA), d is the elevation (or depression) angle, λ is the wavelength at the carrier frequency, φ m = 2π(m − 1)/M is the angle of the mth element in azimuth [22].
For simplicity, only azimuth is considered ( d = 90 • ).However, the results can easily be extended to three dimensions.

Uniform linear array
In the ULA configuration, isotropic antenna elements are located in a straight line with equal spacing between the elements, B, as in Figure 2 [23].The array steering vector for each signal is again denoted with components given by [21]

PREPROCESSOR
The estimated array response matrix A and the received signal vector x, following matched filtering, are input to a preprocessor as shown in Figure 1.It exploits the spatial separation of the users to mitigate CCI effects so as to enable complexity reduction in the subsequent MUD stage.We will consider two approaches, but we first find an alternate form of the JML criterion that lends itself to suboptimal approximation.
If no intersymbol interference is present, JML leads to the symbol by symbol detector given by [10] where (•) H denotes Hermitian transpose.The minimization requires a search over all |A| D possible transmit symbol combinations.The resulting complexity mandates approximation.
The key to approximating ( 4) is to find a transform that maps the M × 1 received vector x into the D × 1 vector y = [y [1], y [2] (5) We call y the transformed receive vector and H the user channel matrix.There are two interpretations possible for the transform of (5), either spatial filtering or diversity combining.(Note that both are essentially projection operations.)In each case, the solution is a D × M complex weight matrix W such that y = Wx. (6)

Spatial filtering
A spatial filter exploits the fact that user signals incident on the antenna array with greater spread in AOA interfere with each other less than signals that are closely spaced in AOA.CCI from users reasonably widely spaced in AOA can thus be effectively reduced.This is essentially a beam forming operation.
The matrix W can be derived from the JML criterion of (4) by choosing y and H such that [6] This satisfies the mapping of ( 5) and yields the JML detector in the form From (7), we find , where (•) † denotes the pseudoinverse.The matrix W is a trellis-oriented multipleinput multiple-output (MIMO) beam former since each row places a beam in the direction of only one transmitted signal [6].It increases the number of observation samples and acts as a noise whitening interference rejection filter.The elements of y denote the received signal in each of the D beams and each row of H shows the energy contribution to the received signal in the dth beam.
Figure 3(a) shows the form of H for a receiver employing a spatial filter as a preprocessor.The receiver has an M = 5-element UCA front end with radius R = 0.2λ.Data is received from D = 6 equal energy users uniformly spaced in AOA.We see that most of the energy is concentrated on or near the main diagonal of H, resulting in a banded structure, where in each row only a few elements contain most of the energy.

Diversity combining
In contrast to (7), if we consider (5) from the viewpoint of diversity combining, we seek to combine the multiple replicas of the received information-bearing signal in an advantageous way.MRC is the classical and optimal [24] diversity combining technique.The combiner output is a weighted linear combination of the signal replicas.For MRC with perfect channel estimation, the optimum weight matrix in ( 6) is W = A H [24].
MRC tries to map the receive vector x into y such that each user has maximum SNR in one of the components of y.Defining the channel matrix H such that allows us to write the JML detector as in (8) with the difference being the definitions of W and H in the two cases.
The row elements of H denote the energy contribution from the D users to the received signal in which the SNR of the corresponding user is maximized.
In Figure 3(c), the form of H is illustrated for a receiver using MRC as a preprocessor.The antenna array is an M = 5element ULA.Again D = 6 users transmit equal energy signals.The users are uniformly spaced within the array's view angle defined as θ max = ±60 • .Hence the user's azimuth AOAs are θ d = {±60 • , ±36 • , ±12 • } with d = 1, 2, . . ., 6.The antenna elements are spaced at distance B = 3λ apart.In contrast to Figure 3(a), the energy is not uniformly concentrated along the main diagonal of H as there are elements with "high" energy further away from the main diagonal.(At this stage, the term "high" refers to an intuitive definition of matrix elements with significant energy.The mathematical definition is given later.)Thus H does not have a banded structure and is not trellis-oriented.

Spatial filtering versus diversity combining
The beam forming spatial filter works best if relatively closely spaced antenna elements are available to form beam patterns.To ensure sufficient correlation, the element spacing should be within half a wavelength at the carrier frequency.This follows from the Nyquist sampling theorem [25].We note that a linear spatial filter cannot cancel more than D = M − 1 interfering cochannel users (see, e.g., [19]).In overloaded receivers, the advantage of beam forming tends to be lost as there will still be significant CCI.
In contrast, diversity combining requires little or no cross-correlation between the antenna elements.If a signal at one element goes through a deep fade, it is then unlikely that the other elements encounter a deep fade for the same signal at the same time.Hence combining the signals from different elements can improve receive performance as there is nearly always good reception at one of them.Antenna spacing is usually on the order of several carrier frequency wavelengths and does not satisfy the Nyquist sampling theorem.As a result, spatial aliasing and grating lobes occur [26] when the array properties are considered.This is offset by the diversity gain attained.We will see that our unified MUD algorithm works well with both types of preprocessors.

Sparsity pattern
The two examples of the channel matrix H in Figures 3(a) and 3(c) show that only a few elements in each row contain most of the signal energy.Therefore, we can derive a sparsity matrix, P, that contains unity entries for elements with "high" energy and zeros for elements with "low" energy [6].(We describe the selection of matrix elements with "high" and "low" energy later.Here it is only an intuitive definition.)The sparsity matrix is a T , where each element p du corresponds to the element h du in H for d, u = 1, . . ., D. Its use allows reduced complexity approximations to the JML detector of (4).The sparsity matrices for Figures 3(a are the column user indices of elements with "high" and "low" energy, respectively.Hence the corresponding sparsity pattern in Figure 3 The quality of the sparsity matrix found depends on the criterion used to choose its elements.A so-called desired energy to interference ratio (DEIR) criterion was used in [6,14].In [18], the strongest energy to interference ratio (SEIR) was used.(Note that the SEIR [18] and DEIR [6]  Here we present a different approach to the construction of P that appears robust over a wider range of cochannel users than the DEIR and SEIR criteria.It is based on two empirically chosen thresholds T 1 and T 2 and determines the complexity-performance tradeoff of subsequent MUD.Because this approach considers energy separation of the preprocessed user signals, it is limited to scenarios where sufficient separation can be achieved, meaning that it tends to perform poorly if, after preprocessing, the user signals have too similar energies.As a result, either too few or too many signals with high energy would be selected.This can occur under extreme overload when using a linear preprocessor.The optimum choice of T 1 and T 2 is an open research topic.
In general, the choice depends on the desired complexity/ performance tradeoff, the receive antenna geometry, the type of preprocessor, the number of receive antennas M, and the number of cochannel users D.
We first compute the signal energy to average interference ratio (SEAIR) and use the empirical threshold T 1 to ensure sufficient separation between high-energy and low-energy signals.The SEAIR is defined as where the numerator represents a high-energy signal and the denominator is the average interference energy with where max (ξ) as the sets of high-and low-energy user symbol vectors, respectively.The low-energy sets, ω [d], are referred to as interfering symbol sets, since they correspond to residual CCI which degrades the detection of the high-energy symbol sets, τ [d].

SYMBOL DETECTION
We now describe the proposed list-based MUD algorithm, the so-called parallel detection with interference estimation (PD-IE) algorithm.As shown in Figure 1, it operates on the preprocessor output and takes the transformed receive vector y, the channel matrix H, and the estimated sparsity matrix P as inputs.A structural block diagram is shown in Figure 4.It uses Q iterations to compute an ordered global list of symbol vectors S = { s (1) , s (2) , . . ., s (L) }, where s (l) is the lth D × 1 symbol vector in the list.(The list S is ordered from most to least likely.) The rows of the inputs y (D×1) , H (D×D) , and P (D×D) are first reordered to produce y (D×1) , H (D×D) , and P (D×D) as indicated by the row ordering block in Figure 4. (Reordering the input quantities improves performance in subsequent detection stages.)This ordering is in terms of the SEIR [18] criterion, which is defined as The numerator denotes the signal power of the strongest user in the dth row h[d] ∈ H, and the denominator is the overall power of the signals outside the enumeration set U e [d].The reordering is in order of decreasing SEIR.In Figures 3(c) and 3(d), the rows {1, 2, 3, 4, 5, 6} of y, H, and P become rows {3, 5, 1, 2, 6, 4} of y , H , and P , respectively.

Symbol estimation
The key to successful detection in overloaded receivers is to estimate and cancel residual CCI.We use D parallel detection branches as shown in Figure 4.Each branch corresponds to one user and performs CCI cancellation and symbol estimation.Figure 5 shows two implementations.In Figure 5(a), residual CCI is estimated explicitly using the trellis implementation as we proposed in [14].In contrast, Figure 5(b) illustrates joint detection as we described in [18].(We use the term "joint detection" because the user symbols and the residual CCI are jointly estimated using PIC techniques.)Both implementations include identical high-energy symbol estimators and take y , H , P , and the tentative global list S as inputs.In addition, y, H, and P are needed for estimation of the residual CCI in Figure 5(a).
Each of the D symbol estimators outputs a branch list where ω (k) [d] and τ (k)  We search over all high-energy symbol sets τ[d] and compute the Euclidean error metric where y [d] is the dth component of y and y (i, j) [d] is the (i, j)th "candidate component" used as an approximation of y [d].Values for y (i, j) [d] are computed as the sum of an "enumeration component" y (i) e [d] and an "interference component" y (i)  if [d] as h du s u , where h du is an element of h [d] ∈ H .The values s u for y (i) e [d] are drawn from the jth high-energy symbol set τ ( j) [d] with j = 1, 2, . . ., |A| |τ[d]| .The values s (i)  u in the interference component y (i)  if [d] are estimates of the residual CCI, drawn from the ith list element ω We then find the vectors s [d] by choosing symbol values from the (i, j) symbol combination with the kth smallest error metric, where min (k) denotes the kth smallest value.
To illustrate estimation of the residual CCI, we consider two examples, one for explicit CCI estimation and the other using joint detection.

Symbol estimation with explicit CCI estimation
Consider a UCA with a banded sparsity matrix P as illustrated in Figure 3(b).The dth CCI estimator in Figure 5(a) has the inputs y, H, P, p [d] ∈ P , and the global tentative symbol list S. It uses the iterative tail-biting delayed decision feedback sequence estimation (ITB-DDFSE) algorithm of [6] to compute estimates of the residual CCI.It constructs a spatial trellis from P and employs the Viterbi algorithm to find the minimum cost path through it.
In order to minimize computational complexity, we first create the list S in [d] from S in each receiver branch using the sparsity pattern p , where K d is the list size with 1 ≤ K d ≤ L. Its elements contain the nonredundant highenergy symbol sets together with the best initial estimates of the residual CCI.Hence the kth symbol vector in the dth list, in [d] is a high-energy symbol set that is nonredundant in S in [d] and the low-energy symbol set ω (k)  in [d] is the best initial estimate of the residual CCI chosen from S. (The best initial estimate ω (k)  in [d] can easily be found from S because the elements in S are ordered from most to least likely.)The list S in [d] is input to the dth CCI estimator in Figure 5 The extension to other signal types is straightforward.The states at the cth stage of the trellis are defined as [14] σ Note that for the chosen example τ[c = 1] = {s 6 s 1 s 2 } are the high-energy symbols.They are represented by fixed states in the trellis and initialized with the kth value τ The corresponding low-energy symbol sets ω (k)  in [d] are used as initial estimates of the residual CCI and are stored in the partial state estimate ν[c].The trellis state sequence is σ  (1) , s (2) overall transitions.In Figure 6, the sequence of overall i→ j transitions is [d], are stored as the list W [d] which is output by the dth CCI estimator, as shown in Figure 5(a).

Symbol estimation with joint detection
We next consider a ULA with a nonbanded sparsity matrix P as shown in Figure 3(d).In this case the symbol estimator of Figure 5 16) and (17).Each list S br [d] serves as input to the (d + 1)th high-energy symbol estimator in the (q pic + 1)th iteration.For q pic = 1, the tentative global list S is chosen as the input.From the input list to the dth symbol estimator, the list of estimates of the residual CCI, W [d], is obtained using the sparsity pattern p [d] ∈ P .After the Q pic th iteration, the branch lists S br [d] are output by the symbol estimators.We have found Q pic = 2 to 5 works well.

List combining
The D branch lists S br [d] are output by the symbol estimators and input to a list combiner (cf. Figure 4).The symbols in each branch vector s br [d] ∈ S br [d] contain estimates of both the low-and high-energy symbol sets ω[d] and τ [d].Here instead of an exhaustive search over all symbol combinations as in (8), only the high-energy symbol sets τ[d] are searched using the error metric of (16).Because of the estimation process, the JML vector s satisfying (8) may not be included in the D branch lists S br [d].By searching and combining the branch lists, we can find improved estimates with high probability of including the desired symbol vector s.In [14], we proposed a list combining algorithm that finds the L-member tentative ordered global list S of most likely symbol estimate vectors s (l) ∈ S, l = 1, 2, . . ., L. We briefly summarize the algorithm here.
The list combiner in Figure 4 takes as inputs y, H, P , and the D branch lists S br [d].For the qth global iteration, the tentative global list S and the corresponding list of error metrics E = {e (1) , e (2) , . . ., e (L) } are stored and S is fed back to the D detector branches.If q = Q (Q is arbitrarily set), S is output by the detector as an estimate of the ordered list of most likely symbol vectors.Typically, only Q = 2 or 3 iterations are necessary.A decision device then selects the first element s (1) ∈ S as the best estimate.Alternatively, S can be used to provide soft information to subsequent receiver stages such as error control decoders.List combining is done in two stages: initial update and iterative search over the estimates of the high-energy symbol sets τ [d].In the initial update, the stored lists S and E are updated with the symbol vectors and error metrics obtained in the current iteration.The iterative search combines the estimates of the high-energy symbol sets τ [d] with the symbols stored in S.This typically requires Q lc = 2 or 3 iterations.The algorithm uses dynamic programming principles and is summarized in Algorithm 1.

PERFORMANCE EVALUATION
Analytical performance bounds for PD-IE are difficult to obtain due to the iterative and list reduction processes.Hence, we use Monte Carlo simulation to compare performance to other MUD algorithms under overload.We assume D single-antenna users transmitting equal power symbol synchronous QPSK (4-QAM) signals.The signals are incident on a receiver with an M-element UCA or ULA where D > M. For simplicity, we assume the same phase reference is used for all signals.The SNR at each receive antenna is defined as the ratio of signal to noise variances, SNR = 10 log 10 (σ 2 s /σ 2 z ), where σ 2 s is the average received power per signal.Simulations are stopped after one user experiences 50 errors.2. Corresponding to S br , define the list of error metrics E br = {e (1)  br , e (2)  br , . . ., e (K) br }.Compute each e (k)  br ∈ E br as e (k)  br = y − H s cand } and E cand = {e (1)  cand , e (2)  cand , . . ., e (L) cand }.These store D × 1 candidate symbol vectors and corresponding error metrics.7.For each iteration q lc = 1, 2, . . ., Q lc and all j = 1, 2, . . ., J d elements τ where e (i) ∈ E is the ith element in E .Update the corresponding list S min by choosing the l = 1, 2, . . ., L symbol vectors from S cand and S with minimum error metric e (l) min .(iv) Set S = S min and E = E min .

UCA
Figure 7 shows the relative performance of the PD-IE, SRSJD, and JML algorithms at SNR = 10 dB.The receiver employs an M = 5-element UCA front end with radius R = 0.2λ.We use the linear beam former of (7) as a spatial filter in the preprocessing stage of the detector.The SEAIR and SSSER thresholds for derivation of the sparsity matrix P are empirically set to T 1 = 2 and T 2 = 0.1, respectively, for up to 100% overload (D ≤ 10).For higher overload factors (D > 10), we set T 1 = 2 and T 2 = 0.5, respectively.As a result, for this example, each row of the channel matrix The matrix P is used for both the PD-IE and SRSJD algorithms.SRSJD performs two iterations around the tail-biting trellis as suggested in [6].Simulations run with more iterations achieved only marginal performance improvements for the increase in SRSJD complexity.The choices of the PD-IE parameters are shown in Table 1.In order to compare the two PD-IE symbol estimators using either explicit CCI estimation (Figure 5(a)) or joint detection (Figure 5(b)), we set Q itb = 2 and adjust the iteration parameter Q pic so that both approaches have similar complexity.Complexity values are presented in Table 1 as the number of real squaring operations per output symbol vector.
From Figure 7 it can be seen that the symbol error rate (SER) essentially increases with the number of users D. This is due to residual CCI in the filtered received signal which increases with the overload factor of the receiver.The somewhat better performance for odd numbers of users, for examble, D = 7, 9, is an artifact of the UCA geometry, as in these cases there are no user signals received from opposite AOAs.Note that the AOA dependance of the UCA is not observed if the SER performance is dominated by the residual CCI.This occurs under heavier overload (e.g., D = 11 users as shown in Figure 7).
JML is the optimum detector and achieves the lowest SER.SRSJD approximates JML up to D = 8 users but fails for D > 8. PD-IE outperforms SRSJD at the cost of higher complexity and achieves near JML performance when using a global list S of size L = 2D.For L = D, performance is impaired due to the increased probability of the transmitted symbols not being in the list S. At a similar complexity, symbol estimation with explicit CCI estimation slightly outperforms joint detection in PD-IE for L = 2D, but performance is worse for L = D.This arises because the trellis-based CCI estimation process can outperform the PIC technique if the correct high-energy symbols are already contained in the global list S. In contrast, joint detection is able to better estimate the CCI for smaller list sizes L because it jointly estimates both the CCI and the high-energy symbols.
Figure 8 illustrates SER versus SNR performance curves for PD-IE using the same receiver setup as in The iteration parameters are set to give comparable complexity for PD-IE with explicit CCI estimation and PD-IE with joint detection as shown in Table 1.
12 users employing symbol estimators with either explicit CCI estimation or joint detection.The SER in Figure 8 decreases with increasing SNR until it reaches an error floor.Its minimum value is dominated by the probability of the correct symbol values not being included in the branch lists S br [d] which explains the higher error floor for the smaller list size L = D in contrast to L = 2D.Increasing the list size L reduces the error floor because more symbol combinations are considered as candidates.This of course increases PD-IE complexity.At low SNR (SNR < 10 dB), the performance results are similar for both PD-IE symbol estimator implementations whereas at higher SNR (SNR ≥ 15 dB), joint detection clearly outperforms explicit CCI estimation in PD-IE.This can be explained by the different symbol estimation processes considered.Since PD-IE with explicit CCI estimation relies on correct estimates of the residual CCI, its SER performance is sensitive to CCI estimation errors.These are more likely to occur if the global list S contains only erroneous symbols and the list size L is small.The explicit CCI estimation process has too few degrees of freedom and cannot possibly accurately estimate all the CCI.There will then always be significant residual CCI.In contrast, PD-IE with joint estimation reestimates both the residual CCI and the high-energy symbol values during the iterative PIC process.It has more degrees of freedom and thus higher probability of finding the correct symbol estimates even if the list S is small or initially contains only erroneous estimates.Increasing the size of S from L = D to 2D reduces the superiority of joint detection due to better explicit CCI estimation in PD-IE.This is observed in Figure 8. Results are shown for D = 9 and 12 users (50% and 100% overload).All other parameters remain unchanged.It can be seen that increasing the number of iterations, Q pic , significantly improves detection performance for D = 12 users.In contrast, performance improvements are much smaller for D = 9 users as Q pic increases.This is expected because increasing Q pic yields more accurate estimation of the residual CCI which is more critical at higher levels of overload.Furthermore, it is evident that more iterations (increased Q pic ) yield a lower error floor as the SNR increases.

ULA
Better CCI estimation comes at the cost of increased complexity.

COMPLEXITY
We now consider the computational complexity of PD-IE.
As a measure of this we use the number of real squaring operations in the calculation of the Euclidean error metrics, as this is usually the most hardware intensive operation [6,10,14,17,18].Complexity of PD-IE depends on many parameters.Among these are the number of users D, the alphabet size |A|, the number of high-energy symbols |τ [d]|, the number of iterations Q itb or Q pic , Q lc , and Q, and the sizes of the lists S br [d] and S.
The overall complexity of PD-IE, C, can be expressed as the sum of the complexities of the symbol estimator and the list combiner, namely, C se and C lc , respectively.From the block diagram in Figure 4, we find where C se = where is the complexity of each CCI estimator with K d being the size of the input list S in [d] and T[c] denoting the number of transitions at the cth trellis stage defined in (21).For joint detection as in Figure 5 The complexity of the list combining algorithm (Algorithm 1) is given by where J d , K, and L d are the sizes of the lists T [d], S br , and S cand , respectively.Note that K and J d may vary in each of the Q global iterations, whereas L d may change in each of the Q lc list combining iterations.
In Table 2, complexity of the JML [10], SRSJD [6], and PD-IE algorithms is compared for receivers with an M = 8-element UCA.The array radius is R = λ/4 and the linear beam former of ( 7) is used as a preprocessor.JML requires 2M|A| D while SRSJD needs only 2Q itb D|A| (μ[c]+1) real squarings [6].Complexity values for PD-IE are shown for |τ[d]| = 3 high-energy symbols, obtained through adjusting the SEAIR and SSSER thresholds.The global list  7 and 8 using an M = 5-element UCA.

Users
Size of S, ∼2.5E4 S has size L = 2D.We use for PD-IE with explicit CCI estimation and Q pic = 3, Q lc = Q = 2 iterations for PD-IE with joint detection.Both list size and iteration parameters were chosen empirically to achieve good detection performance at low complexity.In general, these parameters provide a complexity-performance tradeoff and their values may thus be chosen according to practical restrictions and requirements.The results of Table 2 clearly show that JML has extremely high complexity, increasing exponentially with the number of users.SRSJD achieves the lowest complexity.It has a linear increase within the subsets of users where μ [d] is constant and has an exponential dependance with an increasing number of subsets.PD-IE provides complexity savings of several orders of magnitude over JML but has higher complexity than SRSJD.This is the price to pay for the better performance of PD-IE (cf. Figure 7).The comparison of symbol estimation with explicit CCI estimation and joint detection in PD-IE indicates that joint detection of user symbols and residual CCI has complexity advantages over explicit CCI estimation.This is expected because explicit CCI estimation requires an additional trellis stage for each additional user, whereas for joint detection, the complexity of each symbol estimator remains constant.This can be seen in Table 2 by the increasing complexity ratio C se /C lc for explicit CCI estimation and decreasing values for joint detection.Similar results are found when a ULA is used.

CONCLUSION
In this paper, a unified algorithmic structure for the separation and detection of multiple cochannel signals in an overloaded SIMO environment is proposed.The detection algorithm is applied to receivers with either a UCA or a ULA.
A linear preprocessor employing either spatial beam forming or diversity combining is used to reduce the amount of CCI in the received signals.Due to the overloaded environment and the linear preprocessing, residual CCI is still present.The detection of the user symbols is done using the proposed PD-IE algorithm.It estimates the residual CCI and performs nonlinear iterative list detection of the user symbols.Performance is evaluated using Monte Carlo simulation.PD-IE is shown to approximate the optimum JML detector with significantly lower complexity and outperforms existing low-complexity algorithms.Comparison to the SRSJD algorithm shows that PD-IE yields better performance at the cost of some increase in complexity.Unlike JML whose complexity is exponential in the number of users, PD-IE has a much lower rate of complexity increase.Complexity savings become more significant when the number of receive antennas is large.PD-IE simulation results suggest that joint detection and CCI estimation has advantages over explicit CCI estimation.It achieves a better performancecomplexity tradeoff, yields simpler implementation, and most importantly, it can be used with arbitrary receive array

Call for Papers
The continuous evolution of CMOS technologies, and its variants HV MOS, BCD, and RF CMOS, has enabled the integration of complex functionalities in a single heterogeneous embedded system.Digital subsystems can be integrated onto the same chip or the same package together with RF blocks, analog circuits, power drivers, and even micromechanical parts for sensors and actuators.Such new generation of mixed-signal embedded systems is fueling the development of more efficient and performing solutions in several technology areas: sensors, lab-on-chip, and body area networks for health care; distributed control sensing actuation units for increasing safety, comfort, and engine efficiency in vehicles; software-defined and cognitive radios for multimode multimedia communication; wireless sensor/actuator networks for ambient intelligence.
The opportunity given by mixed-signal embedded systems comes with lots of challenges.The main issues concern the development of innovative methods, languages, CAD tools, and architectures needed in different design phases: highlevel specification and simulation, design space exploration to find optimal partitioning between hardware/software and analog/digital functions; codesign of the different subsystems; automatic synthesis; design flexibility and programmability, IP block reuse, and on field reconfigurability; verification and test of mixed-signal components by means of simulations, formal methods, and rapid prototyping; assembly and integration of heterogeneous blocks in the same chip or package.
This Special Issue intends to also address case studies, in the above-mentioned technology areas, demonstrating how the use of mixed-signal embedded systems enables new services/applications or increases the performance and efficiency of existing ones.Authors working in the area of mixed-signal embedded systems are requested to submit original papers or high-quality review articles addressing recent advances in the field.If the work has been published in conference proceedings, the authors should submit an extended version.The topics include, but are not limited to: • High-performance ADCs and DACs

Call for Papers
Traditional broadcasting and analog TV are being gradually replaced by advanced digital broadcasting systems including digital audio broadcasting, digital TV, Internet TV and audio streaming over IP, and so on.In these new digital applications and systems, audio processing is playing more and more role, which covers coding, indexing, capture, retrieval, surround sound, classification, recording, watermarking, loudness control, transmission, other pre/post processing, and their convergence to silicon-chip.More importantly, the fast growing of digital broadcasting services in mobile/handheld devices makes the research and development of audio processing become very challenging tasks and require huge efforts of the related academic, research, industry, standardization organizations, regulation authorities, content providers, and services providers.
This Special Issue aims to stimulate and guide the development of new and improved audio processing systems by providing a unique form with high quality and timely manner for scientists, engineers, broadcasters, manufacturers, software developers and others involved in the delivery and playback audio contents through broadcasting and IP based networks.It is also hoped that this special issue will attract a broad audience in audio engineering community.Topics of interest of this special issue include but are not limited to: Before submission authors should carefully read over the journal's Author Guidelines, which are located at http://www .hindawi.com/journals/ijdmb/guidelines.html.Prospective authors should submit an electronic copy of their complete manuscript through the journal Manuscript Tracking System at http://mts.hindawi.com/according to the following timetable:

Figure 3 :
Figure 3: (a) Spectral square root (H H H) (1/2) of H and (b) sparsity matrix P for a 5-element UCA.The users are uniformly spaced in AOA.(c) Spectral square root (H H H)(1/2) of H and (d) sparsity matrix P for a 5-element ULA.The user AOAs are uniform within θ max = ±60 • .There are D = 6 equal energy users.Elements with "1" in P are obtained by using the SEAIR and SSSER criteria with thresholds T 1 = 2 and T 2 = 0.1, respectively.
(a), and τ[1] = {s 1 , s 3 } and ω[1] = {s 2 , s 4 , s 5 , s 6 } in Figure3(c).Similar results are obtained for all other values of d.Note that different numbers of users and receive antennas, D and M, as well as different antenna array geometries and element spacing may change the empirically determined thresholds T 1 and T 2 .However, once T 1 and T 2 have been set for a given M, the algorithm appears robust over a wide range of D.
br [d], . . ., s (L) br [d]} of (D × 1) symbol vectors s (k) br [d], where k = 1, 2, . . ., L. Each vector s (k) br [d] contains estimates of the high-and low-energy symbol sets τ[d] and ω[d], respectively, and can be decomposed into [d] are the estimated low-and highenergy user symbol sets in the dth detection branch.(The symbol sets τ[d] and ω[d] for each branch list S br [d] are derived from p [d] ∈ P .)We consider the low-energy sets ω (k) [d] as residual CCI and obtain them by an interference estimation process.The high-energy sets τ (k) [d] are found by an exhaustive search over all possible |A| |τ[d]| symbol combinations τ[d], where |τ[d]| = |U e [d]| is the number of signals in the dth enumeration set U e [d].This is done by the high-energy symbol estimators shown in Figure 5.Each such estimator takes the list W[d] = { ω , . . ., ω (Id) [d]} and the quantities y [d] ∈ y , h [d] ∈ H , and p [d] ∈ P as inputs.The list W[d] contains estimates of the residual CCI with the tilde notation (•) denoting nonredundant list elements.(Storing only the nonredundant elements ω (i) [d] ∈ W, i = 1, 2, . . ., I d , ensures that the complexity of high-energy symbol estimation is minimal.)The list size is I d with 1 ≤ I d ≤ L.
(a).It operates on a spatial trellis having D stages indexed by c = 1, 2, . . ., D. It starts and ends in a fixed state.Note that both fixed states contain the high-energy symbol set τ (k) in [d] and are equivalent due to the tail-biting trellis structure.The trellis is applied to each of the K d symbol vectors s (k) in [d] ∈ S in [d].Figure 6 depicts an example trellis for the CCI estimator of Figure 5(a) for the M = 5 antenna, D = 6 user environment of Figures 3(a) and 3(b) using BPSK signaling.

Figure 4 :
Figure 4: Block diagram of the parallel detector with interference estimation (PD-IE).

hFigure 5 :
Figure 5: The dth symbol estimator in the PD-IE in Figure 4 using (a) explicit CCI estimation and (b) joint detection.

Figure 6 :
Figure 6: ITB-DDFSE trellis for explicit CCI estimation in symbol estimator #1 in Figure 5(a).The trellis is shown for the UCA example in Figures 3(a) and 3(b) using BPSK signals.
(b) is needed.It uses an iterative PIC approach to jointly find estimates of the low-and high-energy symbol sets ω[d] and τ[d].The required inputs to the dth symbol estimator are the tentative global list S and the dth row components of y , H , and P .The symbol estimators compute D tentative branch lists S br [d] by searching over the high-energy symbols τ[d] using (

Initial Update 1 .
Define a list of D × 1 branch symbol vectors, S br .Initialize the elements s (k) br ∈ S br with the nonredundant symbol vectors from the D branch lists S br [d].Note that k = 1, 2, . . ., K and 1 ≤ K ≤ LD.

3 .
Define the list of L tentative minimum error metrics, E min , and the corresponding list of D×1 symbol vectors, S min .Obtain the elements e(l)  min ∈ E min by searching e br , e(i) , l = 1, 2, . . ., L, where e(i) is the ith element in E , obtained in the (q − 1)th iteration.For q = 1, choose E = {∞}.Find the elements s (l) min ∈ S min by choosing symbol values from the corresponding lists S br and S. 4. Set S = S min and E = E min .Iterative Search 5. Define the d = 1, 2, . . ., D lists T [d].Find the elements τ ( j) [d] ∈ T [d] by using p [d] ∈ P to select the nonredundant high-energy symbol sets from S br [d].Note that j = 1, 2, . . ., J d and J d ≤ L. 6. Define the lists S cand = { s ∈ T [d] of the d = 1, 2, . . ., D lists, T [d], (i) Use p[d] ∈ P to find the estimates of the low-energy symbol sets ω[d] in the list S and copy the nonredundant sets into S cand .The resulting list S cand has size L d with 1 ≤ L d ≤ L. (ii) For each element s (k) cand ∈ S cand , k = 1, 2, . . ., L d , do (a) Copy the high-energy symbol set estimate τ Compute the error metric, e (k) cand = y − H s Update the tentative list E min by finding the l smallest metrics, cand , e (i) , l = 1, 2, . . ., L, signals D SRSJD, SNR = 10 dB PD-IE, explicit CCI estimation, L = D PD-IE, joint detection, L = D PD-IE, explicit CCI estimation, L = 2D PD-IE, joint detection, L = 2D JML detector, SNR = 10 dB

Figure 7 :
Figure 7: SER of the worst user versus number of cochannel signals at SNR = 10 dB for a 5-element UCA using JML, SRSJD, and PD-IE algorithms.Iteration parameters for PD-IE are shown in Table1.

Figure 7 . 12 D = 10 Figure 8 :
Figure 8: SER of the worst user versus SNR for PD-IE with list sizes L = D and 2D using a 5-element UCA with D = 10 and 12 users.The iteration parameters are set to give comparable complexity for PD-IE with explicit CCI estimation and PD-IE with joint detection as shown in Table1.

Figure 9 depicts
Figure 9 depicts SER versus SNR curves for a receiver with an M = 6-element ULA with element spacing B = 3λ.The users are randomly allocated to D equal size sectors within the array's view angle of θ max = ±60 • .(For nonfading memoryless channels, the ULA is highly selective in AOA.We therefore use random user spacing into equal size sectors to obtain comparable results for different numbers of users.)The transmitted signals are incident with random phase on the antenna array.We set the SEAIR and SSSER thresholds to T 1 = 2 and T 2 = 0.1, respectively.The detection algorithm is PD-IE with joint detection of user symbols and residual CCI.The iterative PIC process uses either Q pic = 1 and Q pic = 5 iterations.The global list S has size L = 2D.Results are shown for D = 9 and 12 users (50% and 100% overload).All other parameters remain unchanged.It can be seen that increasing the number of iterations, Q pic , significantly improves detection performance for D = 12 users.In contrast, performance improvements are much smaller for D = 9 users as Q pic increases.This is expected because increasing Q pic yields more accurate estimation of the residual CCI which is more critical at higher levels of overload.Furthermore, it is evident that more iterations (increased Q pic ) yield a lower error floor as the SNR increases.Better CCI estimation comes at the cost of increased complexity.

D d=1 C 9 Figure 9 :
Figure 9: SER of the worst user versus SNR for PD-IE using an M = 6-element ULA with element spacing B = 3λ.There are D = 9 and 12 cochannel users.The size of the global list S is L = 2D.

•
Advanced audio compression and classification systems • Theory and algorithms of audio indexing and retrieval in broadcasting • Audio segmentation, search, and description (language, tools) • Implementation (silicon-chip, software and embedded) of standardized audio codecs and audio formats as well as their conversions • Standardization development for audio coding and indexing (MPEG family, AVS, DRM and beyond) • Audio effects and enhancement in broadcasting applications • Multichannel audio coding and transmissions • Compatibility and integration of multiple audio broadcasting standards • Internet streaming-audio quality, measurement and monitoring • Combination indexing: audio, speech and visual information • Watermarking with audio coding and indexing criteria are equivalent if, in the dth row h[d] ∈ H, the diagonal element h dd has the most signal power, |h dd | 2 = max 1≤u≤D |h du | 2 .)Both use a threshold which, if chosen poorly, erroneously treats signals with low energy as high-energy signals, and results in higher detection complexity than necessary for a given level of performance.A poor choice can also lead to considering strong signals as low energy signals, which results in lower complexity at the cost of poorer overall performance.

Table 1 :
Parameters and complexity for PD-IE simulations in Figures

Table 2 :
Comparison of computational complexity for a receiver with an M = 8-element UCA.

•
Digital calibration/correction of analog and RF circuits in embedded systems with wireless connectivity • Integration of MEMS, analog, and digital circuits in smart embedded sensors • Digital and power integration for smart actuators • Methods for the design of space exploration of mixedsignal architectures and partitioning between hardware/software and analog/digital domains • CAD tools and languages for mixed-signal embedded systems • Rapid prototyping techniques and test beds for mixedsignal embedded systems • Application case studies Before submission, authors should carefully read over the journal's Author Guidelines, which are located at http://www .hindawi.com/journals/es/guidelines.html.Prospective authors should submit an electronic copy of their complete manuscript through the journal Manuscript Tracking System at http://mts.hindawi.com/according to the following timetable: Università di Pisa, Via G. Caruso 16, 56122 Pisa, Italy; sergio.saponara@iet.unipi.itUniversità di Pisa, Via G. Caruso 16, 56122 Pisa, Italy; luca.fanucci@iet.unipi.it