Blind Separation of Two Users Based on User Delays and Optimal Pulse-Shape Design

,


Introduction
We consider the problem of multiuser separation in wireless networks via approaches that do not use scheduling.This problem is of interest, for example, when traffic is generated in a bursty fashion, in which case fixed bandwidth allocation would result in poor bandwidth utilization.Lack of scheduling results in collisions, that is, users overlapping in time and/or frequency.To separate the colliding users, one could enable multiuser separation via receive antenna diversity, or code diversity, as in code-division multiple-access (CDMA) systems.However, the former requires expensive hardware since multiple transceiver front ends involve significant cost.Further, the use of multiple antennas might not be possible on small-size terminals or devices.CDMA systems require bandwidth expansion, which requires greater spectral resources, and also introduces frequency-selective fading.In the following, we narrow our field of interest to randomaccess systems that for the aforementioned reasons cannot exploit antenna diversity, and that are inexpensive in terms of bandwidth.In such systems, the use of different power levels by the users can enable user separation by exploiting the capture effect [1], or successive interference cancellation (SIC) [2].Different power levels can result from different distances between the users and the destination, or could be intentionally assigned to users in order to facilitate user separation.While the former case, when it arises, makes the separation problem much easier, the latter approach might not be efficient, as low-power users suffer from noise and channel effects.In the following, we focus on the most difficult scenario of separating a collision of equal-power users.Almost equal powers would also result from power control.Power control is widely used, hence this scenario is of practical interest.
A delay-division multiple access approach was proposed in [3], which exploits the random delays introduced by transmitters.The approach of [3] considers transmissions of isolated frames.It requires that users have distinct delays, assumes full channel knowledge at the receiver and exploits the edges of a frame over which users do not overlap.Pulse-shape waveform diversity was considered in [4] to separate multiple users in a blind fashion.In [4], the received signal is oversampled and its polyphase components are viewed as independent mixtures of the user signals.User separation is achieved by solving a blind source separation problem.Although no specifics on waveform design are given in [4], the examples used in the simulations of [4] consider wideband waveforms for the users.However, if large bandwidth is available, then CDMA would probably be a better alternative to blind source separation.Pulseshape diversity is also employed in [5,6], addressing situations in which the pulse-shape waveforms have bandwidth constraints.
In this paper we follow the oversampling approach of [4], with the following differences.First, we introduce an intentional half-symbol delay between the two users.Second, both users use the same optimally designed pulse-shape waveform.Third, we use successive interference cancelation in combination with blind source separation to further improve the separation performance.
The paper is organized as follows.In Section 2, we describe the problem formulation.The proposed blind method is presented in Section 3. The Pulse-shape design is derived in Section 4. Simulation results validating the proposed method are presented in Section 5, while concluding remarks are given in Section 6.
Notation.Bold capitals denote matrices.Bold lower-case symbols denote vectors.The superscript T denotes transposition.The superscript † denotes the pseudoinverse.Diag{v} denotes the diagonal matrix with diagonal elements the elements of v. • denotes rounding down to the nearest integer.Tr(•) denotes the trace of its argument.Arg{•} denotes the phase of its argument.

Problem Formulation
We consider a distributed antenna system, in which K users transmit simultaneously to a base station.Although much of this paper studies the case K = 2, for reasons that will be explained later, we will keep the K user notation throughout.Narrow-band transmission is assumed here, in which the channel between any user and the base station undergoes flat fading.In addition, quasi-static fading is assumed, that is, the channel gains remain fixed during several symbols.
The transmitted signal of user k is of the form where s k (i) is the ith symbol of user k; T s is the symbol period; p(t) is a pulse-shaping function with support [− LT s , LT s ], where L is an integer.The continuous-time baseband received signal y(t) can be expressed as where a k denotes the complex channel gain between the kth user and the base station; τ k denotes the delay of the kth user; F k is the carrier frequency offset (CFO) of the kth user, arising due to relative motion or oscillator mismatch between receive and transmitter oscillators, and w(t) represents noise.
Our objective is to obtain an estimate of each user sequence, s k (i), i = 0, 1, . . ., up to a complex scalar multiple that is independent of i.The estimation will be based on the received signal only, while channel gains, CFOs and user delays are assumed to be unknown.During the recovery process, there is permutation ambiguity, that is, the order of the users may be lost and again the user signals will be recovered up to a scalar multiple.However, these are considered to be trivial ambiguities and are inherent in blind estimation problems.
We should note that typically, in high-speed communication systems, the main lobes of the pulse-shape functions overlap by 50% [7].This extended time support allows for better frequency concentration, or equivalently, lower spectral occupancy for the transmission of each symbol.However, it introduces intersymbol interference (ISI).Examining x k (t) for t ∈ [iT s , (i + 1)T s ) (see ( 1)), we note the contribution of the ith symbol, the contribution of symbol i + 1 due to the main lobe of p k (t − (i + 1)T s ), and also contributions of symbols i + l, l = . . .− 2, −1, 2, 3, . . .due to the sidelobes of p k (t − (i + l)T s ), = . . .− 2, −1, 2, 3, . .., respectively.If p(t) is a Nyquist pulse and samples are taken at times iT s , i = 1, 2, . .., the overlap does not play any role.However, when we obtain more than one sample during the symbol interval, we expect ISI effects.
Sampling the received signal y(t) at times t = iT s +mT s /P, we obtain where f k = F k T s /P (|F k T s | ≤ 0.5) is the normalized CFO between the kth user and the base station, s k (i) = s k (i)e i2π fkiP , " * " denotes convolution, and h m,k (i) is defined as EURASIP Journal on Wireless Communications and Networking 3 The mth polyphase component, y m (i), m = 1, . . ., P, can be expressed as Let us form the vector y(i) as y(i) = [y 1 (i), . . ., y P (i)] T .It holds that where A is a ; and w(i) = [w 1 (i), . . ., w P (i)] T .This is a P × 2 LK instantaneous multiple-input multiple-output (MIMO) problem.Under certain assumptions, to be provided in the following section, the channel matrix A is identifiable, and the vector s(i) can be recovered up to certain ambiguities.In particular, for each k, we get 2 L different versions of s k , that is, s k (i − L + 1)e j2π fk(i− L+1)P , . . ., s k (i + L)e j2π fk(i+ L)P within a scalar ambiguity.The effects of the CFO on the separated signals can be mitigated by using any of the existing single-CFO estimation techniques (e.g., [8][9][10][11][12][13]), or a simple phase-locked loop (PLL) device [14].(A1) Each of the elements of w(i), as a function of i, is a zero-mean, complex Gaussian stationary random process with variance σ 2 w , and is independent of the inputs.

Blind User Separation
(A2) For each k, s k (•) is independent and identically distributed (i.i.d.) with zero mean and nonzero kurtosis, that is, The s k 's are mutually independent, and each user has unit transmission power.(A6) Either the CFOs are distinct, or the user delays are distinct.(A7) p(t) > 0 for (−T s , T s ); and p(t) = 0 only for t = iT s and i = − L, . . ., −1, 1, . . ., L.
Under assumption (A2), it is easy to verify that the rotated input signals s k (•) are also i.i.d. with zero mean and nonzero kurtosis.Also, the s k 's are mutually independent for different k's.Assumptions (A1) and (A2) are needed for blind MIMO estimation based on (7).Assumptions (A3)-(A7) guarantee that the virtual MIMO channel matrix A in ( 7) has full rank with high probability.Assumption (A3) can actually be relaxed.As will be discussed later, (see ( 18)), the contributions of low-value columns of A in ( 7) can be viewed as noise.This effectively reduces the dimensionality of the problem.(A5) and (A7) guarantee that p(iT s +mT s /P− τ k ) will be nonzero for all allowable values of i, m, and k.To see the effect of (A6), let us write the channel matrix A as where h k (l) is formed by appending h m,k (l) in ( 5) for different m's, that is, and consider the case in which all users have the same delays, that is, τ k = τ, k = 1, . . ., K. If the CFOs are different, A has full column rank.Even if the CFOs are not distinct, the columns of the channel matrix can be viewed as having been drawn independently from an absolutely continuous distribution, and thus the channel matrix has full rank with probability one [15].

Channel Estimation and User Separation.
One can apply to (7) any blind source separation algorithm (e.g., [16]) to obtain an estimate of the channel matrix, A, which is related to the true matrix as where P is a column permutation matrix, and Λ is a complex diagonal matrix.The method of [16] requires fourth-order cumulants of y(i).Accordingly, the estimate of the decoupled signals s(i) within permutation and diagonal complex scalar ambiguities is Denoting by θ k,l the diagonal element of Arg{Λ}, which corresponds to the phase ambiguity of user k with delay l, the separated signals can be expressed as At this point, the users' signals have been decoupled, and all that is left is to mitigate the CFO in each recovered signal.This can be achieved with any of the existing single CFO estimation methods, such as [8][9][10][11][12], or [13].Alternatively, if the CFO is very small, then we can estimate it and at the same time mitigate its effects using a PLL.We should note here that even a very small CFO needs to be mitigated in order to have good symbol recovery.For example, for 4-ary quadrature amplitude modulation (4QAM) signals and without CFO compensation, even if the normalized CFO P f k = F k T s is as small as 0.001, the constellation will be rotated to a wrong position after 0.25/0.001= 250 samples.
If the CFO is large, then a PLL does not suffice, with the severity of the problem depending on the modulation scheme.In this case, the phase of the estimated channel matrix A can be used to obtain a CFO estimate.If p(t) > 0 for all t, then it can be easily seen that Arg{ A} = ΨP with where 1 N is a (1 × N) vector with all elements equal to one, and The least-squares estimates of the CFO can be obtained as where Ψ p,k is the (p, k )th element of Ψ.
On noting that the decoupled signals s k (i − l) in ( 12) are permuted (see (11)) in the same manner as the estimated CFOs in (14), we can use the f k 's to compensate for the effect of CFO in the decoupled signals in (12) and obtain estimates of the input signals as where In order to resolve user permutation and shift ambiguities, one can use user IDs embedded in the data [17].
Although in theory, under the above stated conditions, the matrix A has full rank for any number of users, K, the matrix condition number may become too high when CFOs or delay differences between users become small.As K increases, the latter problem will escalate.Further, for large K, the oversampling factor, P, must be large.However, as P increases, neighboring pulse-shape function samples will be close to each other, and the condition number of A will increase.Therefore, the shape of the pulse-shape function sets a limit on the oversampling factor one can use and thus on the number of users one can separate.Recognizing that the above are difficult issues to deal with, we next focus on the two-user case.Further, we propose to introduce an intentional delay of T s /2 between the two users, in addition to any small random delays there exist in the system.
The performance of user separation depends on the pulse-shape function and also on the location of the samples.Although uniform sampling was described above, nonuniform sampling can also be used, in which case the expressions would require some straightforward modifications.If the samples correspond to a low-value region of the pulse, the corresponding polyphase components will suffer from low signal-to-noise ratio.Also, if the sampling points are close to each other, then the condition number of A will increase.Therefore, one should select the sampling points so that the corresponding samples are all above some threshold and the sampling points are as separated as possible.The effect of pulse-shape and optimal shape design will be discussed in the following section.

Pulse-Shape Design
In this section, we first investigate the effects of pulse-shape on the condition number of A. Since the condition number of a matrix increases as the column correlation increases, we next look at the correlation between the columns of A.
Let us partition the channel matrix A into two submatrices A P and A I , containing, respectively, the columns of A corresponding to the main lobe and those corresponding to the sidelobes of the pulse.We can rewrite (7) as follows: where with h k (l) as defined in (9).Correspondingly, s P = [ s 1 (i), s 1 (i + 1), . . ., s K (i), s K (i + 1)] T , and If the sidelobes of the pulse are very low, then A I s I (i) can be treated as noise and ( 16) can be written as y(i) = A P s P (i) + w(i). (18) 4.1.Pulse Effects.In order to maintain a well-conditioned A P , the correlation coefficient between its columns should be low.Let us further divide the matrix ].The elements of h k (0) are samples from the decreasing part of the main lobe of the pulse.On the other hand, the elements of h k (−1) are from the increasing part of the main lobe of the pulse.Thus, the correlation coefficient of h k (0) and h m (−1) is smaller than the correlation coefficient of h k (0) and h m (0), or that of h k (−1) and h m (−1).Thus, we focus on the effects of the pulse on the column correlations within A 0 and A −1 .
Proposition 1.Let p(t) be a Nyquist pulse that is positive within its main lobe, that is, p(t) > 0 for t ∈ (−T s , T s ).We further assume p(t) is an even function with very low sidelobes.For τ k1 and τ k2 (τ k1 / = τ k2 ) in (0, T s /P), the absolute value of the correlation coefficient between h k1 (0) and h k2 (0) is upper bounded as follows: where Δt is the sampling interval, that is, Δt = T s /P, and h k (0) is given by where p (t) denotes the first-order derivative of p(t).
Proof.See the appendix.
When P is large, the following approximation holds: Thus, for fixed E p and p(0), the correlation coefficient between h k1 (0) and h k2 (0) decreases with increasing Ts 0 p 2 (t)dt.It can be shown that the same holds for the correlation coefficient between h k1 (−1) and h k2 (−1).
Because p(t) should be a Nyquist pulse with small sidelobes and p(t) > 0 for t ∈ (−T s , T s ), it should hold that where is small.
There are additional constraints that the pulse should satisfy, the most important of which is a bandwidth constraint.Most commercial systems, for example, the IEEE 802.11a,IEEE 802.11b, and IEEE 802.11g wireless local-area networks (WLANs) [18], are equipped with a spectral mask that dictates the maximum allowable spectrum, or equivalently, the maximum symbol rate.This leads to a constraint of the form where P( f ) is the Fourier transform of p(t), and M( f ) denotes the spectral mask.
The problem of (24a)-(24e) is not easy to solve.Next, we will take steps towards reformulating it into a convex optimization problem.Let p = [p(0), p(T s /ξ), . . ., p((L − 1)T s /ξ)] T be a vector containing samples of p(t) taken in [0, LT s ], with sampling interval Δt = T s /ξ, in which case L = Lξ + 1 (ξ is an integer representing the number of samples in each symbol interval).The objective function (24a) is equivalent to where Γ is of the form As p(t) is an even symmetric function, the Fourier transform of p can be represented as P( f T , with power spectral density (PSD) equal to |v T ( f )p| 2 2 .Hence, the constraint (24b) is equivalent to Because ( 27) involves an infinite number of constraints, we sample |v T ( f )p| 2 2 in the frequency domain: where N is the number of samples in [0, 1/(2Δt)].In order for (28) to be a good approximation of (27), N should be on the order of 15L [19].
In the discrete-time domain, (24c) is equivalent to where is small and a 1 = [0, . . ., 0, 1, . . ., 1] with ξ +1 leading zeros.Define l j = [0, . . ., 0, 1, 0, . . ., 0, ] T , with the jth element equal to 1. Equation (24d) is equivalent to with j = iξ + 1. Hence the problem of (24a)-(24e) can be reformulated as l T j p = 0, for j = iξ + 1, i = 1, . . ., L, (31d) Since it involves maximization of a convex function, (31a) is not a convex optimization problem.Letting G = pp T , G should be a positive semidefinite matrix of rank 1.The problem of (31a)-(31e) is equivalent to Tr Tr Gl j l T j = 0, for Tr However, the constraint of (32g) is not a convex constraint.By dropping it, we obtain a semidefinite relaxation of the primal problem [20].The resulting convex optimization problem is min Tr Tr Gl j l T j = 0, for Tr As we drop the constraint rank(G) = 1, the resulting G * might not be of unit rank.In this case, we apply eigendecomposition to G * .Let where λ 1 is the largest eigenvalue of G * , and u 1 is the corresponding eigenvector.As G * 0, its eigenvalues λ μ ≥ 0 for μ ∈ {1, . . ., L}.If then p * can result in a good pulse-shape.If λ 1 L μ=2 λ μ , then it holds that which indicates that Ts 0 [p (t)] 2 dt in the problem of (24a)-(24e) is maximized.Moreover, p * can guarantee the validity of (31b) and (31c).Also, if λ 1 L μ=2 λ μ and λ μ ≥ 0, then This indicates that the PSD of p * will be under the IEEE 802.11 mask.In the same way, we can prove that which further indicates that p * has small sidelobes.Moreover, is small and the validity of (38) implies that l T j p * ≈ 0, for which indicates that, if we sample at intervals T s , the interference from neighboring symbols can be neglected.

Pulse Design Examples.
In this section, we demonstrate the performance of a pulse designed as described in Section 4.2.We take 16 samples per symbol, that is, ξ = 16, and set L = 4. Then we obtain L = Lξ + 1 = 65 and N = 15L = 975 samples in the time and frequency domains, respectively.We take to be 3×10 −5 .In Figure 1, we show the ratio η = λ 1 / L μ=2 λ μ of the resulting matrix G * at different symbol rates, where λ 1 is the largest eigenvalue of G * .One can see that the smallest η is above 10 2 , which means that the condition of (35) is satisfied.Therefore, p * = λ 1 u 1 is a good choice of pulse-shape.
For symbol rate 10 M/sec, or equivalently, T s = 10 −7 sec, the designed time-domain pulse is shown in Figure 2.For comparison, the Isotropic Orthogonal Transform Algorithm (IOTA) pulse [21] is also shown in the same figure.The corresponding PSDs, along with the IEEE 802.11 spectral mask are given in Figure 3. From the figures we can see that the proposed pulse decreases faster than the IOTA pulse within [0, T s ].The larger the value of |p (t)|, the faster p(t) decreases.In Figure 3, one can see that the PSD of the proposed pulse is under the 802.11mask, while the PSD of the IOTA pulse violates the mask at f = 22 MHz.For symbol rate 12.19 M/sec, or, T s = 0.82 × 10 −7 sec, the obtained pulse is given in Figures 4 and 5.We also plot the raised cosine pulse with roll-off factor being equal to 1.One can see that, in the frequency domain, the proposed pulse is under the 802.11mask, while in the time domain the proposed pulse is narrower.Note that at this symbol rate, the IOTA pulse cannot meet the mask constraint.

SER Performance.
In this section, we demonstrate the performance of the proposed user separation approach via simulations.We consider a two-user system.The channel coefficients a 1 and a 2 are taken to be zero-mean complex with unit amplitude and phase that is randomly distributed in [0, 2π].The CFOs are chosen randomly in the range [0, 0.001/T s ].The input signals are 4-QAM containing 1024 symbols.The estimation results are averaged over 100 independent channels, and 10 Monte-Carlo runs for each channel.One user is intentionally delayed by half a symbol and in addition, small delays, taken randomly from the interval [−T s /8, T s /8], are introduced to each user.In our simulations, we combine blind source separation method with SIC [2].For blind source separation the Joint Approximate Diagonalization of Eigenmatrices (JADE) algorithm was used, which was downloaded from http://perso.telecom-paristech.fr/∼cardoso/Algo/Jade/jade.m.We first apply JADE to decouple the users, and then correct the decoupled users' CFOs.Subsequently, the strongest user, that is, the which shows the best concentration around the nominal constellation is deflated from the received polyphase components to detect the other user.SIC requires that the first user should be detected very well.To achieve this, the sampling points are chosen around the peak of one user signal, so that ISI and interuser interference effects are minimized.
Eliminating CFO effects from the decoupled users can be done via a PLL, if the CFO is small, or a PLL initialized with a good CFO estimate, if the CFO is large as the PLL by itself would not converge in this case.For the latter case, since we sample around the peak of one user, the CFO estimation formula of ( 14) requires a small modification before it is applied.Let the P sampling points occur at δ 1 , δ 2 , . . ., δ P , and let Ψ be the phase of the channel matrix corresponding to these sampling points.The least-squares estimate of the CFO f k can be obtained as T s 2πP where Ψ p,k is the (p, k )th element of Ψ .In this experiment, the pulse has time support [−4T s , 4T s ].We take P = 7 polyphase components of the received symbols, each consisting of samples taken evenly over the interval [−3T s /8, 3T s /8], with sampling period T s /8.In order to sample around the peak of one user, we used the true  shift values.However, in a realistic scenario this information would be obtained via synchronization pilots [17].
The symbol error rate (SER) performance at T s = 10 −7 sec, that is, at symbol rate 10 M/sec, using the waveform of Figure 2, is shown in Figure 6 along with the performance corresponding to the IOTA pulse.We can see that the SER of the proposed pulse is lower; there is an approximate 4 dB SNR advantage over the IOTA result.
In Figure 7, we show the SER versus SNR at different symbol rates.First, by taking T s = 0.82 × 10 −7 sec, or equivalently symbol rate 12.19 M/sec, we compare the SER performance of the proposed pulses and the raised cosine pulse with roll-off factor 1. As we can see, the performance of the proposed pulse is better.For example, the proposed pulse can achieve SER = 0.01 at 25 dB SNR, while the raised cosine pulse needs 30 dB SNR to achieve the same SER.In the same figure we show the SER performance of the proposed pulse at symbol rate 11 M/sec.At this rate, the proposed pulse can achieve an SER of 0.01 at 15 dB SNR.
In Figure 8, we show SER performance for different values of the oversampling factor, P, at different symbol rates.For P = 4, the sampling occurs evenly within the interval [−3T s /10, 3T s /10] of each received symbol with sampling period T s /5.One can see that, for symbol rate 12.19 M/sec, when the SNR is higher than 25 dB, the SER performance improves by increasing P from 4 to 7. For symbol rates equal to 10 M/sec and 11 M/sec the SER performance remains almost the same with increasing P.
In order to demonstrate the effect of the proposed pulse on the condition number of the system matrix, we show in Figure 9 the condition number of A P corresponding to the proposed and IOTA pulses, averaged over 100 random channels realizations and with P = 4.In order to make a fair comparison, the CFOs and random delays were set to be the same for both pulses.No noise was added in the data.The estimated A P 's were collected from the JADE output, and their condition numbers were calculated.One can see that the proposed pulse results consistently in lower condition number than the IOTA pulse.
Next, we show the effect of user delays on performance.As before, one user is delayed by a half-symbol interval, and in addition, a random delay τ is added to both users to model random delays introduced at the transmitter.In this experiment, the range for the random delay τ is increased from [−T s /8, T s /8] to [−T s /5, T s /5].For random delays within [−T s /5, T s /5], in order to prevent the delay difference of two users from being too small, we select the delays so that their difference is no less than a threshold τ d = T s /5.The resulting SER performance is shown in Figure 10.When the range of τ increases from [−T s /8, T s /8] to [−T s /5, T s /5] the performance becomes worse.This is because by increasing the range for the random delay, the signals of the two users overlap by a larger amount, which results in high condition number for the channel matrix A. The best performance would be obtained with just the half-symbol delay and no random delays; however, this is not a realistic case.
Next, to show the advantage of the intentional halfsymbol delay, we consider a case without intentional delay, with random user delays only.The random delays of both users are taken within [−T s /8, T s /8].In order to prevent worsening of performance we restricted the smallest delay difference between two users to be no less than τ d = T s /5.We compare the SER performance of the proposed pulse with IOTA and raised cosine pulses at different symbol rates.Firstly, comparing the corresponding curves in Figure 10, one can first see that without the intentional delay the SER performance decreases.In particular, for the proposed pulse in order to achieve SER 0.01, we need an SNR of 17 dB and 30 dB for symbol rates 10 M/sec and 12.19 M/sec, respectively.Secondly, the SER performance of the proposed pulse is still better than that of IOTA and raised cosine pulses at the corresponding symbol rate.
Finally, we show the effect of CFOs on performance (see Figure 12).In order to highlight the effect of the CFOs, SER results were obtained without intentional delay, with random delays taken in the interval [−T s /8, T s /8] and by setting the delay difference of the two users to be no less than τ d = T s /5.The normalized CFOs were chosen randomly within the range [0, CFO r ] for CFO r = 0.3, and CFO r = 0.001.For CFO r = 0.3 we restricted the smallest difference between two CFOs to be no less than CFO d = 0.1, and for CFO r = 0.001, we set no threshold on the CFO difference of the two users.For CFO r = 0.3, the CFO is quite large, and the PLL by itself is not enough to remove the CFO in the decoupled users.Therefore, we first used the method described in Section 3.2 to estimate the CFOs and then used the PLL to compensate for the residual CFO.
The quality of the CFO estimates depends on the accuracy of the channel matrix estimate.Since low-magnitude elements of the channel matrix correspond to low values of the pulse, and as such are susceptible to errors, we set a threshold, ϕ, defined as ϕ = α h k (l) ∞ , and for CFO estimation, we only use elements of h k (l) whose amplitudes are greater than ϕ.In this experiment, we took α = 0.2.The CFO effects were eliminated via a PLL initialized with the CFO estimate of (40).One can see that the larger CFO r gives better performance.It is important to note that the large CFOs involve bandwidth expansion.The percentage of bandwidth expansion can be calculated as CFO r /(T s W), where W = 11 MHz is the bandwidth of the pulse.For CFO r = 0.3 and T s = 1/(symbol rate), the percentages of bandwidth expansion for symbol rates 10 M/sec, 11 M/sec and 12.19 M/sec are, respectively, 27.27%, 30%, and 33.25%.

Conclusions
A blind K-user separation scheme has been proposed that relies on intentional user delays, optimal pulse-shape waveform design, and also combines blind user separation with SIC.The proposed approach achieves low SER at a reasonable SNR level.Simulation results for the K = 2 case have confirmed that the proposed pulse design leads to SER performance better than that of conventional pulse-shape waveforms.The intentional delay was equal to half a symbol interval, which means that the users still overlap significantly during their transmissions.The use of intentional delay is necessitated by the fact that, although small user delay and CFO differences help preserve the identifiability of the problem, in practice, they may not suffice to separate the users.Also, although the proposed approach can work for any number of users, as the number of users increases, the CFO and delay differences become smaller, which makes the separation more difficult.Based on our experiments, small CFO differences did not affect performance.Although introducing large intentional CFO differences among users could help, that would increase the effective bandwidth.A new ALOHA-type protocol that separates second-order collision based on the ideas described in this paper, along with a software-defined radio implementation can be found in [17].
In the last step of (A.12), we assumed that P is large and τ k and τ l are small, and also that p(T s ) = 0.
In the same way h k1 (0) T h k1 (0) can be approximated as

3. 1 .
Assumptions.The following assumptions are sufficient for user separation.

Figure 3 :Figure 4 :
Figure 3: Pulse-shapes in the frequency domain for symbol rate 10 M/sec.

20 Figure 5 :
Figure 5: Pulse-shapes in the frequency domain for symbol rate 12.19 M/sec.

Figure 6 :
Figure 6: SER performance for different pulse-shapes for symbol rate 10 M/sec, with CFOs randomly chosen within the range [0, 0.001/T s ].

Figure 7 :
Figure 7: SER performance for different pulse-shapes and different symbol rates, with CFOs randomly chosen within the range [0, 0.001/T s ].

PFigure 8 :
Figure 8: SER performance comparison for different oversampling factors P, with CFOs randomly chosen within the range [0, 0.001/T s ].

Figure 11 :
Figure 11: SER performance comparison for random delay only at different symbol rates, with CFOs randomly chosen within the range [0, 0.001/T s ].

Figure 12 :
Figure 12: SER performance comparison for random delay only and different amounts of CFO.