Digital Receiver Design for Transmitted Reference Ultra-Wideband Systems

A complete detection, channel estimation, synchronization, and equalization scheme for a transmitted reference (TR) ultra-wideband (UWB) system is proposed in this paper. The scheme is based on a data model which admits a moderate data rate and takes both the interframe interference (IFI) and the intersymbol interference (ISI) into consideration. Moreover, the bias caused by the interpulse interference (IPI) in one frame is also taken into account. Based on the analysis of the stochastic properties of the received signals, several detectors are studied and evaluated. Furthermore, a data-aided two-stage synchronization strategy is proposed, which obtains sample-level timing in the range of one symbol at the ﬁrst stage and then pursues symbol-level synchronization by looking for the header at the second stage. Three channel estimators are derived to achieve joint channel and timing estimates for the ﬁrst stage, namely, the linear minimum mean square error (LMMSE) estimator, the least squares (LS) estimator, and the matched ﬁlter (MF). We check the performance of di ﬀ erent combinations of channel estimation and equalization schemes and try to ﬁnd the best combination, that is, the one providing a good tradeo ﬀ between complexity and performance.


Introduction
Ultra-wideband (UWB) techniques can provide high speed, low cost, and low complexity wireless communications with the capability to overlay existing frequency allocations [1].Since UWB systems employ ultrashort low duty cycle pulses as information carriers, they suffer from stringent timing requirements [1,2] and complex multipath channel estimation [1].Conventional approaches require a prohibitively high sampling rate of several GHz [3] and an intensive multidimensional search to estimate the parameters for each multipath echo [4].
Detection, channel estimation, and synchronization problems are always entangled with each other.A typical approach to address these problems is the detection-based signal acquisition [5].A locally generated template is correlated with the received signal, and the result is compared to a threshold.How to generate a good template is the task of channel estimation, whereas how to decide the threshold is the goal of detection.Due to the multipath channel, the complexity of channel estimation grows quickly as the number of multipath components increases, and because of the fine resolution of the UWB signal, the search space is extremely large.
Recent research works on detection, channel estimation, and synchronization methods for UWB have focused on low sampling rate methods [6][7][8][9] or noncoherent systems, such as transmitted reference (TR) systems [5,10], differential detectors (DDs) [11], and energy detectors (EDs) [9,12].In [6], a generalized likelihood ratio test (GLRT) for framelevel acquisition based on symbol rate sampling is proposed, which works with no or small interframe interference (IFI) and no intersymbol interference (ISI).The whole training sequence is assumed to be included in the observation window without knowing the exact starting point.Due to its low duty cycle, an UWB signal belongs to the class of signals that have a finite rate of innovation [7].Hence, it can be sampled below the Nyquist sampling rate, and the timing information can be estimated by standard methods.The theory is developed under the simplest scenario, and extensions are currently envisioned [13].The timing recovery algorithm of [8] makes cross-correlations of successive symbol-long received signals, in which the feedback controlled delay lines are difficult to implement.In [9], the authors address a timing estimation comparison among different types of transceivers, such as stored-reference (SR) systems, ED systems, and TR systems.The ED and the TR systems belong to the class of noncoherent receivers.Although their performances are suboptimal due to the noise contaminated templates, they attract more and more interest because of their simplicity.They are also more tolerant to timing mismatches than SR systems.The algorithms in [9] are based on the assumption that the frame-level acquisition has already been achieved.Two-step strategies for acquisition are described in [14,15].In [14], the authors use a different search strategy in each step to speed up the procedure, which is the bit reversal search for the first step and the linear search for the second step.Meanwhile, the two-step procedure in [15] finds the block which contains the signal in the first step, and aligns with the signal at a finer resolution in the second step.Both methods are based on the assumption that coarse acquisition has already been achieved to limit the search space to the range of one frame and that there are no interferences in the signal.
From a system point of view, noncoherent receivers are considered to be more practical since they can avoid the difficulty of accurate synchronization and complicated channel estimation.One main obstacle for TR systems and DD systems is the implementation of the delay line [16].The longer the delay line is, the more difficult it is to implement.For DD systems [11], the delay line is several frames long, whereas for TR systems, it can be only several pulses long [17], which is much shorter and easier to implement [18].ED systems do not need a delay line, but suffer from multiple access interference [19], since they can only adopt a limited number of modulation schemes, such as on-off keying (OOK) and pulse position modulation (PPM).A two-stage acquisition scheme for TR-UWB systems is proposed in [5], which employs two sets of direct-sequence (DS) code sequences to facilitate coarse timing and fine aligning.The scheme assumes no IFI and ISI.In [20], a blind synchronization method for TR-UWB systems executes an MUSIC-kind of search in the signal subspace to achieve highresolution timing estimation.However, the complexity of the algorithm is very high because of the matrix decomposition.
Recently, a multiuser TR-UWB system that admits not only interpulse interference (IPI), but also IFI and ISI was proposed in [21].The synchronization for such a system is at low-rate sample-level.The analog parts can run independently without any feedback control from the digital parts.In this paper, we develop a complete detection, channel estimation, synchronization, and equalization scheme based on the data model modified from [21].Moreover, the performance of different kinds of detectors is assessed.A twostage synchronization strategy is proposed to decouple the search space and speed up synchronization.The property of the circulant matrix in the data model is exploited to reduce the computational complexity.Different combinations of channel estimators and equalizers are evaluated to find the one with the best tradeoff between performance and complexity.The results confirm that the TR-UWB system is a practical scheme that can provide moderate data rate communications (e.g., in our simulation setup, the data rate is 2.2 Mb/s) at a low cost.
The paper is organized as follows.In Section 2, the data model presented in [21] is summarized and modified to take the unknown timing into account.Further, the statistics of the noise are derived.The detection problem is addressed in Section 3. Channel estimation, synchronization, and equalization are discussed in Section 4. Simulation results are shown and assessed in Section 5. Conclusions are drawn in Section 6.
Notation.We use upper (lower) bold face letters to denote matrices (column vectors).

Asynchronous Single User Data Model
The asynchronous single user data model derived in the following paragraphs uses the data model in [21] as a starting point.We take the unknown timing into consideration and modify the model in [21].

Single Frame.
In a TR-UWB system [10,21], pairs of pulses (doublets) are transmitted in sequence as shown in Figure 1.The first pulse in the doublet is the reference pulse, whereas the second one is the data pulse.Since both pulses go through the same channel, the reference pulse can be used as a "dirty template" (noise contaminated) [8] for correlation at the receiver.One frame-period T f holds one doublet.Moreover, N f frames constitute one symbol period T s = N f T f , which is carrying a symbol s i ∈ {−1, +1}, spread by a pseudorandom code c j ∈ {−1, +1}, j = 1, 2, . . ., N f , which is repeatedly used for all symbols.The polarity of a data pulse is modulated by the product of a frame code and a symbol.The two pulses are separated by some delay interval D m , which can be different for each frame.The delay intervals are in the order of nanoseconds and D m T f .The receiver employs multiple correlation branches corresponding to different delay intervals.To simplify the system, we use a single delay and one correlation branch, which implies D m = D. Figure 1 also presents an example of the receiver structure for a single delay D. The integrate-and-dump (I&D) integrates over an interval of length T sam .As a result, one frame results in P = T f /T sam samples, which is assumed to be an integer.
The received one-frame signal ( jth frame of ith symbol) at the antenna output is where τ is the unknown timing offset, h(t) = h p (t) g(t) of length T h with h p (t) the UWB physical channel and g(t) the pulse shape resulting from all the filter and antenna effects, and n(t) is the bandlimited additive white Gaussian noise (AWGN) with double-sided power spectral density N 0 /2 and bandwidth B. Without loss of generality, we may assume that the unknown timing offset τ in ( 1) is in the range of one symbol period, τ ∈ [0, T s ), since we know the signal is present by detection at the first step (see Section 3) and propose to find the symbol boundary before acquiring the package header (see Section 4).Then, τ can be decomposed as where δ = τ/T sam ∈ {0, 1, . . ., L s − 1} denotes the samplelevel offset in the range of one symbol with L s = N f P, the symbol length in terms of number of samples, and ∈ [0, T sam ) presents the fractional offset.Sample-level synchronization consists of estimating δ.The influence of will be absorbed in the data model and becomes invisible as we will show later.
Based on the received signal r(t), the correlation branch of the receiver computes where with Note that n 0 [n] is the noise autocorrelation term, and n 1 [n] encompasses the signal-noise cross-correlation term and the noise autocorrelation term.Their statistics will be analyzed later.Taking into consideration, we can define the channel correlation function similarly as in [21] R(Δ, m) where h(t) = 0, when t > T h or t < 0. Therefore, the first term in (3) can be denoted as s i c j nTsam can also be rewritten in a similar way, leading x[n] to be where P h = T h /T sam is the channel length in terms of number of samples, and R(0, m) is always nonnegative.
Although R(2D, m + D/T sam ) is always very small compared to R(0, m), we do not ignore it to make the model more accurate.We also take the two bias terms into account, which are the cause of the IPI and are independent of the data symbols and the code.Now, we can define the P h × 1 channel energy vector h with entries h m as where R(0, m) ≥ 0. Further, the P h × 1 bias vector b with entries b m is defined as Note that these entries will change as a function of , although is not visible in the data model.As we stated before, sample-level synchronization is limited to the estimation of δ.Using (8) and (9), x[n] can be represented as Now we can turn to the noise analysis.A number of papers have addressed the noise analysis for TR systems [22][23][24][25].The noise properties are summarized here, and more details can be found in Appendix A. We start by making the assumptions that D 1/B, T sam 1/B, and the timebandwidth product 2BT sam is large enough.Under these assumptions, the noise autocorrelation term n 0 [n] can be assumed to be a zero mean white Gaussian random variable with variance σ 2 0 = N 2 0 BT sam /2.The other noise term n 1 [n] includes the signal-noise cross-correlation and the noise autocorrelation, and can be interpreted as a random disturbance of the received signal.Let us define two other P h × 1 channel energy vectors h and h with entries h m and h m to be used in the variance of n 1 [n] as follows: Using those definitions and under the earlier assumptions, n 1 [n] can also be assumed to be a zero mean Gaussian random variable with variance This indicates that all the noise samples are uncorrelated with each other and have a different variance depending on the data symbol, the frame code, the channel correlation coefficients, and the noise level.Note that the noise model is as complicated as the signal model.

Multiple
Frames and Symbols.Now let us extend the data model to multiple frames and symbols.We assume the channel length P h is not longer than the symbol length L s .A single symbol with timing offset τ will then spread over at most three adjacent symbol periods.Define which is an L s -long sample vector.By stacking M + N − 1 such received sample vectors into an ML s × N matrix where N indicates the number of samples in each row of X, and M denotes the number of sample vectors in each column of X, we obtain the following decomposition: where N 1 is the noise matrix similarly defined as X, and the structure of the other matrices is illustrated in Figure 2. We first define a code matrix C. It is a block Sylvester matrix of size (L s + P h − P) × P h , whose columns are shifted versions of the extended code vector: Then we define a bias matrix B which is of size (L s + P h − P) × N f made up by shifted versions of the bias vector b with a shift step of P samples, as shown in Figure 3.The matrix B δ of size ML s × (MN f + 2N f ) also has M+2 block columns, the nonzero parts of which are obtained from the bias matrix B in the same way as C δ .Since the bias is independent of the data symbols and the code, it is the same for each frame.Each column of the resulting matrix B δ 1 (MN f +2N f )×N is the same and has a period of P samples.Defining b f to be the P × 1 bias vector for one such period, we have Note that b f is also a function of δ, but since it is independent of the code, we cannot extract the timing information from it.
Recalling the noise analysis of the previous section, the noise matrix N 1 has zero mean and contains uncorrelated samples with different variances.The matrix Λ, which collects the variances of each element in N 1 , is where H δ and H δ have exactly the same structure as B δ , only using h and h instead of b.They all have the same periodic property, if multiplied by 1. Defining h f and h f to be the two P × 1 vectors for one such period, we obtain

Detection
The first task of the receiver is to detect the existence of a signal.In order to separate the detection and the synchronization problems, we assume that the transmitted signal starts with a training sequence and assign the first segment of the training sequence to detection only.In this segment, we transmit all "+1" symbols and employ all "+1" codes.It is equivalent to sending only positive pulses for some time.This kind of training sequence bypasses the code and the symbol sequence synchronization.Therefore, we do not have to consider timing issues when we handle the detection problem.The drawback is the presence of spectral peaks as a result of the periodicity.It can be solved by employing a time hopping code for the frames.We omit this in our discussion for simplicity.It is also possible to use a signal structure other than TR signals for detection, such as a positive pulse training with an ED.Although the ED doubles the noise variance due to the squaring operation, the TR system wastes half of the energy to transmit the reference pulses.Therefore, they would have a similar detection performance for the same signal-to-noise ratio (SNR), that is, the ratio of the symbol energy to the noise power spectrum density.We keep the TR structure for detection in order to avoid additional hardware for the receiver.
In the detection process, we assume that the first training segment is 2M 1 symbols long, and the observation window is M 1 symbols long (M 1 L s = M 1 N f P samples equivalently).We collect all the samples in the observation window, calculate a test statistic, and examine whether it exceeds a threshold.If not, we jump into the next successive observation window of M 1 symbols.The 2M 1 -symbol-long training segment makes sure that there will be at least one moment, at which the M 1 -symbol-long observation window is full of training symbols.In this way, we speed up our search procedure by jumping M 1 symbols.Once the threshold is exceeded, we skip the next 2M 1 symbols in order to be out of the first segment of the training sequence and we are ready to start the channel estimation and synchronization at the sample-level (see Section 4).There will be situations where the observation window only partially overlaps the signal.However, for simplicity, we will not take these cases into account, when we derive the test statistic.If these cases happen and the test statistic is larger than the threshold, we declare the existence of a signal, which is true.Otherwise, we miss the detection and shift to the next observation window, which is then full of training symbols giving us a second chance to detect the signal.Therefore, we do not have to distinguish the partially overlapped cases from the overall included case.We will derive the test statistic using only two hypotheses indicated below.But the evaluation of the detection performance will take all the cases into account.

Detection Problem Statement.
Since we only have to tell whether the whole observation window contains a signal or not, the detection problem is simplified to a binary hypothesis test.We first define the M 1 N f P × 1 sample vector . ., (k+M 1 −1)N f P, which collects all the samples in the observation window.The hypotheses are as follows.
(1) H 0 : there is only noise.Under H 0 , according to the analysis from the previous section, x is modeled as where n 0 is the noise vector with entries n 0

, and a
∼ indicates approximately distributed according to.The Gaussian approximation for x is valid based on the assumptions in the previous section.
(2) H 1 : signal with noise is occupying the whole observation window.Under H 1 , the data model ( 14) and the noise model (17) can be easily specified according to the all "+1" training sequence.We define H δ having the same structure as B δ , only taking h instead of b.It also has a period of P samples in each column, if multiplied by 1. Defining h f to be the P × 1 vector for one such period, we have By selecting M = M 1 and N = 1 for ( 14) and taking ( 16), ( 18), ( 19) and ( 22) into the model, the sample vector x can be decomposed as where the zero mean noise vector n 1 has uncorrelated entries and the variances of each element in n 1 are given by Due to the all "+1" training sequence, the impact of the IFI is to fold the aggregate channel response into one frame, so the frame energy remains constant.Normally, the channel correlation function is quite narrow, so R(D, m) R(0, m) and R(2D, m) R(0, m).Then, we can have the relation Defining the P × 1 frame energy vector . ., P and frame energy E f = 1 T P z f , we can simplify x and λ Based on the analysis above and the assumptions from the previous section, x can still be assumed as a Gaussian vector in agreement with [23] x where diag(a) indicates a square matrix with a on the main diagonal and zeros elsewhere.

Detector Derivation.
The test statistic is derived using H 0 and H 1 .It is suboptimal, since it ignores other cases.But it is still useful as we have analyzed before.The Neyman-Pearson (NP) detector [26] decides where γ is found by making the probability of false alarm P FA to satisfy The test statistic is derived by taking the stochastic properties of x under the two hypotheses into L(x) (29) and eliminating constant values.It is given by where A detailed derivation is presented in Appendix B. Then the threshold γ will be found to satisfy Hence, for each observation window, we calculate the test statistic T(x) and compare it with the threshold γ.If the threshold is exceeded, we announce that a signal is detected.The test statistic not only depends on the noise knowledge σ 2 0 but also on the composite channel energy profile z f [i].All data samples make a weighted contribution to the test statistic, since they have different means and variances.The larger z f [i]/σ 2 0 is, the heavier the weighting coefficient is.If we would like to employ T(x), we have to know σ 2 0 and z f [i] first.Note that σ 2 0 can be easily estimated, when there is no signal transmitted.However, the estimation of the composite channel energy profile z f [i] is not as easy, since it appears in both the mean and the variance of x under H 1 .

Detection Performance Evaluation.
Until now, the optimal detector for the earlier binary hypothesis test has been derived.The performance of this detector working under real circumstances has to be evaluated by taking all the cases into account.As we have described before, there are moments where the observation window partially overlays the signal.They can be modeled as other hypotheses H j , j = 2, . . ., M 1 N f P. Applying the same test statistic T(x) under these hypotheses including H 1 , the probability of detection is defined as We would obtain P D,1 > P D, j , j = 2, . . ., M 1 N f P. Since the observation window collects the maximum signal energy under H 1 and the test statistic is optimized to detect H 1 , it should have the highest possibility to detect the signal.Furthermore, if we miss the detection under H j , j = 1, . . ., M 1 N f P, we still have a second chance to detect the signal with a probability of P D,1 in the next observation window, recalling that the training sequence is 2M 1 symbols long.Therefore, the total probability of detection for this testing procedure is P D, j + (1 − P D, j )P D,1 , j = 1, . . ., M 1 N f P, which is larger than P D,1 and not larger than P D,1 + (1 − P D,1 )P D,1 .Since all hypotheses H j , j = 1, . . ., M 1 N f P have equal probability, we can obtain that the overall probability of detection P Do for the detector T(x) is where P D,1 < P Do < P D,1 + (1 − P D,1 )P D,1 .Since the analytical evaluation of P Do is very complicated, we just derive the theoretical performance of P D,1 under H 1 .In the simulations section, we will obtain the total P Do by Monte Carlo simulations and compare it with P D,1 and P D,1 + (1 − P D,1 )P D,1 , which can be used as boundaries for P Do .
A theoretical evaluation of P D,1 is carried out by first analyzing the stochastic properties of T(x).As T(x) is composed of two parts, we can define Then we have First, we have to know the probability density function (PDF) of T(x).However, due to the correlation between the two parts, it can only be found in an empirical way by generating enough samples of T(x) and making a histogram to depict the relative frequencies of the sample ranges.Therefore, we simply assume that T 1 (x) and T 2 (x) are uncorrelated, and T(x) is a Gaussian random variable.The mean (variance) of T(x) is the sum of the weighted means (variances) of the two parts.The larger the sample number M 1 N f P is, the better the approximation is, but also the longer the detection time is.There is a tradeoff.In summary, T(x) follows a Gaussian distribution as follows: The mean and the variance of T 1 (x) can be easily obtained based on the assumption that x is a Gaussian vector.The stochastic properties of T 2 (x) are much more complicated.More details are discussed in Appendix C. All the performance approximations are summarized in Table 1, where the function Q(•) is the right-tail probability function for a Gaussian distribution.A special case occurs when P = 1, which means that one sample is taken per frame (T sam = T f ).For this case, where no oversampling is used, we have constant energy E f and constant noise variance σ 2 1 = 2N 0 E f + σ 2 0 for each frame.Then the weighting parameters for each sample in the detector would be exactly the same.We can eliminate them and simplify the test statistic to (39) Therefore, T 2 (x)/σ 2 0 will follow a central Chi-squared distribution under H 0 , and T 2 (x)/σ 2 1 will follow a noncentral Chisquared distribution under H 1 .We calculate the threshold for T 2 (x) as and the probability of detection under H 1 as where the functions Q χ 2 ν (x) and Q χ 2 ν (λ) (x) are the righttail probability functions for a central and noncentral Chisquared distribution, respectively.The statistics of T 1 (x) can be obtained by taking into Table 1, and multiplying the means with σ 2 1 /E f and the variances with σ 4  1 /E 2 f .As a result, the threshold , which can be easily obtained.The P D,1 of T (x) could be evaluated in the same way as T(x) in Table 1.
The theoretical contributions of T 1 (x) and T 2 (x) to T (x) are assessed in Figure 4.The simulation parameters are set to M 1 = 8, N f = 15, T f = 30 ns, T p = 0.2 ns, and B ≈ 2/T p .For the definition of E p /N 0 , we refer to Section 5.The detector based on T 1 (x) (dashed lines) plays a key role in the performance of the detector based on T (x) (solid lines) under H 1 .For low SNR, they are almost the same, since T 1 (x) can be directly derived by ignoring the signalnoise cross-correlation term in the noise variance under H 1 .There is a small difference between them for medium SNRs.T 2 (x) (dotted lines) has a performance loss of about 4 dB compared to T (x).Thanks to the ultra-wide bandwidth of the signal, the weighting parameter N 0 /σ 0 2 greatly reduces the influence of T 2 (x) on T (x).It enhances the performance of T (x) slightly in the medium SNR range.According to these simulation results and the impact of the weighting parameter N 0 /σ 2 0 , we can employ T 1 (x) instead of T (x).It has a much lower calculation cost and almost the same performance as T (x).
Furthermore, the influence of the oversampling rate P to the P D,1 of T(x) can be ignored because the oversampling only affects the performance of T 2 (x), which only has a very small influence on T(x).Therefore, the impact of the oversampling can be neglected.In Section 5, we will evaluate the P D,1 of T(x) using the IEEE UWB channel model by a quasi-analytical method and also by Monte Carlo simulations.Based on the simulation results in this section, we can predict that for small P (P > 1), the P D,1 for T(x) will be more or less the same as the P D,1 for T (x) or T 1 (x).

Channel Estimation, Synchronization, and Equalization
After successful signal detection, we can start the channel estimation and synchronization phase.The sample-level synchronization finds out the symbol boundary (estimates the unknown offset δ), and the result can later on be used for symbol-level synchronization to acquire the header.This two-stage synchronization strategy decomposes a twodimensional search into two one-dimensional searches, reducing the complexity.The channel estimates and the timing information can be used for the equalizer construction.
Finally, the demodulated symbols can be obtained.

Bias Estimation.
As we have seen in the asynchronous data model, the bias term is undesired.It does not have any useful information, but it disturbs the signal.We will show that this bias seriously degrades the channel estimation performance later on.The second segment of the training sequence consists of "+1, −1" symbol pairs employing a random code.The total length of the second segment should be M 1 + 2N s symbols, which includes the budget for jumping 2M 1 symbols after the detection.The "+1, −1" symbol pairs can be used for bias estimation as well as channel estimation.
Since the bias is independent of the data symbols and the  useful signal part has zero mean, due to the "+1, −1" training symbols, we can estimate the L s ×1 bias vector of one symbol,

Channel Estimation.
To take advantage of the second segment of the training sequence, we stack the data samples as which is equivalent to picking only odd columns of X in (14) with M = 2 and N = 2N s − 1.As a result, each column depends on the same symbols, which leads to a great simplification of the decomposition in ( 14) as follows: where N 1 is the noise matrix similarly defined as X.For simplicity, we only count the noise autocorrelation term with zero mean and variance σ 2 0 into N 1 , where σ 2 0 can be easily estimated in the absence of a signal.Because we jump into this second segment of the training sequence after detecting the signal, we do not know whether the symbol s k is "+1" or "−1".Rewriting (46) in another form leads to where T , and the vector h ssδ of length 2L s blends the timing and the channel information, which contains two channel energy vectors with different signs, s k h and −s k h, located according to δ as follows: where circshift (a, n) circularly shifts the values in the vector a by |n| elements (down if n > 0 and up if n < 0).According to (47) and assuming the channel energy has been normalized, the linear minimum mean square error (LMMSE) estimate of h ssδ then is Defining where a(m : n) refers to element m through n of a, we can obtain a symbol-long LMMSE channel estimate as According to a property of circulant matrices, C s can be decomposed as C s = F ΩF H , where F is the normalized DFT matrix of size 2L s × 2L s , and Ω is a diagonal matrix with the frequency components of the first row of C s on the diagonal.Hence, the matrix inversion in (49) can be simplified dramatically.Observing that a circulant matrix, the bias term actually does not have to be removed in (49), since it is implicitly removed when we calculate (50).Therefore, we do not have to estimate the bias term explicitly for channel estimation and synchronization.When the SNR is high, ) can be replaced by It is a least squares (LS) estimator and equivalent to a deconvolution of the code sequence in the frequency domain.
On the other hand, when the SNR is low, which is equivalent to a matched filter (MF).The MF can also be processed in the frequency domain.The LMMSE estimator in (49), the LS estimator in (52), and the MF in (53) all have a similar computational complexity.However, for the LMMSE estimator, we have to estimate σ 2 0 and the channel energy.As an example, we show the performance of these channel estimates under high SNR conditions (the simulation parameters can be found in Section 5). Figure 5 indicates the symbol-long channel estimate h δ with bias removal (implicitly obtained) and | h ssδ (1 : L s )| without bias removal, where

EURASIP Journal on Wireless Communications and Networking
Ns for the LMMSE and h ssδ = (1/σ 2 0 )F Ω H F H X1 Ns for the MF.When the SNR is high, the LMMSE estimator is expected to have a similar performance as the LS estimator.Thus, we omit the LS estimator in Figure 5.The MF for h δ (dashed line) has a higher noise floor than the LMMSE estimator for h δ (solid line), since its output is the correlation of the channel energy vector with the code autocorrelation function.The bias term lifts the noise floor of the channel estimate resulting from the LMMSE estimator (dotted line) and distorts the estimation, while it does not have much influence on the MF (dashed line with + markers).The stars in the figure present the real channel parameters as a reference.The position of the highest peak for each curve in Figure 5 indicates the timing information and the area around this highest peak is the most interesting part, since it shows the estimated channel energy profile.Although the LMMSE estimator without bias suppresses the estimation errors over the whole symbol period, it has a similar performance as all the other estimators in the interesting part.

Sample-Level Synchronization.
The channel estimate h δ has a duration of one symbol.But we know that the true channel will generally be much shorter than the symbol period.We would like to detect the part that contains most of the channel energy and cut out the other part in order to be robust against noise.This basically means that we have to estimate the unknown timing δ.Define the search window length as L w in terms of the number of samples (L w > 1).The optimal length of the search window depends on the channel energy profile and the SNR.We will show the impact of different window lengths on the estimation of δ in the next section.Define h wδ = [ h T sδ , − h T sδ (1 : L w − 1)] T , and define δ as the δ estimate as follows: This is motivated as follows.According to the definition of h sδ , when δ > L s − P h , h sδ will contain channel information partially from s k h and partially from −s k h, which have opposite signs.In order to estimate δ, we circularly shift the search window to check all the possible sample positions in h sδ and find the position where the search window contains the maximum energy.If we do not adjust the signs of the two parts, the δ estimation will be incorrect when the real δ is larger than L s − P h .This is because the two parts will cancel each other, when both of them are encompassed by the search window.That is the reason why we construct h wδ by inverting the sign of the first L w − 1 samples in h sδ and attaching them to the end of h sδ .Moreover, the estimator (54) benefits from averaging the noise before taking the absolute value.

Equalization and Symbol-Level Synchronization.
Based on the channel estimate h δ and the timing estimate δ, we select a part of h δ to build three different kinds of equalizers.Since the MF equalizer cannot handle IFI and ISI, we only select the first P samples (the frame length in terms of number of samples) of circshift( h δ , − δ) as h p .The code matrix C is specified by assigning P h = P.The estimated bias b s can be used here.We skip the first δ data samples and collect the rest of the data samples in a matrix X δ of size L s × N as in the data model ( 14) but with M = 1.Therefore, the MF equalizer is constructed as where s is the estimated symbol vector.Moreover, we also construct a zero-forcing (ZF) equalizer and an LMMSE equalizer by replacing h with h, which collects the first P h samples (the channel length estimate in terms of number of samples) of circshift( h δ , − δ), and using δ = (L s − δ) mod L s in the data model (14).The channel length estimate P h could be obtained by setting a threshold (e.g., 10% of the maximum value of h δ ) and counting the number of samples beyond it in h δ .These equalizers can resolve the IFI and the ISI to achieve a better performance at the expense of a higher computational complexity.The estimated bias b s can also be used.We collect the samples in a data matrix X of size 2L s ×N similar as the data model ( 14) with M = 2. Then the ZF equalizer gives Training sequence and the LMMSE equalizer gives where Φ = C δ (I 4 ⊗ h).S is a 4 × N symbol matrix.We can choose either the second or the third row of S as the demodulated symbol sequence.
Until now, the sample-level synchronization confirms the boundaries of the symbols.However, it is not able to explore the boundary of the training header, since the second segment of the training sequence just employs pairs of "+1,−1" symbols.After the sample-level synchronization, the demodulation is triggered.The third segment of the training sequence is a known training symbol pattern.Once we find the matching symbol pattern, we can distinguish the training header.Symbol-level synchronization is then accomplished.To summarize the training segments used in every stage, the overall structure of the training sequence is shown in Figure 6.

Simulation Results
We evaluate the performance of different detectors and the performance of different combinations of channel estimation and equalization schemes for a single user and single delay TR-UWB system.We use a Gaussian second derivative pulse, which is 0.2 ns wide.The delay interval D between two pulses in a doublet is 4 ns.The first segment of the training sequence is 2M 1 = 16 symbols long, all of which are composed of positive pulses.Hence, the observation window includes M 1 = 8 symbols.The second segment of the training sequence has M 1 + 2N s = 38 symbols and employs a pseudonoise (PN) code sequence.The code length N f is 15.The frame-period T f is 30 ns.The IEEE UWB channel model CM3 [27] is employed and truncated to 90 ns, which represents a NLOS channel.The oversampling rate P is 3, which results in T sam = 10 ns.We define E p /N 0 as the received aggregate pulse energy to noise ratio with   The noise samples may be mistakenly weighted heavily under real circumstances.On the other hand, T 1 (x) accumulates all the frame samples in the observation window, which is equivalent to equally weighting.According to these results, we can employ T 1 (x) because of its simplicity and similar performance as T(x).The performance gap for symbol-long estimates between the LS/LMMSE (dashed lines/solid lines) estimator and the MF is large.When we concentrate on the channel estimates in a limited range, such as 30 ns (lines with • markers) and 90 ns (lines with ♦ markers), the gap between the MF and the LS/LMMSE estimator is smaller.The for δ estimation is also assessed with different values of L w based on different channel estimators.From Figure 10, we see that the δ estimates based on MF (dotted lines), LS (dashed lines), and LMMSE (solid lines) channel estimates with the same L w have similar performance, and L w = 30 ns is the best choice among all.The MSE for δ with L w = 30 ns (lines with • markers) is saturated after the SNR reaches 10 dB.This is because we use NLOS channels, where the first path may not be the strongest and there is always remaining a fractional timing offset .Meanwhile the differences of the MSE for channel estimation with a 90-nanosecond range based on different methods (lines with ♦ markers) are quite small around 10 dB in Figure 9, which will be employed to construct the equalizer.As a result, we choose the MF as the channel estimator.
Furthermore, combinations of the MF channel estimator with different equalizers are investigated.We employ L w = 30 ns for synchronization.Figure 11 shows the BER performance.The BER performance for the MF equalizer (lines with • markers) approaches 0 after 12 dB, while the performances for the ZF (lines with * markers) and the LMMSE equalizers (lines with markers) approach 0 after 10 dB.Hence, the MF equalizer is 2 dB worse than the ZF and the LMMSE equalizer, and all of them employ 90 ns long channel estimates.The curves of the ZF equalizer and the LMMSE equalizer overlay each other.The bias does not have much impact on them.They have almost the same performance.As a result, the optimal combination considering cost and performance would be an MF channel estimator with a ZF equalizer.According to the results above, we can remark that the IFI after the integrate-anddump is not so serious in our simulation setup, since the channel energy attenuates exponentially and one frame contains most of the energy.The performance differences of different equalizers are not so obvious.However, the LMMSE estimator has the potential to handle more serious IFI and ISI.The effects of the bias on the BER performance can be ignored, but they have to be taken into account for the channel estimation (done implicitly, see Section 4.1).When we want to shorten the frame length to achieve a higher data rate, more interference will be generated.We then need a more accurate data model to handle this interference.

Conclusions
We have proposed a complete solution for signal detection, channel estimation, synchronization, and equalization in a TR-UWB system.The scheme is based on a data model, which takes IPI, IFI, and ISI into account and releases the frame time requirements to allow for higher data rate communications.Several detectors based on a specific training scheme are derived and assessed.We find that the simple detector, which sums up all the samples in the observation window and compares the result with a threshold, gives a good balance between performance and cost.Moreover, the joint channel and timing estimation is achieved in three different ways.The property of the circulant matrix in the data model is exploited to reduce the complexity of the algorithms.Then a two-stage synchronization strategy is proposed to first achieve sample-level synchronization and later to achieve symbol-level synchronization.Last but not least, three kinds of equalizers are derived.We evaluate different combinations of channel estimation and equalization schemes using the IEEE UWB channel model CM3, which shows that the TR-UWB system can be implemented with low cost and achieves moderate data rate communications.

A. Noise Analysis
The noise autocorrelation term n 0 where n(t) is band limited AWGN, and its autocorrelation function is As a result, we obtain similar results as in [24,25,30] as follows: where σ 2 0 = N 2 0 BT sam /2.These aggregate noise samples are uncorrelated with each other, recalling that T sam 1/B.This assumption has usually been satisfied by UWB signals (e.g., in our case T sam = 10 ns, B ≈ 2/T p = 10 GHz, then 2BT sam = 200).Also n 0 [n] and n 1 [n] can be assumed as Gaussian random variables by invoking the sampling theorem and the central limit theorem [28].

B. Detector Derivation
In summary, the statistics of x in (31) are

C. Statistic of the Detectors
C.1.Detector T 1 (x).Since x is assumed to be a Gaussian vector, T 1 (x) also follows a Gaussian distribution: , Actually, if the condition z f [i]/N 0 BT sam /4 is satisfied, which means the signal-to-noise ratio (SNR) is low, the term 2N 0 z f [i] can be ignored in the variance of x under H 1 , and then T 1 (x) can be derived directly.
C.2. Detector T 2 (x).Since the different entries of x have different weighting factors in T 2 (x), we collect the data samples bearing the same weighting factor into the same group.Therefore, there are P groups of data samples, and they are assumed to be uncorrelated.Each group (k+M1−1)N f −1 n=(k−1)N f x 2 [nP + i] follows a Chi-squared distribution.However, T 2 (x) is still assumed to be a Gaussian variable, as it is the sum of the weighted groups.Then, we can obtain H 0 : , H 1 : , where χ 2 ν is the central Chi-squared pdf with ν degrees of freedom, which has mean ν and variance 2ν.Meanwhile, χ 2 ν (λ) is the noncentral Chi-squared pdf with ν degrees of freedom and noncentrality parameter λ.Hence, it has mean ν + λ and variance 2ν + 4λ.

Figure 1 :
Figure 1: The transmitted UWB signal and the receiver structure.
The shift step is one sample.Its structure is shown in Figure3.The matrix C δ of size ML s × (MP h + 2P h ) is composed of M + 2 block columns, where δ = (L s − δ ) mod L s , δ ∈ {0, 1, . . ., L s − 1}.As long as there are more than two sample vectors (M > 2) stacked in every column of X, the nonzero parts of the block columns will contain M − 2 code matrices C. The nonzero parts of the first and last two block columns result from splitting the code matrix C according to δ : C i (2L s − i + 1 : 2L s , :) = C(1 : i, :) and C i (1 : L s + P h − P − i, :) = C(i + 1 : L s + P h − P, :), where A(m : n, :) refers to column m through n of A. The overlays between frames and symbols observed in C δ indicate the existence of IFI and ISI.

Figure 3 :
Figure 3: The structure of the code matrix C and the bias matrix B.

Figure 4 :
Figure 4: Performance comparison between T (x) and its components T 1 (x) and T 2 (x).

Figure 5 :
Figure 5: The symbol-long channel estimate h δ with bias removal and | h ssδ (1 : L s )| without bias removal, when SNR is 18 dB.

Figure 6 :
Figure 6: The signal structure for training sequence.

Figure 7 :P 3 P
Figure 7: Experimental and theoretical P D,1 performance comparison for T(x) with P = 3 and T 1 (x).

Figure 8 :
Figure 8: Experimental P Do for T(x) with P = 3 and T 1 (x).

Figure 9 :
Figure 9: MSE performance for channel estimation with different lengths.

Figure 10 :Figure 11 :
Figure 10: MSE performance for δ estimation with various L w 's.

Table 1 :
Statistical Analysis and Performance Evaluation for Different Detectors, P > 1, T sam = T f /P.
) is the Fourier transform of h(u − τ), u ∈ [(n − 1)T sam , nT sam ], which is a segment of the aggregate channel.Since the bandwidth B of n(t) is assumed much larger than the bandwidth of h(u − τ), u ∈ [(n − 1)T sam , nT sam ], we obtain ).Therefore, n 0 [n] has approximately zero mean, as a result of R n (D) ≈ 0 based on the assumption D 1/B.According to where H( f