Low-complexity synchronization algorithms for orthogonally modulated IR-UWB systems

Timing synchronization is a major issue for any communication system since it is essential to ensure its stable operation and reliable performance. In this paper, we compare two low-complexity synchronization algorithms for impulse radio ultra-wideband (IR-UWB) system, employing orthogonal pulse shape modulation (PSM). The two widely adopted modulation schemes for IR-UWB systems are binary pulse amplitude modulation and binary pulse position modulation. However, the possibility of generating orthogonal UWB pulses in recent years has motivated the use of orthogonal PSM which is particularly attractive as high-order modulation and also due to its possible robustness against ISI, and therefore is the focus of this paper. Relying on the unique signal format, the first algorithm applies simple overlap-add operation followed by energy detection to achieve synchronization. This approach is semi non-data-aided (NDA) because a part of the signal is specifically reserved to help enable synchronization. The other algorithm, on the other hand, exploits the discriminating nature of well-designed polarity codes and employs a series of code word matching and averaging operations to achieve synchronization. This approach is full NDA as there is no need to interrupt the data transmission. Based on the judicious change in the phase of transmitted signal applied for synchronization purposes, the second algorithm can also be used to extract synchronized aggregate templates. These templates are then used in demodulation, resulting in a low-complexity non-coherent alternative to complex Rake receivers. The two compared timing algorithms rely on simple overlap-add operations and thus remain operational under practical UWB settings. Simulation results are provided to demonstrate the efficient performance of proposed timing estimators.


Introduction
Ultra-wideband (UWB) radio has seen a growing interest among researchers since its approval as commercial technology for data communications as well as for radar applications by the Federal Communications Commission (FCC) in 2002 [1]. A large swathe of 7.5 GHz spectrum between 3.1 GHz to 10.6 GHz range (which accounts for the name ultra-wideband) with an extremely low power spectral density of −41.25 dBm/MHz is allocated for UWB communications. Impulse radio UWB (IR-UWB) is one potential candidate for implementing UWB systems, characterized by data transmission using trains of nanosecond level pulses in a discontinuous way. The interest in IR-UWB is attributed to many unique features such as its ability to coexist with licensed RF systems in underlay mode, simple baseband transceiver, low probability of interception and detection, high ranging resolution, and the ability to exploit rich multipath diversity are just few to mention [2].
The aforementioned attractive features, however, come at a cost of equally demanding design challenges such as dense multipath channel estimation, precise synchronization, function under severe interference from existing systems, multiple-access support, and receiver design. The stringent timing requirements pose a major challenge to the deployment of IR-UWB systems, and timing accuracy is fundamental to ensure their satisfactory performance. As multiple pulses, each located in its own frame, are used to represent one information-bearing symbol in IR-UWB, http://jwcn.eurasipjournals.com/content/2013/ 1/199 synchronization is typically performed in two stages. During the first stage, called acquisition stage, a coarse synchronization is carried out to quickly identify the symbol starting frame. The second stage, known as the tracking stage, aims at refining the acquisition stage estimator and reducing down the timing mismatch to less than a chip duration.
Although synchronization is a tough task to accomplish in any communication system, it becomes much more challenging in IR-UWB due to the need of nanosecondlevel precision and low-power impulsive UWB pulses. Indeed, the fine resolution obtained thanks to the wide signal bandwidth, results in large search space for the synchronization while extremely low power transmission means long sequence to be processed in order to develop a reliable synchronization criterion. Many resolvable multipaths due to short UWB pulses can also cause the receiver to lock with more than one arriving multipath component, thus resulting in multiple acquisition phases. Last but not the least, the transmitted signal is distorted by antennas and unknown frequency selective dense multipath channels [3,4], which further intricate the already challenging task. Different types of receivers need different levels of synchronization accuracy. The optimal coherent receiver (Rake) needs to align a locally generated template with the incoming received signal with an accuracy at the order of reciprocal of the signal bandwidth, which for UWB is in the order of tens of picoseconds. The lowcomplexity non-coherent receivers (transmitted reference [5], differential detector [6], etc.) slightly relax the synchronization requirements and typically need an accuracy in the nanoseconds range [7]. Nevertheless, in both cases, the synchronization requirements remain very strict. It was shown in [8][9][10] that a slight misalignment at the order of nanoseconds can severely degrade the IR-UWB system performance.

Overview of existing methods
A number of algorithms treat timing synchronization as part of channel estimation and aim at the joint estimation of timing offset and channel taps [11][12][13]. In [11], this is done using maximum likelihood criterion, whereas a least square based method looking for the minimum of Euclidean distance between received signal samples and a local replica of their noiseless components is presented in [12]. However, a formidably high sampling rate up to several gigahertz raises concerns over their implementations. Besides, very fast analog-to-digital converters (ADC) are needed in [12] as it is a fully digital approach. Treating timing estimation as a harmonic retrieval problem, a subspace-based method has been proposed in [13]. The implementation complexity involved in subspace analysis along with possible ill-conditioned Vandermonde systems in closely spaced multipaths limits its application in realistic UWB channels. The abovementioned algorithms require certain assumptions such as the absence of inter-frame interference (IFI) and inter-symbol interference (ISI), known multipath channel, and absence of timehopping codes, which are rather optimistic assumptions for practical UWB settings.
The design of low-complexity synchronization schemes using either symbol-rate or frame-rate sampling is, therefore, highly motivated in UWB in order to reduce the implementation complexity. One simple approach for synchronization in impulse radio is based on match-filtering the received signal with a locally generated 'clean template' and peak-picking the correlation samples. Evidently, the reference template must encompass multipath channel effect which is unknown at synchronization, thus needing a cumbersome task of channel estimation. A scheme, known as timing with dirty templates (TDT), was proposed in [14] to tackle this issue by utilizing a pair of successive symbol-long segments of the received signal, where one segment serves as template for the other. However, the main drawback of this approach is its poor performance due to the noise on noise effect from the dirty templates. An algorithm using orthogonal pulses in an alternative manner and then applying TDT algorithm is presented in [15]. Relying on periodic transmission of non-zero mean symbols, joint timing and template recovery algorithms via energy detection have been developed in [16,17], with universal applicability in the presence of ISI and multiuser interference (MUI). However, this asymmetric modulation aggravates the received signal-tonoise ratio (SNR), thus deteriorating bit error rate (BER) performance. Also, these algorithms need much longer sequence for reliable synchronization. Another class of synchronization algorithms capitalizes on the fine correlation properties of binary codes. One such algorithm with improved performance using fewer number of symbols is proposed in [18], which can be utilized under both non-data-aided (NDA) and DA scenarios. Exploiting the discriminative nature of similar binary codes, several other timing algorithms resilient to different types of interferences such as IFI and ISI [19], MUI [20], nearfar problem [21], and a low-complexity demodulator [22] have also been proposed.
The above-referenced methods inevitably apply serial searching over all possible candidate time shifts. The large search space of IR-UWB systems, thanks to their extremely wide bandwidth, means that such linear search will lead to an increased mean synchronization time http://jwcn.eurasipjournals.com/content/2013/1/199 (MST). A class of algorithms skipping the serial search technique has therefore been developed [23,24]. In [23], it is shown that bin reversal search is the most efficient search technique with much reduced MST. Two-stage synchronization is adopted in [24], where the first stage performs a rapid coarse search and reduces the search space to a small subset while the second stage identifies exact timing using serial search in this subset. A class of optimal search strategies is presented in [25] where fundamental limits on achievable MST are also provided, and it is shown that conventional serial search results in maximum MST. A promising algorithm is proposed in [26], which uses orthogonal UWB pulses and avoids searching. These rapid synchronization algorithms, however, mostly focus on coarse acquisition and lack the accuracy compared to serial searching-based approaches.

Contributions
One common characteristic of the synchronization techniques in the literature is that they are valid only for binary pulse amplitude modulation (BPAM) or/and binary pulse position modulation (BPPM), the two most popular modulation schemes in IR-UWB. These modulation schemes were widely adopted in the early years of UWB technology because of the difficulty to generate appropriate UWB pulses which can respect the severe FCC power constraints, thus limiting the choice. However, in recent years, the possibility of generating multiple mutually orthogonal and spectrally efficient pulses with the same widths [27,28] has encouraged the use of alternative orthogonal modulation (OM) schemes [29] for IR-UWB. Pulse shape modulation (PSM) is an interesting OM scheme in which information is conveyed by the shape of pulse [30,31]. These OM schemes are particularly attractive as high-rate multidimensional modulations, compared to high-order PAM and PPM [32]. This feature along with possible robustness of PSM against MUI and ISI makes it the focus of research work presented in this paper.
The main objective in this paper is to develop and compare low-complexity NDA synchronization algorithms for IR-UWB systems, employing orthogonal PSM. NDA algorithms are preferred as they do not interrupt the data transmission and can operate under 'cold start-up' scenarios where the receiver is not aware of the transmission start time. The target metric is the improved performance with low-complexity rather than reduced MST; therefore, we assume linear search for simplicity in our algorithms.
Most of the low-complexity algorithms presented in the second paragraph of Section 1.1 cannot be applied in PSM-modulated IR-UWB systems. For example, the algorithms based on matched filtering the received signal with either 'clean' or 'dirty' templates will not function for PSM in the NDA context as we are not aware of the orthogonal pulse being received. The algorithms based on the alternate periodic transmission of nonzero mean symbols followed by energy detection will also not function as there will be no zero-mean region after the first order averaging in the observation signal in the case of PSM, contrary to PAM. Similarly, the algorithms which benefit from the fine correlation properties of binary codes will result in inferior performance because in the NDA context, the two neighboring symbols may have different pulses and thus the impulsive nature of correlation is lost.
In order to deal with these issues, we developed in [33] an energy detection-based synchronization (EDS) algorithm exploiting the first order averaging and a judiciously designed transmitted signal, in a way that by simple overlap-add operation followed by energy detection, one can estimate the synchronization time. This algorithm has the advantage of achieving synchronization with no a-priori knowledge and remains equally valid for higherorder PSM. It will serve as a reference in the framework of this paper for a new synchronization algorithm, which is proposed along with a new SAT-based demodulation technique.
The two key contributions of this paper can be then summarized as follows.
1. The main limitation of EDS algorithm is that the judicious change in the signal format results in the loss of one frame per symbol, thereby making it a semi-NDA algorithm. Thus, we propose another approach, which exploits the discriminating nature of well-designed binary codes and does not incur any data loss. It estimates the timing offset by code matching followed by aggregating received signal segments and energy detection. This new code matching-based synchronization (CMS) algorithm provides much improved performance than EDS, especially in the case of relatively high number of symbols. Both of these algorithms remain functional under practical UWB settings of the unknown channel, pulse distortions by antennas, TH spreading, the presence of IFI and a moderate ISI, and even when multiple users are present. 2. In the course of establishing synchronization using CMS algorithm, we also get as a by-product an aggregate template which can be used to develop a non-coherent demodulation scheme for PSM, similar to the one proposed for BPAM in [16].
To the best of our knowledge, EDS and CMS are the first synchronization algorithms proposed in the literature in the framework of PSM-modulated IR-UWB signals.
The rest of this paper is organized as follows: Section 2 outlines the signal model, propagation channel, and http://jwcn.eurasipjournals.com/content/2013/1/199 synchronization preliminaries. The two synchronization algorithms and the non-coherent demodulation procedure along with the merits and feasibility discussion are presented in Section 3. Numerical results are provided in Section 4 to compare and validate the two algorithms, while conclusions are drawn in Section 5.

Notation
. and . represent integer floor and ceil operations, respectively, and [.] B denotes the modulo operation with base B.

PSM transmission model
For a typical IR-UWB system in a single-user scenario, equipped with TH codes and employing orthogonal PSM, the transmitted signal can be expressed as where M is the modulation order. Each pulse has a duration T ψ and satisfies dt is the pulse energy. Due to severe limitations imposed by FCC on transmission power, effective SNR per symbol is increased by repeating UWB pulses over N f frames with one pulse per frame to represent each data bit d(i). The symbol duration is thus T s = N f T f where T f is the frame duration. Spectrum smoothing and multiaccess are established by time shifting UWB pulses at multiples of chip duration T c using user-specific pseudo-random TH codes

Reception model
The UWB indoor propagation channel is frequency selective and can be modeled by a stochastic tapped delay line [3]. The frequency-selective nature of channel can lead to distortion of transmitted pulse which varies from path to path. A typical UWB channel impulse response can be expressed as where {λ l , τ l } L−1 l=0 are channel path gains and delays, respectively, satisfying τ l < τ l+1 , ∀ l. The function f k (t) includes the combined effect of individual pulse distortion and transmit/receive antenna effect. The UWB channel is also assumed to be quasi-static, i.e., channel taps remain invariant over a block of several symbols but may vary from block to block. For the purpose of elaboration simplicity, we may represent the channel as a weighted sum of time-shifted Dirac delta functions, i.e., h(t) = L−1 l=0 λ l δ(t − τ l ). This simplification does not affect the proposed algorithms as they are unaffected by the channel and antenna characteristics. In order to isolate propagation delay τ 0 from channel delays, channel response can be rewritten as The received signal is then obtained as the convolution product s(t) * h(t), corrupted by an additive white Gaussian noise n(t) with double-sided power spectral density N 0 /2: To develop our synchronization algorithms, we assume that both IFI and ISI are absent. This condition can be easily met by choosing (N h − 1)T c + T g ≤ T f . Note that this assumption is only imposed for analytic simplicity, and we will show the robustness of our algorithms against IFI and a moderate ISI with numerical results .

Problem formulation
In practical scenarios, the receiver is unaware of transmission starting time and channel propagation delay τ 0 . We assume that the receiver initiates the synchronization at time t 0 ≥ τ 0 , and we set τ 0 = 0 as it is only serving as a reference. Denoting t 0 = NT s − t φ , with N = t 0 /T s , the observation signal can be written as As the receiver aims at aligning to the starting time of the first information symbol after t 0 , i.e., to the time t = http://jwcn.eurasipjournals.com/content/2013/1/199 t 0 + t φ , thus the required synchronization parameter to be estimated is t φ ∈ [0, T s ).

Energy detection based algorithm
In order to achieve synchronization with energy detection, we first judiciously modify the conventional PSM symbol format in (1) as follows: where where α m i ∈ {0, 1} and {β m i , γ i } = ±1. From (5), it is clear that two changes have been made. First, the starting frame of each symbol is reserved and can be regarded as information-free pulse. Without loss of generality, we set c 0 = 0 hereafter. Secondly, pulses with alternate phase are used to represent a particular symbol, i.e., the data symbol d(i) is transmitted using ψ d(i) (t) and −ψ d(i) (t) alternately. The graphical explanation of these changes can be observed in Figure 1.
This modified transmitted symbol will correspondingly result in a modified received symbol as follows: Now, given (8), a simple energy detection-based algorithm is proposed exploiting the judiciously designed signal format of (5). First, we take T s -long K segments from the received signal x(t), given by where As we have assumed that both IFI and ISI are absent, therefore it is easy to observe that each segment x k (t) of size T s will span at most two successive symbols of p R,d(i) (t). Letting i = N + k + q where q = 0 or q = −1, (9) can be rewritten as Next, the mean of observation signal is found using sample mean estimator obtained from K segments as follows: whereη(t) is averaged noise. Ignoring noise brevity and substituting (8) in (11), we get From (7), it can be seen that where p m is the total number of symbols in a sequence of length K having d(k) = m. As K is sufficiently large, the sample mean can approximate to 0 even when p m is odd. Consequently, Exploiting the above fact, (12) can be simplified tō From (16), it is clear thatx(t) will have non-zero region only around t φ . Exploiting the zero guards, the objective function to estimate t φ can be developed aŝ where [.] T s is included asx(t) has size T s while integration in (17) needs periodic extension ofx(t), and T I is the integration interval. In the sequel, we will show that J(τ ) achieves its unique maximum only at t φ , i.e., J(t φ ) = T I 0 g 2 0 (t)dt. Let t = τ − t φ be the relative misalignment between t φ and candidate time shift τ with t ∈ (−T s , T s ). As the value of t leads to different results, we consider the two cases: t ∈ (−T s , 0] and t ∈ (0, T s ], separately. Specifically, if t ∈ (−T s , 0], the objective function J(τ ) can be given as Recalling that g(t) has a finite non-zero support within 0, T g , we get J(τ ) = , which can be rearranged as Clearly, the objective function J(τ ) is lower bounded by positive integral T s + t T I + t g 2 0 dt > 0, thus yielding a unique maximum if and only if (iff ) t = 0 or τ = t φ . Likewise, following the same steps when t ∈ [0, T s ), we obtain Again, by a similar argument, we can conclude that J(τ ) will achieve its maximum iff τ = t φ , thus validating the algorithm.

Demodulation
The detection statistic for i-th symbol in conventional correlation-based Rake receiver is given aŝ is the reference signal with {λ l ,τ l,0 } L−1 l=0 representing estimated channel parameters and p T,m (t) is as given in (5) with γ i = 1.
Note that the above-mentioned synchronization method can also be used for synchronization in BPSKmodulated IR-UWB systems, only by reserving the first frame as the other modification of alternate phase change is inherently available in BPSK. It is worth mentioning at this point that the reservation of one frame for synchronization purpose does not influence the BER performance to a large extent. For example, considering http://jwcn.eurasipjournals.com/content/2013/1/199 modified binary PSM signaling as in (5), the BER using the correlation-based detector of (21) can be found as where erfc( x e −t 2 dt is the complementary error function. The term (N f − 2)/N f is the result of one reserved frame; however, as N f is chosen sufficiently large in IR-UWB in order to increase effective SNR per symbol, it is reasonable to approximate this term to 1. Also, it is important to clarify at this point that this information-free frame is adapted only during the synchronization phase. After the synchronization is done, the transmitter returns to the conventional PSM scheme. As this synchronization phase constitutes a very small fraction (say less than 5%) of the total transmission time [16], the effect on the overall demodulation performance will be negligible.
Although the Rake receiver is considered to be optimal, unfortunately it needs L parallel correlators which make its implementation unfeasible for practical UWB channels. Also, the performance of Rake receiver is very sensitive to mistiming [8,9] and channel estimation errors [34]. These limitations of the Rake receiver motivate the use of non-coherent receivers for UWB [5][6][7], where the correlation between the received signal and a template derived from the received signal itself is performed. Another interesting alternative to the Rake receiver is proposed in [16] where a template, called synchronized aggregate template (SAT), is achieved as a by-product of the synchronization algorithm. This SAT-based receiver has much lower complexity and exhibits very attractive performance in the SNR range of practical interest. In the following section, we will show that by carefully designing the transmitted signal, we can develop a low-complexity SAT-based receiver for PSM-IR-UWB systems along with an improved synchronization algorithm.

Code matching based algorithm and SAT extraction
Analyzing the mean of observation signal in (12), it is evident that the symbol-long segments ofx(t) contain a version of p R,d(i) (t) circularly shifted by t φ . Due to the careful change brought in the signal format by introducing γ i in (5), the second part in the summation in (12) cancels out, leaving behind only the first frame. However, if we assume γ i = 1, i.e., do not modify the original PSM signal format and if t φ is estimated correctly, then (12) will result with binary PSM in Applying the law of large numbers, one can see that lim K→∞pR (t) = p R (t), i.e.,p R,0 (t) = p R,0 (t) = v 0 (t) and p R,1 (t) = p R,1 (t) = v 1 (t). Clearly, if we can separate the two parts ofp R (t) in (23), we will get the two desired reference signals as required for demodulation in (21) in case of binary PSM. This inspires us to think about an alternative solution which can provide not only the timing estimation but can also help in extracting the reference signals for demodulation.
Owing to the fact that the demodulator in (21) makes decision on the basis of the UWB waveform orthogonality, we may change the phase of the transmitted waveform in a way that it not only estimates the synchronization parameter but also separates the two template waveforms. Thus, instead of using ψ d(i) (t) for modulation, we will multiply ψ d(i) (t) with β i , defined as where Q ≥ K, with K being the number of symbols used for synchronization and Q being the number of symbols used for SAT recovery. As d(i) are independent and identically distributed (i.i.d) symbols taking the values {0, 1} equiprobably, we can split them into two groups, denoted as G 0 (i) := {i : d(i) = 0} and G 1 (i) := {i : d(i) = 1}. Choosing Q sufficiently large and using (24), the mean of the transmitted pulses can be shown to be Similarly, we can show that (1/Q) Thus, by judiciously changing the phase of the basic UWB pulse and performing two separate averaging http://jwcn.eurasipjournals.com/content/2013/1/199 operations oncet φ is known, SAT can be recovered from (12) aŝ Once SAT is recovered, we may proceed with our demodulation procedure in (21). It is worth mentioning that under the condition of large Q and equiprobable symbols, only Q out of 2Q symbols used for SAT recovery are modulated by twice the amplitude of the others. As this value of Q is very small compared to the channel coherence time [16], received SNR will not be greatly aggravated and thus the impact on BER performance will be negligible.
This alternation of the symbol phase according to (24) can therefore effectively extract the SATs for demodulation as long as we know t φ . Thus, the next target is to estimate the synchronization parameter while preserving the phase alternation. To achieve that purpose, the frames within a symbol are first multiplied with a bipolar code b having periodic autocorrelation function defined as with k = 0, 1, 2, . . .. Many sequences exhibit the above autocorrelation property (ACP), such as maximum length shift register sequences (m-sequence), Barker codes, etc. Applying these modifications to the transmitted symbol, we get accordingly the received and observation symbols as follows: A graphical explanation of changes applied to transmitted signal can be seen in Figure 2. The mean of observation signal is given as Next, we take frame-long segments from x k (t) and compensate for random TH delays and binary code {b j } N f −1 j=0 , followed by the signal aggregation operation.
where τ ∈ [0, T s ) is the candidate time shift and the noise term is ignored for brevity hereafter. Synchronization parameter t φ and candidate shift τ both can be expressed as an integer multiple of T f plus a remnant, i.e., the relative misalignment at any specific time shift can be denoted as (30) can be rewritten as Substituting p R,d(i) (t) from (28), we get To this end, it is worth noting that the dual purposes served by TH codes (i.e., spectrum smoothing and multiaccess) can be equally achieved by the user-specific orthogonal polarity codes b. Thus, we assume c m = c j = 0, ∀ m, j . As a result, (32) can be simplified as As the value of˜ leads to different results, we first consider the case when˜ ∈ − φ , 0 . Since the observed segmentsx b (t, τ ) have a finite support 0, T f , only finite values of i, j will contribute to non-zero summands in (32) under the assumption of no IFI and ISI. Thus, it is easy to find that Also, by the explanation in (25) about the averaging operation, it is clear that . Applying these simplifications, (33) can be expressed as Capitalizing on the ACP of bipolar code b defined in (27), (35) simply reduces tō It is easy to observe that when˜ = 0, the term corresponding to n τ = n φ + 1 in (36) disappears due to the fact that T g ≤ T f in the absence of IFI. Applying energy detection operation afterwards with integration interval equal to T g , it is thus clear that x b (t, τ ) achieves its maximum energy of N 2 f E g only whenñ = 0,˜ = 0, i.e., τ = t φ , where E g = T g 0 g 2 (t) dt is the energy of aggregate received segment.
Similarly, when˜ ∈ 0, T f − φ , we can show following the same procedure that Again, by similar argument we can conclude thatx b (t, τ ) attains its maximum iff τ = t φ . Building on the above http://jwcn.eurasipjournals.com/content/2013/1/199 analytic analysis, timing offset t φ can be estimated in NDA mode using the following optimization Note thatx(t) has a size T s , whereas integration in (38) requires its periodic extension; thus, a [.] T s operation is

Discussion
In terms of a-priori knowledge, both algorithms need minimal information. EDS algorithm, however, has an edge over the second one because it is totally blind in the sense that it does not need any knowledge whatsoever about the transmitted signal. CMS algorithm, on the other hand, does need a-priori knowledge about the user-specific TH and binary codes. From the applicability viewpoint, the first algorithm is relatively more promising in the case of M-ary PSM as γ i in (6) changes the phase for all modulation indices m = 0, 1, . . . , M − 1 alternately. Therefore, increasing the modulation order will not affect the algorithm as long as the first frame is reserved and carries the same waveform throughout. The integration region T I has an impact on the synchronization accuracy of both algorithms as it is responsible for the signal energy capture. Ideally, it should be equal to the channel delay spread plus UWB pulse duration, i.e., T I = T g . However, if this value is not known, we can set it to T I = T f − N h T c in the presence of TH codes and T I = T f otherwise, provided that it captures sufficient energy. With regard to implementation complexity, it mainly amounts to two factors: (a) shifting of observation signal in (9) and (29) and (b) maximization of objective function J(τ ) in (17) and (38), respectively, for the two algorithms. The shifting can be done both in analog as well as in digital domain. Analog approaches have the advantage of avoiding the sampling which can be very high in UWB regime. However, they need analog delay lines (on the order of symbol duration) for shifting which can be demanding especially for low-power circuits. Nonetheless, chips implementing analog delays from 20 to 2,000 ns are available and can be used to implement the algorithms [35]. On the other hand, the digital implementation is relatively simple from the signal processing viewpoint, and digital operations can be performed efficiently in modern on-chip technologies. However, it will need UWB receiver to digitize the signal at the Nyquist rate (usually several gigahertz). Thus, the primary concern in digital implementation is the design of ultra-fast ADCs. Parallel ADCs can be used to achieve this feat where each ADC operates at a fraction of the effective sampling frequency [36]. Nevertheless, if ultra-fast ADCs are available [37], both algorithms can be implemented in full digital format. As far as the maximization itself is concerned, it is obvious that the continuous search over [0, T s ) will result in prohibitive complexity. In practice, the objective function J(τ ) is evaluated over a grid of finite equispaced values τ = nT δ where n ∈ [0, T s /T δ ) and T δ is the step interval.
The estimated synchronization parameter will be then t φ =nT δ with an ambiguity of T δ . It is worth mentioning that the synchronization at any precision can be achieved by the proposed algorithms and is only constrained by the affordable complexity.
From the perspective of possible application area, we envision a similar context and application framework for our methods as described in [16,17] for UWB ad hoc networks such as wireless sensor networks. Consider a single piconet consisting of multiple nodes. A likely configuration of the overall protocol is outlined in [16] (see paragraph before section 4 in [16]). Under multiuser interference, the essence of the proposed algorithms is that at any time, there is only one node (but not the same one all the time) which transmits peculiar signal format. This node is designated as a 'master' node and takes the responsibility of synchronizing the other nodes designated as 'slave' nodes. This is effectively the case with star or clustered topologies of ad hoc networks.

Simulations and comparisons
In this section, simulations are carried out to evaluate the performance of the two synchronization algorithms in terms of probability of acquisition (P A ), normalized mean square error (NMSE), and BER. In all ensuing simulations, specially designed B-spline-based orthogonal UWB pulses are used with duration T ψ = 1.28 ns [28]. Each symbol consists of N f = 13 frames, while each frame contains N c = 15 chips. The chip duration T c is the same as the pulse duration T ψ , resulting in a frame duration of T f = 19.2 ns. The binary code is selected as b = 202 in decimal for CMS algorithm. We used TH codes randomly taking integer values from [0, N h ), where N h = 5. The multipath channel employed in simulations is CM1 indoor channel proposed by IEEE 802.15.3a working group [4], having RMS delay spread equal to 5 ns. The synchronization parameter t φ is randomly generated from a uniform distribution over [ 0, T s ) at each Monte Carlo trial.

Synchronization performance
We first evaluate the synchronization accuracy of the proposed algorithms in terms of acquisition probability http://jwcn.eurasipjournals.com/content/2013 /1/199 which is defined as P A = Pr t φ − t φ ≤ T δ . The integration interval T I is set equal to T g . SNR is defined as P g /σ 2 where P g is the received power per pulse (after the convolution of transmitted pulse with channel impulse response). The synchronization performance is assessed under three different conditions, namely (1) absence of IFI, (2) addition of moderate IFI and ISI, and (3) introduction of MUI. The no IFI condition is met by truncating the channel beyond T f − N h T c = 12.8 ns. Addition of moderate IFI is ensured by extending channel delay spread up to T f and hence spreading N h T c long tail of dispersed pulse into the subsequent frame. The tail of the last frame in each symbol will also spread into the first frame of subsequent symbol, thereby also inducing a small ISI. Finally, MUI is introduced by two interfering users who transmit conventional PSM symbols. As N h = 5, the user-specific TH codes cannot widely separate the three users; thus, a severe interference exists among them.
We set step interval T δ = T f , and the resulting performance curves are shown in Figures 3,4,5. The two algorithms are also compared with [16] and [18], respectively. The reason for this choice is that the method in [16] is also based on energy detection like our first algorithm, while the one in [18] exploits bipolar codes much like our second proposed scheme, thus providing a good basis for relative comparison. It is worth mentioning, however, that both of these algorithms deal with IR-UWB systems employing BPAM and are used only to give a benchmark performance. A pronounced improvement in performance is evident with proposed algorithms compared to referenced algorithms under all operating conditions. Between the two approaches, the CMS algorithm exhibits much better performance, thanks to the fine ACP of bipolar codes. We can observe that the performance degradation is more severe in the case of interfering users than IFI. However, the performance is not degraded dramatically compared to the ideal case of no interference, and the synchronization parameter can still be estimated with reasonable precision. This proves the robustness of proposed schemes under the practical operating conditions. Also, in the case of CMS algorithm, we have considered the worst scenario where the two interfering users also employ the same binary codes as the desired user. Therefore, it is reasonable to assume that by choosing orthogonal binary codes for different users in CMS algorithm and larger separation by TH codes in both algorithms, the performance under MUI may further improve. Figure 6 employs NMSE metric to compare the performance of proposed algorithms. The NMSE curves decrease monotonically for both algorithms before reaching an error floor. This error floor is obvious since synchronization is performed with finite resolution of either T f or 3T c only. The error floor is almost 6 × 10 −4 for the T f resolution case corresponding to a timing error standard deviation around 6.11 ns, which is less than the intended accuracy of T f = 19.2 ns. Similarly, for 3T c resolution case, the error floor is around 4 × 10 −5 resulting in timing error standard deviation of about 1.58 ns, which is again less than the intended accuracy of 3T c = 3.84 ns. This figure also proves the ability of both algorithms to achieve synchronization at any desired resolution.
Next, the effect of various lengths of the observation window K on acquisition probability can be seen in Figure 7. It is clear that the performance is improved with increasing K due to the fact that the signal averaging operation employed by both algorithms can better mitigate the noise effect with large K. Also, Figure 7 demonstrates the impact of different step intervals T δ . The higher the synchronization precision desired, the more will be the number of symbols K needed. Nevertheless, again any level of precision can be achieved with the proposed algorithms. In Figure 8, the performance is compared in both line-of-sight (LOS, CM1) and non-line-of-sight (NLOS, CM2) propagation channels of the IEEE 802.15.3a standard with T δ = T f . The frame duration T f is increased to 25.6 ns so that enough of the channel energy can still be captured in the case of NLOS channel. The simulation results in Figure 8 show that the performance is almost identical in LOS and NLOS channels. Finally, the effect of the number of frames N f on the acquisition probability of proposed algorithms is observed in Figure 9. Due to the correlation properties of bipolar codes, it is obvious that the performance of CMS algorithm should increase with the code length which can be verified from Figure 9. However, the number of frames does not have any significant effect on the performance of EDS algorithm. In fact, the performance may degrade because with increasing N f , there will be more discrete bins to be linearly searched, thereby increasing the probability of wrong estimation.

BER performance
We now translate the synchronization performance into BER performance. Figures 10 and 11 exhibit BER using demodulator of (21), after estimating synchronization parameter by EDS and CMS algorithms, respectively. We suppose that channel estimation is done after synchronization and that it was error free. We average over 10 3 channel realizations where in each realization after estimating t φ , we demodulate 10 3 symbols using all-Rake (A-Rake). We also plot BER using A-Rake under perfect timing as a reference. As CMS algorithm can precisely synchronize with much fewer symbols, we can achieve BER similar to the case of perfect timing even with K = 32 and almost similar with K = 16. However, EDS algorithm needs relatively large number of symbols to achieve reasonable BER performance.
Finally, we compare SAT-based receiver (26) with selective-Rake (S-Rake), which is practically a more viable solution compared to A-Rake. Synchronization is first performed using CMS algorithm and then symbols are demodulated using SAT-based receiver. The results shown in Figure 12 demonstrate that the performance of SAT-based receiver is better than S-Rake with five fingers corresponding to the strongest paths under perfect timing. SAT-based receiver can simply outperform S-Rake by increasing the number of averaging symbols Q, while S-Rake on the other hand is dependent on the number of fingers to capture sufficient energy. At high SNR, SATbased receiver with reasonable averaging can even come very close to ideal A-Rake.

Conclusions
In this paper, the issue of synchronization in timehopping IR-UWB systems employing pulse shape modulation is addressed and two low-complexity algorithms are compared. The first algorithm exploits a judiciously designed signal format to enable synchronization using simple overlap-add operation followed by energy detection. This algorithm is particularly interesting for systems using high-order orthogonal PSM modulation. On the other hand, the modified signal format results in the loss of one frame per symbol, so an alternate algorithm is proposed to avoid this small data loss. Exploiting impulsive autocorrelation function of bipolar codes, we develop a synchronization criterion using a series of code matching, overlap-add and energy detection operations. Based on this criterion, a new low-complexity NDA synchronization algorithm is then proposed. Code matching and averaging greatly suppress the interference and noise, resulting in an improved performance. Both the proposed algorithms remain functional in the presence of TH codes, unknown channel, and distortion due to Tx/Rx antennas. Simulation results confirm precise synchronization of the two algorithms and their robustness in the presence of IFI and MUI. Furthermore, a new low-complexity demodulation scheme was also derived using synchronized aggregate templates. This receiver bypasses the cumbersome task of channel estimation and can collect the full multipath energy. Also, it inherently captures the pulse distortion caused by antennas and other receiver effects. Results show that it can achieve performance comparable to the widely adopted Rake in medium-to-high SNR range. In the future, we plan to investigate synchronization performance in the presence of severe IFI and ISI. Also, the large search space due to fine timing resolution of UWB systems leads to an increased synchronization time. Thus, it would be interesting to investigate rapid synchronization schemes for UWB. Additionally, the analytic performance analysis of SAT receiver is necessary to conclude on its advantages and drawbacks compared to conventional Rake.