Synchronization in 5G networks: a hybrid Bayesian approach toward clock offset/skew estimation and its impact on localization

Clock synchronization has always been a major challenge when designing wireless networks. This work focuses on tackling the time synchronization problem in 5G networks by adopting a hybrid Bayesian approach for clock offset and skew estimation. Furthermore, we provide an in-depth analysis of the impact of the proposed approach on a synchronization-sensitive service, i.e., localization. Specifically, we expose the substantial benefit of belief propagation (BP) running on factor graphs (FGs) in achieving precise network-wide synchronization. Moreover, we take advantage of Bayesian recursive filtering (BRF) to mitigate the time-stamping error in pairwise synchronization. Finally, we reveal the merit of hybrid synchronization by dividing a large-scale network into local synchronization domains and applying the most suitable synchronization algorithm (BP- or BRF-based) on each domain. The performance of the hybrid approach is then evaluated in terms of the root mean square errors (RMSEs) of the clock offset, clock skew, and the position estimation. According to the simulations, in spite of the simplifications in the hybrid approach, RMSEs of clock offset, clock skew, and position estimation remain below 10 ns, 1 ppm, and 1.5 m, respectively.

Network-wide synchronization in WSNs has been addressed in [7,9,10] by employing the Belief Propagation (BP) algorithm. Typically, BP runs on a Factor Graph (FG) corresponding to the network and calculates the marginals at each node by iteratively exchanging beliefs between neighboring nodes [14]. The algorithm is advantageous in the sense that it is fully distributed and estimates the clock offset and skew with high accuracy. However, the time required to compute the pairwise conditional probability distribution functions (pdfs) needed for FG, and then conducting the iterative message passing, can be considered as a potential drawback rendering its practical applicability limited.
Pairwise synchronization is mostly conducted by exchanging time-stamps between the nodes using the Precision Time Protocol (PTP) [15]. To perform network synchronization in a layer-by-layer manner, PTP is then combined with the Best Master Clock Algorithm (BMCA), whose role is to determine the Master Node (MN) in the network. While this combination operates sufficiently robust in tree-structured networks with medium time-sensitivity (sub-µ s range), BMCA's poor performance in networks with mesh topology on one hand, and uncertainty in time-stamping on the other hand, render the algorithm futile in highly time-sensitive (sub-hundred ns range) loopy networks.
Despite the attempts in [11,16] to address the time-stamping uncertainty (or error) by the virtue of Kalman filtering, this approach is not optimal in the Bayesian sense since all the information available from time-stamps is not utilized. Instead, the Bayesian Recursive Filtering (BRF) utilized in [17] can be employed to capture all the available information in time-stamps, thereby optimally rectifying the time-stamping error. We have already revealed the outstanding merit of BRF in the mitigation of time-stamping error in [18].
Although all aforementioned techniques have made invaluable contribution, none of them alone can be expected to meet the global and local time precision aimed by 5G for accurate localization [19]. Instead, a combination of these algorithms is more likely to deliver a superior performance owing to diverse network typologies (mesh, tree, or a combination thereof ) [16]. In particular, to successfully achieve precise network synchronization, it is suggested by [20] that the architecture of a large-scale network should consist of common synchronization areas and multiple synchronization domains. Therefore, equipping networks with different synchronization algorithms (or a combination thereof ) appears to be a balanced approach, whereby each domain can, based on its topology and capabilities, leverage the most suitable algorithm. In this manner, it is easier to satisfy the requirement of the relative time error in the synchronization domains while keeping the absolute time error low. This is particularly of interest in applications where ultra-high time accuracy is required in a specific synchronization domain, e.g., positioning services.
In [16], we have introduced and thoroughly described the idea of hybrid synchronization, whereby clock offset can be precisely estimated and correspondingly corrected. The extension to incorporate the clock skew estimation was proposed in [18]. In this paper, we expand on [16,18] and design a hybrid synchronization algorithm based on asymmetric time-stamp exchange to allow for accurate localization [21]. The merit of asymmetric time-stamp for localization has been revealed in [3,22,23]. Furthermore, the fine time measurement standard introduced in [24] allows for implementation of such timestamp exchange mechanism. Given that, in addition to analysis of clock offset and skew estimation, we examine the impact of the proposed hybrid approach on a localization algorithm based on the technique presented in [22].
The contribution of this paper can then be briefly summarized as follows: • We present the principles of BP-based network-wide and BRF-based pairwise synchronization based on asymmetric time-stamp exchange. • We develop a hybrid statistical synchronization algorithm by combining the two above-mentioned Bayesian approaches. • We analyze the performance of the hybrid approach when estimating the clock offset and skew as well as its impact on a localization algorithm.
The rest of this paper is structured as follows: In Sect. 2, the system model is introduced. Section 3 deals with the estimation methods for network-wide, pairwise, and hybrid synchronization. We present and discuss the simulation results in Sect. 4. Section 5 is devoted to the impact of hybrid synchronization on MU localization. Finally, Sect. 6 concludes this work and points to the future work.

Notation
The boldface capital A and lower case a letters denote matrices and vectors, respectively. a(n) indicates the nth element of vector a . A T represents the transposed of matrix A . I N denotes a N dimensional identity matrix. N (x|µ, �) indicates a random vector x distributed as Gaussian with mean vector µ and covariance matrix . diag(x 1 , . . . , x K ) denotes a diagonal matrix with the diagonal elements given by (x 1 , . . . , x K ). The symbol ∼ stands for "is distributed as, " and the symbol ∝ represents the linear scalar relationship between two functions.

System model
In this section, we firstly present the clock model for each node in the communication network. Then, we explain the components constructing the clock offset in details. Finally, the time-stamp exchange mechanism is described comprehensively.

Clock model
The clock behavior for each node i is modeled as [25] where c i (t) shows the local time at each node, t represents the reference time, γ i denotes the clock skew, and θ i is the clock offset. We consider the parameter γ i as random and varying over time. However, it is common to assume that it remains unchanged in the course of one synchronization period [9,11]. Moreover, θ i consists of several components, all thoroughly discussed in the following subsection. In light of above-mentioned points, time synchronization can be deemed equivalent to estimation of γ i and θ i (or transformations thereof ) for each node. Corrections are then applied such that, ideally, all the clocks show the same time as the reference time t.

Clock offset decomposition
We decompose the clock offset θ i as shown in Fig. 1, thereby elaborating on its constituent components. The parameter t i /t j is the time it takes for a packet to leave the transmitter after being time-stamped (the term "time-stamp" is refered to hardware time-stamping hereafter), d ij /d ji denote the propagation delay, and r i /r j represents the time that a packet needs to reach the time-stamping point upon arrival at the receiver. Generally, meaning that the packets sent from node i to node j do not experience the same delay as the packets sent from node j to node i. In particular T ij = t i + r j , and R ij = t j + r i are random variables due to multiple hardware-related random independent processes and can, therefore, be assumed i.i.d. Gaussian random variables distributed as N (µ T , σ 2 T ) and N (µ R , σ 2 R ), , respectively [7,9,10]. Conversely, d ij and d ji are usually assumed to be deterministic and symmetric ( d ij = d ji ) [7]. Figure 2 depicts the histogram of the clock offset and its Gaussian fit for 5000 packet exchange between two Commercial Off-The-Shelf (COTS) street nodes. 1 In particular, the variance of offset turns out to be around 9 ns, what is crucial to know if we are to reduce the error in the clock offset/skew estimation.

Time-stamp exchange mechanism
We employ the asymmetric time-stamping mechanism introduced in [21] and shown in Fig. 3. It functions as follows: node j transmits a sync message wherein the local time c j (t k 1 ) is incorporated. Node i receives the packet and records the local reception time c i (t k 2 ) . After a certain time, the process repeats again with c j (t k 3 ) and c i (t k 4 ). Subsequently, at local time c i (t k 5 ) , node i sends back a sync message to node j with c i (t k 2 ), c i (t k 4 ) and c i (t k 5 ) incorporated. Upon reception, node j records the local time c j (t k 6 ). Given that, the relation between local clocks can be written as: where (t k 1 , t k 3 )/t k 6 and t k 5 /(t k 2 , t k 4 ) are the time points where neighboring nodes j and i send/ receive the sync messages, respectively. Stacking the weighted sum of (2), (3) and (4) for K rounds of time-stamp exchange gives where W ji and W ij are K × 2 matrices with the kth row being  [3,10]. Finally, In concrete terms, what (5) implicitly states is that for given ξ i and ξ j , the probability that we measure W ji and W ij is equal to . This can be expressed as

Methods of clock offset and skew estimation
In this section, first the principles of BP-based network-wide synchronization are described. Subsequently, we introduce the BRF-based pairwise synchronization. Lastly, we present an approach, where both techniques are employed in a hybrid manner.

Network-wide offset and skew estimation
In network-wide synchronization, we aim to synchronize each node with a global MN. Alternatively, we can restate the problem as estimation of parameters γ i and θ i (or vector parameter ξ i ), based on the observation matrices W ji and W ij . Mathematically, this is translated to the following marginal calculation: where I i denotes the set of neighboring nodes of node i and M is total number of the nodes in the network. Consequently, ξ i can be estimated as Unfortunately, the computation cost and complexity of the marginal pdf in (7) are extremely high. Instead, as a compromise, one can resort to approximating the integrand of (7). This is carried out in the sequel with the aid of variational methods.

Variational methods
The basic idea underpinning variational methods is to approximate an intractable complex distribution p(x) by a straightforward tractable distribution q(x) . To this end, one can minimize the discrepancy measure Kullback-Leibler (KL) divergence between p(x) and q(x) , given by [14] The minimization is then achieved by drawing on the Bethe method, which imposes the following structure on q(x) [26]: where x j and x i are neighboring nodes. The structure in (10) can be appropriately represented by FG. Furthermore, to efficiently infer the marginal beliefs, BP is typically run on the FG [14]. Therefore, in the sequel, we briefly describe FG and BP.

Factor graph
An FG is a bipartite graph that depicts a pdf with the factorized form, e.g., that of (10). In particular, an FG comprises several nodes known as variable nodes, and a number of factor nodes, each being a function of its neighboring variable nodes (Fig. 4).
We construct the graphical model in Fig. 4, where a number of APs are backhauled by a mesh network, each represented by ξ i . The main objective is then to compute the marginal illustrated in (7). Adopting the approximation outlined in Sect. 3.1.1, the conditional probability under the integral of (7) turns into where p(ξ i ) indicates the Gaussian distributed prior knowledge on ξ i and p(W ji , W ij |ξ i , ξ j ) is the pairwise conditional probability computed from (6). In the The FG corresponding to an exemplifying network. Note that, the FG is drawn only for the backhaul network to avoid unnecessary complexity. To draw the FG of the whole network, one can simply consider APs as variable nodes connected to their corresponding backhaul nodes via factor nodes sequel, we briefly describe the principles of BP as an efficient algorithm to obtain the estimation in (8).

Belief propagation
BP is a technique which relies primarily on the exchange of beliefs between neighboring nodes to infer the marginals. This inference is proved to be exact when the graphs are singly connected and approximate if they contain loops [14]. While generally there is no guarantee that the algorithm converges in the loopy graphs, [9,10] have indicated that, if there exist at least one MN in the network, the convergence of BP is certain. Figure 5 depicts the principles of the message passing in BP for the nodes ξ i and ξ j . For the sake of simplicity, we denote the factor p(W ji , W ij |ξ i , ξ j ) with p ij . The message from a factor vertex p ij to a variable vertex ξ i in iteration l is then given by [14] where (l) ξ j →p ij (ξ j ) denotes the message from a variable node ξ j to the variable vertex p ij and is given by where b (l) (ξ i ) denotes the marginal belief of variable node ξ i in the lth iteration. It is expected that the result of the integral in (12) is Gaussian distributed as its arguments are also Gaussian distributed. We note that, in practice, both (12) and (13) are locally computed at each node and only Fig. 6.
j→i ) denote the message sent from j to i. Considering (12) and (13), the covariance matrix Q (l) j→i can be calculated by [10,18,27] where (12)

Fig. 5 Message passing principles in BP
and Q j is the covariance matrix of p(ξ j ) . Furthermore, where µ j represents the mean vector of p(ξ j ). It should be noted that Q j and µ j remain unchanged during the message updating process. The BP algorithm initializes the message from node j to node i as . Node j computes its outgoing message to node i according to (15) and (17)  Finally, the clock skew and offset estimation can be computed by (16) Fig. 6 i→j = pij →ξ j and j→i = pij →ξ i are the BP messages exchanged between physical nodes in practice

Pairwise offset and skew estimation
In pairwise synchronization, one node is assumed to be the MN. In particular, in Fig. 3, instead of a global reference c(t) = t, we take node j as MN. We can then introduce the transformations For the sake of simplicity, as done in [28], we assume d ij = d ij , R k ij = R k ij , and T k ij = T k ij . This is valid owing to γ j ≈ 1 and the value of d ij + T k ij and d ij − R k ij being low. Finally, (2), (3) and (4) Fig. 7). Similar to (7) the pdf corresponding to the kth state can be written as . Following the steps explained in "Appendix", (29) can be simplified to The term p(ξ k i |c 1:k−1 ij ) is known as prediction step, while the term p(c k ij |ξ k i ) is referred to as measurement update or correction step [29]. Considering the clock properties discussed in Sect. 2.1, it is typical in wireless networks to assume that ξ k i is Gaussian distributed [3,9,28]. Given this assumption, in the sequel, we show that the relation between the states is linear, implying that the marginal in (30) is also Gaussian distributed.

Prediction
Assuming constant skew in one synchronization period ( = K rounds of time-stamp exchange), a reasonable prediction for ξ k i is given by [11] where A =

Correction
To obtain the correction term in (30), we conduct the following mathematical manipulations. Subtracting (26) from (27) leads to while weighted sum of (26)-(28) gives (29)  T ij 2 + σ 2 R ij and 2σ 2 T ij , , respectively. This is straightforward to observe since they are linear subtraction of independent random processes. Alternatively, we can write (33) and (34) in matrix form as

Consequently,
where µ c = B −1 ij r ij and c = B −1 ij R ij B −T ij .

Estimation
Considering (32) Fig. 8 A summary of the BRF-based pairwise synchronization and joint sync&loc. The satellite information is used to initialize the joint sync&loc algorithm described in Sect. 5 Figure 8 summarizes this recursive process.

Hybrid synchronization
Given Sects. 3.1 and 3.2, to ensure a low end-to-end synchronization error at the global level, BP can be run over the backhaul network. At the same time, we can employ the BRF algorithm to perform synchronization between the backhaul nodes and the APs at the edge of the network where fast and frequent synchronization is required to keep the relative time error small. This is, in particular, crucial to a number of applications such as localization as will be discussed in Sect. 5.
The steps of the hybrid synchronization are described in algorithm 1. Firstly, step 1 determines the network sections suitable for BP and BRF (they are labeled as BP-nodes and BRF-nodes, respectively). Then, step 2 initiates the time-stamp exchange mechanism (Fig. 3) and, correspondingly, the BRF algorithm at BRF-nodes.
Step 3 triggers the time-stamp exchange among the BP-nodes, thereby collecting the required time-stamps to construct the matrices W ji and W ij . Step 4 is where the BP iterations commence and continue until it converges, or the maximum number of iterations L is achieved. In step 5, the outgoing messages are computed by each BP-node using (15) and (17). They are then sent to their corresponding nodes. Step 6 updates each node's belief. Lastly, in steps 7-10, we check for the convergence by comparing the difference between clock offset and skew estimation in iterations (l) and (l − 1) with a predefined small value ǫ . If the algorithm is converged, the clock offset and skew estimation are calculated by means of (18) and (21), respectively. Note that step 2 and steps 3-11 can run simultaneously.

Convergence analysis
Convergence of hybrid synchronization algorithm depends on the behavior of BRF, and BP. In particular, at the edge of the network where we aim to locally synchronize the APs using BRF the convergence is of no meaning. Nevertheless, as a measure to evaluate the estimator's performance, given the set of linear equations presented in Sect. 3.2, we can refer to BRF with Gaussian parameters as minimum variance unbiased estimator [30].
Given that, convergence of Algorithm 1 depends solely on the convergence of BP which is of crucial importance for global synchronization.
While it is known that BP converges to the exact marginal on loop-free FGs, its convergence on loopy FGs is highly conditional. In the context of clock synchronization, detailed convergence analysis of loopy BP has been conducted in [7,9,10,31]. For the set of message passing formulas presented in this paper, we can leverage on [10, Lemma 1] and [10,Lemma 2] to prove that the mean vector η

Simulation results and discussion
In this section, we evaluate the performance of the hybrid synchronization algorithm proposed in this work. Detailed analysis of its impact on the achievable performance of the joint sync&loc algorithm at the edge of the network is left to the next section. Figure 4 exemplifies a wireless network where the algorithm proposed in this work can be applied. It comprises a number of APs, all backhauled by a wireless mesh network and delivering services to MUs. The following scenarios are simulated: a) synchronizing the whole network using only BP (the APs in Fig. 4 are assumed to be variable nodes connected to the mesh network via factor nodes), b) performing hybrid synchronization as described in Algorithm 1, where we synchronize the mesh backhaul network by means of BP and the APs at the edge of the network using BRF, and c) carrying out synchronization across the rounds of time-stamp exchange K. Scenario (a) is considered as the   Table 1. Figure 9 shows the RMSEs of the clock offset and skew estimation versus the number of message passing iterations for scenario (a). The RMSEs of offset and skew are indicated in nanosecond (ns) and parts per million (ppm), respectively. As can be observed, BP converges after four iterations and achieves an offset and skew RMSE below 7 ns and 0.2 ppm, respectively. As shown in [7,14], when there exist at least one MN in the network, the convergence is guaranteed. However, the value to which BP converges in loopy networks is deemed to be approximate. Note that, although this simulation setup reveals the potential performance of BP, the nodes, and particularly the APs, must wait at least four message passing iterations in addition to K rounds of time-stamp exchange (required for obtaining the conditional probabilities) to be fully synchronized. This is particularly unfavorable in certain synchronization-based services such as localization, where continuous time alignment is essential for accurate estimation of the MUs' positions. Therefore, it is necessary that the APs synchronize themselves to the backhaul network more frequently to be able to deliver those services at an increased performance as required in 5G networks. Figure 10 depicts the RMSEs of the clock offset and skew estimation versus the number of message passing and BRF iterations for scenario (b). We can observe a slight deterioration in performance (RMSE increases by 2-3 ns for the offset and 0.5-0.6 ppm for the skew) compared to scenario (a). In fact, this is the cost of economizing on the number of BP iterations as well as rounds of time-stamp exchange. To clarify, BP commences  only when the nodes have already conducted K rounds of time-stamp exchange (to construct W ij and W ji ). Even then, it takes four iterations, or n if there are n nodes between an AP and the MN, to estimate the clock parameters and correspondingly perform synchronization. Conversely, BRF is faster, as it is directly applied after each round of time-stamp exchange and runs independently (does not need any information from the other network sections as BP does). Therefore, it can conduct more iterations, thereby continuously fulfilling the requirement of very low relative time error on a local level. Given the above-mentioned properties for BP and BRF, the hybrid approach sacrifices a fraction of global accuracy to rapidly achieve synchronization at a local level, which is crucial to a number of applications such as MU localization. Figure 11 presents the RMSEs of the clock offset and skew estimation versus the rounds of time-stamp exchange K. As can be observed, the RMSEs of both offset and skew estimation decrease as K grows, indicating that the higher number of time-stamp exchanges leads to a more accurate estimation. The gradient is, however, slightly smaller for the APs owing to the fact that their RMSEs comprises two components, i.e., the synchronization error of the backhaul mesh network and the error arising when synchronizing APs with their corresponding backhaul nodes. Although the former decreases as K grows, the latter remains constant resulting in a slower decline of RMSEs of clock offset and skew estimation at the APs.

Network synchronization
We note that the network in Fig. 9 is only a random example picked to lucidly convey the fundamental concepts of hybrid synchronization introduced in this work. The intuitions obtained from above simulations are still valid even if we replace the network by any other network with arbitrary size. Nevertheless, while the size of the network, in particular the backhaul network, does not play a role when locally synchronizing adjacent APs, it can prolong the time of convergence for BP depending on the number of nodes between node i and the MN.

Impact of hybrid synchronization on localization
To evaluate the impact of hybrid synchronization on the localization accuracy, we draw on the idea of joint synchronization and localization (sync&loc) introduced in [22]. In particular, in this section we focus on the edge of the communication network, as shown in Fig. 12, where the APs, on one hand, synchronize themselves with the backhaul nodes, i.e., the serving Base Stations (BSs). On the other hand, they perform joint sync&loc by exchanging time-stamps with MUs to which they have Line-of-Sight (LoS) connection (Fig. 12). Each MU i is assumed to exchange time-stamps with two APs, i.e., j and l. 2 In the sequel, we briefly describe the principles of joint sync&loc.

Joint MU synchronization and localization
The principles of Bayesian joint sync&loc are akin to those described in (30), (31), and (35). To incorporate the location estimation into the algorithm, we need to redefine ξ i as where x i /v x i and y i /v y i denote the position/velocity of the MU i on the x and y axes, respectively. In particular, location-related parameters appear when expanding the propagation delay d ij (or d ji ) as where x j and y j represent the known position of the jth AP on the x and y axes, respectively. Furthermore, each AP is assumed to be equipped with an N-element Uniform Linear Array (ULA) antenna and is able to perform Angle of Arrival (AoA) estimation in each round of time-stamp exchange. This estimation is given by where ϕ ij denotes the true AoA and n ϕ ∼ N (0, σ 2 ϕ ) is the zero mean Gaussian noise stemming from the AoA estimation algorithm. For the sake of simplicity, in our simulations we rely on the Cramer-Rao Bound (CRB) of AoA estimation, derived in [32], to calculate σ ϕ while ϕ ij is computed knowing the exact trajectory of MU i and the location of AP j. Moreover, the nonlinearity in (41) and (42) is dealt with by resorting to Taylor expansion. The details of the approximation can be found in (43)-(46), where v c represents the speed of light and ( x k i , y k i ) denotes the position of MU i predicted by the prediction step in the kth round of joint sync&loc. We note that similar set of equations, i.e., (41)-(45), and (46), can be written for the second AP serving MU i, i.e., AP l.
Given the new prediction and measurement equations, it is clear that A, Q n , in (31) and B ij , r ij , R ij in (35) require adjustment to account for the location parameters added to ξ i . The former can readily be updated using motion equations [22,33], while the latter is adapted with the aid of time-stamp exchange and AoA measurements. The adjusted matrices and vectors are given in (47)-(49), where represents the time between two consecutive rounds of joint sync&loc and c i (t) j /c i (t) l denotes the time-stamp of MU i when exchanged with AP j/l (this is to distinguish the time-stamps sent by MU i to its two serving APs).

Performance analysis
We perform our analysis for the pedestrian scenario shown in Fig. 12. In particular, in this scenario, a MU (the pedestrian) moves with a constant velocity of 2 m/s ( ≈ 7 km/h) and takes the turns randomly until it exits the map. During its journey, we assume that the MU exchanges time-stamps with two APs to which it has LoS connection. Each AP performs AoA estimation as well. Both time-stamps and the AoA estimations are then combined by means of the joint sync&loc algorithm, described in Sect. 5 and depicted in Fig. 8, to estimate the vector variable ξ i . The simulation is conducted for two cases: (1) the APs synchronize themselves with the backhaul network by means of BP, corresponding to scenario (a) in Sect. 4.1, and (2) using the hybrid approach proposed in this work, corresponding to scenario (b) in Sect. 4.1. Figure 13 depicts the RMSEs of the clock offset and position estimation of the MU versus the uncertainty in time-stamping. As can be noticed, both RMSEs increase with growth of σ T . Although generally the RMSEs for (b) are larger, the difference in the rate of growth appears to be small, i.e., 0.05 m/ns for the RMSE of position and 0.3 for that of clock offset. This is a negligible cost at which we reduce the complexity of the algorithm by one BP iteration and K rounds of time-stamp exchange. Furthermore, the APs at the edge are able to perform localization immediately without waiting for the BP iterations. Consequently, as shown in Fig. 13, with only 3 more iterations of BRF, the gap between RMSEs can be halved (dotted curves lie in the middle of the solid ones).
(43) d ij v c ≈ a k j,0 + a k j,x (x i − x k i ) + a k j,y (y i − y k i ), arctan( r 3 = c j (t k 6 ) − a k j,0 + a k j,x x k i + a k j,y y k i +c l (t k 1 ) + c l (t k 3 ) + 2c l (t k 6 ) − c l (t k 6 ) − a k l,0 + a k l,x x k l + a k l,y y k i r 4 = ϕ k j − b k j,0 + b k j,x x k i + b k j,y y k i r 5 = ϕ k l − b k l,0 + b k l,x x k i + b k l,y y k i (49)