Joint communication and positioning based on soft channel parameter estimation

A joint communication and positioning system based on maximum-likelihood channel parameter estimation is proposed. The parameters of the physical channel, needed for positioning, and the channel coefficients of the equivalent discrete-time channel model, needed for communication, are estimated jointly using a priori information about pulse shaping and receive filtering. The paper focusses on the positioning part of the system. It is investigated how soft information for the parameter estimates can be obtained. On the basis of confidence regions, two methods for obtaining soft information are proposed. The accuracy of these approximative methods depends on the nonlinearity of the parameter estimation problem, which is analyzed by so-called curvature measures. The performance of the two methods is investigated by means of Monte Carlo simulations. The results are compared with the Cramer-Rao lower bound. It is shown that soft information aids the positioning. Negative effects caused by multipath propagation can be mitigated significantly even without oversampling.


Introduction
Interest in joint communication and positioning is steadily increasing [1]. Synergetic effects like improved resource allocation and new applications like locationbased services or a precise location determination of emergency calls are attractive features of joint communication and positioning. Since the system requirements of communication and positioning are quite different, it is a challenging task to combine them: Communication aims at high data rates with little training overhead. Only the channel coefficients of the equivalent discretetime channel model, which includes pulse shaping and receive filtering in addition to the physical channel, need to be estimated for data detection. In contrast, positioning aims at precise position estimates. Therefore, parameters of the physical channel like the time of arrival (TOA) or the angle of arrival (AOA) need to be estimated as accurately as possible [2,3]. Significant training is typically spent for this purpose.
In this paper, a joint communication and positioning system based on maximum-likelihood channel parameter estimation is suggested [4]. The estimator exploits the fact that channel and parameter estimation are closely related. The parameters of the physical channel and the channel coefficients of the equivalent discrete-time channel model are estimated jointly by utilizing a priori information about pulse shaping and receive filtering. Hence, training symbols that are included in the data burst aid both communication and positioning.
On the one hand, in [5][6][7], it is proposed to use a priori information about pulse shaping and receive filtering in order to improve the estimates of the equivalent discrete-time channel model. However, the information about the physical channel is neglected in these publications. On the other hand, channel sounding is performed in order to estimate the parameters of the physical channel [8][9][10]. But, to the authors best knowledge, the proposed parameter estimation methods are not applied for estimation of the equivalent discretetime channel model. The estimator proposed in this paper combines both approaches: Channel estimation is mandatory for communication purposes. By exploiting a priori information about pulse shaping and receive filtering, the channel coefficients can be estimated more precisely and positioning is enabled. Hence, synergy is created.
This paper focusses on the positioning part of the proposed joint communication and positioning system. Most positioning methods suffer from a bias introduced by multipath propagation. Multipath mitigation is, thus, an important issue. The proposed channel parameter estimator performs multipath mitigation in two ways: First, the maximum-likelihood estimator is able to take all relevant multipath components into account in order to minimize the modeling error. Second, soft information can be obtained for the parameter estimates. Soft information corresponds to the variance of an estimate and is a measure of reliability. This information can be exploited by a weighted positioning algorithm in order to improve the accuracy of the position estimate.
On the basis of confidence regions, two different methods for obtaining soft information are proposed: The first method is based on a linearization of the nonlinear parameter estimation problem and the second method is based on the likelihood concept. For linear estimation problems, an exact covariance matrix can be determined in closed form. For nonlinear estimation problems, as it is the case for channel parameter estimation, there are different approximations to the covariance matrix, which are based on a linearization. These approximate covariance matrices are generated by most nonlinear least-squares solvers (e.g., Levenberg-Marquardt method) anyway and can be used after further analysis [11]. Confidence regions based on the likelihood method are more robust than those based on approximate covariance matrices since they do not rely on a linearization, but they are also more complex to calculate. Heuristic optimization methods like genetic algorithms or particle swarm optimization offer a comfortable procedure to determine the likelihood confidence region as demonstrated in [12]. Both methods are only approximate, and their accuracy depends on the nonlinearity of the estimation problem. In [13], Bates and Watts introduce curvature measures that indicate the amount of nonlinearity. These measures can be used to diagnose the accuracy of the proposed methods.
The remainder of this paper is organized as follows: The system and channel model is described in Section 2. The relationship between channel and parameter estimation is explained and the nonlinear metric of the maximum-likelihood estimator is derived. General aspects concerning nonlinear optimization are discussed. In Section 3, the concept of soft information is introduced. Based on confidence regions, two methods for obtaining soft information concerning the parameter estimates are proposed. In order to further analyze the proposed methods, the curvature measures of Bates and Watts are introduced in Section 4. The curvature measures are calculated for the parameter estimation problem and a first analysis of the problem is given. Afterward, positioning based on the TOA is explained in Section 5, and the performance of the two soft information methods is investigated by means of Monte Carlo simulations. The results are compared with the Cramer-Rao lower bound. Finally, conclusions are drawn in Section 6.

System and channel model
Throughout this paper, the discrete-time complex baseband notation is used. Let x[k] denote the kth modulated and coded symbol of a data burst of length K. Some symbols x[k] are known at the receiver side ("training symbols"), whereas others are not known ("data symbols"). It is assumed that data and training symbols can be separated perfectly at the receiver side. The received sample y [k] at time index k can be written as where h l [k] is the lth channel coefficient of the equivalent discrete-time channel model with effective channel memory length L, and n[k] is a Gaussian noise sample with zero mean and variance σ 2 n . The noise process is assumed to be white. In Figure 1, the relationship between the physical channel and the equivalent discrete-time channel model is shown. The input/output behavior of the continuous-time channel is exactly represented by the equivalent discrete-time channel model, which is described by an FIR filter with coefficients h l [k]. The delay elements z -1 correspond to the sampling rate 1 T . In this paper, only symbol-rate sampling T = T s is considered, where T s is the symbol duration. a The channel coefficients h l [k] are samples of the overall impulse response of the continuous-time channel. This impulse response is given by the convolution of the known pulse shaping filter g(τ) = g Tx (τ), the unknown physical channel c(τ, t), and the known receive filter g Rx (τ). Since the convolution is associative and commutative, pulse shaping and receive filtering can be combined: g(τ) = g Tx (τ) * g Rx (τ), where * denotes the convolution.
The physical channel can be modeled by a weighted sum of delayed Dirac impulses: where M is the number of resolvable propagation paths. The parameters f μ (t) and τ μ (t) denote the complex amplitude and the propagation delay of the μth path at time t, respectively. Without loss of generality, it is assumed that the multipath components are sorted according to ascending delay: τ 1 (t) <τ 2 (t) < ··· <τ M (t). The delay of the first arriving path is called TOA. Positioning is based on the assumption that the TOA corresponds to the distance between transmitter and receiver. This is only true if a line-of-sight (LOS) path exists. In urban or indoor environments, the LOS path is often blocked. In these so-called non-LOS (NLOS) scenarios, the modeling error reduces the positioning accuracy significantly. Additionally, positioning typically suffers from a bias introduced by multipath propagation even if a LOS path exists. In order to analyze the multipath mitigation ability of the proposed soft channel parameter estimator, this paper restricts itself to LOS scenarios. However, the influence of NLOS is discussed in Section 5.2.
Given c(τ, t) and g(τ), the overall channel impulse response h(τ, t) can be written as After symbol-rate sampling (3) at t = kT s , the channel coefficients can be represented as: In the following, it is assumed that the channel is quasi time-invariant over the training length (block fading). Thus, the time index k in (4) can be omitted.
For simulation of communication systems, it is sufficient to consider excess delays. Without loss of generality, τ 1 = 0 can be assumed then. The effective channel memory length L is, therefore, determined by the excess delay τ M -τ 1 plus the effective width T g of g(τ).
In case of positioning based on the TOA, however, it is important taking into account that τ 1 = d c , where d is the distance between transmitter and receiver and c is the speed of light. Denoting the maximum possible delay by τ max M , the maximum possible channel memory length can be pre-calculated according to This channel memory length covers all possible propagation scenarios including the worst case. Hence, the channel impulse response is embedded in a sequence of zeros as shown in Figure 2.

Channel parameter estimation
Channel estimation is mandatory for data detection. Typically, training symbols are inserted in the data burst for estimation of the equivalent discrete-time channel model. If the channel is quasi time-invariant over the training sequence (block fading), least-squares channel estimation (LSCE) can be applied. In this paper, a training preamble of length K t is assumed. For the interval L ≤ k ≤ K t -1, the received samples according to (1) can be expressed in vector/matrix notation as y = Xh + n, (6) where X is the training matrix with Toeplitz structure, covariance matrix C n = σ 2 n I . The least-squares channel estimates are given by Using the assumptions above, the estimation error ε is zero mean and Gaussian with covariance matrix [14]. For a pseudo-random training sequence, the matrix (X H X) becomes a scaled identity matrix with scaling factor K t -L, and the covariance matrix of the estimation error reduces to The main idea of joining communication and positioning is based on the relationship in (4). If the parameters of the physical channel are stacked into a vector θ = [Re{f 1 }, Im{f 1 }, τ 1 , Re{f 2 }, ..., τ M ], (4) can be rewritten as: The parameters θ can be estimated by fitting the model function (8) to the least-squares channel estimates h l . Hence, the channel estimates are not only used for data detection, but they are also exploited for positioning. Furthermore, refined channel estimates ĥ l are obtained by evaluating (8) for the parameter estimateθ [4]. b On the one hand, positioning is enabled since the TOA τ 1 is estimated. On the other hand, data detection can be improved because refined channel estimates are obtained.
The maximum-likelihood estimateθ is given by the set θ that maximizes the likelihood function [14] For LSCE with pseudo-random training, this is equivalent to maximizing the likelihood function with respect to θ. The second approach in (10) may seem more natural to some readers since the parameters are estimated directly from the received samples. But since both approaches are equivalent, as proven in the "Appendix", it seems more convenient to the authors to apply the first approach: Channel estimates are usually already available in communication systems and the metric derived from (9) is less complex than the metric derived from (10). Hence, only the first approach is considered in the following.
Since the noise is assumed to be Gaussian, the maximum-likelihood estimator corresponds to the leastsquares estimator: The minimization of the metric (θ ) in (11) cannot be solved in closed form since (θ ) is nonlinear. An optimization method has to be applied. In order to chose a suitable optimization method to findθ , different system aspects have to be taken into account, and a tradeoff depending on the requirements has to be found. The goal is to find the global minimum of (θ ) .
Unfortunately, (θ ) has many local minima due to the superposition of random multipath components. Consequently, the optimization method of choice should be either a global optimization method or a local optimization method in combination with a good initial guess, i. e., an initial guess that is sufficiently close to the global optimum. Both choices involve different benefits and drawbacks. To find a good initial guess is difficult and, therefore, may be seen as a drawback itself. But in case a priori knowledge in form of a good initial guess is available, a search in the complete search space would be unnecessary. For channel parameter estimation, it is suggested to divide the problem into an acquisition and a tracking phase. In the acquisition phase, a global optimization method is applied, and in the tracking phase, the parameter estimate of the last data burst may be used as an initial guess for a local optimization method. This is suitable for channels that do not change too rapidly from data burst to data burst. In this paper, particle swarm optimization (PSO) [15][16][17] is suggested for the acquisition phase, and the Levenberg-Marquardt method (LMM) [18,19] is proposed for the tracking phase.
PSO is a heuristic optimization method that is able to find the global optimum without an initial guess and without gradient information. PSO is easy to implement because only function evaluations have to be performed. So-called particles move randomly through the search space and are attracted by good fitness values (θ ) in their past and of their neighbors. In this way, the particles explore the search space and are able to find the global optimum. It is a drawback that PSO does not assure global convergence. There is a certain probability (depending on the signal-to-noise ratio) that PSO converges prematurely to a local optimum (outage). Furthermore, PSO is sometimes criticized because many iterations are performed in comparison to gradientbased optimization algorithms.
The LMM belongs to the standard nonlinear leastsquares solvers and relies on a good initial guess. The gradient of the metric has to be supplied by the user. For the LMM, convergence to the optimum in the neighborhood of the initial guess is assured. Second derivative information is used to speed up convergence: The LMM varies smoothly between the inverse-Hessian method and the steepest decent method depending on the topology of the metric [18]. Furthermore, an approximation to the covariance matrix of the parameter estimates is calculated inherently by the LMM. The LMM is designed for small residual problems. For large residual problems (at low signal-to-noise ratio), it may fail (outage).

Definition of soft information
The concept of soft information is already widely applied: In the area of communication, soft information is used for decoding, detection, and equalization. In the field of navigation, soft information is exploited for sensor fusion [20]. This paper aims at obtaining soft information for the parameter estimates in order to improve the positioning accuracy before sensor fusion is applied.
Soft information is a measure of reliability of the (hard) estimates. The intention is to determine the a posteriori distribution of the estimates. Hence, the (hard) estimate is the mean of the distribution, and the soft information corresponds to the variance of the distribution. For linear estimation problems with known noise covariance matrix, the a posteriori distribution of the estimates can be determined in closed form [14]. If the noise is Gaussian distributed, the estimator is, furthermore, a minimum variance unbiased estimator (MVU). However, only few problems are linear. A popular estimator for more general problems is the maximum-likelihood estimator as already described in Section 2.2 for channel parameter estimation. The maximum-likelihood estimator is asymptotically (for a large number of observations or at a high signal-to-noise ratio) unbiased and efficient [14]. Furthermore, an asymptotic a posteriori distribution can be determined. For Gaussian noise with covariance matrix C = s 2 I, the asymptotic covariance matrix of the estimates is given by the inverse of the Fisher information matrix evaluated at the true parameters [14]. The parameter estimatê θ given by (11) is asymptotically distributed as follows: where I(θ) is the Fisher information matrix with entries in which the star ⋆ denotes the conjugate complex. Given the Jacobian matrix of (8), the Fisher information matrix can be written as well as The variance of parameter θ m is given by the mth diagonal entry of the asymptotic covariance matrix: In general, the true value of the parameters is not known. Therefore, the asymptotic covariance matrix cannot be determined and an approximation has to be found. Different approximate covariance matrices are given in the literature that should be used with caution since the approximation may be very poor [11,21]. In the following section, a short description of confidence regions is included because they are closely related to soft information: Some of the confidence regions rely on the approximate covariance matrices mentioned above.

Confidence regions
In [11], Donaldson and Schnabel investigate different methods to construct confidence regions and confidence intervals. Confidence regions and intervals are closely related to soft information since they also indicate reliability: The estimated parametersθ do not coincide with the true parameters θ because of the measurement noise. A confidence region indicates the area around the estimated parameters in which the true parameters might be with a specific probability. This probability is called the confidence level and is often expressed as a percentage. A commonly used confidence level is 95%.
For linear problems with Gaussian noise, the confidence regions are elliptical and can be determined exactly by the covariance matrix C linear , which can be computed in closed form [14]. The linear confidence region consists of all parameter vectorsθ that satisfy the following formula: in which P = 3M is the number of parameters, N = L +1 is the number of observations, 1 -a is the confidence level, and F is the Fisher distribution. According to [11], the most common method to determine a confidence region for a nonlinear problem consists of the linearization of the problem in order to obtain an approximate covariance matrix. In this paper, the following approximate covariance matrix is applied: c The only difference between C approx in (18) and C asymp in (16) is that the Jacobian matrix is evaluated at the parameter estimateθ instead of the true parameter θ and that the variance s 2 is estimated by the residual variance s 2 = (θ)/(N − P) . When C linear in (17) is replaced by C approx in (18), an approximate confidence region for a nonlinear problem is obtained as On the one hand, the computational complexity is quite low and the results are very similar to the wellknown linear case. On the other hand, the approximation can be very poor and should be used with caution [11,21]. Another (more complex) way to determine a confidence region is the likelihood method [11]: All parameter vectorsθ that satisfy (20) are included in the likelihood confidence region. This region does not have to be elliptical but can be of any form. The likelihood method is approximate for nonlinear problems as well but more precise and robust than the linearization method since it does not rely on linearization. There is an exact method, which is called lack-of-fit method, that is neglected in this paper due to its high computational complexity and because the likelihood method is already a good approximation according to [11]. The accuracy of the linearization and the likelihood method strongly depends on the problem and on the parameters. Donaldson and Schnabel [11] suggest to use the curvature measures of Bates and Watts [13], which are introduced in Section 4, as a diagnostic tool. With these measures, it can be evaluated whether the corresponding method is applicable or not.

Proposed methods to obtain soft information
After this excursion to confidence regions, the way of employing this knowledge for obtaining soft information is now discussed. The first and straightforward idea is to use the variances of the approximate covariance matrix C approx in (18). This method is simple, and many optimization algorithms like the LMM already compute and output C approx or similar versions of it. But without further analysis (see Sections 4 and 5), it is questionable whether this method is precise enough.
The second idea is based on the likelihood confidence regions. Generally, it is quite complex to generate the likelihood confidence region since many function evaluations have to be performed in the surrounding of the parameter estimatesθ . However, heuristic optimization algorithms like PSO perform many function evaluations in the whole search space anyway, and therefore, they are well suited to determine the likelihood confidence region [12]. A drawback of heuristic algorithms (many function evaluations are required until convergence) is transformed into an advantage with respect to likelihood confidence regions. The procedure proposed in [12] is as follows: In every iteration, each particle determines its fitness (θ ) , which is stored with the corresponding parameter setθ in a table. After the optimumθ with fitness (θ ) is found, all parameter setsθ that fulfill are selected from the table and form the likelihood confidence region. It can be observed that the density of points near the parameter estimateθ is higher than at the border of the likelihood confidence region. The reason is that the particles are attracted by good fitness values near the optimum and oscillate in its neighborhood before convergence occurs. Hence, all pointsθ form a distribution with mean and variance, where the mean coincides with the parameter estimateθ . Therefore, the variance of this distribution can be used as soft information.
In Section 5, the performance of both methods is evaluated and compared. Prior to that the curvature measures of Bates and Watts [13] are introduced for further analysis and understanding.

Introduction to curvature measures
In [13], Bates and Watts describe nonlinear least-squares estimation from a geometric point of view and introduce measures of nonlinearity. These measures indicate the applicability of a linearization and its effects on inference. Hence, the accuracy of the confidence regions described in Section 3 can be evaluated using these measures. In the following, the most important aspects of the so-called curvature measures are presented.
First, the nonlinear least-squares problem is reviewed: A set of parameters shall be estimated from a set of observations with where h l (θ) is a nonlinear function of the parameters θ and ε l is additive zero mean measurement noise with variance σ 2 ε . The least-squares estimate is given by the valueθ that minimizes the sum of squares of residuals which corresponds to the metric of the maximum-likelihood estimator in the case of Gaussian measurement noise. The sum of squares in (25) can also be written as Since the function h(θ) is nonlinear, the solution locus will be a curved surface. For inference, the solution locus is approximated by a tangent plane with an uniform coordinate system. The tangent plane at a specific point h(θ 0 ) can be described by a first-order Taylor series where J(θ 0 ) is the Jacobian matrix as defined in (14) evaluated atθ 0 . The informational value of inference concerning the parameter estimates highly depends on the closeness of the tangent plane to the solution locus. This closeness in turn depends on the curvature of the solution locus. Therefore, the measures of nonlinearity proposed by Bates and Watts indicate the maximum curvature of the solution locus at the specific point h(θ 0 ) . It is important to note that there are two different kinds of curvatures since two different assumptions are made concerning the tangent plane. First, it is assumed that the solution locus is planar at h(θ 0 ) and, hence, can be replaced by the tangent plane (planar assumption). Second, it is assumed that the coordinate system on the tangent plane is uniform (uniform coordinate assumption), i.e., the coordinate grid lines mapped from the parameter space remain equidistant and straight in the sample space. It might happen that the first assumption is fulfilled, but the second assumption is not. Then, the solution locus is planar at the specific point h(θ 0 ) , but the coordinate grid lines are curved and not equidistant. If the planar assumption is not fulfilled, the uniform coordinate assumption is not fulfilled either.
In order to determine the curvatures, Bates and Watts introduce so-called lifted lines. Similar to the fact that each pointθ 0 in the parameter space maps to a point h(θ 0 ) on the solution locus in the sample space, each straight line in the parameter space throughθ 0 , maps to a lifted line on the solution locus where v can be any non-zero vector in the parameter space. The tangent vector of the lifted line for m = 0 at θ 0 is given bẏ The set of all tangent vectors (for all possible vectors v) forms the tangent plane. For measuring curvatures, second-order derivatives are needed additionally. The second-order derivative of the function h(θ) is the Hessian which is a three-dimensional tensor. The lth face of the Hessian is, thus, a P × P matrix The second-order derivative of the lifted line is given byḧ in which the tensor product is performed such that The derivatives of the lifted line ḣ v and ḧ v can be interpreted physically: If a point moves along the lifted line h v (m) in the sample space, where m denotes the time, then ḣ v and ḧ v denote the instantaneous velocity and instantaneous acceleration at time m = 0, respectively. The acceleration can be decomposed in three parts as shown in Figure 3.ḧ P v is parallel to the velocity vector ḣ v and, thus, parallel to the tangent plane. It corresponds to the change in velocity of the moving point.ḧ N v is normal to the tangent plane and describes the change in direction of the velocity vector ḣ v normal to the tangent plane.ḧ G v is parallel to the tangent plane and normal to the velocity vector ḣ v . It corresponds to the geodesic acceleration and indicates the change in direction of the velocity vector ḣ v parallel to the tangent plane. Based on these acceleration components, the curvatures of the solution locus atθ 0 can be determined: is the normal curvature in direction of v and is called intrinsic curvature and is the tangential d curvature in direction of v and is called parameter-effects curvature. The curvatures are divided into normal and tangential components since each component has a different influence on the accuracy of the linear approximation. On the one hand, the intrinsic curvature is an intrinsic property of the solution locus. It only affects the planar assumption. On the other hand, the parameter-effects curvature only influences the uniform coordinate assumption and depends on the specific parameterization of the problem. Hence, a reparameterization may change the parameter-effects curvature but not the intrinsic curvature. In order to assess the effect of the curvatures on inference, they should be normalized. A suitable scaling factor is the so-called standard radius ρ = s √ P since its square r 2 = s 2 P appears on the right hand side in (19) and (20), which describe the confidence regions. The relative curvatures are given by the curvatures (36) and (37) multiplied with the standard radius: If the relative curvatures are small compared with 1/ F 1−α P,N−P for all possible directions v, then the corresponding assumptions are valid. Hence, it is sufficient to determine the maximum relative curvatures e Figure 3 Example for the decomposition of the acceleration vector ḧ v with respect to the velocity vector ḣ v . and to compare them to 1/ F 1−α P,N−P in order to assess the accuracy of the confidence regions [11]. If the confidence region based on the linearization method (19) with the approximate covariance matrix shall be applied, both the planar assumption and the uniform coordinate assumption have to be fulfilled. That means that the maximum relative curvatures Γ N and Γ T have to be small compared with 1/ F 1−α P,N−P . The confidence region based on the likelihood method (20) is more robust since only the planar assumption needs to be fulfilled and only Γ N needs to be small compared with

Analysis of the parameter estimation problem
In the following, the parameter estimation problem is analyzed by calculating the maximum relative curvatures and by plotting the confidence regions (19) and (20) for different signal-to-noise ratios (SNRs). The system setup is as follows: A training preamble of length K t = 256 is assumed that covers 10% of the data burst of length K = 2,560. A pseudo-random sequence of BPSK symbols is used as training. Since this paper concentrates on the positioning part of the proposed joint communication and positioning system, it is sufficient to focus on the channel estimation and to neglect the data detection. A Gaussian pulse shape g(τ) = gT x (τ) * gR x (τ)~exp (-(τ/ T s ) 2 ) is assumed. After receive filtering, the noise process is slightly colored, but we have verified that the correlation is negligible with respect to receiver processing. The training sequence is transmitted over the physical channel and at the receiver side channel parameter estimation as suggested in Section 2.2 is performed. For the purpose of curvature analysis, only PSO as described in [16] with I = 50 particles and a maximum number of T = 8,000 iterations is applied for solving the nonlinear metric Ω(θ). PSO delivers the likelihood confidence region automatically as explained in Section 3.3. The approximate co variance matrix is calculated afterward according to (18). A confidence level of 95% is applied (a = 0.05). Since the curvature measures depend on the parameter set θ and also on the noise samples, simulations are performed for a fixed channel model at different SNRs. Two different channel models are assumed: A single-path channel (M = 1) and a two-path channel (M = 2) with a small excess delay (Δτ 2 : = τ 2 -τ 1 = 0.81T s ), both with a memory length L = 10. The parameters of the channels are given in Table 1. Furthermore, the maximum relative curvatures Γ N and Γ T for different SNRs and the value of 1/ F 0.95 P,N−P are listed in Table 1. It can be concluded that the planar assumption is always fulfilled since Γ N is much smaller than 1/ F 0.95 P,N−P in all cases. This means the likelihood method is always accurate. For the single-path channel, the uniform coordinate assumption is also fulfilled for all SNRs (see Table  1), i.e., the confidence regions based on the linearization method and the approximate covariance matrices are accurate. This is confirmed by Figure 4a, b, c. In Figure  4, the confidence regions based on the linearization method (black ellipse) and the likelihood method (filled dots) are plotted for the parameter combination of the real part θ 1 and the delay θ 3 of the LOS path normalized with respect to the symbol duration T s . Both regions are similar for the single-path channel. In case of the two-path channel, a different situation is observed as shown in Figure 4d, e, f. The uniform coordinate assumption is violated at low SNR since Γ T is not much smaller than 1/ F 0.95 P,N−P (see Table 1). The shape of the likelihood confidence region differs strongly from the ellipse generated by the approximate covariance matrix. Only at high SNR, both shapes coincide. For the twopath channel, the uniform coordinate assumption is valid from approximately 35-40 dB upward. For different channel realizations, different results are obtained. It should be mentioned again that the curvature measures strongly depend on the parameter set θ and on the noise samples. The larger the excess delay Δτ 2 , the lower is the nonlinearity of the problem, i.e., the uniform coordinate assumption is already valid at lower SNR and vice versa. It can be summarized that the confidence regions based on the linearization method are not accurate at low SNR in a multipath scenario. Hence, the soft information based on the approximate covariance matrix may lead to inaccurate results. The influence of soft information on positioning is investigated in the following section.

Positioning based on the time of arrival
There are many different approaches to determine the position, e.g., multiangulation, multilateration, fingerprinting, and motion sensors. This paper focusses on radiolocation based on the TOA, which is also called multilateration. Furthermore, two-dimensional positioning is considered in the following. An extension to three dimensions is straightforward. The position p = [x, y] T of a mobile station (MS) is determined relative to B reference objects (ROs) whose The true distance between the bth RO and the MS is a nonlinear function of the position p given by Thus, positioning is again a nonlinear problem. f There are alternative ways to solve the set of nonlinear equations described by (42) and (43). In this paper, two different approaches are considered: The iterative Taylor series algorithm (TSA) [22] and the weighted leastsquares (WLS) method [23,24].
The TSA is based on a linearization of the nonlinear function (43). Given a starting positionp 0 (initial guess), the pseudo-ranges can be approximated by a first-order Taylor series in which J(p) is the Jacobian matrix of (43) with entries Defining r 0 = r − d(p 0 ) and p 0 = p −p 0 results in the following linear relationship that can be solved according to the least-squares approach: until the correction factor p i is smaller than a given threshold. If the initial guess is close to the true position, few iterations are needed. If the starting position is far from the true position, many iterations may be necessary. Additionally, the algorithm may diverge. Hence, finding a good initial guess is a crucial issue. For the numerical results shown in Section 5.2, the position estimate of the WLS method is used as initial guess for the TSA.
The WLS method [23,24] solves the set of nonlinear equations described by (42) and (43) in closed form. Hence, this method is non-iterative and less costly than the TSA. The basic idea is to transform the original set of nonlinear equations into a set of linear equations. For this purpose, one RO is selected as reference. Without loss of generality, the first RO is chosen here. By subtracting the squared distance of the first RO from the squared distances of the remaining ROs, a linear leastsquares problem with solution is obtained, in which The weighting matrix W' is given by: Both, the TSA and the WLS method, apply a weighting matrix that contains the variances of the pseudorange errors. Reliable pseudo-ranges have higher weights than unreliable ones and, thus, have a stronger influence on the estimation results. Typically, the true variances are not known. They can only be estimated as described in Section 3: For each link b, the variance of the TOA σ 2 τ 1,b is determined via the linearization g or the likelihood method. This TOA variance is transformed into a pseudo-range variance σ 2 ηb by a multiplication with c 2 . If no information about the estimation error h is available, the weighting matrices correspond to the identity matrix I (no weighting at all).
The Cramer-Rao lower bound (CRLB) provides a benchmark to assess the performance of the estimators [14]: where

I(p) = J(p) T WJ(p)
is the Fisher information matrix. If the estimator is unbiased, its mean squared error (MSE) is larger than or equal to the CRLB. If the MSE approaches the CRLB, the estimator is a minimum variance unbiased (MVU) estimator.
The positioning accuracy depends on the geometry between the ROs and the MS and, thus, varies with the position p. This effect is called geometric dilution of precision (GDOP) [22,25]. In order to separate the influence of the geometry from the influence of the estimation errors h on the positioning accuracy, it is assumed that all pseudo-ranges are affected by the same error variance σ 2 η = 1, i.e., W = I. Given this assumption, the GDOP is the square root of the CRLB:

Numerical results
In the following, the overall performance of the proposed system concept using soft information is evaluated. For this purpose, two scenarios with different GDOP as shown in Figure 5 are considered. The ROs are denoted by black circles and the GDOP is illustrated by contour lines. For both scenarios, B = 4 ROs are located inside a quadratic region with side length √ 2R , where R = 2T s c is the distance from every RO to the middle point of the region. For the first scenario, the ROs are placed in the lower left part of the region, which results in a large GDOP on average. The second scenario has a small GDOP on average since the ROs are placed in the corners of the region. For the communication links between the MS and the ROs, the same setup as described in Section 4.2 is applied. Furthermore, power control is assumed, i.e., the SNR for all links is the same. All results reported throughout this paper are for one-shot measurements.
Three different channel models with memory length L = 10 are investigated: a single-path channel (M = 1), a two-path channel (M = 2) with large excess delay (Δτ 2 [T s ,2T s ]) and a two-path channel (M = 2) with small excess delay ( τ 2 ∈ [ T s 10 , T s ]) . For all channel models, the LOS delay τ 1 , b for each link b is calculated from the true distance d b (p). The excess delay of the multipath component Δτ 2 for both two-path channels is determined randomly in the corresponding interval. The smaller the excess delay is, the more difficult it is to separate the different propagation paths. The power of the multipath component is half the power of the LOS component. The phase of each component is generated randomly between 0 and 2π. For each link, channel parameter estimation is performed and soft information based on the linearization method and on the likelihood method is obtained. For PSO, I = 50 particles and a maximum number of iterations T = 8,000 are applied. h The estimated LOS delaysτ 1,b are converted to pseudoranges r b , and the position of the MS is estimated with the TSA and the WLS method applying the different soft information methods. For comparison, positioning without soft information is performed. The position estimate of the WLS method is used as initial guess for the TSA. Furthermore, in the WLS method, the RO with the best weighting factor is chosen as reference.
The performance of the estimators is evaluated by Monte Carlo simulations and the results are compared with the Cramer-Rao lower bound (CRLB). On the one hand, simulations are performed over SNR since the accuracy of the soft information methods depends on the SNR. In each run, a new MS position p is determined randomly inside the region of Figure 5. On the other hand, simulations are performed over space for a fixed SNR in order to assess the influence of the GDOP. A fixed 4 × 4 grid of MS positions is applied in this case.
Different channel realizations are generated during the Monte Carlo simulations. Since different channel realizations result in different weighting matrices W, a mean CRLB is introduced, where the expectation is taken with respect to the channel realizations. For the simulations over SNR, the expectation is additionally taken with respect to the random positions p.
The simulation results are shown in Figure 6. There are eight different graphs (6a, b, c, d, e, f, g, h) arranged in an array with two columns and four rows. In the first column, the results for the simulations over SNR are shown. The second column contains the results for the simulations over space at 30 dB. In each row, the results for a fixed simulation setup are illustrated. All graphs show the root mean squared error (RMSE) ofp normalized with respect to d s = cT s for positioning without soft information ("wo"), with soft information from the likelihood method ("like"), and with soft information from the linearization method ("lin"). The square root of the mean CRLB (normalized with respect to d s ), which is denoted simply as CRLB in the following, is plotted for comparison ("crlb"). Curves labeled with "L" were obtained for the first scenario with large average GDOP, and curves labeled with "S" were obtained for the second scenario with small average GDOP.
At first, the results for the single-path channel are discussed because this scenario represents an optimal case: Both soft information methods are accurate (see Section 4.2) and due to power control, the pseudo-range errors (a) Scenario with large average GDOP.
(b) Scenario with small average GDOP.         for all ROs should be the same. Hence, positioning without and with weighting is supposed to perform equally well. The first row of Figure 6 contains the results for the WLS method, whereas the second row shows the results for the TSA. As supposed previously, the RMSE curves for positioning without soft information and with soft information from the likelihood and the linearization method coincide. The TSA is furthermore a MVU estimator since the RMSE approaches the CRLB for all SNRs and for all positions. The WLS method performs worse: There is a certain gap between the CRLB and the RMSE. In Figure 6b, it can be observed that this gap depends on the position and, thus, on the GDOP: The larger the GDOP is, the larger is the gap. Hence, the gap between RMSE and CRLB in Figure 6a is smaller for the second scenario ("S") since the GDOP is smaller on average. For the two-path channels, a similar behavior of the WLS method was observed. Therefore, only the results for the TSA are considered in the following due to its superior performance.
The third and fourth row of Figure 6 show the simulation results for the two-path channels with large and small excess delay, respectively. It was observed in Section 4.2 that the likelihood method is generally accurate even for multipath channels. In contrast, the accuracy of the linearization method depends on the excess delay and the SNR. The smaller the excess delay, the higher is the nonlinearity of the problem and the less accurate is the linearization method. The accuracy increases with SNR. Hence, it is supposed that the likelihood method outperforms the linearization method. Only at very high SNR, both methods are assumed to perform equally well. Surprisingly, the linearization and the likelihood method show approximately the same performance for all cases. The linearization method performs even slightly better in most cases. Only for very low SNR and a small excess delay the likelihood method outperforms the linearization method. The likelihood method seems to be more susceptible to the GDOP. Hence, the inaccuracy of the covariance matrices at low SNR barely influences the positioning accuracy. Actually, it seems that the absolute value of the weights in the weighting matrices W and W' is not crucial. Rather a correct ratio of the weights is relevant. Thus, rough soft information is sufficient as long as the ratio of the pseudo-range variances is accurate. This is fulfilled even for the inaccurate covariance matrices of the linearization method. Hence, it is suggested to apply the linearization method because of its lower computational complexity.
For the two-path channel with large excess delay (Figure 6e, f), the RMSE with or without soft information is almost the same since the multipath components can already be separated by the estimator quite well. For a small excess delay (Figure 6g, h), the RMSE with soft information is much closer to the CRLB than without soft information. With respect to SNR, a gain of approximately 7-10 dB is achieved (see Figure 6g). Furthermore, positioning with soft information is less susceptible to the GDOP (see Figure 6h). Thus, soft information is well suited to mitigate severe multipath propagation. The smaller the excess delay is, the more important it is to apply soft information for positioning.
The influence of the GDOP can be neglected for the scenario with small average GDOP. The curves labeled with "S" indicate that even for one-shot estimation without oversampling a positioning accuracy much smaller than the distance corresponding to the symbol duration, d s , is achieved for all channel models.
For all simulations, a LOS path has been assumed so far. Hence, the estimated TOA corresponds to distance between transmitter and receiver. However, in urban or indoor environments, the LOS path is often blocked as already mentioned in Section 2.1. Therefore, the influence of NLOS propagation is discussed here. In case of NLOS, a modeling error is introduced that reduces the positioning accuracy significantly. The proposed soft channel parameter estimator does not take a priori information about the physical channel (e.g., probability of NLOS) into account and, hence, is not able to detect such a modeling error. The obtained soft information can only be used to mitigate multipath propagation. In order to mitigate NLOS effects, further processing has to be done (e.g., [24]).
Nevertheless, multipath mitigation is an important issue. The multipath mitigation ability of the proposed soft channel parameter estimation has been presented for M = 2 paths due to clarity and simplicity reasons. The influence of the number of multipath components is as follows: The complexity of the soft channel parameter estimator increases with the number of multipath components. Furthermore, the reliability of the estimates decreases with M. Hence, the positioning accuracy deteriorates. If M is large and the scatterers are closely spaced (dense multipath), the estimator becomes biased and the positioning accuracy saturates. In general, it is suggested to consider only the dominant paths if M is large.
It was mentioned before that the TSA may diverge. Divergence occurred for large GDOP when the initial guess was far from the true position. i This happened only rarely. The initial guess is determined by the WLS method which is very susceptible to the GDOP. Hence, the starting position may be far away from the true position for large GDOP.
As mentioned in Section 2.2, PSO does not assure global convergence. For both two-path channels, PSO sometimes converges prematurely. In most of these cases, it converges to a boundary of the search space, such that the premature convergence can be detected (outage). In Figure 7, the outage rates are shown for both two-path channels: The dashed lines (i) and (iii) denote the probability that the delay estimation fails for one RO and the solid lines (ii) and (iv) denote the probability that two or more ROs fail. If the delay estimation fails for one RO, the position of the MS can be determined nevertheless since only three ROs are necessary for positioning in two dimensions. Only if two or more ROs fail, the position estimation fails, too. By adding more ROs, the outage rate for positioning can be decreased to an arbitrary small amount. The outage rates for the two-channel models differ significantly. For the two-path channel with large excess delay (Δτ 2 [T s , 2T s ]), the outage rates (i) and (ii) are negligible. In contrast, the outage rates (iii) and (iv) for the two-path channel with small excess delay ( τ 2 ∈ [ T s 10 , T s ]) are quite high at low SNR but decrease significantly with increasing SNR. The smaller the excess delay is, the higher is the probability that PSO converges prematurely.

Conclusions
In this paper, a channel parameter estimator based on the maximum-likelihood approach is proposed for joint communication and positioning. The parameters of the physical channel (e.g., TOA) and the equivalent discrete-time channel model are estimated jointly. In order to mitigate multipath propagation effects and to improve the positioning accuracy, soft information concerning the parameter estimates is used. Two different methods to obtain soft information are proposed: The linearization and the likelihood method.
The accuracy of the methods depends on the nonlinearity of the parameter estimation problem, which is evaluated by the curvature measures of Bates and Watts. It is shown that the likelihood method is always accurate for the parameter estimation problem. The linearization method is only accurate in a singlepath channel or at high SNR for a multipath channel. Nevertheless, Monte Carlo simulations for a twodimensional positioning problem show that this has only very little influence on the positioning. The positioning algorithms that exploit the soft information obtained by the linearization and the likelihood method perform equally well. For severe multipath propagation, the RMSEs for the weighted positioning algorithms are closer to the CRLB than the RMSE of positioning without weighting. A gain of approximately 7-10 dB can be achieved. Hence, multipath propagation effects can be mitigated significantly, even for one-shot estimation without oversampling. Based on these results, it is suggested to apply the linearization method because of its lower computational complexity.
Endnotes a For oversampling with factor J it follows: T = T s J . b The mean squared error of the channel estimates ĥ l is reduced in comparison to the mean squared error of the least-squares channel estimates h l , if the number of parameters, 3M, is less than the number of channel coefficients, L + 1, to be estimated. For simulation results please refer to [4]. c C approx corresponds toV a in [11] for a complex-valued problem instead of a realvalued problem. d The superscript T , which denotes tangential, should not be mistaken for the superscript T , which denotes the transpose of a matrix. e In [13] a simplified method to determine the maximum relative curvatures is introduced based on linear transformations of the coordinates in the parameter and the sample space. This method is neglected here because it is out of the scope of this paper. f In a two-dimensional TOA scenario at least three ROs are required. For positioning in three dimensions a fourth RO is needed. g For the linearization method the variance of the TOA corresponds to the 3rd diagonal entry of the approximate covariance matrix C approx . h Furthermore, channel parameter estimation was performed for the LMM described in [18] with the true parameters θ as initial guess. Since PSO and the LMM provided approximately the same performance, only PSO is considered here for conciseness. i The outliers due to divergence were not considered in the calculation of the RMSE.