The novelty of our proposed algorithm lies in the use of high-order approximation for a nonlinear derivative of the log-likelihood function and the use of an innovative iterative correction process to refine the approximate solution obtained via *QR* transformation adopted from a matrix computation theory. The most important is such an approach indeed yields very satisfying results. We shall now present our scheme in detail.

### 3.1 Formulation of the maximum-likelihood estimation

From (5), a log-likelihood function can be derived as

\begin{array}{ll}\phantom{\rule{6.5pt}{0ex}}ln\Lambda & =-Nln\left(\pi {\sigma}_{w}^{2}\right)-\frac{1}{{\sigma}_{w}^{2}}{\u2225\mathbf{r}-\frac{1}{\sqrt{N}}{\mathbf{D}}_{\delta}{\mathbf{F}}_{N}^{H}\mathbf{X}{\mathbf{F}}_{v}\mathbf{h}\u2225}^{2}\\ =-Nln\left(\pi {\sigma}_{w}^{2}\right)-\frac{1}{{\sigma}_{w}^{2}}\left({\mathbf{r}}^{H}\mathbf{r}-\frac{2}{\sqrt{N}}\text{Re}\left\{{\mathbf{r}}^{H}{\mathbf{D}}_{\delta}{\mathbf{F}}_{N}^{H}\mathbf{X}{\mathbf{F}}_{v}\mathbf{h}\right\}\right.\\ \phantom{\rule{9em}{0ex}}\left(\right)close=")">+\frac{1}{N}{\mathbf{h}}^{H}{\mathbf{F}}_{v}^{H}{\mathbf{X}}^{H}{\mathbf{F}}_{N}{\mathbf{D}}_{\delta}^{H}{\mathbf{D}}_{\delta}{\mathbf{F}}_{N}^{H}\mathbf{X}{\mathbf{F}}_{v}\mathbf{h}\end{array}\n \n \n =\n -\n N\n ln\n \n \n \pi \n \n \sigma \n w\n 2\n \n \n \n -\n \n 1\n \n \sigma \n w\n 2\n \n \n \n \n \n r\n H\n \n r\n -\n \n 2\n \n N\n \n \n Re\n \n \n \n r\n H\n \n \n D\n \delta \n \n \n F\n N\n H\n \n X\n \n F\n v\n \n h\n \n \n \n \n \n \n \n \n \n close=")">\n \n +\n \n h\n H\n \n \n F\n v\n H\n \n \n X\n H\n \n X\n \n F\n v\n \n h\n \n \n ,\n \n

(7)

where Re{⋅} means real part. Now, by setting ∂ ln *Λ*/∂**h** = **0**, we can obtain a solution for **h** that will render a maximum ln *Λ* for a fixed *δ*. This is just an ML estimate of **h** at a fixed *δ* given by[10]

\widehat{\mathbf{h}}=\frac{1}{\sqrt{N}}{\left({\mathbf{F}}_{v}^{H}{\mathbf{X}}^{H}\mathbf{X}{\mathbf{F}}_{v}\right)}^{-1}{\mathbf{F}}_{v}^{H}{\mathbf{X}}^{H}{\mathbf{F}}_{N}{\mathbf{D}}_{\delta}^{H}\mathbf{r}.

(8)

Constant modulus training sequence has been proven optimal for channel estimation[32]. Chu sequence[33], for example, falls onto this category. We shall use a Chu sequence given by{X}_{k}={e}^{j\pi m{k}^{2}/N}*m* being any integer relatively prime to *N*}. This results in **X**^{H}**X** = **I**_{
N
}. Then, (8) can be simplified to

\widehat{\mathbf{h}}=\frac{1}{N\sqrt{N}}{\mathbf{F}}_{v}^{H}{\mathbf{X}}^{H}{\mathbf{F}}_{N}{\mathbf{D}}_{\delta}^{H}\mathbf{r}.

(9)

Next, setting\frac{\partial ln\Lambda}{\partial \delta}=0 leads to

\text{Im}\left\{{\mathbf{r}}^{H}\mathbf{Q}{\mathbf{D}}_{\delta}{\mathbf{F}}_{N}^{H}\mathbf{X}{\mathbf{F}}_{v}\mathbf{h}\right\}=0,

(10)

where **Q** = diag{0, 1, …, *N* - 1} and Im{⋅} means imaginary part. Replacing the **h** in (10) by the\widehat{\mathbf{h}} of (9), we find

\begin{array}{l}\begin{array}{c}\text{Im}\left\{{\mathbf{r}}^{H}\mathbf{Q}{\mathbf{D}}_{\delta}{\mathbf{F}}_{N}^{H}\mathbf{X}{\mathbf{F}}_{v}{\mathbf{F}}_{v}^{H}{\mathbf{X}}^{H}{\mathbf{F}}_{N}{\mathbf{D}}_{\delta}^{H}\mathbf{r}\right\}\phantom{\rule{1em}{0ex}}=\text{Im}\left\{{\mathbf{r}}^{H}{\mathbf{D}}_{\delta}\mathbf{G}{\mathbf{D}}_{\delta}^{H}\mathbf{r}\right\}\end{array}\\ \phantom{\rule{1.5em}{0ex}}=\sum _{m=0}^{N-1}\sum _{n=0}^{N-1}\text{Im}\left\{{r}_{m}^{*}{r}_{n}{g}_{m,n}{e}^{j2\pi \delta \left(m-n\right)/N}\right\}=0,\end{array}

(11)

where *g*_{m,n} is the (*m* + 1, *n* + 1)th element of an *N* × *N* matrix **G** given by

\begin{array}{l}\mathbf{G}=\mathbf{Q}{\mathbf{F}}_{N}^{H}\mathbf{X}{\mathbf{F}}_{v}{\mathbf{F}}_{v}^{H}{\mathbf{X}}^{H}{\mathbf{F}}_{N}=\left\{{g}_{m,n}\right\}\\ \phantom{\rule{0.75em}{0ex}}=\left[\begin{array}{cc}0& 0\\ \sum _{l=0}^{v-1}\sum _{k=0}^{N-1}{X}_{k}{e}^{-j2\pi \left(l-1\right)k/N}\sum _{k\text{\'}=0}^{N-1}{X}_{k\text{\'}}^{\ast}{e}^{j2\pi \mathit{\text{lk}}\text{\'}/N}& \sum _{l=0}^{v-1}\sum _{k=0}^{N-1}{X}_{k}{e}^{-j2\pi \left(l-1\right)k/N}\sum _{k\text{\'}=0}^{N-1}{X}_{k\text{\'}}^{\ast}{e}^{j2\pi \left(l-1\right)k\text{\'}/N}\\ \vdots & \vdots \\ \left(N-1\right)\sum _{l=0}^{v-1}\sum _{k=0}^{N-1}{X}_{k}{e}^{-j2\pi \left(l-N+1\right)k/N}\sum _{k\text{\'}=0}^{N-1}{X}_{k\text{\'}}^{\ast}{e}^{j2\pi \mathit{\text{lk}}\text{\'}/N}& \left(N-1\right)\sum _{l=0}^{v-1}\sum _{k=0}^{N-1}{X}_{k}{e}^{-j2\pi \left(l-N+1\right)k/N}\sum _{k\text{\'}=0}^{N-1}{X}_{k\text{\'}}^{\ast}{e}^{j2\pi \left(l-1\right)k\text{\'}/N}\end{array}\right.\\ \phantom{\rule{2.25em}{0ex}}\left(\right)close="]">\begin{array}{cc}\cdots & 0\\ \cdots & \sum _{l=0}^{v-1}\sum _{k=0}^{N-1}{X}_{k}{e}^{-j2\pi \left(l-1\right)k/N}\sum _{k\text{\'}=0}^{N-1}{X}_{k\text{\'}}^{\ast}{e}^{j2\pi \left(l-N+1\right)k\text{\'}/N}\\ \cdots & \vdots \\ \cdots & \left(N-1\right)\sum _{l=0}^{v-1}\sum _{k=0}^{N-1}{X}_{k}{e}^{-j2\pi \left(l-N+1\right)k/N}\sum _{k\text{\'}=0}^{N-1}{X}_{k\text{\'}}^{\ast}{e}^{j2\pi \left(l-N+1\right)k\text{\'}/N}\end{array}& \phantom{\rule{.3em}{0ex}}.\end{array}\n

(12)

Since (11) is now channel independent, we have decoupled *δ* from **h** and (11) can thus be solved for *δ* alone. However, (11) is highly nonlinear in *δ* and contains infinite number of solutions. We only desire the one solution that yields the global maximum of ln *Λ*. The task is not possible by analytical means. However, we can resort to the numerical method.

### 3.2 The approximation approach

We now expand the exponential term *e*^{j 2π(m - n)δ/N} in (11) into an infinite series (Taylor series expansion) and then truncate this infinite series beyond terms with power order higher than *K* to obtain

\begin{array}{ll}\phantom{\rule{6.5pt}{0ex}}{e}^{j2\pi \left(m-n\right)\delta /N}& =1+j\frac{2\pi \left(m-n\right)}{N}\delta -\frac{1}{2!}{\left[\frac{2\pi \left(m-n\right)}{N}\right]}^{2}{\delta}^{2}-\cdots \\ \phantom{\rule{1em}{0ex}}+\frac{1}{K!}{\left[j\frac{2\pi \left(m-n\right)}{N}\right]}^{K}{\delta}^{K}.\end{array}

(13)

Substituting this approximate expression of (13) for the exponential term into (11), we will get a *K* degree polynomial of *δ* with real coefficients. Therefore, solving (11) becomes equivalent to finding the roots of a real polynomial of degree *K*. This is an eigenvalue problem in matrix computations[26]. Express the *K* degree polynomial as

p\left(\delta \right)={a}_{0}+{a}_{1}\delta +\cdots +{a}_{K-1}{\delta}^{K-1}+{\delta}^{K}=0.

(14)

Notice that we have normalized the polynomial such that the coefficient *a*_{
K
} is unity. This can be easily done just by dividing the original polynomial equation by the original nonzero *a*_{
K
}. To make it distinguishable, denote the original nonzero *a*_{
K
} by a different symbol{\tilde{a}}_{K}. It then can be readily verified that the coefficients {*a*_{
k
}} in (14) are given by

\begin{array}{l}{a}_{k}=\frac{1}{{\tilde{a}}_{K}}\sum _{m=0}^{N-1}\sum _{n=0}^{N-1}\text{Im}\left\{\frac{{r}_{m}^{*}{r}_{n}{g}_{m,n}}{k!}{\left[\frac{j2\pi \left(m-n\right)}{N}\right]}^{k}\right\}\\ \phantom{\rule{1.25em}{0ex}}=\frac{1}{{\tilde{a}}_{K}\cdot k!}{\left(\frac{2\pi}{N}\right)}^{k}\text{Im}\left\{{j}^{k}\sum _{n=0}^{k}{\left(-1\right)}^{n}{C}_{n}^{k}{\mathbf{r}}^{H}{\mathbf{Q}}^{k-n}\mathbf{G}{\mathbf{Q}}^{n}\mathbf{r}\right\},\\ \phantom{\rule{3em}{0ex}}k=0,1,\dots ,K\end{array}

(15)

where{C}_{n}^{m}=\frac{m!}{\left(m-n\right)!n!}, *m* ≥ *n*.

A word is in order here. We define our approximation order as *K* when the polynomial degree in (14) is *K*. However, note that (14) is the approximation of (11) which is a derivative of the log-likelihood function of (7). Thus, here the Taylor series truncation is performed after differentiation of the log-likelihood function, while in[12] and[14], the Taylor series truncation is performed directly on the log-likelihood function. According to our definition,[12] and[14] are actually using first-order approximations. Note that when *K* = 1, we also have a first-order approximation algorithm.

### 3.3 *QR* transformations

Now, from (13), we construct a *K* × *K* square matrix called the companion matrix as[27]

\mathit{A}=\left[\begin{array}{cccccc}0& 0& \cdots & \cdots & 0& -{a}_{0}\\ 1& 0& \ddots & \ddots & 0& -{a}_{1}\\ 0& 1& \ddots & \ddots & \vdots & \vdots \\ \vdots & 0& \ddots & \ddots & \vdots & \vdots \\ \vdots & \vdots & \ddots & \ddots & 0& \vdots \\ 0& \cdots & \cdots & 0& 1& -{a}_{K-1}\end{array}\right].

(16)

It has been known that a square matrix can be triangularized by iterative *QR* transformations[26–28], where *Q* is an orthogonal matrix and *R* is an upper triangular matrix. When all the roots of the polynomial of (14) are real, these roots will constitute the diagonal elements of the eventual triangularized *A* matrix that are also the eigenvalues of *A*[26–28]. In case the polynomial equation of (14) has complex roots in conjugate pairs, the iterative *QR* transformations will lead to a Hessenberg matrix[26] which is quasi-triangular. That is, the subdiagonal immediately below the main diagonal will contain nonzero elements. However, on the main diagonal, each real root will appear as an element while each complex root will not appear but is replaced by a certain unpredictable number, either real or complex, as an element[26]. There is no way of telling which elements are real roots and which are the numbers replacing complex roots. However, it can be certain that the one real root is there to be the desired global solution for the CFO estimate which must be real. We shall detail later how to find this global solution by our algorithm design.

The process of iterative *QR* transformations called the Gram-Schmidt *QR* transformations involves two operation phases alternatively performed. The first operation phase is called the Gram-Schmidt *QR* decomposition. The *QR* decomposition is carried out by the Gram-Schmidt orthogonalization process[26, 34]. The second operation phase is an iterative transformation process (or a triangularization process). However, when the polynomial degree *K* gets too high, the triangularization process will begin to produce less accurate results[34]. Fortunately, for the OFDM tracking problem at hand, we do not have to use a very high order of *K* and only need a crude result out of the triangularization process since a complementing correction process coupled with the triangularization process will carry the burden and take care the rest of the matter to eventually bring a final CFO estimate result to great accuracy. As a result, we do not need to execute a great many iterations of *QR* transformations. It can be demonstrated that just a couple of iterations of *QR* transformations will suffice. For fine frequency synchronization, the frequency drift is less than half the carrier spacing (|*δ*| < 0.5)[9, 10, 12, 14]. In this case, computer experiments show that our order 2 algorithm can produce results with good accuracy. When a wider tracking range is desired (|*δ*| > 0.5), we will need at least an order 4 algorithm.

The iterative *QR* transformations are carried out as follows:

Let the square matrix *A* of (13) be denoted as

\mathit{A}=\left[\begin{array}{cccc}{\mathbf{a}}_{1}& {\mathbf{a}}_{2}& \cdots & {\mathbf{a}}_{K}\end{array}\right],

(17)

where {**a**_{
k
}, *k* = 1, 2, …, *K*} are *K* × 1 column vectors of **A**. We then carry out the Gram-Schmidt orthogonalization process as follows:

Define the projection of a vector **a** on a unit vector **e** as

{\mathrm{proj}}_{\mathbf{e}}\mathbf{a}=\frac{{\mathbf{a}}^{T}\mathbf{e}}{{\u2225\mathbf{e}\u2225}^{2}}\mathbf{e},

(18)

where *T* denotes the transpose and\u2225e\u2225=\sqrt{{e}^{T}e} is the Euclidean norm of **e**. Then, let

\begin{array}{l}\phantom{\rule{3em}{0ex}}{\mathbf{u}}_{1}={\mathbf{a}}_{1},\\ \phantom{\rule{3em}{0ex}}{\mathbf{u}}_{2}={\mathbf{a}}_{2}-{\text{proj}}_{{\mathbf{e}}_{1}}{\mathbf{a}}_{2},\\ \phantom{\rule{3em}{0ex}}{\mathbf{u}}_{3}={\mathbf{a}}_{3}-{\text{proj}}_{{\mathbf{e}}_{1}}{\mathbf{a}}_{3}-{\text{proj}}_{{\mathbf{e}}_{2}}{\mathbf{a}}_{3},\\ \phantom{\rule{6em}{0ex}}\vdots \\ \phantom{\rule{3em}{0ex}}{\mathbf{u}}_{K}={\mathbf{a}}_{K}-\sum _{k=1}^{K-1}\mathrm{p}{\text{roj}}_{{\mathbf{e}}_{k}}{\mathbf{a}}_{K},\\ \text{with}\phantom{\rule{1em}{0ex}}{\mathbf{e}}_{k}=\frac{{\mathbf{u}}_{k}}{\left|\left|{\mathbf{u}}_{k}\right|\right|}.\end{array}

(19)

The orthogonal matrix *Q* is now formed as

\mathit{Q}=\left[\begin{array}{cccc}{\mathbf{e}}_{1}& {\mathbf{e}}_{2}& \cdots & {\mathbf{e}}_{K}\end{array}\right].

(20)

The upper triangular matrix *R* is given by

\mathit{R}={\mathit{Q}}^{T}\mathit{A}.

(21)

This completes the Gram-Schmidt *QR* decomposition process. Next, we start the following iterations:

{\mathit{A}}_{1}=\mathit{\text{RQ}}.

Construct *Q*_{1} from *A*_{1} using the Gram-Schmidt orthogonal process as done above.

Construct{\mathit{R}}_{1}={\mathit{Q}}_{1}^{T}{\mathit{A}}_{1}.

\begin{array}{l}{\mathit{A}}_{2}={\mathit{R}}_{1}{\mathit{Q}}_{1}\\ \phantom{\rule{1.25em}{0ex}}\vdots \\ {\mathit{A}}_{L}={\mathit{R}}_{L-1}{\mathit{Q}}_{L-1}.\end{array}

(22)

When *L* is sufficiently large, we will find *A*_{
L
} to be upper triangular with diagonal elements equal to the roots of the polynomial of (14). Now, the Gram-Schmidt *QR* transformations are completed. Note that, as mentioned earlier, in actual operations when coupled with a complementing correction process, we need not to execute many iterations of the *QR* transformations (*L* needs not be large). In our computer simulations, we have used *L* = 2 and up to *K* = 6.

Of the *K* roots obtained, only one root will be desired. Assume the *K* roots are *δ*_{1}, *δ*_{2}, …, *δ*_{
K
}. Substitute each root into (9), then into (7), with the Chu sequence chosen for **X** earlier. The one root that maximizes (7) is the sought-after solution{\widehat{\delta}}_{0}, i.e.,

{\widehat{\delta}}_{0}=\text{arg}\phantom{\rule{.3em}{0ex}}\underset{{\delta}_{k}}{\text{max}}\left[ln\Lambda \left({\delta}_{k}\right)\right],\phantom{\rule{1.5em}{0ex}}k=1,2,\dots ,K.

(23)

For the special case of the first-order approximation when *K* = 1, (14) directly leads to{\widehat{\delta}}_{0}=-{a}_{0}/{a}_{1}, and hence, no *QR* transformation needs to be executed. Furthermore, in case (14) possesses complex roots, *A*_{
L
} will be in the Hessenberg form (quasi-triangular) as stated earlier. However, it will not matter. Exactly as in the all real roots case, we simply substitute each diagonal element into the log-likelihood function of (7). The one that yields the maximum log-likelihood function is the global solution just as given by (23).

### 3.4 The iterative correction process

We now must note that the global solution{\widehat{\delta}}_{0} obtained as described above was via an approximation method truncating a Taylor series expansion beyond terms with power order higher than *K*, i.e., the approximate expression of (13) was used. Therefore, the solution{\widehat{\delta}}_{0} is still an approximate solution (with *L* = 2 as is to be used, this solution will be even cruder). We can further refine this solution by an iterative correction process as follows.

With the initial estimate{\widehat{\delta}}_{0} (this is why we have used the subscript 0 to begin with), we can correct the received signal vector as

{\mathbf{r}}_{1}={\mathbf{D}}_{{\widehat{\delta}}_{0}}^{H}{\mathbf{r}}_{0},

(24)

where **r**_{0} = **r** is just the original received signal vector. After this CFO correction, **r**_{1} is expected to be cleaner than **r**_{0} (= **r**). We thus use this corrected signal or replace **r** with **r**_{1} in (9) to come up with a better log-likelihood function from which a new polynomial of degree *K* is then generated, i.e., a new (9) is generated. Carrying out the iterative Gram-Schmidt *QR* transformations again as above, we get a new and better CFO estimate{\widehat{\delta}}_{1}. Continuing this way, eventually at a certain *M* th iteration, we should have{\widehat{\delta}}_{M}\to 0. Then, our final or overall CFO estimate can be obtained as

\widehat{\delta}=\sum _{m=0}^{M}{\widehat{\delta}}_{m}\to \delta .

(25)

To avoid confusion, we shall, from here on, call\widehat{\delta} the final or overall CFO estimate, define the *i* th interim CFO estimate as{\widehat{\delta}}_{\left(i\right)}=\sum _{m=0}^{i}{\widehat{\delta}}_{m},*i* = 0, 1, …, *M*, (notice that{\widehat{\delta}}_{\left(M\right)}=\widehat{\delta}), and refer to *δ* as the true CFO (the very beginning CFO to be estimated).

Now, replacing the final CFO estimate of (25) for the *δ* in (9), we can get the CIR estimator\widehat{\mathbf{h}} immediately.

We will summarize the iterative adaptive *QR* transformation algorithm consisting of two phases alternately. To be clear, the matrices *Q*_{
l
}, *R*_{
l
}, and *A*_{
l
} mentioned above at the first phase are all added with an extra subscript *m* as *Q*_{l,m}, *R*_{l,m}, and *A*_{l,m}, respectively, corresponding to the *m* th run of the second phase. The proposed iterative adaptive algorithm is listed as below:

Initial condition: **r**_{0} = **r** and *A*_{0,0} = *A*.

For *m* = 0, 1, …, *M*

For *l* = 0, 1, …, *L* - 1

Construct orthogonal matrix *Q*_{l,m} from companion matrix *A*_{l,m} using Gram-Schmidt orthogonal process.

Construct{\mathit{R}}_{l,m}={\mathit{Q}}_{l,m}^{T}{\mathit{A}}_{l,m}.

Construct *A*_{l+ 1,m} = *R*_{l,m}*Q*_{l,m}.

End

Choose the *K* diagonal elements of *A*_{L,m} to be *δ*_{
k
}, *k* = 1, 2, …, *K*.

Calculate{\widehat{\delta}}_{m}=\text{arg}\phantom{\rule{.3em}{0ex}}\underset{\delta k}{\text{max}}\left[ln\Lambda \left({\delta}_{k}\right)\right].

Construct{r}_{m+1}={D}_{\widehat{\delta}m}^{H}{r}_{m}.

Construct companion matrix *A*_{0,m + 1} from the *K* degree polynomial that approximates\text{lm}\left\{{\mathit{r}}_{m+1}^{H}{\mathit{D}}_{\delta -\widehat{\delta}\left(m\right)}\mathit{G}{\mathit{D}}_{\delta -\widehat{\delta}\left(m\right)}^{H}{\mathit{r}}_{m+1}\right\}=0.

End

The final or overall CFO estimate is\widehat{\delta}=\sum _{m=0}^{M}{\widehat{\delta}}_{m}\to \delta, and the CIR estimate is\widehat{\mathbf{h}}=\frac{1}{N\sqrt{N}}{\mathbf{F}}_{v}^{H}{\mathbf{X}}^{H}{\mathbf{F}}_{N}{\mathbf{D}}_{\widehat{\delta}}^{H}\mathbf{r}.

Apparently, when a linear approximation (first-order approximation) is used for a high-degree polynomial, the precision becomes less as compared to a quadratic approximation (second-order approximation). The computational load for a high-order approximation algorithm would be heavier than that of a first-order approximation algorithm since more terms are involved in computations. However, what is important is that we are rewarded with tremendous improvements in CFO tracking range, estimation accuracy, and SER performance. These improvements will be demonstrated with performance comparisons by computer simulations in Section 4.