From Equation 4, we can find that the LMSbased channel estimation method never takes advantage of the sparse structure in channel vector h. To get a better understanding, the LMSbased channel estimation methods can be expressed as
\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+\mathrm{Adaptive}\phantom{\rule{0.25em}{0ex}}\mathrm{error}\phantom{\rule{0.25em}{0ex}}\mathrm{update}.
(11)
Unlike the conventional LMS method in Equation 11, sparse LMS algorithms exploit channel sparsity by introducing several ℓ_{p}norm penalties to their cost functions with 0 ≤ p ≤ 1. The LMSbased adaptive sparse channel estimation methods can be written as
\tilde{\mathbf{h}}\left(n+1\right)=\underset{\mathrm{sparse}\phantom{\rule{0.25em}{0ex}}\mathrm{LMS}}{\underset{\u23df}{\underset{\mathrm{LMS}}{\underset{\u23df}{\tilde{\mathbf{h}}\left(n\right)+\mathrm{Adaptive}\phantom{\rule{0.25em}{0ex}}\mathrm{error}\phantom{\rule{0.25em}{0ex}}\mathrm{update}}}+\mathrm{Sparse}\phantom{\rule{0.25em}{0ex}}\mathrm{penalty}}}.
(12)
Equation 12 motivated us to introduce different sparse penalties in order to take advantage of the sparse structure as prior information. Here, if we analogize the updated equation in Equation 12 to the CSbased sparse channel estimation [10, 11], one can find that more accurate sparse channel approximation is adopted, and then better estimation accuracy could be obtained and vice versa. The conventional sparse penalties are ℓ_{p}norm (0 < p ≤ 1) and ℓ_{0}norm, respectively. Since ℓ_{0}norm penalty on sparse signal recovery is a wellknown NPhard problem, only the ℓ_{p}norm (0 < p ≤ 1)based sparse LMS approaches have been proposed for adaptive sparse channel estimation. Compared to conventional two sparse LMS algorithms (ZALMS and RZALMS), LPLMS can achieve a better estimation performance. However, there still exists a performance gap between the LPLMSbased channel estimator and the optimal one. As the development of mathematics continues, a more accurate sparse approximate algorithm to ℓ_{0}norm LMS (L0LMS) is proposed in [22] and analyzed in [23]. However, they never considered any application on sparse channel estimation. In this paper, the L0LMS algorithm is applied in adaptive sparse channel estimation to improve the estimation performance.
It is easily found that exploitation of more accurate sparse structure information can obtain a better estimation performance. In the following, we investigate sparse LMSbased adaptive sparse channel estimation methods using different sparse penalties.
3.1 LMSbased adaptive sparse channel estimation
The following are the LMSbased adaptive sparse channel estimation methods:

ZALMS. To exploit the channel sparsity in time domain, the cost function of ZALMS [18] is given by
{L}_{\mathrm{ZA}}\left(n\right)=\frac{1}{2}{e}^{2}\left(n\right)+{?}_{\mathrm{ZA}}{\tilde{\mathbf{h}}\left(n\right)}_{1},
(13)
where ?_{ZA} is a regularization parameter to balance the estimation error and sparse penalty of \tilde{\mathbf{h}}\left(n\right). The corresponding updated equation of ZALMS is
\begin{array}{l}\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)\mu \frac{?{L}_{\mathrm{ZA}}\left(n\right)}{?\tilde{\mathbf{h}}\left(n\right)}\\ \phantom{\rule{3.6em}{0ex}}=\tilde{\mathbf{h}}\left(n\right)+\mathit{\mu e}\left(n\right)\mathbf{x}\left(t\right){?}_{\mathrm{ZA}}\mathrm{sgn}\left(\tilde{\mathbf{h}}\left(n\right)\right),\end{array}
(14)
where ?_{ZA}?=?µ?_{ZA} and sgn(·) is a componentwise function which is defined as
\mathrm{sgn}\left(\mathbf{h}\right)=\left\{\begin{array}{c}\hfill \begin{array}{c}\hfill 1,h>0\hfill \\ \hfill 0,h=0\hfill \end{array}\hfill \\ \hfill 1,h<0\hfill \end{array}\right..
? From the updated equation in Equation 14, the function of its second term is compressing small channel coefficients as zero in high probability. That is to say, most of the small channel coefficients can be simply replaced by zeros, which speeds up the convergence of this algorithm.

RZALMS. ZALMS cannot distinguish between zero taps and nonzero taps as it gives the same penalty to all the taps which are often forced to be zero with the same probability; therefore, its performance will degrade in less sparse systems. Motivated by the reweighted l_{1}norm minimization recovery algorithm [19], Chen et al. have proposed a heuristic approach to reinforce the zero attractor which was termed as the RZALMS [18]. The cost function of RZALMS is given by
{L}_{\mathrm{RZA}}\left(n\right)=\frac{1}{2}{e}^{2}\left(n\right)+{?}_{\mathrm{RZA}}\underset{i=1}{\overset{N}{{\displaystyle ?}}}\phantom{\rule{0.25em}{0ex}}\mathrm{log}\left(1+{?}_{\mathrm{RZA}}\left{h}_{i}\left(n\right)\right\right),
(15)
where ?_{RZA}?>?0 is the regularization parameter and ?_{RZA}?>?0 is the positive threshold. In computer simulation, the threshold is set as ?_{RZA}?=?20 which is also suggested in [18]. The i th channel coefficient {\tilde{h}}_{i}\left(n\right) is then updated as
\begin{array}{l}{\tilde{h}}_{i}\left(n+1\right)={\tilde{h}}_{i}\left(n\right)\mu \frac{?{L}_{\mathrm{RZA}}\left(n\right)}{?{\stackrel{~}{h}}_{i}\left(n\right)}\\ \phantom{\rule{4em}{0ex}}={\tilde{h}}_{i}\left(n\right)+\mathit{\mu e}\left(n\right)x\left(ti\right){?}_{\mathrm{RZA}}\frac{\mathrm{sgn}\left({\tilde{h}}_{i}\left(n\right)\right)}{1+{?}_{\mathrm{RZA}}\left{\tilde{h}}_{i}\left(n\right)\right},\end{array}
(16)
where ?_{RZA}?=?µ?_{RZA}?_{RZA}. Equation 16 can be expressed in the vector form as
\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+\mathit{\mu e}\left(n\right)\mathbf{x}\left(t\right){?}_{\mathrm{RZA}}\frac{\mathrm{sgn}\left(\tilde{\mathbf{h}}\left(n\right)\right)}{1+{?}_{\mathrm{RZA}}\left\tilde{\mathbf{h}}\left(n\right)\right}.
? Please note that the second term in Equation 16 attracts the channel coefficients {\tilde{h}}_{i}\left(n\right),i=1,2,\dots ,N whose magnitudes are comparable to 1/?_{RZA} to zeros.

LPLMS. Following the above ideas in [18], LPLMSbased adaptive sparse channel estimation method has been proposed in [20]. The cost function of LPLMS is given by
{L}_{\mathrm{LP}}\left(n\right)=\frac{1}{2}{e}^{2}\left(n\right)+{?}_{\mathrm{LP}}{\tilde{\mathbf{h}}\left(n\right)}_{p},
(17)
where ?_{LP}?>?0 is a regularization parameter. The corresponding updated equation of LPLMS is given as
\begin{array}{l}\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)\mu \frac{?{L}_{\mathrm{LP}}\left(n\right)}{?\tilde{\mathbf{h}}\left(n\right)}\\ \phantom{\rule{4em}{0ex}}=\tilde{\mathbf{h}}\left(n\right)+\mathit{\mu e}\left(n\right)\mathbf{x}\left(t\right){?}_{\mathrm{LP}}\frac{{\tilde{\mathbf{h}}\left(n\right)}_{p}^{1p}\mathrm{sgn}\left\{\tilde{\mathbf{h}}\left(n\right)\right\}}{{?}_{\mathrm{LP}}+{\left\tilde{\mathbf{h}}\left(n\right)\right}^{\left(1p\right)}},\end{array}
(18)
where ?_{LP}?=?µ?_{LP} and ?_{LP}?>?0 is a small positive parameter.
? L0LMS (proposed). Consider l_{0}norm penalty on LMS cost function to produce a sparse channel estimator as this penalty term forces the channel tap values of \tilde{\mathbf{h}}\left(n\right) to approach zero. The cost function of L0LMS is given by
{L}_{L0}\left(n\right)=\frac{1}{2}{e}^{2}\left(n\right)+{?}_{L0}{\tilde{\mathbf{h}}\left(n\right)}_{0},
(19)
where ?_{L 0}?>?0 is a regularization parameter and {\tilde{\mathbf{h}}\left(n\right)}_{0} denotes l_{0}norm sparse penalty function which counts the number of nonzero channel taps of \tilde{\mathbf{h}}\left(n\right). Since solving the l_{0}norm minimization is an NPhard problem [17], to reduce computational complexity, we replace it with an approximate continuous function:
{\tilde{\mathbf{h}}\left(n\right)}_{0}\u02dc\underset{i=0}{\overset{N1}{{\displaystyle ?}}}\left(1{e}^{\xdf\left{\tilde{h}}_{i}\left(n\right)\right}\right).
(20)
? The cost function in Equation 19 can then be rewritten as
{L}_{L0}\left(n\right)=\frac{1}{2}{e}^{2}\left(n\right)+{?}_{L0}\underset{i=0}{\overset{N1}{{\displaystyle ?}}}\left(1{e}^{\xdf\left{\tilde{h}}_{i}\left(n\right)\right}\right).
(21)
? The firstorder Taylor series expansion of exponential function {e}^{\xdf\left{\tilde{h}}_{i}\left(n\right)\right} is given as
{e}^{\xdf\left{\tilde{h}}_{i}{}_{i}\left(n\right)\right}\u02dc\left\{\begin{array}{cc}\hfill 1\xdf\left{\tilde{h}}_{i}\left(n\right)\right,\hfill & \hfill \mathrm{when}\phantom{\rule{0.25em}{0ex}}\left{\tilde{h}}_{i}\left(n\right)\right=\frac{1}{\xdf}\hfill \\ \hfill 0,\hfill & \hfill \mathrm{otherwise}\hfill \end{array}\right..
(22)
? The updated equation of L0LMSbased adaptive sparse channel estimation can then be derived as
\begin{array}{l}\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)\\ \phantom{\rule{4.2em}{0ex}}+\mathit{\mu e}\left(n\right)\mathbf{x}\left(t\right){?}_{L0}\xdf\mathrm{sgn}\left(\tilde{\mathbf{h}}\left(n\right)\right){e}^{\xdf\tilde{\mathbf{h}}\left(n\right)},\end{array}
(23)
where ?_{L 0}?=?µ?_{L 0}. Unfortunately, the exponential function in Equation 23 will also cause high computational complexity. To further reduce the complexity, an approximation function J\left(\tilde{\mathbf{h}}\left(n\right)\right) is also proposed to the updated Equation 23. Finally, the updated equation of L0LMSbased adaptive sparse channel estimation can be derived as
\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+\mathit{\mu e}\left(n\right)\mathbf{x}\left(t\right){?}_{L0}J\left(\tilde{\mathbf{h}}\left(n\right)\right),
(24)
where ?_{
L0
}?=?µ?_{
L0
} and J\left(\tilde{\mathbf{h}}\left(n\right)\right) is defined as
J\left(\tilde{\mathbf{h}}\left(n\right)\right)=\left\{\begin{array}{cc}\hfill 2{\xdf}^{2}{\tilde{h}}_{i}\left(n\right)2\xdf\xb7\mathrm{sgn}\left({\tilde{\mathit{h}}}_{i}\left(n\right)\right)\hfill & \hfill \mathrm{when}\phantom{\rule{0.25em}{0ex}}{\tilde{h}}_{i}\left(n\right)=\frac{1}{\xdf}\hfill \\ \hfill 0,\hfill & \hfill \mathrm{otherwise},\hfill \end{array}\right.\phantom{\rule{0.25em}{0ex}}
(25)
for all i ? {1, 2,…, N}.
3.2 Improved adaptive sparse channel estimation methods
The common drawback of the above sparse LMSbased algorithms is that they are vulnerable to probabilistic scaling of their training signal x(t). In other words, LMSbased algorithms are sensitive to signal scaling [21]. Hence, it is very hard to choose a proper step size μ to guarantee stability of these sparse LMSbased algorithms even if the step size satisfies the necessary condition in Equation 10.
Let us reconsider the updated equation of LMS in Equation 4. Assuming that the n th adaptive channel estimator \tilde{\mathbf{h}}\left(n+1\right) is the optimal solution, the relationship between the (n = 1)th channel estimator \tilde{\mathbf{h}}\left(n+1\right) and input signal x(t) is given as
{\tilde{\mathbf{h}}}^{T}\left(n+1\right)\mathbf{x}\left(t\right)=y\left(t\right),
(26)
where y(t) is assumed to be ideal received signal at the receiver. To solve a convex optimization problem in Equation 26, the cost function can be constructed as [21]
\begin{array}{l}C\left(n\right)={\left(\tilde{\mathbf{h}}\left(n+1\right)\tilde{\mathbf{h}}\left(n\right)\right)}^{T}\left(\tilde{\mathbf{h}}\left(n+1\right)\tilde{\mathbf{h}}\left(n\right)\right)\\ \phantom{\rule{3.2em}{0ex}}+\xi \left(y\left(t\right){\tilde{\mathbf{h}}}^{T}\left(n+1\right)\mathbf{x}\left(t\right)\right),\end{array}
(27)
where ξ is the unknown realvalue Lagrange multiplier [21]. The optimal channel estimator at the (n + 1)th update can be found by letting the first derivative of
Hence, it can be derived as
\frac{\partial C\left(n\right)}{\partial \tilde{\mathbf{h}}\left(n+1\right)}=2\left(\tilde{\mathbf{h}}\left(n+1\right)\tilde{\mathbf{h}}\left(n\right)\right)\xi \mathbf{x}\left(t\right)=0.
(29)
The (n + 1)th optimal channel estimator is given from Equation 29 as
\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+\frac{1}{2}\xi \mathbf{x}\left(t\right).
(30)
By substituting Equation 30 into Equation 26, we obtain
\xi {\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)=2\left(y\left(t\right){\tilde{\mathbf{h}}}^{T}\left(n\right)\mathbf{x}\left(t\right)\right)=2e\left(n\right),
(31)
where e\left(n\right)=y\left(t\right){\tilde{\mathbf{h}}}^{T}\left(n\right)\mathbf{x}\left(t\right) (see Equation 2) and the unknown parameter ξ is given by
\xi =\frac{2e\left(n\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)}.
(32)
By substituting it to Equation 30, the updated equation of NLMS is written as
\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+{\mu}_{1}\frac{e\left(n\right)\mathbf{x}\left(t\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)},
(33)
where μ_{1} is the gradient step size which controls the adaptive convergence speed of NLMS algorithm. Based on the updated Equation 33, for better understanding, NLMSbased sparse adaptive updated equation can be generalized as
\tilde{\mathbf{h}}\left(n+1\right)=\underset{\mathrm{sparse}\phantom{\rule{0.25em}{0ex}}\mathrm{NLMS}}{\underset{\u23df}{\underset{\mathrm{NLMS}}{\underset{\u23df}{\tilde{\mathbf{h}}\left(n\right)+\mathrm{Normalized}\phantom{\rule{0.25em}{0ex}}\mathrm{adaptive}\phantom{\rule{0.25em}{0ex}}\mathrm{error}\phantom{\rule{0.25em}{0ex}}\mathrm{update}}}+\mathrm{Sparse}\phantom{\rule{0.25em}{0ex}}\mathrm{penalty}}},
(34)
where normalized adaptive update term is μ_{1}e(n)x(t)/x^{T}(t)x(t) which replaces the adaptive update μe(n)x(t) in Equation 4. The advantage of NLMSbased adaptive sparse channel estimation it that it can mitigate the scaling interference of training signal due to the fact that NLMSbased methods estimate the sparse channel by normalizing with the power of training signal x(t). To ensure the stability of the NLMSbased algorithms, the necessary condition of step size μ_{1} is derived briefly. The detail derivation can also be found in [21].
Theorem 2
The necessary condition of reliable NLMS adaptive channel estimation is
0<{\mu}_{1}<\frac{E\left\{e\left(n\right){\left(\mathbf{h}\tilde{\mathbf{h}}\left(n\right)\right)}^{T}\mathbf{x}\left(t\right)/{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)\right\}}{E\left\{{e}^{2}\left(n\right)/{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)\right\}}.
(35)
Proof Since the NLMSbased algorithms share the same gradient step size to ensure their stability, for simplicity, studying the NLMS for a general case. The updated equation of NLMS is given by
\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+{\mu}_{1}e\left(n\right)\frac{\mathbf{x}\left(t\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)},
(36)
where μ_{1} denotes the step size of NLMStype algorithms. Denoting the channel estimation error vector as \mathbf{u}\left(n\right)=\mathbf{h}\tilde{\mathbf{h}}\left(n\right), (n + 1)th update error u(n + 1) can be written as
\mathbf{u}\left(n+1\right)=\mathbf{u}\left(n\right){\mu}_{1}e\left(n\right)\frac{\mathbf{x}\left(t\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)}.
(37)
Obviously, the (n + 1)th update MSE E{u^{2}(n + 1)} can also be given by
\begin{array}{l}E\left\{{\mathbf{u}}^{2}\left(n+1\right)\right\}=E\left\{{\mathbf{u}}^{2}\left(n\right)\right\}2{\mu}_{1}E\left\{\frac{e\left(n\right){\mathbf{u}}^{\mathit{T}}\left(n\right)\mathbf{x}\left(t\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)}\right\}\\ \phantom{\rule{7.6em}{0ex}}+{\mu}_{1}^{2}E\left\{\frac{{e}^{2}\left(n\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)}\right\}.\end{array}
(38)
To ensure the stable updating of the NLMStype algorithms, the necessary condition is satisfying
\begin{array}{l}E\left\{{\mathbf{u}}^{2}\left(n+1\right)\right\}E\left\{{\mathbf{u}}^{2}\left(n\right)\right\}=2{\mu}_{1}E\left\{\frac{e\left(n\right){\left(\mathbf{h}\tilde{\mathbf{h}}\left(n\right)\right)}^{T}\mathbf{x}\left(t\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)}\right\}\\ \phantom{\rule{15em}{0ex}}+{\mu}_{1}^{2}E\left\{\frac{{e}^{2}\left(n\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)}\right\}\le 0.\end{array}
(39)
Hence, the necessary condition of reliable adaptive sparse channel estimation is μ_{1} satisfying the theorem in Equation 35.
□
The following are the improved adaptive sparse channel estimation methods:

ZANLMS (proposed). According to the Equation 14, the updated equation of ZANLMS can be written as
\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+{\mu}_{1}e\left(n\right)\frac{\mathbf{x}\left(t\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)}{?}_{\mathrm{ZAN}}\mathrm{sgn}\left(\tilde{\mathbf{h}}\left(n\right)\right).
(40)
where ?_{ZAN}?=?µ_{1}?_{ZAN} and ?_{ZAN} is a regularization parameter for ZANLMS.

RZANLMS (proposed). According to Equation 16, the updated equation of RZANLMS can be written as
\begin{array}{l}\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+{\mu}_{1}e\left(n\right)\frac{\mathbf{x}\left(t\right)}{{\mathbf{x}}^{T}\left(t\right)\mathbf{x}\left(t\right)}\\ \phantom{\rule{5.7em}{0ex}}{?}_{\mathrm{RZAN}}\frac{\mathrm{sgn}\left(\tilde{\mathbf{h}}\left(n\right)\right)}{1+{?}_{\mathrm{RZA}}\left\tilde{\mathbf{h}}\left(n\right)\right}.\end{array}
(41)
where ?_{RZAN}?=?µ_{1}?_{RZA}?_{RZAN} and ?_{RZAN} is a regularization parameter for RZANLMS. The threshold is set as ?_{
RZAN
}?=??_{
RZA
}?=?20 which is also consistent with our previous research in [24–27].

LPNLMS (proposed). According to the LPLMS in Equation 18, the updated equation of LPNLMS can be written as
\begin{array}{l}\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+{\mu}_{1}e\left(n\right)\frac{\mathbf{x}\left(t\right)}{{\mathbf{x}}^{\mathit{T}}\left(t\right)\mathbf{x}\left(t\right)}\\ \phantom{\rule{5.7em}{0ex}}{?}_{\mathrm{LPN}}\frac{{\tilde{\mathbf{h}}\left(n\right)}_{p}^{1p}\mathrm{sgn}\left\{\tilde{\mathbf{h}}\left(n\right)\right\}}{{?}_{\mathrm{LPN}}+{\left\tilde{\mathbf{h}}\left(n\right)\right}^{\left(1p\right)}},\end{array}
(42)
where ?_{LPN}?=?µ_{1}?_{LPN}/10, ?_{L 0N} is a regularization parameter, and ?_{LPN}?>?0 is a threshold parameter.

L0NLMS (proposed). Based on updated the equation of L0LMS algorithm in Equation 24, the updated equation of L0NLMS algorithm can be directly written as
\tilde{\mathbf{h}}\left(n+1\right)=\tilde{\mathbf{h}}\left(n\right)+{\mu}_{1}e\left(n\right)\frac{\mathbf{x}\left(t\right)}{{\mathbf{x}}^{\mathit{T}}\left(t\right)\mathbf{x}\left(t\right)}{?}_{L0N}J\left(\tilde{\mathbf{h}}\left(n\right)\right),
(43)
where ?_{L 0N}?=?µ_{1}?_{L 0N} and ?_{L 0N} is a regularization parameter. The sparse penalty function J\left(\tilde{\mathbf{h}}\left(n\right)\right) has been defined as in (25).
3.3 CramerRao lower bound
To decide the CRLB of the proposed channel estimator, Theorems 3 and 4 are derived as follows.
Theorem 3 For an Nlength channel vector h, if μ satisfies 0 < μ < 2/λ_{
max
}, then MSE lower bound of LMS adaptive channel estimator isB=\mu {P}_{0}N/\left(2\mu {\lambda}_{\mathit{min}}\right)~\mathcal{O}\left(N\right), where P_{
0
}is a parameter which denotes unit power of gradient noise and λ_{
min
}denotes the minimum eigenvalue of R.
Proof Firstly, we define the estimation error at the (n + 1)th iteration v(n + 1) as
\begin{array}{l}\mathbf{v}\left(n+1\right)=\tilde{\mathbf{h}}(n+1)\mathbf{h}\\ \phantom{\rule{4em}{0ex}}=\mathbf{v}\left(n\right)+\mu \mathbf{x}\left(t\right)e\left(n\right)\\ \phantom{\rule{4em}{0ex}}=\mathbf{v}\left(n\right)\frac{1}{2}\mu \mathrm{\Gamma}\left(n\right),\end{array}
(44)
where \mathrm{\Gamma}\left(n\right)=\partial L\left(n\right)/\partial \tilde{\mathbf{h}}\left(n\right)=2\mathbf{x}\left(t\right)e\left(n\right) is a joint gradient error function which includes channel estimation error and noise plus interference error. To derive the lower bound of the channel estimator, two gradient errors should be separated. Hence, assuming Γ(n) can be split in two terms: \mathrm{\Gamma}\left(n\right)=\tilde{\Gamma}\left(n\right)+2\mathbf{w}\left(n\right) where \tilde{\Gamma}\left(n\right)=2\left(\mathbf{R}\tilde{\mathbf{h}}\left(n\right)\mathbf{p}\right) denotes the gradient error and w(n) = [w_{0}(n), w_{1}(n), …, w_{N − 1}(n)]^{T} represents the gradient noise vector [21]. Obviously, E{w(n)} = 0 and
\begin{array}{l}\mathbf{x}\left(t\right)e\left(n\right)=\frac{1}{2}\mathrm{\Gamma}\left(n\right)\\ \phantom{\rule{3.7em}{0ex}}=\mathbf{R}\left(\tilde{\mathbf{h}}\left(n\right)\mathbf{h}\right)\mathbf{w}\left(n\right)\\ \phantom{\rule{3.7em}{0ex}}=(\mathbf{R}\tilde{\mathbf{h}}\left(n\right)\mathbf{p})\mathbf{w}\left(n\right)\\ \phantom{\rule{3.7em}{0ex}}=\mathbf{R}\left(\tilde{\mathbf{h}}\left(n\right)\mathbf{h}\right)\mathbf{w}\left(n\right)\\ \phantom{\rule{3.7em}{0ex}}=\mathbf{Rv}\left(n\right)\mathbf{w}\left(n\right),\end{array}
(45)
where p = Rh. Then, we rewrite v(n + 1) in Equation 44 as
\begin{array}{l}\mathbf{v}\left(n+1\right)=\mathbf{v}\left(n\right)+\mu \mathbf{x}\left(t\right)e\left(n\right)\\ \phantom{\rule{3.7em}{0ex}}=\mathbf{v}\left(n\right)\mu \mathbf{Rv}\left(n\right)\mu \mathbf{w}\left(n\right)\\ \phantom{\rule{3.7em}{0ex}}=\left(\mathbf{I}\mu \mathbf{R}\right)\mathbf{v}\left(n\right)\mu \mathbf{w}\left(n\right)\\ \phantom{\rule{3.7em}{0ex}}=\left(\mathbf{I}\mu \mathbf{QD}{\mathbf{Q}}^{H}\right)\mathbf{v}\left(n\right)\mu \mathbf{w}\left(n\right)\\ \phantom{\rule{3.7em}{0ex}}=\mathbf{Q}\left(\mathbf{I}\mu \mathbf{D}\right){\mathbf{Q}}^{H}\mathbf{v}\left(n\right)\mu \mathbf{w}\left(n\right),\end{array}
(46)
where the covariance matrix can be decomposed as R = QDQ^{H}. Here, Q is an N × N unitary matrix while D = diag{λ_{1}, λ_{2}, …, λ_{
N
}} is an N × N eigenvalue diagonal matrix. We denote \tilde{\mathbf{v}}\left(n\right)={\mathbf{Q}}^{H}\mathbf{v}\left(n\right) and \tilde{\mathbf{w}}\left(n\right)={\mathbf{Q}}^{H}\mathbf{w}\left(n\right) as the rotated vectors, and Equation 46 can be rewritten as
\tilde{\mathbf{v}}\left(n+1\right)=\left(\mathbf{I}\mu \mathbf{D}\right)\tilde{\mathbf{v}}\left(n\right)\mu \tilde{\mathbf{w}}\left(n\right).
(47)
According to Equation 47, the MSE lower bound of LMS can be derived as
\begin{array}{l}B=\underset{n\to \infty}{\text{lim}}E\left\{\tilde{\mathbf{v}}{\left(n+1\right)}_{2}^{2}\right\}\\ \phantom{\rule{0.48em}{0ex}}=\underset{n\to \infty}{\text{lim}}E\left\{{\left[\left(\mathbf{I}\mu \mathbf{D}\right)\tilde{\mathbf{v}}\left(n\right)\mu {\tilde{\mathbf{w}}}^{T}\left(n\right)\right]}^{T}\left[\left(\mathbf{I}\mu \mathbf{D}\right)\tilde{\mathbf{v}}\left(n\right)\mu \tilde{\mathbf{w}}\left(n\right)\right]\right\}\\ =\underset{n\to \infty}{\text{lim}}\underset{a\left(n\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2}E\left\{{\tilde{\mathbf{v}}\left(n\right)}_{2}^{2}\right\}}}\underset{b\left(n\right)}{\underset{\u23df}{2\mu \left(\mathbf{I}\mu \mathbf{D}\right)E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{v}}\left(n\right)\right\}}}\\ \phantom{\rule{1.6em}{0ex}}+\underset{c\left(n\right)}{\underset{\u23df}{{\mu}^{2}E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{w}}\left(n\right)\right\}}}.\end{array}
(48)
Since signal and noise are independent, hence, E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{v}}\left(n\right)\right\}=\dots =E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{v}}\left(n\right)\right\}=0, and Equation 48 can be simplified as
B=\underset{n\to \infty}{\text{lim}}\underset{a\left(n\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2}E\left\{{\tilde{\mathbf{w}},\left(n\right)}_{2}^{2}\right\}}}+\underset{c\left(n\right)}{\underset{\u23df}{{\mu}^{2}E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{w}}\left(n\right)\right\}}}
(49)
For a better understanding, the first term a(n) in Equation 49 can be expanded as
\left\{\begin{array}{c}\hfill \underset{a\left(n\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2}E\left\{{\tilde{\mathbf{v}}\left(n\right)}_{2}^{2}\right\}}}=\underset{a\left(n1\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{4}E\left\{{\tilde{\mathbf{v}}\left(n1\right)}_{2}^{2}\right\}}}+\underset{c\left(n1\right)}{\underset{\u23df}{{\mu}^{2}{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2}E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{w}}\left(n\right)\right\}}}\hfill \\ \hfill \underset{a\left(n1\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{4}E\left\{{\tilde{\mathbf{v}}\left(n1\right)}_{2}^{2}\right\}}}=\underset{a\left(n2\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{6}E\left\{\tilde{\mathbf{v}}{\left(n2\right)}_{2}^{2}\right\}}}+\underset{c\left(n2\right)}{\underset{\u23df}{{\mu}^{2}{\left(\mathbf{I}\mu \mathbf{D}\right)}^{4}E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{w}}\left(n\right)\right\}}}\hfill \\ \hfill \vdots \hfill \\ \hfill \underset{a\left(1\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2\left(n1\right)}E\left\{{\tilde{\mathbf{v}}\left(1\right)}_{2}^{2}\right\}}}=\underset{a\left(0\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2n}E\left\{{\tilde{\mathbf{v}}\left(0\right)}_{2}^{2}\right\}}}+\underset{c\left(0\right)}{\underset{\u23df}{{\mu}^{2}{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2\left(n1\right)}E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{w}}\left(n\right)\right\}}}\hfill \end{array}\right..
(50)
According to Equation 50, Equation 49 can be further rewritten as
\begin{array}{l}B=\underset{n\to \infty}{\text{lim}}\underset{a\left(0\right)}{\underset{\u23df}{{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2n}E\left\{{\tilde{\mathbf{v}}\left(0\right)}_{2}^{2}\right\}}}+\underset{c\left(0\right)+c\left(1\right)+\dots +c\left(n1\right)}{\underset{\u23df}{{\mu}^{2}{\displaystyle \sum}_{i=0}^{n}{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2i}E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{w}}\left(n\right)\right\}}}\\ \phantom{\rule{2em}{0ex}}\ge \underset{n\to \infty}{\text{lim}}{\mu}^{2}{\displaystyle \sum}_{i=0}^{n+1}{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2i}E\left\{{\tilde{\mathbf{w}}}^{T}\left(n\right)\tilde{\mathbf{w}}\left(n\right)\right\},\end{array}
(51)
where the first term {\text{lim}}_{n\to \infty}{\left(\mathbf{I}\mu \mathbf{D}\right)}^{2n}E\left\{{\tilde{\mathbf{v}}\left(n\right)}_{2}^{2}\right\}\to 0 when 1 − μλ_{
i
} < 1. Consider the MSE lower bound of the i th channel taps {b_{
i
}; i = 0, 1, …, N − 1}. We obtain
{b}_{i}=\underset{n\to \infty}{\text{lim}}{\mu}^{2}{\displaystyle \sum}_{i=0}^{N1}{\left(1\mu {\lambda}_{i}\right)}^{2i}E\left\{{\left{\tilde{w}}_{i}\left(n\right)\right}^{2}\right\}=\frac{\mu {P}_{0}}{2\mu {\lambda}_{i}},
(52)
where E\left\{{\left{\tilde{w}}_{i}\left(n\right)\right}^{2}\right\}={\lambda}_{i}{P}_{0} and P_{
0
} denotes the gradient noise power. For any overall channel, since the LMS adaptive channel estimation method does not use the channel sparse structure information, the MSE lower bound should be cumulated from all of the channel taps. Hence, the lower bound B_{LMS} of LMS is given by
\begin{array}{l}B={\displaystyle \sum}_{i=0}^{N1}{b}_{i}={\displaystyle \sum}_{i=0}^{N1}\frac{\mu {P}_{0}}{2\mu {\lambda}_{i}}\ge {\displaystyle \sum}_{i=0}^{N1}\frac{\mu {P}_{0}}{2\mu {\lambda}_{\mathrm{min}}}=\frac{\mathit{\mu N}{P}_{0}}{2\mu {\lambda}_{\mathrm{min}}}~\mathcal{O}\left(N\right),\end{array}
(53)
where N is the channel length of h, {λ_{
i
}; i = 0, 1, …, N − 1} are eigenvalues of the covariance matrix R and λ_{min} is its minimal eigenvalue.
□
Theorem 4 For an Nlength sparse channel vector h which consists of K nonzero taps, if μ satisfies 0 < μ < 2/λ_{
max
}, then the MSE lower bound of the sparse LMS adaptive channel estimator is {B}_{S}=\mu {P}_{0}K/\left(2\mu {\lambda}_{\mathrm{min}}\right)~\mathcal{O}\left(K\right).
Proof From Equation 53, we can easily find that the MSE lower bound of the adaptive sparse channel estimator has a direct relationship with the number of nonzero channel coefficients, i.e., K. Let Ω denote the set of nonzero taps' position, that is, h_{
i
} ≠ 0, for i ϵ Ω and h_{
i
} = 0 for others. We can then obtain the lower bound of the sparse LMS as
\begin{array}{l}{B}_{S}={\displaystyle \sum}_{i=0,i\in \mathrm{\Omega}}^{N1}{b}_{i}={\displaystyle \sum}_{i\in \mathrm{\Omega}}\frac{\mu {P}_{0}}{2\mu {\lambda}_{i}}\ge {\displaystyle \sum}_{i\in \mathrm{\Omega}}\frac{\mu {P}_{0}}{2\mu {\lambda}_{m\mathrm{in}}}\\ \phantom{\rule{1.4em}{0ex}}=\frac{\mathit{\mu K}{P}_{0}}{2\mu {\lambda}_{\mathrm{min}}}~\mathcal{O}\left(K\right).\end{array}
(54)
□