ParEst: joint estimation of the OFDM channel state information in MIMO systems

OFDM channel state information (CSI) is needed for determining key parameters in MIMO transmissions. In this paper, a novel CSI estimation method, ParEst, is proposed, which estimates the CSI from multiple transmitting antennas simultaneously. ParEst is based on a recent discovery that the CSI can be approximated very well by the linear combination of complex-based sinusoids on constant frequencies. ParEst finds the CSI of all antennas jointly by solving an optimization problem and achieves higher accuracy than existing heuristic methods. As the base sinusoids are on constant frequencies, ParEst pre-computes most key steps and reduces the run-time computation complexity to just a modest number of vector dot products equivalent to a few FFT calculations. ParEst can be applied to MIMO links in networks such as LTE, 5G, or Wi-Fi.


Introduction
OFDM channel state information (CSI) describes the wireless channel and is often needed for determining key parameters in wireless transmissions, such as the precoding matrix in MIMO or MU-MIMO. CSI for any transmitting and receiving antenna pair is a complex vector, which can be measured directly when the number of transmitting antenna is one. However, with more antennas in wireless communication systems, to reduce the overhead, it is preferred to allow multiple antennas to transmit CSI training symbols simultaneously. For example, in LTE and 5G, a user equipment (UE) may transmit the De-Modulation Reference Signal (DMRS) on each antenna simultaneously, and the burden lies with the receiver to find the CSI of each individual antenna from the composite signal.
In this paper, a novel CSI estimation method, ParEst, is proposed, where ParEst stands for parallel estimation. To be more specific, with ParEst, multiple antennas may transmit CSI estimation symbols on the same set of OFDM subcarriers simultaneously. ParEst takes the received vector and solves an optimization problem to find the CSI. The optimization is based on a recent discovery that the CSI can be approximated very well by a set of base sinusoids on constant frequencies [1]. As a result, most steps can be precomputed, and the run-time computation complexity is just a modest number of vector dot products equivalent to a few FFT calculations. Evaluation shows that ParEst achieves much higher accuracy than existing methods. Zhang

EURASIP Journal on Wireless Communications and Networking
(2020) 2020:235 Page 2 of 17 The following is a list of notations used in this paper: • " ": the element-wise multiplication of two vectors.
• " * ": the conjugate of a complex number, vector, or matrix.
• "{}": representing all elements belong to a particular set.
The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 describes the system model. Section 4 explains ParEst. Section 5 discusses some mathematical properties of the orthogonal bases. Section 6 gives the evaluation results. Section 7 concludes the paper.

Related work
CSI estimation is a classical problem in wireless communications. CSI estimation for single antenna systems has been studied in [2,3]. ParEst focuses on MIMO systems, where one of the main challenges is to reduce the system overhead by allowing simultaneous CSI estimation of multiple transmitting antennas, referred to as a Code Division Multiplexing (CDM) group. CDM group has been supported by some earlier work, such as [4], and has been adopted in LTE [5] and 5G NR [6].
One of the existing solutions in the literature can be referred to as Cutoff [7][8][9]. This is because the antennas transmit orthogonal sequences, which, after processing, result in peaks at different locations. The signal around each peak can be carved out to approximate the complete signal from each antenna, which is then used in the reconstruction of the CSI. Clearly, the Cutoff method will suffer low accuracy, especially when the number of antennas is large, because the neighborhood of the peak is only part of the actual signal. ParEst is different from Cutoff, and achieves higher accuracy, because it uses all the observed data to estimate the CSI of any antenna.
Another family solutions are built on the assumption that the channels of neighboring subcarriers are similar, and are aided with further optimizations, such as smoothing [10,11] or windowing [12]. ParEst has been compared with one of the representative solutions, referred to as Smooth [10], and has shown better performance, especially for 8 by 8 MIMO systems. This is because Smooth depends on the high similarity of A neighboring subcarriers where A is the number of antennas, which is less likely to be true when A is large, while ParEst does not depend on such similarities. CSI estimation and compression for massive MIMO in cellular systems have attracted much attention [13][14][15][16][17][18][19][20]. The key difference between ParEst and such work is that ParEst is based on the recent discovery of constant frequency sinusoid approximation of the CSI, which enables dramatic reduction of run-time computation complexity. CSI-related issues in high-speed scenarios have been investigated in [21][22][23], while in this paper, the mobility is assumed to be at a low or modest level.
Recently, there has been increasing interest in reducing the CSI learning overhead by using the uplink CSI to estimate the downlink CSI based on channel reciprocity [24][25][26][27][28]. The main idea is to extract the information of the propagation path from the uplink CSI and then use it for calculating the downlink CSI. The fundamental difference between ParEst and such work is that ParEst is a method to estimate the CSI of multiple transmitting antennas from a single vector, and while doing so, ParEst does not attempt to estimate the path information, because it is not needed.
In OFDM, a propagation path eventually results in a sinusoid in the CSI [1,29,30]. The constant frequency sinusoid approximation, i.e., the CSI can be approximated as the linear combinations of a small number of base sinusoids on constant frequencies, has been observed and used for CSI compression [1,31,32]. ParEst is based on the same observation, however is designed for CSI estimation.
Finally, this paper is a significantly improved version of an earlier, preliminary version [33].

System model
ParEst can be applied to both cellular or Wi-Fi networks. In this paper, the focus is to estimate the CSI of one node with multiple transmitting antennas. The node transmits CSI estimation symbols on all antennas to allow the receiver to estimate the CSI of each antenna. The number of antennas of the node is denoted as A. The number of subcarriers is denoted as N. The subcarriers are assumed to be consecutive. The node transmits a particular sequence of length N on the assigned subcarriers on each antenna, where the sequence is a complex vector. The sequence for antenna a is denoted as S a . The sequences are orthogonal and have constant amplitude, such as in LTE and 5G, which improves the estimation accuracy.
On the receiver side, as the same method can be applied for each receiving antenna, in this paper, the focus is on a single receiving antenna. The complex vector observed on the N subcarriers is denoted as R. Let C a be the CSI vector of transmitting antenna a. Note that where element h of a vector is denoted as the name of the vector with an additional subscript h, and is the white Gaussian noise vector. The CSI is approximated as the linear combination of a set of base sinusoids. To be more specific, suppose there are K base sinusoids, where sinusoid k is denoted as B k and is on frequency f k . As the signals from the antennas of the same node are supposed to go through similar propagation environments, the same set of base sinusoids are used for all antennas. According to the approximation proposed in [1]: where α a,k is the coefficient of sinusoid k for the CSI of antenna a. Combining Eqs. 1 and 2, R h can be approximated as: Based on R and Eq. 3, the values of α a,k for all a and k can be calculated, which can then be used to find the CSI based on Eq. 2.

ParEst
In this section, ParEst is explained in details, starting with the overview.

Overview
ParEst is a least squares estimation (LSE) of the CSI according to Eq. 3. As the base sinusoids are on constant frequencies, many steps can be pre-computed, reducing the run-time computation complexity to a minimum. To be more specific, during the precomputation, the base sinusoids are converted into a set of orthogonal bases. During run-time, the coefficients of the orthogonal bases are found with simple vector dot product computations between R and the bases. To determine a good set of bases for the wireless channel, ParEst performs a simple linear search, because each additional base incrementally and independently improves the approximation. To be more specific, ParEst gradually adds more bases to approximate R, until the approximation is believed to be acceptable. As the bases are orthogonal, the computation in each iteration involves only the newly added base.

Mathematical foundation of ParEst
As the base sinusoids are fixed, { a,k } a,k can be pre-computed, and the run-time computation reduces to the dot products of vectors. To find the CSI vectors, {α a,k } a,k , which are the coefficients of the base sinusoids, should be found based on {β a,k } a,k . Let α be {α a,k } a,k organized as a single column vector, where α a,k is the element aK + k in α. Let β be {β a,k } a,k organized as a single column vector in the same manner. Let be the matrix with {Z a,k } a,k as the column vectors, where Z a,k is column aK + k in . Let be the matrix with { a,k } a,k as the column vectors in the same manner. Clearly, Therefore, where denotes the conjugate transpose of and I denotes the identity matrix. Therefore, With α, the CSI can be found according to Eq. 2. Note that −1 can be pre-computed because both and are constant matrices.

The ParEst algorithm-a linear search
ParEst is a simple linear algorithm based on the mathematical foundation discussed in Section 4.2. Note that, if the set of base sinusoids are given, the computation steps are completely determined according to Section 4.2. In practice, however, a key problem is to determine the best set of base sinusoids to match any particular channel. Channels with larger delay spread need more base sinusoids in a larger frequency range than those with smaller delay spread. Too few base sinusoids will lead to poor approximation. Too many base sinusoids will lead to over fitting, i.e., matching not the signal but the noise. ParEst pre-computes { a,k } a,k for a certain maximum number bases of sinusoids, where the base sinusoids are on evenly spaced frequencies, starting with 0 with a step denoted as δ. Given R, ParEst enters a simple loop as shown in Fig. 1. In iteration k, ParEst computes β 0,k to β A−1,k according to Eq 4. The fit residual, denoted as ξ , which is initially R, is then incrementally updated as ParEst exits the loop when the power of ξ is less than the estimated noise power. After ParEst exits the loop, it uses Eq. 7 to find α and then the CSI vectors. Note that this requires [ ] −1 to be pre-computed and stored for every k, which is still a good tradeoff between run-time complexity and storage. Note that, in each iteration, the computation is mainly just AN complex multiplications to get β 0,k to β A−1,k , and AN complex multiplications to update ξ .
The search basically finds the fewest number of bases with the fit residual power close to the expected noise power. This is because a good fit should be very close to the actual signal, and the residual should largely be noise. Further increasing the number of bases will only lead to larger errors, because the additional bases will be forced into linear combinations to best match the residual, which is mostly noise.

The complexity of ParEst
ParEst has a very low computation complexity. Suppose the search takes K iterations. ParEst uses only 3ANK + A 2 K 2 complex multiplications. The following is the breakdown: • ANK are used in the loop for calculating {β a,k } a,k ; • ANK are used in the loop for updating ξ ; • A 2 K 2 are used in computing α; • ANK are used in computing the CSI vector from α. Zhang

EURASIP Journal on Wireless Communications and Networking
(2020) 2020:235 Page 6 of 17 Note that computing the power of ξ can be achieved by a table look up on each element and therefore does not require multiplication. Also, note that N ≥ AK, because the number of observations must be no less than the number of variables. Therefore, overall, the number of multiplications can be further bounded from the above by 4ANK. As K is typically much less than N, the complexity of ParEst is similar to a few FFT calculations on vectors of length N.

Analysis of the approximation error
The approximation error refers to the deviation from estimation to the actual CSI, which, for antenna a, is approximately recalling that S a has a constant amplitude. Note that Letβ and γ a,k = · * a,k , which are the parts of β a,k for the approximation of the actual CSI, and that for the approximation of the noise, respectively. Note thatβ a,k is independent of the noise. Also, K−1 k=0β a,k a,k − S a C a decays exponentially with the increase of the number of bases [1]. Therefore, the deviation is mainly K−1 k=0 γ a,k a,k . As noise is random, the exact deviation is not known. However, certain statistical properties can still be obtained, under the assumption that the noise is white Gaussian with 0 mean and variance σ 2 .
As * a,k a,k is a set of orthogonal bases, γ a,k a,k is a set of independent Gaussian random variables with 0 mean and variance σ 2 . Let γ be γ a,k a,k organized as a column vector, where γ a,k is element aK + k in γ . Let Denote element (r, c) in [ ] −1 as χ r,c . Clearly, every element in ζ is also Gaussian with 0 mean. The covariance of ζ a,k and ζ a,q is The noise fit at subcarrier h for antenna a, denoted as a,h , is where f k represents the frequency of base sinusoid k. a,h is also a Gaussian random variable with 0 mean. The variance is which is the power of the noise fit at subcarrier h. Figure 2 shows the theoretical and simulated noise fit power at each subcarrier, when the noise power σ 2 = 1, the channel is ETU, and the number of antennas is 2. It can be seen that the theoretical result matches with the simulation. Also, for most of the subcarriers, the noise fit power is much less than 1, which is because the base sinusoids are designed to approximate sinusoids only in a certain frequency range, and cannot follow exactly the noise curve, which is white. Therefore, in effect, ParEst filters out most noise and matches better with the actual CSI. Lastly, the noise fit values at the beginning and the end of the CSI are much larger than the rest, which matches with the observations in practice.

ParEst in practice
In practice, δ, the base sinusoid frequency spacing, is determined empirically, because it depends on many factors, including the typical delay spread of the channels and the number of subcarriers. In the current design, for example, δ is 0.12, 0.07, 0.039 for 36, 72, and 144 subcarriers, respectively. The values are chosen to achieve good approximations even for very challenging wireless channels, e.g., the LTE ETU channel [34,35]. Note that δ needs to be selected just once, not in the run-time.
As the number of assigned RBs may vary, one option is to make pre-computations for every possible number of RBs. Another option is to make pre-computations for up to a certain number of RBs, such as 12. In case the number of assigned RBs is more than 12, the RBs can be divided into segments with 12 or less RBs, which are then estimated separately.
As mentioned earlier, during the linear search, ParEst repeatedly adds more bases to approximate R. The maximum number of bases to attempt is a system parameter. In practice, during the pre-computation of a,k a,k from Z a,k a,k , the process stops at K, if the linear combination of a,k a,k for k = 0 to K − 1 can approximate Z a,K with small error, i.e., less than 1% of the power of Z a,K , because the addition of Z a,K will make little contribution beyond this point.
Lastly, to better accommodate the diversity of the wireless channels, ParEst actually tries first δ/2 and then δ as the base sinusoid frequency spacing. This is because certain channels, like the LTE EPA channel [34,35], have very small delay spread, and δ/2 will lead to better approximations. This at most doubles the run-time computation complexity, which is still low. In practice, with δ/2, the maximum number of bases is much smaller; therefore, the actual increase of complexity is very small.

Mathematical properties of the bases
In this section, some mathematical properties of the bases are given, which reveals some interesting insights of the bases, such as a recursive relation.
As mentioned earlier, the base sinusoids are on evenly spaced frequencies starting with 0 at a step of δ. Consider the special case with one transmitting antenna. Let base sinusoid k be B k . Suppose B K is to be approximated as the linear combinations of B 0 to B K−1 to minimize the squared error. Let ϒ K be the residual, i.e., the difference between B K and the approximation.
Proof First, note that if B 1 to B K are used to approximate B K+1 , the residual would be ϒ K B 1 , because by factoring out B 1 , the approximation would have been exactly the same as using B 0 to B K−1 to approximate B K . Second, consider using B K to B 1 to approximate B 0 . By factoring out B K , it would be exactly the same as using B * 0 to B * K−1 to approximate B * K , the residual of which is ϒ * K . Lastly, note that ϒ K+1 should be K minus its projection on K .
Theorem 2 ϒ K is symmetrical: and for h ∈[ 0, N − 1], where || and stand for the amplitude and phase of a complex number, respectively.
Proof Based on induction. It can be verified that the theorem is true for K = 1. Assume it is true up to a particular K.
The first claim is that To see this, note that, based on the induction hypothesis, the phases of K and K are also symmetrical, i.e., for any h, Therefore, For either possible values of (α K ), As a result, the phase of N−1 h=0 K,h * K,h is one of the possible values of (α K ).
The theorems can help explaining some of the properties of the bases, along with certain observations made in practice. For example, it appears that for small δ, |α K | is very close to 1 and (α K ) = (K+1)(N−1)δ 2 − Kπ. It can be argued that |ϒ K,0 | decays exponentially with K, and Note that this is clearly true when K = 1. For larger K, The phase difference of the two terms in the above equation is Therefore, and

Evaluation
ParEst has been tested with real-world experiments on USRP, as well as with simulations.

Proof-of-concept experiments with USRP
ParEst has been tested with USRP B210 [36,37] in real-world experiments. The devices in the experiments are shown in Fig. 3. A total of 10 experiments were conducted in a university building, the locations of some are shown in Fig. 4.

Experiment setup
As shown in Fig. 3, the sender, which is on the left, has 2 antennas. The sender transmitted on each antenna the PUSCH DMRS signal according to the LTE specifications [5].

Fig. 3 The USRP B210 used in the experiments
The baseband DMRS signal was generated with the OpenLTE implementation at [38].
The receiver, which is on the right, has one antenna and simply recorded the received baseband samples to be processed by ParEst. In the experiments, the carrier frequency was 915 MHz. The sample rate was 2 M samples per second. The sender used 36 resource blocks (RB) with a total of 432 subcarriers. As the FFT size for the baseband signal was 2048, the link occupied 0.42 MHz of bandwidth. The received time-domain samples in one typical experiment is shown in Fig. 5, which contains 3 OFDM symbols. The first symbol is the actual CSI estimation symbol, i.e, the sender transmitted DMRS signals on both antennas. The second and the third symbols were transmitted by antenna 1 and antenna 2 individually, from which the CSI can be measured directly for comparison.

Experiment results
The result of a typical experiment is shown in Fig. 6. In the figure, curves in different colors represent the amplitude of the CSI from different antennas. For the curves in the same color, the dashed curve is the estimation by ParEst. It can be seen that the estimation is fairly accurate and closely follows the measurement. Figure 7 shows the normalized approximation error, which is defined as the ratio of the approximation error power over the noise power, of all 10 experiments. The normalized approximation error is used as the metric, because the CSI measurement is noisy. However, if the CSI estimation is accurate, the difference between the estimated CSI and the measured noisy CSI should mostly be noise, and therefore, should have similar power as the noise. It can be seen that the approximation error in most cases are close to the noise. Some of the larger differences, such as in experiment 6, was due to interference. The experiments confirm the practicality of ParEst, because the result suggests that ParEst can estimate the real-world CSI with CSI estimation symbols generated according to the existing standards.

Discussions
Although the experimental evaluation is valuable, additional evaluations are needed, because of the limitations of the experiments. First, as the sample rate of the USRP is limited, the bandwidth of the experimental link was small. Second, even though the second and third symbols were transmitted to measure the actual CSI, the measurement was noisy, as shown in Fig. 6. Note that without the clean CSI, a quantitative comparison with other CSI estimation methods cannot be made, because the difference between the estimated CSI and the measured CSI, which is used as the ground truth, will be dominated by noise. Third, as the transmission range was not large, the wireless channels were typically flat, as those shown in Fig. 6. Therefore, ParEst is also evaluated by simulation, as discussed in the following.

Simulation evaluation
The simulation overcomes some of the limitations in the experiments and is discussed in the following.

Simulation setup
In the simulation, the same exact PUSCH DMRS signal in LTE used in the experiments was passed to the LTE channel model. The output of the model is the baseband signal to be processed by ParEst and the compared methods. The LTE EPA and ETU channel models were used, which represent channels with small and large delay spread, respectively [34,35]. White Gaussian noise was added to the signal. The SNR is defined as the signal power in the received vector, R, over the noise power in R. Note that the clean CSI is known in the simulation. The MIMO systems were 4 by 4 or 8 by 8. The number of RBs was 12.

Compared methods
One of the compared methods is Cutoff [7], with which R is first converted to another vector, referred to as the peak vector, in which signals from different antennas appear as peaks at evenly spaced locations. For each antenna, the points around the corresponding peak are taken and used to approximate the complete peak vector of this antenna, which is then used in the conversion back to the CSI. Another compared method is referred to as Smooth [10], which first assumes that the channel coefficients of adjacent subcarriers are the same, then further improves the performance by smoothing the transitions, i.e., taking a weighted average of the neighboring subcarriers.

Performance metrics
One performance metric is denoted as F10, which is the fraction of estimated CSI that deviates from the clean CSI by 10%. Note that CSI estimation error of over 10% will likely prohibit the use or higher modulation orders like 256 QAM even when the signal is strong, because the constellation points are too close. Another metric is the bit error ratio (BER) of data transmissions according to the estimated CSI. To be more specific, based on the estimated CSI matrix of each subcarrier, the closed-loop MIMO with singularvalue-decomposition (SVD) was simulated. The number of layers was half of the antenna number. The power allocation was standard water-filling. The modulation order of each layer was selected according to the SNR with the same heuristic applied to all compared methods. The BER is the error ratio of the hard decisions. Figures 8,9,10,and 11 show the performance of ParEst, Cutoff, and Smooth. The main observations include the following:

Performance comparison
• Measusred by both F10 and BER, ParEst has very large gains over both Cutoff and Smooth for both EPA and ETU channels. • The gain of ParEst for ETU channel is smaller than the EPA channel, because ETU has larger delay spread and is more difficult to approximate.
• The performance of ParEst consistently improves as SNR increases. On the contrary, the performance of Cutoff and Smooth seems to be stagnant even with higher SNR, especially for 8 antennas. This is because of the systematic errors of these methods cannot be reduced with higher SNR. Note that Cutoff approximates the entire peak vector with only points around the peak, while Smooth assumes that neighboring subcarriers have the same channel. Figure 12 shows the run-time computation complexity of ParEst as a function of SNR, measured by the number of iterations in the linear search. Note that the upper bound of the complexity is linear to the number of iterations according to the analysis in Section 4.4. The observations are in the following:

Computation complexity of ParEst
• The computation complexity of ParEst is indeed very low. For example, even for the more challenging case under the ETU channel, the number of iterations, at most, is just slightly below 35. Such a number of iterations should put no burden on the implementation in practice, considering that the computation in each iteration is very simple. • The computation complexity is lower for simpler channels. This is because simpler channels have smaller delay spread and need less sinusoids to approximate. It also shows that ParEst is able to automatically select parameters to match the channel condition. • The number of iteration increases as the SNR increases. This is because as the SNR increases, more fine details in the CSI curves are revealed, which require more base sinusoids to match.

Conclusions
In this paper, a novel CSI estimation method, ParEst, is proposed. ParEst is designed for senders with multiple antennas and allows the sender to transmit CSI training symbols simultaneously on all antennas. The receiver solves an optimization problem based on the received composite signal and finds the CSI of each individual antenna. The run-time complexity of ParEst is very low, because most of the steps are pre-computed, based on the fact that the CSI can be approximated very well by sinusoids on constant frequencies. ParEst has been experimentally tested and has been shown to be capable of estimating the CSI in the real-world. ParEst has also been compared with the existing methods with simulations and has demonstrated improvements of over an order of magnitude in many cases. Therefore, ParEst can be a strong candidate as the CSI estimation method for networks such as LTE, 5G, and Wi-Fi.