Firstly, we use sorted QR decompose (SQRD) pre-processing: H=SQR instead of the conventional QRD: H=QR to improve the BER performance. In [10], the authors have shown that a low complex SQRD can be designed by using the modified Gram-Schmidt algorithm with pipelining and resource sharing.
The main process of our algorithm is done through N stages as the full K-best does. The following ideas are proposed to reduce the complexity.
3.1 Direct expansion
Firstly, Dn in (3) is rewritten into (4) and (5) as follows.
(4)
(5)
In the first quarter of the constellation (in which IP and QP parts are both non-negative), we divide the IP space into subdomains such as . Each subdomain is associated with a set of best values of . For example, if the modulation is 16-QAM and L = 9, the IP space is divided into [ 0,rnn), [ rnn,2rnn), and [ 2rnn,∞) subdomains. The corresponding three best values of are (1, -1, 3), (1, 3, -1), and (3, 1, -1), respectively (refer to Figure 2a). With QP space and , we do similarly.
The L best child nodes per parent node in stage n (n = N,…,1) are directly specified as follows:
Step 1. Calculate fn that is defined in (4).
Step 2. Determine the IP subdomain that belongs to by comparing with values such as rnn, 2rnn, …, . From that, the best values of will be known. If , the signs of are reversed. Then, we calculate the corresponding DIn in (5). The and DQn are found similarly (refer to Figure 2a).
Step 3. From best values of , DIn, and , DQn, we compute L best values of xn and Dn in (5). Let call in and qn as the index numbers of the best values of DIn and DQn, which are already in ascending order. The combination of the sum Dn = DIn+DQn is arranged so that the sum increases. Consequently, the results of Dn are approximately in ascending order without sorting (refer to Figure 2b).
To expand L best child nodes from a parent node, the previous works such as [5] firstly finds the center node by rounding the result xc=fn/rnn. It then seeks for L nearest nodes to the center node. The divider is thus required. By comparing as step 2, the proposed algorithm can eliminate the divider fn/rnn. Furthermore, by using (5), L values of Dn are obtained from ceil values of DIn and DQn. The complexity of computing Euclidean distance Dn is reduced.
3.2 Parent node grouping
It is important to know how much should the number of child nodes per parent node (L) be. If L is too large, BER performance is improved. However, the decoder’s complexity is also increased. If L is too small, the BER performance may be too small to fulfill the system requirement.
Notice that once the L best child nodes are directly specified as mentioned in Section 3.1, if L>K, there is no probability that one of the last L-K child nodes of any parent will become the final selection. Thus, selecting L ≤ K is a way to reduce the complexity without trade-off of the performance.
In another aspect, assume that k and c are the index number of the K parent nodes (PEDn+1) and of the L child nodes (Dn) per parent node in stage n, respectively. Because values of PEDn+1 are already sorted in stage n+1, the parent node that has high index k will have a large value of PEDn+1. Thus, its child nodes are expected to have low probability to be selected as one of the K smallest (best) nodes for the next stage. To prove this analysis, we did the simulation and computed the probability (in %) in which a child node might become one of the K best nodes. The result is shown in Figure 3. From this figure, it can be seen that the larger the index k is, the smaller the number of child nodes may be selected.
Based on that fact, we propose a parent node grouping method as follows: The K parent nodes are divided into G groups. Each group has A=K/G parent nodes. Note that K and G should be selected so that K is dividable to G (i.e., mod(K,G) = 0). Group 1 contains the best parent nodes, while group G contains the worst parent nodes. Each parent node of the g th (g 1,2,…,G) group is expanded by Lg child nodes so that LG < ⋯ < L1 ≤ K.
3.3 Two-dimensional sorter
Sorting is the major bottleneck of the K-best decoder because of its high complexity. Theoretically, the sorting of n elements requires (n2-n)/2 comparators.
In this subsection, we propose a two-dimensional (2D) sorter which has low complexity, is suitable for hardware resource sharing, and produces approximate result. The 2D sorter for sorting child nodes is described as follows: we put the C child nodes into an A×B matrix, in which . The j th row of the matrix contains all the child nodes of the j th parent of all groups. The illustration in the case G=3 is shown in Figure 4. The matrix operates through two processes called as row sorting and column sorting, one after the other, as follows:
After completing the row and column sorting, the K top-left elements of the sorted matrix are expected to be the best (smallest) values and are selected. A simulation is needed in advance to correctly determine the position of the best candidates.To verify the correctness of the 2D sorter, we did the simulation and measured the probability (in %) in which an element of the sorted matrix might become one of the actual K = 7, K = 14, and K = 21 best nodes. The results are shown in Figure 5. From these results, positions of the 1st to the 7th (yellow color), 8th to the 14th (green color), and 15th to the 21st (blue color) best nodes are one by one determined. The figure also shows that the obtained results (in %) are slightly affected by channel type. However, the influence is too small so that the position of the best nodes is not affected by channel type.
The 2D sorter is suitable for hardware resource sharing because all the rows (columns) do the same task. A circuit which sorts B elements of the 1st row in the 1st cycle can be reused to sort the 2nd, …, A th rows in the 2nd, …, A th cycles.