- Research
- Open Access

# Low-complexity high-throughput decoding architecture for convolutional codes

- Ran Xu
^{1}Email author, - Kevin Morris
^{1}, - Graeme Woodward
^{2}and - Taskin Kocak
^{3}

**2012**:151

https://doi.org/10.1186/1687-1499-2012-151

© Xu et al; licensee Springer. 2012

**Received: **23 August 2011

**Accepted: **23 April 2012

**Published: **23 April 2012

## Abstract

Sequential decoding can achieve a very low computational complexity and short decoding delay when the signal-to-noise ratio (SNR) is relatively high. In this article, a low-complexity high-throughput decoding architecture based on a sequential decoding algorithm is proposed for convolutional codes. Parallel Fano decoders are scheduled to the codewords in parallel input buffers according to buffer occupancy, so that the processing capabilities of the Fano decoders can be fully utilized, resulting in high decoding throughput. A discrete time Markov chain (DTMC) model is proposed to analyze the decoding architecture. The relationship between the input data rate, the clock speed of the decoder and the input buffer size can be easily established via the DTMC model. Different scheduling schemes and decoding modes are proposed and compared. The novel high-throughput decoding architecture is shown to incur 3-10% of the computational complexity of Viterbi decoding at a relatively high SNR.

## Keywords

- architecture
- convolutional code
- Fano algorithm
- high-throughput decoding
- scheduling
- sequential decoding
- WirelessHD

## 1 Introduction

The 57-64 GHz unlicensed bandwidth around 60 GHz can accommodate multi-gigabits per second (multi-Gbps) wireless transmission in a short range. There are several standards for 60 GHz systems, such as WirelessHD [1] and IEEE 802.15.3c [2, 3]. In both WirelessHD and the AV PHY mode in IEEE 802.15.3c, a concatenated FEC scheme is used with a RS code as the outer code and a convolutional code as the inner code. In order to achieve the target decoding throughput at multi-Gbps, parallel convolutional encoding has been adopted by the transmitter baseband design in both standards. It is straightforward to use parallel Viterbi decoding in the receiver baseband. However, it has been shown in [4, 5] that parallel Viterbi decoders in the receiver baseband result in massive hardware complexity and power consumption. The problem will become more severe if a higher decoding throughput is targeted (i.e., 10 Gbps) for a battery powered user terminal in the future [6]. Hence it is desirable to find a low-complexity high-throughput decoding method for convolutional codes in such systems.

The Viterbi algorithm (VA) achieves maximum likelihood decoding for convolutional codes [7]. The VA is a breadth-first, exhaustive search approach based on the trellis diagram. Sequential decoding is another method of convolutional decoding and is a depth-first, non-exhaustive searching approach based on the tree diagram. It only explores partial paths locally in the code tree, so it has sub-optimal decoding performance and its computational complexity varies with SNR. There are two main types of sequential decoding algorithms which are known as the Stack algorithm [8] and the Fano algorithm [9, 10]. Because the Fano algorithm has low storage and sorting requirements, it can achieve higher decoding throughput compared to the Stack algorithm. Only the Fano algorithm is considered in this article. Sequential decoding is not widely used in real systems due to the excessive computations and long decoding delay when the SNR is low. However, if a relatively high SNR can be achieved (e.g., for a very short range and/or via beamforming) or required for some applications (e.g., HD video streaming), sequential decoding will on average incur a very low computational complexity and short decoding delay, which results in a high decoding throughput.

In this article, a novel low-complexity high-throughput decoding architecture based on parallel Fano algorithm decoding with scheduling is proposed. Different scheduling schemes and decoding modes are investigated. A discrete time Markov chain (DTMC) is introduced to model the proposed architecture to establish the relationship between input data rate, input buffer size, and clock speed of the decoders. The trade-offs between error rate, computational complexity, scheduling schemes and decoding modes are studied. It will be shown that the high-throughput decoding architecture can achieve a much lower computational complexity compared to the Viterbi decoding with a similar error rate performance. The rest of the article is organized as follows. First, the unidirectional Fano algorithm (UFA) and bidirectional Fano algorithm (BFA) are reviewed in Section 2. The novel parallel Fano decoding with scheduling architecture is proposed in Section 3. Different scheduling schemes and decoding modes are also proposed in this section. The DTMC based modeling is applied to the decoding architecture in Section 4. Simulation results are given in Section 5, and the conclusions are drawn in Section 6.

## 2 Unidirectional Fano algorithm and bidirectional Fano algorithm

In the conventional unidirectional Fano algorithm, the decoder starts decoding from the initial state zero (or origin node). During each iteration of the algorithm, the decoder may move forward (increase depth within the tree), move backward (reduce depth), or stay at the current tree depth. The decision is made based on the comparison between the threshold value and the path metric. If a forward movement is made, the threshold value needs to be tightened. If the decoder cannot move forward or backward, the threshold value needs to be loosened. A detailed flowchart of the UFA can be found in [11].

## 3 Parallel Fano decoding with scheduling

### 3.1 Architecture

*R*= 1/3 and the constraint length is

*K*= 7. For each input bit, there are three coded output bits (

*X*,

*Y*, and

*Z*). The generator polynomials are

*g*

_{0}= {133}

_{8},

*g*

_{1}= {171}

_{8}, and

*g*

_{2}= {165}

_{8}. This convolutional code is used throughout the article to target the WirelessHD specification and the IEEE 802.15.3c AV PHY mode, though it should be noticed that sequential decoding can also be used to decode very long constraint length convolutional codes which may be infeasible for the Viterbi algorithm to decode.

^{a}for the WirelessHD and IEEE 802.15.3c standards is shown in Figure 4. The building blocks operate in reverse compared to the corresponding building blocks at the Tx. There are eight parallel convolutional decoders, and the VA can be implemented in each of them. However, it is one of the most power and hardware intensive blocks in the Rx baseband. The system operates in indoor and short range environments, so it is possible that there is a line-of-sight (LOS) path between the Tx and the Rx which enables a relatively high SNR at the Rx. Even if the LOS component is not available, the adaptive antenna beamforming technique can still guarantee a relatively high SNR at the Rx. Additionally, the Tx and the Rx are quasi-static, which means the SNR is roughly constant. All these facts make sequential decoding algorithm an attractive approach for high-throughput convolutional decoding.

*N*parallel Fano decoders each with a finite input buffer accommo-dating up to

*B*codewords. The supported input data rate of each buffer is assumed to be

*R*

_{ d }information bits per second. The total supported data rate or average decoding throughput will be

*N*·

*R*

_{ d }. This parallel Fano decoding system can be treated as a parallel queuing system, in which the parallel input buffers are the queues and the parallel Fano decoders are the servers. Due to the variable computational efforts of the Fano decoders, the input buffer occupancies (

*Q*

_{1},...,

*Q*

_{ N }) vary from each other as shown in Figure 5. If the Fano decoders can be scheduled to decode the codewords in different input buffers, the utilization of the Fano decoders can be increased, resulting in a higher decoding throughput. For example, if a Fano decoder ${\mathcal{F}}_{m}$ finishes decoding one codeword and its input buffer occupancy is lower than that of another input buffer, i.e., ${\mathcal{B}}_{n}$, it is possible to schedule the decoder ${\mathcal{F}}_{m}$ to help decoding another codeword in the input buffer ${\mathcal{B}}_{n}$, thus to reduce

*Q*

_{ n }to avoid potential buffer overflow or frame erasure. In order to realize this, a scheduler is introduced which can allocate the Fano decoders to the input buffers dynamically as shown in Figure 5. Each Fano decoder also needs to connect to all the input and output buffers. The scheduler is invoked when a decoder finishes decoding one codeword. It then allocates the decoder to an input buffer according to some scheduling scheme. The allocation of the decoders to the input buffers can be achieved by changing the connectivities between the input buffers and the decoders and those between the decoders and the output buffers.

*B*- 1. When a decoder ${\mathcal{F}}_{m}$ finishes decoding the codeword in its buffer, the buffer is cleared and updated with a new codeword from a long input buffer according to some scheduling scheme. For example, as shown in Figure 6, when the decoder ${\mathcal{F}}_{2}$ finishes decoding the codeword in its buffer, the scheduler selects the long input buffer ${\mathcal{B}}_{N}$ according to some scheduling scheme. If its occupancy is greater or equal to one codeword length

*L*

_{ f }, i.e.,

*Q*

_{ N }≥

*L*

_{ f }, the buffer of ${\mathcal{F}}_{2}$ is updated with a new codeword from ${\mathcal{B}}_{N}$ and the occupancy of ${\mathcal{B}}_{N}$ is reduced

*Q*

_{ N }=

*Q*

_{ N }-

*L*

_{ f }; otherwise if

*Q*

_{ N }<

*L*

_{ f }, a "virtual link" is setup between ${\mathcal{B}}_{N}$ and ${\mathcal{F}}_{2}$ until

*Q*

_{ N }≥

*L*

_{ f }. The difference between Figures 5 and 6 is that the parallel long input buffers are not necessarily attached to the Fano decoders in the equivalent architecture, which makes the understanding of the system much easier.

When an input buffer ${\mathcal{B}}_{n}$ is about to overflow, the scheduler compares the computational efforts of all the decoders and erases the codeword of the decoder ${\mathcal{F}}_{m}$ if it has consumed the highest computational effort among all the decoders. After the codeword of the decoder ${\mathcal{F}}_{m}$ is erased, one codeword in the input buffer ${\mathcal{B}}_{n}$ is scheduled to the decoder ${\mathcal{F}}_{m}$ and the occupancy of the input buffer ${\mathcal{B}}_{n}$ is reduced *Q*_{
n
}= *Q*_{
n
}- *L*_{
f
}.

The number of decoders *M* is assumed to be the same as the number of input buffers *N* in Figures 5 and 6 (i.e., *M* = *N*) for ease of illustration. However, it will be shown in Section 5 that a higher number of decoders may be required to achieve a target decoding throughput (i.e., *M* > *N*).

### 3.2 Scheduling schemes

When a decoder finishes decoding a codeword, the scheduler needs to decide which input buffer the decoder should serve next. It has been discussed in [18–20] that serving the longest queue first (LQF) can help making the parallel queues (or input buffers) the most balanced or stable, thus maximising the input data rate *R*_{
d
}. The scheduled decoders serving the longest queue first is considered to be one of the best scheduling schemes in the proposed architecture in terms of achieving a high decoding throughput.

The LQF scheme needs to compare the input buffer occupancy values. Other simpler scheduling schemes can be employed to reduce the computational and hardware complexity of the scheduler. One possible scheduling scheme is to randomly select the input buffer, which is named the RDM scheme. Another scheduling scheme is to group the parallel input buffers and decoders, such that each decoder can only be scheduled to the input buffers within the same group. The decoders in the same group are scheduled according to the LQF scheme. This is known as the static scheduling scheme or the STC scheme. In this article, each group is assumed to have two input buffers and two UFA decoders. Compared to the LQF scheme, the STC scheme can help reducing the need for multi-port memories and high fan-out multiplexers. It can also simplify the design of the scheduler and the connections between the input buffers and the decoders.

### 3.3 PUFAS mode and PBFAS mode

When a decoder ${\mathcal{F}}_{m}$ finishes decoding a codeword, it can be scheduled to decode a new codeword from one of the input buffers, or it can be scheduled to help another decoder ${\mathcal{F}}_{{m}^{\prime}}$ which has already been working on a whole codeword. The scheduled decoder ${\mathcal{F}}_{m}$ can decode from the end state zero of this codeword, which makes ${\mathcal{F}}_{m}$ and ${\mathcal{F}}_{{m}^{\prime}}$ decode the same codeword in the BFA mode. These two modes are known as the parallel unidirectional Fano algorithm decoding with scheduling (PUFAS) mode and the parallel bidirectional Fano algorithm decoding with scheduling (PBFAS) mode, respectively. It has been shown in [12, 13] that the decoding throughput of a BFA decoder is at least two times of a UFA decoder (*D*_{
BFA
}≥ 2*D*_{
UFA
}) due to the parallel processing between the FD and the BD and also due to the computational effort reduction achieved by the BFA. As a result, if there are *M* UFA decoders among which any two can decode in the BFA mode, the decoding throughput can be improved by forming ⌊*M*/2⌋ BFA decoders. In this case, there will be ⌊*M*/2⌋ parallel BFA decoders which can be scheduled in the architecture.

## 4 DTMC based modeling

*N*

_{ decoded }is the number of decoded codewords and

*N*

_{ erased }is the number of erased codewords due to buffer overflow. A metric called blocking probability (

*P*

_{ B }) is defined as:

*P*

_{ B }is similar to the frame error rate (

*P*

_{ F }) caused by undetected errors. In designing the system, the input data rate

*R*

_{ d }(in bps), the clock speed of each Fano decoder

*f*

_{ clk }(in Hz) and the input buffer size

*B*(in codewords) need to be chosen properly to ensure that:

In this article, *P*_{
B
}*=* 0.01 × *P*_{
F
}is adopted as the target blocking probability (*P*_{
target
}). The relationship between *R*_{
d
}, *f*_{
clk
}and *B* can be found via simulation. Another way to analyze the architecture is to model it based on queuing theory.

### 4.1 DTMC based modeling on single UFA/BFA

**D/G/1/B**queue [21], in which

**D**means that the input data rate is deterministic,

**G**means that the decoding time is generic, 1 means that there is one decoder and

**B**is the number of codewords the input buffer can hold. The state of the Fano decoder is represented by the input buffer occupancy or queue length when a codeword just finishes decoding, which is measured in terms of branches or information bits stored in the buffer.

*Q*(

*n*) and

*Q*(

*n*+ 1) have the following relationship:

*Q*(

*n*+ 1) is the input buffer occupancy when the

*n*

^{ th }codeword just finishes decoding,

*T*

_{ s }(

*n*) is the decoding time of the

*n*

^{ th }codeword by the Fano decoder and

*L*

_{ f }is the length of a codeword in terms of branches or information bits. [

*x*] denotes the operation to get the nearest integer to

*x*. The speed factor of the Fano decoder is defined as the ratio between

*f*

_{ clk }and

*R*

_{ d }:

*f*

_{ clk }is normalized to 1, Equation (4) can be changed to:

*μ*and

*L*

_{ f }, the state of the input buffer

*Q*(

*n +*1) is determined uniquely by the state

*Q*(

*n*) and the decoding time

*T*

_{ s }(

*n*).

*T*

_{ s }(

*n*) and

*T*

_{ s }(

*n+*1) are

*i.i.d*. in the AWGN channel or randomly interleaved fading channels. As a result, the state of the input buffer is a DTMC. It is assumed that the Fano decoder can execute one iteration per clock cycle which is feasible according to [22], so

*T*

_{ s }(

*n*) is measured in clock cycles/codeword. The simulated distribution of

*T*

_{ s }will be used in the following analysis since its closed form expression is intractable. The difference between

*Q*(

*n*+ 1) and

*Q*(

*n*) is defined as:

*B*is:

*P*

_{ ij }is the state transition probability from

*S*

_{ i }to

*S*

_{ j }which can be calculated as follows:

*T*

_{ s }, which is shown in Figure 7 for the UFA with different speed factors at

*E*

_{ b }/

*N*

_{0}= 4dB. It should be noted that a bad codeword may incur unbounded decoding time for a Fano decoder and it is common to erase this codeword. This case corresponds to

*j*= Ω in Equations (9) and (10). The initial state probability (

*n*= 0) of the input buffer is:

*π*

_{ ω }(

*n*) is the probability that the input buffer is in state

*S*

_{ ω }at time

*n*. The steady state probability of the input buffer is then:

where **∏**(*i*) is the steady state probability that the input buffer is in state *S*_{
i
}and ${p}_{{\Delta}_{\Omega -i}}^{+}=\text{Pr}\left(\Delta >\Omega -i\right)$.

### 4.2 Extension to PUFAS/PBFAS-LQF

*M*Fano decoders working in parallel with each running at

*f*

_{ clk }and the LQF scheduling scheme is used, the

*M*Fano decoders can be fully utilized to decode the codewords in the

*N*input buffers. Since the

*M*Fano decoders and the

*N*input buffers are identical to each other, the system is totally symmetric and can be treated as a faster Fano decoder with the clock speed of ${f}_{clk}^{\prime}=M\cdot {f}_{clk}$ working on each input buffer with the probability of

*P*

_{ S }= 1/

*N*. As a result, Equation (6) should be changed to:

*i*∈ {1,...,

*N*}, and Equation (7) should be changed to:

The state transition probability matrix P_{
T,i
}can be calculated based on the distribution of Δ_{
i
}, and Equations (8)-(13) can still be applied to the PUFAS/PBFAS-LQF. The validation of the proposed DTMC model will be confirmed by the simulation results shown in the following section.

## 5 Simulation results

The performance of the proposed parallel Fano decoding with scheduling is examined via simulation in this section. The branch metric calculation is based on 1-bit hard-decision with the Fano metric [11]. Using 3-bit soft-decision for the branch metric calculation results in about 1.75 to 2dB additional coding gain. However, 1-bit hard-decision is favoured in very high throughput decoder design to achieve a trade-off between the complexity of the decoder and the error rate performance. In this article, 1-bit hard-decision is adopted for the metric calculation for both the Viterbi and the Fano algorithm. The threshold adjustment value in the Fano algorithm is *δ* = 2. The modulation is BPSK and the channel is assumed to be an AWGN channel. The AWGN channel is similar to the LOS multipath channel for 60 GHz as discussed in [23]. Each frame has *L* = 200 bits plus *K* - 1 = 6 zeros bits which results in a total frame (or a codeword) length of *L*_{
f
}= *L* + *K* - 1 = 206 bits. The input buffer size is assumed to be *B* = 10.

### 5.1 Comparison between different scheduling schemes

*E*

_{ b }/

*N*

_{0}= 4 dB which corresponds to the target blocking probability of

*P*

_{ target }= 10

^{-3}. In both the PUFAS and the PBFAS, the LQF scheduling scheme has the best performance. In the PUFAS the RDM scheme has a better performance compared to the STC scheme, while in the PBFAS the RDM scheme has the worst performance compared to all the other schemes. This is because when the RDM scheme is employed in the PBFAS, a BFA decoder may become idle if it randomly selects a low occupancy input buffer. But the wrong selection by the RDM scheme in the PUFAS may make only one UFA decoder idle. As a result, the RDM scheme can be used in the PUFAS and the STC scheme can be used in the PBFAS to reduce the complexity of the scheduler. However, since the complexity added by the LQF scheduler to the parallel decoders is minimal, it is favoured in terms of achieving a higher decoding throughput.

### 5.2 Validation of the DTMC model

^{b}are compared with the simulation results to validate the DTMC model. It can be seen from Figure 9 that the semi-analytical results are quite close to the simulation results, which indicates the accuracy of the proposed DTMC model. The working speed factor of the parallel unidirectional Fano algorithm decoding without scheduling (PUFA) is about

*μ*= 17 which can be reduced to

*μ*= 7 and

*μ*= 5.6 if the LQF scheduling scheme is performed in the PUFAS and in the PBFAS, respectively. The corresponding decoding throughput improvements are 140% and 200%, respectively.

It has been found that the proposed DTMC based modeling on the PUFAS-LQF and PBFAS-LQF is ideal when the input buffer size *B* is large enough (i.e., *B* ≥ 5). The accuracy of the model degrades as *B* gets smaller. However, a very short input buffer will not be adopted according to the trade-off between area and decoding throughput as discussed in [21]. Additionally, it has also been found that the accuracy of the model does not depend on the relationship between *M* and *N* (i.e., *M* > *N, M* = *N* or *M* < *N*) as long as the input buffer size is large enough.

### 5.3 Number of parallel Fano decoders

*M*and the working speed factors

*μ*for both the PUFAS-LQF and the PBFAS-LQF at

*E*

_{ b }/

*N*

_{0}= 4dB and 5dB, respectively. This relationship can be easily established by the proposed DTMC model.

If the target decoding throughput is *D*_{
target
}= 1 Gbps and the clock speed of the Fano decoder is *f*_{
clk
}= 500 MHz, the supported input data rate will be *R*_{
d
}= *D*_{
target
}/*N* = 125 Mbps for *N* = 8 input buffers and the target speed factor will be *μ*_{1} = *f*_{
clk
}/*R*_{
d
}= 4. It can be seen from Figure 10 that the required number of decoders is *M* = 14 for the PUFAS-LQF and *M* = 12 for the PBFAS-LQF at *E*_{
b
}/*N*_{0} = 4dB. Two decoders can be saved if the PBFAS-LQF is adopted compared to the PUFAS-LQF for the same decoding throughput.

It can also be seen from Figure 10 that the decoding throughput can be improved as SNR increases for the same number of decoders. As a result, some of the decoders can be dynamically turned off as SNR increases for the same decoding throughput, though a large number of decoders may be required to support a low SNR. For example, if the target decoding throughput increases to *D*_{
target
}= 2 Gbps and the clock speed of the Fano decoder is still *f*_{
clk
}= 500 MHz, the target speed factor will be *μ*_{2} = 2. It can be seen from Figure 10 that the required number of decoders is *M* = 28 for the PUFAS-LQF and *M* = 26 for the PBFAS-LQF at *E*_{
b
}/*N*_{0} = 4dB which can be reduced to only *M* = 12 if the SNR increases to 5dB. In this case, more than half of the decoders can be turned off to reduce the power consumption of the decoding architecture.

### 5.4 Error rate performance and computational complexity

*T*-algorithm [27] can also achieve a reduced computational complexity at a high SNR with a minimal penalty in coding gain, so its performance is also included for comparison. It can be seen in Figure 11 that the PVA has the best BER performance. There is about 0.1dB penalty in coding gain at BER = 10

^{-4}by using the PUFAS-LQF. The PBFAS-LQF has the worst performance and there is about 0.25 dB coding gain loss compared to the PVA. The

*T*-algorithm has been tuned to achieve similar BER performance by setting the discarding threshold

*T*= 5.

*C*

_{ UFA }is the computational complexity of the UFA decoder and

*C*

_{ S }is the computational complexity of the LQF scheduler. It is known that

*C*

_{ UFA }≥

*L*

_{ f }= 206 BMC and

*C*

_{ S }is only

*N*- 1 = 7 times input buffer occupancy values comparisons. As a result, the computational complexity of the PUFAS-LQF to decode one codeword is

*C*

_{ PUFAS }≈

*C*

_{ UFA }. Similarly, the computational complexity of the PBFAS-LQF to decode one codeword is:

*C*

_{ FD }is the number of BMC to decode one codeword in the forward direction and

*C*

_{ BD }is the number of BMC in the backward direction. The computational complexity of the PVA to decode one codeword has a fixed value:

The distributions of *C*_{
UFA
}, *C*_{
BFA
}, and *C*_{
VA
}at different SNR can be found in Figure 2.

It can be seen that the proposed decoding architecture consumes a much lower computational complexity compared to the PVA. For example at *E*_{
b
}/*N*_{0} = 4 dB, the computational complexity of the PUFAS-LQF is only 10% of the PVA and it reduces to 3% at 6 dB. Additionally, the computational complexity of the PBFAS-LQF is lower than that of the PUFAS-LQF at a lower SNR, but they become very similar as SNR increases. This is because at a high SNR, the computational complexity reduction achieved by the BFA compared to the UFA becomes minimal. Since there is a very limited improvement on decoding throughput and computational complexity by using the PBFAS-LQF compared to the PUFAS-LQF at a high SNR, the PUFAS-LQF is favored due to its better BER performance. It can also be seen from Figures 11 and 12 that with a similar BER performance as the PUFAS-LQF and the PBFAS-LQF the *T*-algorithm cannot achieve the same low computational complexity.

## 6 Conclusions

This article considered the application of sequential decoding algorithm in high-throughput wireless communication systems. A novel architecture based on parallel Fano algorithm decoding with scheduling was proposed. Due to the scheduling of the Fano decoders according to the input buffer occupancy, a high decoding throughput can be achieved by the proposed architecture. Different scheduling schemes and decoding modes were proposed and compared. It was shown that the PBFAS-LQF scheme could achieve the highest decoding throughput. A DTMC model was proposed for the decoding architecture. The relationship between the input data rate, the clock speed of the decoder and the input buffer size can be easily established via the DTMC model. The model was validated by simulation and utilized to determine the number of decoders required for a target decoding throughput. It was shown that the novel high-throughput decoding architecture requires 3-10% of the computational complexity of the Viterbi decoding with a similar error rate performance. This novel architecture can be employed in high-throughput systems such as 60 GHz systems to achieve energy efficient low-complexity convolutional codes decoding.

## Endnotes

^{a}The standard does not specify the Rx design. Only the Tx design is given. ^{b}Since the distribution of *T*_{
s
}is obtained by simulation, the DTMC based results are referred to as semi-analytical.

## Declarations

### Acknowledgements

The authors would like to thank the Telecommunications Research Laboratory (TRL) of Toshiba Research Europe Ltd and its directors for supporting this study.

## Authors’ Affiliations

## References

- WirelessHD Specification Version 1.1 Overview[http://www.wirelesshd.org/pdfs/WirelessHD-Specification-Overview-v1.1May2010.pdf]
- IEEE Standard for Information technology-Telecommunications and information exchange between systems-Local and metropolitan area networks-Specific requirements. Part 15.3: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for High Rate Wireless Personal Area Networks (WPANs) Amendment 2: Millimeter-wave-based Alternative Physical Layer Extension. IEEE Std 802.15.3c-2009 (Amendment to IEEE Std 802.15.3-2003) 2009.Google Scholar
- IEEE 802.15 WPAN Task Group 3c (TG3c) Millimeter Wave Alternative PHY[http://www.ieee802.org/15/pub/TG3c.html]
- Kato S, Harada H, Funada R, Baykas T, Sam C, Junyi W, Rahman M: Single carrier transmission for multi-gigabit 60-GHz WPAN systems.
*IEEE J Sel Areas Commun*2009, 27(8):1466-1478.View ArticleGoogle Scholar - Marinkovic M, Piz M, Choi C, Panic G, Ehrig M, Grass E: Performance evaluation of channel coding for Gbps 60-GHz OFDM-based wireless communications. In
*IEEE 21st International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC)*. Istanbul, Turkey; 2010:994-998.View ArticleGoogle Scholar - Fettweis G, Guderian F, Krone S: Entering the path towards terabit/s wireless links. In
*Design, Automation and Test in Europe Conference and Exhibition (DATE)*. Grenoble, France; 2011:1-6.Google Scholar - Viterbi A: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.
*IEEE Trans Inf Theory*1967, 13(2):260-269.View ArticleMATHGoogle Scholar - Jelinek F: Fast sequential decoding algorithm using a stack.
*IBM J Res Develop*1969, 13(6):675-685.MathSciNetView ArticleMATHGoogle Scholar - Fano R: A heuristic discussion of probabilistic decoding.
*IEEE Trans Inf Theory*1963, 9(2):64-74. 10.1109/TIT.1963.1057827MathSciNetView ArticleGoogle Scholar - Pan W, Ortega A: Adaptive computation control of variable complexity Fano decoders.
*IEEE Trans Commun*2009, 57(6):1556-1559.View ArticleGoogle Scholar - Lin S, Costello D:
*Error Control Coding: Fundamentals and Applications*. Pearson Prentice-Hall, Upper Saddle River, NJ; 2004.MATHGoogle Scholar - Xu R, Kocak T, Woodward G, Morris K, Dolwin C: Bidirectional Fano algorithm for high throughput sequential decoding. In
*IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)*. Tokyo, Japan; 2009:1809-1813.Google Scholar - Xu R, Kocak T, Woodward G, Morris K: Throughput improvement on bidirectional Fano algorithm. In
*Proc of the 6th International Wireless Communications and Mobile Computing Conference (IWCMC)*. Caen, France; 2010:276-280.Google Scholar - Habib I, Paker O, Sawitzki S: Design space exploration of hard-decision Viterbi decoding: algorithm and VLSI implementation.
*IEEE Trans Very Large Scale Integr (VLSI) Syst*2010, 18(5):794-807.View ArticleGoogle Scholar - Black P, Meng T: 1-Gb/s, four-state, sliding block Viterbi decoder.
*IEEE J Solid-State Circ*1997, 32(6):797-805. 10.1109/4.585246View ArticleGoogle Scholar - Fettweis G, Meyr H: Parallel Viterbi algorithm implementation: breaking the ACS-bottleneck.
*IEEE Trans Commun*1989, 37(8):785-790. 10.1109/26.31176View ArticleGoogle Scholar - Anders M, Mathew S, Hsu S, Krishnamurthy R, Borkar S: A 1.9 Gb/s 358 mw 16-256 state reconfigurable Viterbi accelerator in 90 nm CMOS.
*IEEE J Solid-State Circ*2008, 43(1):214-222.View ArticleGoogle Scholar - Tassiulas L, Ephremides A: Dynamic server allocation to parallel queues with randomly varying connectivity.
*IEEE Trans Inf Theory*1993, 39(2):466-478. 10.1109/18.212277MathSciNetView ArticleMATHGoogle Scholar - Ganti A, Modiano E, Tsitsiklis J: Optimal transmission scheduling in symmetric communication models with intermittent connectivity.
*IEEE Trans Inf Theory*2007, 53(3):998-1008.MathSciNetView ArticleMATHGoogle Scholar - Al-Zubaidy H, Talim J, Lambadaris I: Optimal scheduling policy determination for high speed downlink packet access. In
*IEEE International Conference on Communications (ICC)*. Glasgow, Scotland; 2007:472-479.Google Scholar - Xu R, Woodward G, Morris K, Kocak T: A discrete time Markov chain model for high throughput bidirectional Fano decoders. In
*IEEE Global Telecommunications Conference (GLOBECOM)*. Miami, USA; 2010:1-5.Google Scholar - Ozdag R, Beerel P: An asynchronous low-power high-performance sequential decoder implemented with QDI templates.
*IEEE Trans Very Large Scale Integr (VLSI) Syst*2006, 14(9):975-985.View ArticleGoogle Scholar - Sum C, Zhou L, Funada R, Wang J, Baykas T, Rahman M, Harada H: Virtual time-slot allocation scheme for throughput enhancement in a millimeter-wave multi-Gbps WPAN system.
*IEEE J Sel Areas Commun*2009, 27(8):1379-1389.View ArticleGoogle Scholar - Sun F, Zhang T: Low-power state-parallel relaxed adaptive Viterbi decoder.
*IEEE Trans Circ Syst I Regular Papers*2007, 54(5):1060-1068.View ArticleGoogle Scholar - Jin J, Tsui C: Low-power limited-search parallel state Viterbi decoder implementation based on scarce state transition.
*IEEE Trans Very Large Scale Integr (VLSI) Syst*2007, 15(10):1172-1176.View ArticleGoogle Scholar - He J, Liu H, Wang Z, Huang X, Zhang K: High-speed low-power Viterbi decoder design for TCM decoders.
*IEEE Trans Very Large Scale Integr (VLSI) Syst*2012, 20(4):755-759.View ArticleGoogle Scholar - Simmons S: Breadth-first trellis decoding with adaptive effort.
*IEEE Trans Commun*1990, 38(1):3-12. 10.1109/26.46522MathSciNetView ArticleGoogle Scholar - Shieh S, Chen P, Han Y: Reduction of computational complexity and sufficient stack size of the MLSDA by early elimination. In
*IEEE International Symposium on Information Theory (ISIT)*. Nice, France; 2007:1671-1675.Google Scholar - Han Y, Chen P, Wu H: A maximum-likelihood soft-decision sequential decoding algo-rithm for binary convolutional codes.
*IEEE Trans Commun*2002, 50(2):173-178. 10.1109/26.983310View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.