Delay tolerant (delto) distributed TAST codes for cooperative wireless networks

In a distributed cooperative communication system, as the distances between different relay nodes and the receiving nodes may be different, so the performances of distributed space time codes at receiving nodes may badly be degraded if timing synchronization is not assured. In this article, extending the work of Damen et al. we introduce the design of distributed threaded algebraic space-time (TAST) codes offering resistance to timing delay off-set. We present some new and useful techniques of constructing delay tolerant TAST code for distributed cooperative networks, which, like their brethren codes, are delay tolerant for any delay profile and achieve full diversity for arbitrary number of relays, transmit/receive antennas, and input alphabet size. Our proposed codes with minimum lengths achieve better performances than the existing codes retaining full rate and full diversity with or without use of guard bands. Simulations results confirm our claim of obtaining better performances.


Introduction
Wireless communication systems with multiple antennas have recently attracted considerable interests [1][2][3]. Perhaps, the reason is that the performance of a wireless system is often limited by fading and may significantly be improved by exploiting some sort of diversity, for example spatial diversity. But on other hand equipping the pocket size mobile handsets with additional radiofrequency (RF) hardware is not feasible. Therefore, thinking for alternate options, many researchers have proposed different solutions and proposals.
Sendonaris et al. [4] proposed the idea of cooperative diversity which enables the source/destination to use nearby nodes as virtual antennas. In other words, the nearby relay nodes may act as auxiliary receivers/transmitters for the original transmitter/receiver. But, again the main problem with cooperative terminals as indicated by Li and Xia [5,6] is the asynchronous nature of transmission which forces the traditionally designed space-time codes to lose their diversity and coding gain when used over distributed cooperative networks.
In fact in an unsynchronized cooperative network, the data from different relays reach the destination after different delays. In [7], it was shown that all the wellknown codes lose their diversity at receiver. Mei et al. [8] proposed a technique of using of guard bands between blocks of symbols. The proposed scheme in [8] could achieve full-diversity but the main drawback of this technique is its limitation in the number of relays (only two are allowed) and a huge rate loss due to the insertion of guard bands.
The proposed delay tolerant codes for asynchronous cooperative network of Li and Xia [5,6] were further generalized and refined in [9] by including full-diversity delay tolerant space-time trellis codes of minimum constrained length. In [7], delay tolerant distributed space-time block codes based on threaded algebraic space-time (TAST) codes [10] are designed for unsynchronized cooperative network. The distributed TAST codes of [7] preserve the rank of the space-time codewords under arbitrary delays at the receiver. In a similar way, the authors in [7] further extend their study in [11] by introducing delay tolerant codes with minimum lengths. A lattice-based maximum likelihood detector is used for decoding, which is computationally more complex than the decoupled decoding of orthogonal space-time block codes.
Following the framework of [7,11], in this article we introduce some new and useful techniques for designing delay tolerant TAST codes. The proposed codes achieve full diversity and are flexible with respect to signalling constellation, transmission rate, and number of transmit/receive antennas. Our proposed codes achieve better performances over the existing codes particularly at high SNR. For ease to write, hereafter, we will use delto code as short form of delay tolerant code.
The rest of the article is organized as follows. The background and system model are given in Section 2. Some construction techniques of delay tolerant spacetime codes are discussed in Section 3. Multiple threads packing are given in Section 4. Section 5 elaborates the construction methods for delto codes with minimum lengths. Some construction examples are presented in Section 6. Simulation results are given in Section 7 and the conclusion is given in Section 8.

System model
In a cooperative communication system, the communication between source and destination is modelled in two phases.
In phase-I, the source sends information to the destination and at the same time this information is also received by the relays.
In phase-II, the relays help the source by forwarding or retransmitting the received information to destination.
Initially proposed in [12], the relays use different protocols for processing and re-transmitting the received signal from source to destination. In this article, we consider decode-and-forward processing strategy at the relays.
Since the relay nodes use common time slots and frequency bandwidth for retransmissions of their signals, the relays may expose to overlap both in time and frequency, i.e. each node transmits a distinct coded bit stream, the superposition of which forms a space-time code. In what follows, the design and performance analysis of such distributed STBC codes will be our main focus.
We assume the conventional MIMO system, modelled with N t transmit antennas corresponding to N relays and N r receive antennas at the destination. At time instant t, the received signal is expressed in vector notation as wherer t ∈ C N r ×1 is the received vector at time t, s t ∈ C N t ×1 is modulated signal vector transmitted during the tth symbol interval, H t ∈ C N r ×N t is the channel matrix andn t ∈ C N r ×1 denotes additive white Gaussian noise. The N t × T modulated space-time codeword matrix s is transmitted over T symbol intervals by takinḡ s t to be the tth column of s. The channel is assumed to be quasi-static, i.e. the channel transfer matrix H t is constant over a codeword interval but is generated randomly and independently from codeword to codeword. We further assume that no error occurs between sources and relays. The nature of processing strategies at relays greatly impact the code design and decoding complexity, and it is beyond the scope of this article to discuss this issue in detail. In the simplest case, the timing offset from different relays at reception may be avoided by use of guard bands or some sort of timing advance protocols [13].
If the maximum possible timing offset among the different relay's transmissions is L' symbol intervals and a pad of duration L' symbols is used by each relay between its coded transmissions, then the different composite space-time codewords never overlap in time. Each space-time codeword can be decoded individually. For short space-time block codes, the significant rate loss induced by the use of fill symbols or guard intervals can be mitigated by allowing the relays to transmit its coded streams one after another [13], but for long block size the code rate loss is an open problem. In what follows, we present the construction of space-time codes which are robust to arbitrary delays without insertion of guard bands.

Delay tolerance of space-time codes
For the sake of completeness we review some notations from [7]. Let S be an STBC code with codeword of size N t × T. Assume s 1 and s 2 are two distinct codewords of S . The diversity order of S is the minimum rank of the difference matrix s 1 -s 2 over all pairs of distinct codeword in S . This condition is referred as rank criterion [14].
For our purposes, the transmitted symbols will finitely be generated from an underlying finite constellation using algebraic number field constructions. Let A denote the two-dimensional constellation chosen from Z [i] or Z[j], and let F = Q(i) or Q(j) denote the field of complex rational numbers and complex Eisenstein rational numbers, respectively. Let F(θ) be an extension field of degree [F(θ): F]. Then, the fundamental alphabet for our constructions is given by where integer P ≤ [F(θ): F].
Each transmitted symbol s s is from Ω or, more generally, is from its image f(Ω) under some specified one-to-one mapping: f: Ω C.
A space time code S is said to be τ-delto for the quasi-synchronous cooperative diversity scenario, if the difference between every non-trivial pair of codewords in S retains full rank even though its rows are transmitted with arbitrary delays of duration at most τ symbols [15].
If all relays start transmitting all rows of a distributed STBC simultaneously, then different rows will reach destination with different delays δ i ≤ δ max , i {1, 2, ..., N}. If all relays continuously transmit the rows of different distributed STBC at different blocks, then the data of two consecutively transmitted STBC can be overlapped due to the timing errors. All relays start transmitting the assigned rows of the codeword simultaneously and as the values of the relatives delays are unknown, therefore each of them waits for δ max time interval after the transmission of the codeword is finished. Due to the delays in the reception, an N t × T transmitted STBC s is transmitted into an N t × (T + δ max ) codeword at the receiver as follows where a 0 represents no transmission and δ max denotes the maximum of the relative delays. Let W symbols be encoded into the original STBC S ∈ C N t ×T , then it can be seen from (3) that they take T + δ max time interval for transmitting s. Hence, the effective data rate in the asynchronous cooperative network is W/T + δ max , which is less than the data rate in a synchronous system W/T for which the STBC is traditionally designed. Now, a space-time code S is called τ-delto if for all delay profiles Δ with δ max (Δ) ≤ τ, the effective spacetime code S achieves the spatial diversity as high as that of S . A space-time code is fully delto if it is delto for any positive integer τ. For more detail examples, the readers are referred to [7]. Furthermore, in [7], it has also been proved that the space-time codes constructed from cyclic division algebra [16], including the wellknown Golden code, are also not delay tolerant.

Construction of delto codes
In this section, we try to develop two useful techniques for construction of STBC codes based on threads that are delay tolerant. The constructed codes achieve maximal spatial diversity and are fully delto. They are also flexible with respect to signalling constellation, transmission rate, number of transmit/receive antennas and decoder complexity. For most of the cases, we use the fundamental signalling alphabet Ω derived from constellation A in accordance with (2).
A layer is a mapping strategy that assigns a particular transmit antenna to be used at each individual time interval of a code word [17]. A layer is called a thread when it spans in spatial and temporal dimensions in such a way that at each time instant: 1 ≤ t ≤ T at most one antenna is used [18]. With a minor modification we relax the condition of antenna usage at each time interval and allow the signalling intervals to be empty, i.e. no symbol be transmitted from any antenna during certain signalling intervals.
We use a technique very similar to that of Huffman (HM) binary tree. We develop an HM binary tree of N t + 1 nodes. We assume that nodes 2 (node 1 is discarded) to N t + 1 of the HM binary trees represent the rows 1 to N t of codeword matrices, respectively. We further assume that the weights or more precisely hamming weights of nodes m i , i = 1, 2, ..., Nt + 1 are such that With these assumptions, we may construct the HM tree in a straightforward way starting from bottom node coming up to top node.
As an example, consider the HM thread Λ defined for N t transmit antennas, where For N t = 3 and 4, to obtain HM threads, we draw two HM binary trees in Figures 1 and 2, respectively. Putting the obtained numerical values in matrix form in a row end-to-start manner by non-zero elements, i.e. after discarding the first node (i.e. m i ) of the tree, the m i +2 th row is started immediately from next column in which m i+1 th has its last non-zero element. The process is repeated till m N t +1 th row. The empty positions are filled by zeros, we get As an alternate method, we can develop such type of codeword matrices by using the following expression, where the thread Λ HM has (i, j) entry defined as where the Kronecker delta function is defined as Lemma 1: Let S = HM denote the space-time code in which the repetition code over alphabet Ω is used over the thread Λ HM , then S achieves full spatial diversity and is fully delto.
Proof: One can see that S encompasses multiples of Λ HM , so all the differences between codeword in S are multiples of HM , hence it is easy to show that HM is of full rank for all delay profiles. One can see that regardless of Δ size, the ith row of HM always contains the same number of ones as its index (i.e. m i = i), whereas the total number of non-zero elements in all lower numbered rows is i(i-1)/2. Hence, for each i, there is a column in HM for which the entry in the ith row is 1 and all the elements above it are zeros. The set of these columns for i = 1 to N t forms an N t × N t submatrix that is lower-triangular with ones on the diagonal. Since this submatrix has determinant 1, so we can say that HM is of full rank.
One can see from the code structure that any permutation of rows or columns may be done in Λ HM to produce an equivalent thread yet preserving the properties of its parent code, deletion of rows in Λ HM also would not affect the delto property. Now, generalizing the obtained results over DAST codes [19], for t = 1, 2, ..., T HM N t , consider f t : Ω C be a one to one function, and we derive the corresponding thread function matrix F Λ (x) for thread Λ HM by replacing the non-zero elements in matrix Λ HM by the function F Λ (x). For example, for N t = 4, we have for some a Ω. Then, S achieves full diversity and is delto.
Proof: If a and b are two distinct codewords of S , then the difference codeword matrix f Λ (a) and f Λ (b) will adopt the same form as Λ HM by replacing 1 for t = 1, 2, ..., T HM N t by the difference matrix f t (a)-f t (b). Let an arbitrary delay profile Δ be applied to the difference matrix f Λ (a)-f Λ (b) to produce the matrix F Δ , then, as proved before, the columns t 1 , t 2 , ..., t N t in F Δ form a lower triangular matrix with diagonal entries equal to f t i (a) − f t i (b) for i = 1, 2, ..., N t and this matrix has determinant: Since all the functions f t i are one to one, so the determinant D will be zero subject to condition if a = b, likewise F Λ (a) will be equal to F Λ (b) if a = b. Therefore, the matrix F Δ is full rank.

Another method
The HM method for construction of codeword matrices may be lethargic for large value of N t , as one can see from the code structure that there is a large disparity or unevenness in usage of antennas. Here, we develop another method of thread matrix construction in which each antenna is used for the same number of time. We is the number of zeros between two non-zero elements in row i.
The first non-zero element in row i lies in columns j according to Table 1.
The UU design is fully delto and offers full diversity. Equivalently, we can construct such codeword matrices for UU threads as follows, where (i, j) th entry is defined as where delmod is ordinary modulo N t function, and is not taken into account when Kronecker delta function δ j is active, and the Kronecker delta function is defined as where P is a vector of first T elements of safe prime numbers. b .
Lemma 3: Let S = UU be the N t × T UU N t space-time code in which the repetition code is used over the thread Λ UU . Then, S achieves full spatial diversity and is fully delto.
Proof: As one can see from code structure, it is easy to show that for any delay profile Δ, the ith row of the thread matrix cannot be expressed as a linear combination of rows 1 through N t -1.
The two non-zero elements in ith row are separated by u zero elements, where u is given by (14) or, more precisely Furthermore, the leading non-zero element in row i 1 and i N t always starts from column j 2 and j 1 , respectively, whereas the second non-zero element of same rows lies in j 2 and j 2N t −1 , respectively. Likewise, for the rest of the rows, the second non-zero elements lay in position j i + ϖ, where ϖ is the position of leading non-zero element in that row.
We know that in the linear combination of even weight rows, if the leading non-zero element in row N t lies in column j i , then there must be an odd number of rows having a non-zero element in column j i [7]. Therefore, we say that our proposed codeword is fully delto.

Multiple thread delto codes
In previous section, we discussed different techniques for construction of single thread delto codes. To improve the rate of these codes, we combine multiple delto threads in single codeword matrices. There is more than one way of packing such threads. Here, we discuss two methods as follows.

Cyclic shift
This method has a very simple and interesting structure. We use to shift each column of thread matrix Λ k (k = 1, 2, ..., N t ) by one element in thread matrix Λ k+1 . We repeat the process till the last thread Λ M .
Let Λ k be thread k for N t transmit antennas and T vector channel uses. Then, for N t = 4 and  For ease to understand, we replace the non-zero elements in their respective locations by alphabets a, b, c and d in Λ 1 to Λ 4 , respectively. So, the above codeword matrix is reproduced as After making a shift by one element in each column in above codeword matrix, we get After making a shift by one element in each column in above codeword matrix, we get Equivalently, we can construct such a codeword threads matrix HM N t by following expression where Xmod is ordinary modulo N t function with a small difference that it replaces the output zero by N t , and the Kronecker delta function is defined as From this packing of threads, we get an N t × T spacetime code S which transmits N t repetition codes simultaneously, one per thread by selecting the code codewords [7].
(Note that the notations a, b, c, d... used in above codewords matrices are replaced by a 1 , a 1 , ..., a N t ).
For HM thread structure when N t = 4 and by packing threads HM 1 to HM 4 , we get c c c b b b b  b a a d d d c c c c  c b b a a a d d d d  d c c b b b a a Similarly for N t = 3, we have and when N t = 2, we have In case of UU threads UU threads can also be packed in the same way as we did above for HM threads. For N t = 4 and T = 2N t , we have and packing all the four threads into a single codeword matrix, we get a d c d b c  c b b a d a c d  d c c b a b d a  a d d c b c a c b c  c b b a c a  a c c b and for N t = 2, get

Algebraically packed multiple-threads
The codes constructed in Section 3 are individually delto and fully diverse, but when they are packed together in a single codeword matrix in a way as we did above, it is not guaranteed that they are delto and fully diverse because the threads may interact in a detrimental way [7]. The remarkable work of El Gamal and Damen [10] can be used to make it sure that the packed codewords are delto and fully diverse. Let Λ be the HM thread for N t transmit antennas and uses. Let f i, j : Ω C be a one-to-one function for each choice i = 1, 2, ..., N t and j = 1, 2, ..., T. For each thread, Λ k derived from Λ in accordance with (6), form the threaded matrix function F k (x) whose (i, j) th entry is f i, Consider the N t × T space-time code S with L ≤ N t active threads consisting of all modulated codewords of the form: for a 1 , a 2 , ..., a L Ω arbitrary. Then, S achieve full spatial diversity and is fully delto.
Proof: Assume a and b are two distinct codewords and are subject to the delay profile Δ.
Then, s Δ is given as Let m denote the largest index for which a m ≠ b m but a i = b i for i >m. Then, The non-zero elements in main diagonal form a submatrix, and are given by This submatrix has determinant where G(j) is a polynomial in j over F(θ ) of degree <N t (m-1). Since the functions f i,j i are all one-to-one and a m ≠ b m , Equation (35) is a non-trivial polynomial in j of degree N t (m-1) over F(θ ).
By design choice, j is not the root of any non-trivial polynomial of degree N t (m-1) over F(θ ). Hence, D(j) ≠ 0, so the matrix is of full rank. We conclude that S achieves full spatial diversity and is fully delto.

Code rate
In the multiple thread code construction, the rate of the space-time code S is given [7] as Thus, we can make S full rate by proper selection of parameters L and P for a given set of code parameters N t , N r and T. In other words, we make the modulation parameters flexible to match the specified spatio-temporal structure. This selection of modulation parameters can be done in different ways, a natural choice is to take L = min(N t , N r ) and P = T.

Packing of threads (when L <N t )
In previous section, we developed a technique of packing the single thread codeword matrices into L = N t threads codeword matrices. Selecting fewer threads than N t may increase the spectral efficiency of the code without increasing the constellation size by reducing the code interval length, but for that we have to relax the condition of antenna usage per time unit within each thread. In this section, we pack the threads in such a way that we allow the usage of more than one antenna per time unit within each thread.

HM thread
We denote the smallest code length for transmission of L threads from N t transmit antennas by T SHM N t ,L . From Section 3, we know that for HM threads the total number of channel usage is n 1 + n 2 + · · · + n N t = N t (N t + 1)/2, where n i = i. Now let h 1 , h 2 , ..., h L denote permutation assigning the values n 1 + n 2 + · · · + n N t to the transmit antennas 1,2,...,N t . Then, according to [7], we may write From (36), for L = 1, we have and for L > 1, we have For N t = 4 and L = 1 and 2, we have for example a a b b b  a a a b b  a a a

UU thread
We denote the smallest code length for transmission of L threads from N t transmit antennas by T SUU N t ,L . From Section 3, we saw that for UU thread, the maximum expansion between two channel uses is 2N t -1. So, we may deduce that  c c a b d b  c b d d b a c a  b c a a c d b d  a d For N t = 3 and L = 3 we have for example Such type of codes will work efficiently for larger value of L. For smaller value of L, we can delete zero columns in (17) and (18), even after amputation of these columns the obtained codes still retain their properties of full diversity and delto.

Minimum length delto codes
The delto codes discussed in the previous sections have the codes length T >N t ; therefore, for large size N t their performances may decrease. In this section, we extend our work and propose a technique for constructing delto codes with minimum delay length T = N T . Our construction method is based on tight packing of the HM threads as developed in Section 3.
In fact, an N t × T MIMO codeword matrix is a strand of algebraic SISO codes separated by Diophantine numbers j, and the difference between distinct N t × N t submatrices is the diversity order of the codeword matrix [ [10], th: 4].
In [11], the authors show that the minimum length delto STBC code s can be constructed by multiplying the designed thread codeword matrix with an N t × T matrix C, whose entries are re-arrangement ofc ∈ C (C being a full diversity one dimensional block code of length N t T).
Of course the main problem in designing such type of codes is the design of thread codeword matrix. In [11], the authors have proposed two types of such matrices for two and three relays. In what follows, we discuss a new technique for construction of thread codeword matrices, and we claim that our proposed code get better performance over [11], particularly at high SNRs.

Construction of thread codeword matrix
Recall from Section 3 for HM generalized thread construction, here we develop a simple construction method for T ML N t = N t as follows: ▪ For row i, (i = 2, ..., N t ), define a complex number j whose power of 2 is simple addition of non-zero elements of row i in HM single thread codeword matrix.
▪ In the first row of HM single thread codeword matrix, the i-tuples of zeros above the non-zero elements in ith row (i = 2, ..., N t ) are replaced by j i-1 . ▪ Fill the empty positions by 1.
For example for N t = 3, the HM 3 matrix from (3) can be represented as For ease to understand, let a i, j represent the location of j in (46).
We show that by an appropriate selection of parameters j and one-dimensional code C , the resulted space-time code S is delto for every delay profile. Now let Ξ denote those N t -tuples of a i, j in Λ ML , which are taken from different rows, and let φ α max be the highest number used, where α max = 2 N t .
Lemma 4: Let S = ML denote the space-time code in which the repetition code (with codewords of length N 2 t ) over alphabet Ω is used as one-dimensional SISO code in conjunction with thread Λ ML , then S achieves full spatial diversity and is fully delto, if the following conditions are satisfied.
▪ j is chosen as an algebraic or transcendental number such that the numbers 1, φ, ..., φ α max are algebraically independent over the field F(θ ) that contains Ω [20]. ▪ The parameters a i, j are chosen such that the summation of the entries of every N t -tuples in Ξ is unique.
Since the one-dimensional code C is a repetition code, it is sufficient to show that ML is full rank for every arbitrary delay profiles = (δ 1 , δ 2 , . . . , δ N t ). To verify the diversity order of the code, we need to find out the largest square submatrix in ML which is full rank.
▪ First column is chosen such that it contains a non-zero element in row N t . ▪ jth Column is chosen such that it contains a power of j at ith row (i, j = 2, ..., N t -1) ▪ As a last step chose N t th column (for which we have only one choice) As a result, the obtained N t × N t submatrix has at least one thread L with all non-zero elements containing N t elements of power j.
If the sum of the powers of j in L threads is m, then the determinant of the submatrix is given by where g(j) is a polynomial of j with degree less than or equal to a max . Since m is unique, g(j) does not contain any term in j m . Therefore, if the number 1, φ, ..., φ α max are algebraically independent over F(θ), det(D) is not zero and the code achieves full diversity for every delay profile. Due to the nice structure of our codes, we may use more than one method to verify the determinant of the largest N t × N t submatrix, for example we can use (35) or the same proof as used for Lemma 1.

Examples
In this section, we lay down some examples of delto distributed TAST codes. Similar to of TAST codes [10], the construction of delto codes are carried out by appropriate selection of the SISO codes and the numbers j. Full-diversity SISO codes over fading channels can be constructed by applying full-diversity unitary transformations to input signals drawn from lattices or multidimensional constellations carved from a ring. Damen et al. [20] provided a systematic way of constructing N t × N t fully diverse unitary transformations over the field that contains the elements of information symbols, as where R = W H .diag(D), W being a discrete Fourier transform matrix built from the transmit QAM symbols vector of size P; we have where θ is a transcendental or an algebraic numbers of suitable degree to guarantee the full diversity of the rotation [20].
For N t = N r = L = 2, P = T = 3, using HM thread construction guideline from Section 3, we get delto distributed TAST code as follows: where X = (x 1 , x 2 , x 3 ) T = R.U and Y = (y 1 , y 2 , y 3 ) T = R. V, U, V are two 3 × 1 vectors of QAM symbols and R 3 is optimal 3 × 3 complex rotation according to (49). By setting j = exp(2πi/15), this code provides the rate of 2-QAM symbols per channel use and achieves a transmit diversity of 2 regardless of the delay profile.
In (52), the number of active threads L is less than the number of transmit antennas N t . One can re-construct (52) to get a delto distributed TAST code of smaller latency by reducing the number of zeros in transmission.
Thus, for N t = 3, N r = L = 2, P = 5 and T = 4, one has the STBC code with codeword matrix ⎡ ⎣ x 1 φy 2 φy 3 0 0 x 2 φy 4 x 5 In this case, by setting j = exp(2πi/36), Equation (53) guarantees full diversity irrespective of the delay profile. This code provides the rate of 2.5-QAM symbols per channel use.
Although the above examples are independently derived from the thread construction techniques discussed in Sections 3 and 4, but they resemble to that of Damens' codes designed in [7], and it was also confirmed by the simulation results that they have exactly same performances as that of [7], but we hope that the simplicity in construction techniques of our codes may reduce hardware complexity.
In Section 5, we introduced a technique for T = N t codeword matrix construction, where the information symbols are chosen from Z[i] for N t = 2 with a required full-diversity rotations of 4 × 4, and ℤ[j] when N t = 3, with a required full-diversity rotations of size 9 × 9.
For N t = N r = T = 2, we get a delto STBC codeword matrix of the form where X = (x 1 , x 2 , x 3 , x 4 ) T = R.U, U is 4 × 1 vectors of QAM symbols and R 4 is optimal 4 × 4 complex rotation according to (48). By setting j = exp(2πi/3), this code provides a rate of 2-QAM symbols per channel use and achieves a transmit diversity of 2 regardless of the delay profile among its rows.
The noiseless received signal of (54) can be written as We remark that Note that when the number of equations is less than the number of unknowns it is necessary to use a decision feedback equalizer (DFE) to help the sphere decoding to converge. For example it is possible to proceed like as following.
At each time instant the first n m transmitted symbols in a packet correspond to last n m decoded symbols in the last packet. The matrix B can be partitioned in the following way and the transmitted symbol vector can be partitioned as: U = [C n m ×1 ; D 4−n m ×1 ] ; where C n m ×1 is the last n m decoded symbols in the last packet, thus we have and we can run the sphere decoding algorithm with the following transformation: Z' Z-B 1 .C and U D.
The new system involves the calculation of vector D of lower size and this can be done with the classical sphere decoding algorithm.
In the case of one delay symbol period We suppose that first row is delayed by one symbol period. In this case, the new space time code can be written as The noiseless received signal can then be written We remark that x 1 = R(1,:) 1 × 4 .U 4 × 1 , x 2 = R(2,:) 1 × 4 .   H(N, 2).φ 4 .R(4, :)1×4 For N t = T = 3, and N r = 2, w get a delto distributed TAST codeword of the form ⎡ ⎣ where X = (x 1 , x 2 , ..., x 9 ) T = R.U, U is a 9 × 1 vector of information symbols belonging to a 4-array constellation in ℤ[j] and R 9 is optimal 9 × 9 complex rotation according to (48). By setting j = exp(πi/12), this code provides the rate of 3 symbols per channel use and achieves a transmit diversity of 3 regardless of the delay profile.

Simulation
Similar to that of TAST codes, we use sphere decoder for decoding our delto codes. In case of delay profiles where the received signals may contain some unknown equations are dealt by the use of minimum mean square error-decision feedback equalization (MMSE-DFE) processing, as explained in previous section and can originally be found in [21,22].
The simulation figures illustrated below show bit and symbol error rates (SERs) as function of E b /No in decibels, which is adjusted as follows.
where E s is the average signal energy per receive antenna and R is the code rate in bit per channel use (bpcu). Figure 3 shows the bit error rate (BER) and SER of the delto distributed TAST code (50) with and without delay. We repeat that in case of code (50), the code parameters are N t = N r = L = 2 and P = T = 3. In case of delay, the first row is shifted by one symbol right to the second row.
In Figure 4, we simulated the BER and SER performances of delto distributed TAST codes (52) with and without delay. The code parameters of (52) are N t = 3, N r = L = 2, and P = T = 5. For delay profile, the first row is shifted by one symbol right to the second row.
In Figure 5, we considered the BER and SER performances of delto distributed TAST code (53) with and without delay. The code parameters of (53) are N t = 3,   N r = L = 2, P = 5 and T = 4. For delay profile, the first row is shifted by one symbol right to the second row. Figure 6 shows the BER and SER performances for codeword matrix (54) without delay. The results are compared with the result of the well-known golden code [23] and the code proposed in [11]. The associated code parameters of (54) are N t = N r = T = 2. One can see that at high SNRs our proposed code (54) gets better performances. Figure 7 shows the BER and SER performances for codeword matrix (54) with delay. For delay case, we shifted the first row by one symbol interval as shown in (63). The results are compared with the result of wellknown golden code [23] and the code given in [11]. Figure 8 shows the BER and SER performances for codeword matrix (68) without delay. The associated code parameters (68) are N t = T = 3, and N r = 2. The results are compared with the result of the code given in [11].
From this figure, one can observe that our proposed code gets better performance by 0.5 dB at the BER of 10 -3 . The proposed code gets better performances at high SNRs. Figure 9 shows the BER and SER performances for codeword matrix (68) with delay. The code parameters for (68) are N t = T = 3, and N r = 2. For delay case, we shifted the first row by one symbol interval. The results are compared with that of the code given in [11]. Figure 10 shows the BER and SER performances for codeword matrix (32) with delay. Delay profile is obtained by shifting the first row in (68) by one symbol to the right of other rows. The associated code (68) parameters are N t = N r = T = 3. The results are compared with that of the code in [11]. Our proposed delto distributed TAST code (68) gets better performances particularly at high SNRs. Figure 11 shows the BER and SER performances for codeword matrix (68) without delay. In this case, the associated code (68) parameters are N t = N r = T = 3, The results are compared with that of the code in [11]. One can see that the error performance of our proposed delto distributed TAST code (68) is improved by about 2 dB at the BER of 10 -5 .

Conclusion
Within the same framework developed in [7,11], we introduced some easy and useful techniques for the construction of delay tolerant distributed STBC codes having full diversity and full rate. Like their brethren codes, our proposed codes are flexible with respect to constellation size, number of receive/transmit antennas. We introduced two useful techniques for constructing        threads codewords matrices. The packing of different threads into a single codeword matrix provides different code structures to be used over cooperative networks with different setup of relays and antennas. In term of error rates, the codes with T >N t developed in (50) to (53) do not outperform the codes introduced by Damen but we hope that their simple structures may reduce the hardware complexity. The codes with T = N t developed in (54) and (68) outperform the existing codes without sacrificing decoding complexity and other nice characteristics. For example, the error performance of the code proposed in (68) is improved by about 2 dB at the BER of 10 -5 when N t = N r = 3, and 0.5 dB at 10 -3 when N t = 2 and N r = 2.
Endnotes a For N t > 4, plus sign is replaced by minus sign. b A safe prime is a prime number of the form 2p + 1, where p is also a prime. For example, first seven safe prime numbers are [5,7,11,23,47,59,83].