Diversity transform matrices for DST codes can be optimized based on maximizing the mutual information *I*(*R*,*Y*) between *R* and *Y*. Maximization of *I*(*R*,*Y*), however, turns out to be a complex task [11]. Alternatively, we shall employ the cutoff rate as a measure with respect to which the diversity transform is optimized. The channel cutoff rate *R*_{0} is a lower bound on the Shannon channel capacity *C*. Its usage in place of capacity often leads to tractable results. In this context, the detection is assumed to be maximum likelihood.

### 4.1 Cutoff rate analysis

Viewing an *N*-dimensional vector *Y* as a ‘super’ symbol of an infinite-length (random) code in which all super symbols are statistically independent, the cutoff rate *R*_{0}(*N*) for single channel use is given by [12]:

{2}^{-{R}_{0}\left(N\right)}={E}_{H}\left\{\underset{R}{\int}{\left[\frac{1}{{M}^{N}}\sum _{Y}\sqrt{{f}_{R}(R/Y,H)}\right]}^{2}\mathit{\text{dR}}\right\},

(11)

where *E*_{
H
} denotes expectation over *H*.

When the path gains are known at the receiver side, the probability density function of *R* conditioned on *Y* and *H* can be written as

\begin{array}{ll}{f}_{R}\left(R\right|Y,H)& =\frac{1}{{\pi}^{N}\xb7det({\sigma}^{2}\xb7H)}exp\{-{(R-H\mathcal{AY})}^{\u2021}\\ \phantom{\rule{1em}{0ex}}\xb7{\left({\sigma}^{2}H\right)}^{-1}\xb7(R-H\mathcal{AY})\}.\end{array}

(12)

The cutoff rate *R*_{0} for a single channel usage of each of the *N* symbols constituting *Y* is *R*_{0}(*N*)/*N*, thus combining Equations 11 and 12 one obtains

\begin{array}{c}{2}^{-N{R}_{0}}={E}_{H}\left\{\underset{R}{\int}\left[\frac{1}{{M}^{N}{\pi}^{N/2}{\left({\sigma}^{2}\right)}^{N/2}\sqrt{det\left(H\right)}}\xb7\right.\right.\\ \left(\right)close="\}">{\left(\right)close="]">\sum _{Y}exp\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}-\frac{1}{2}{(R\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}H\mathcal{AY})}^{\u2021}\xb7{\left({\sigma}^{2}H\right)}^{-1}\xb7(R\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}H\mathcal{AY})\phantom{\rule{0.3em}{0ex}}\right\}\phantom{\rule{0.3em}{0ex}}}^{}2\mathit{\text{dR}}\\ \phantom{\rule{0.3em}{0ex}}.\end{array}\n

(13)

For notational brevity, we introduce *Z* as a dummy variable (of the same nature as *Y* in Equation 13) and define {R}_{Y}\equiv R-H\mathcal{AY} and {R}_{Z}\equiv R-H\mathcal{AZ}; thus, we have

\begin{array}{l}{2}^{-N{R}_{0}}={E}_{H}\left\{\underset{R}{\int}\frac{1}{{M}^{2N}{\pi}^{N}{\left({\sigma}^{2}\right)}^{N}det\left(H\right)}\right.\\ \phantom{\rule{5em}{0ex}}\times \sum _{Y}\sum _{Z}exp\left[-\frac{1}{2}{\left({R}_{Y}\right)}^{\u2021}\xb7{\left({\sigma}^{2}H\right)}^{-1}\xb7{R}_{Y}\right]\\ \phantom{\rule{5em}{0ex}}\left(\right)close="\}">\xb7exp\left[-\frac{1}{2}{\left({R}_{Z}\right)}^{\u2021}\xb7{\left({\sigma}^{2}H\right)}^{-1}\xb7{R}_{Z}\right]\mathit{\text{dR}}\end{array}\n \n \n \n =\n \n \n E\n \n \n H\n \n \n \n \n \n \n 1\n \n \n \n \n M\n \n \n 2\n N\n \n \n \n \n \n \n \u2211\n \n \n Y\n \n \n \n \n \u2211\n \n \n Z\n \n \n \n \n \u220f\n \n \n j\n =\n 1\n \n \n N\n \n \n \n \n I\n \n \n j\n \n \n \n \n ,\n \n \n

(14)

where *I*_{
j
} is defined as

\begin{array}{ll}{I}_{j}& =\underset{{r}_{j}}{\int}\frac{1}{\pi {\sigma}^{2}{\stackrel{~}{h}}^{j}}\\ =exp\left\{-\frac{|{r}_{j}-{\stackrel{~}{h}}^{j}\sum _{i=1}^{N}{a}_{\mathit{\text{ji}}}{Y}_{i}{|}^{2}+|{r}_{j}-{\stackrel{~}{h}}^{j}\sum _{i=1}^{N}{a}_{\mathit{\text{ji}}}{Z}_{i}{|}^{2}}{2{\sigma}^{2}{\stackrel{~}{h}}^{j}}\right\}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{\mathit{\text{dr}}}_{j}.\end{array}

Rearranging the algebraic terms in the last expression, we get

\begin{array}{lc}{I}_{j}& =\underset{{r}_{j}}{\int}\frac{1}{\pi {\sigma}^{2}{\stackrel{~}{h}}^{j}}exp\left\{-\frac{|{r}_{j}-\frac{1}{2}{\stackrel{~}{h}}^{j}\sum _{i=1}^{N}{a}_{\mathit{\text{ji}}}({Y}_{i}+{Z}_{i}){|}^{2}}{{\sigma}^{2}{\stackrel{~}{h}}^{j}}\right\}{\mathit{\text{dr}}}_{j}\xb7\\ exp\left\{-\frac{{\left({\stackrel{~}{h}}^{j}\right)}^{2}\left|\sum _{i=1}^{N}{a}_{\mathit{\text{ji}}}\right({Y}_{i}-{Z}_{i}){|}^{2}}{4{\sigma}^{2}{\stackrel{~}{h}}^{j}}\right\}\\ =exp\left\{-\frac{{\left({\stackrel{~}{h}}^{j}\right)}^{2}\left|\sum _{i=1}^{N}{a}_{\mathit{\text{ji}}}\right({Y}_{i}-{Z}_{i}){|}^{2}}{4{\sigma}^{2}{\stackrel{~}{h}}^{j}}\right\}=exp\left\{-\frac{{\stackrel{~}{h}}^{j}{G}_{j}}{{\sigma}^{2}}\right\},\end{array}

(15)

where

{G}_{j}=\frac{\left|\sum _{i=1}^{N}{a}_{\mathit{\text{ji}}}\right({Y}_{i}-{Z}_{i}){|}^{2}}{4}.

(16)

According to the channel model used, {\stackrel{~}{h}}^{j} are statistically independent among different blocks; Equation 14 then becomes

{2}^{-N{R}_{0}}=\frac{1}{{M}^{2N}}\sum _{Y}\sum _{Z}\prod _{j=1}^{N}{E}_{{\stackrel{~}{h}}^{j}}\left\{{I}_{j}\right\}.

(17)

The following proposition will be useful for the calculation of the expectation in Equation 17:

#### Proposition

**Proposition** **4**.{\stackrel{~}{h}}^{j} *is chi-square distributed with* 2*m* *n* *degrees of freedom, where* *n* *and* *m* *are the number of transmit and receive antennas, respectively.*

*Proof*.

{\stackrel{~}{h}}^{t}=\sum _{i=1}^{n}\sum _{j=1}^{m}|{h}_{i,j}^{t}{|}^{2}=\sum _{i=1}^{n}\sum _{j=1}^{m}\Re {\left({h}_{i,j}^{t}\right)}^{2}+\Im {\left({h}_{i,j}^{t}\right)}^{2},

which is the sum of the squares of 2*m* *n* independent, *N*(0,0.5)-distributed, random variables. The distribution of {\stackrel{~}{h}}^{t} is therefore

{f}_{{\stackrel{~}{h}}^{t}}\left({\stackrel{~}{h}}^{t}\right)=\frac{1}{(\mathit{\text{mn}}-1)!}\xb7{\left({\stackrel{~}{h}}^{t}\right)}^{\mathit{\text{mn}}-1}{e}^{-{\stackrel{~}{h}}^{t}}U\left({\stackrel{~}{h}}^{t}\right),

with *U*(.) being the unit step function.

Combining Proposition 4 with Equation 15 gives

\begin{array}{ll}{E}_{{\stackrel{~}{h}}^{j}}\left\{{I}_{j}\right\}& =\underset{0}{\overset{\infty}{\int}}\frac{1}{(\mathit{\text{mn}}-1)!}\xb7{\left({\stackrel{~}{h}}^{j}\right)}^{\mathit{\text{mn}}-1}\\ \phantom{\rule{1em}{0ex}}\xb7exp\left[-{\stackrel{~}{h}}^{j}\left(1+\frac{{G}_{j}}{{\sigma}^{2}}\right)\right]d{\stackrel{~}{h}}^{t}\\ =\frac{1}{{\left(1+\frac{{G}_{j}}{{\sigma}^{2}}\right)}^{\mathit{\text{mn}}}}.\end{array}

(18)

From Equations 17 and 18, it follows that

{R}_{0}=\underset{2}{log}M-\frac{1}{N}\underset{2}{log}\left[\frac{1}{{M}^{N}}\sum _{Y}\sum _{Z}\prod _{j=1}^{N}\frac{1}{{\left(1+\frac{{G}_{j}}{{\sigma}^{2}}\right)}^{\mathit{\text{mn}}}}\right],

(19)

where *G*_{
j
} is defined in Equation 16. Note that the actual cutoff rate of a scheme which uses an orthogonal design with parameters *p*, *k* is multiplied by a factor of *k*/*p*, since *p* time slots are used to transmit *k* symbols; the actual cutoff rate is therefore

\begin{array}{c}{R}_{0}(p,k)\\ \phantom{\rule{-23.0pt}{0ex}}=\frac{k}{p}\left\{\underset{2}{log}M\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}\frac{1}{N}\underset{2}{log}\phantom{\rule{0.3em}{0ex}}\left[\phantom{\rule{0.3em}{0ex}}\frac{1}{{M}^{N}}\phantom{\rule{0.3em}{0ex}}\sum _{Y}\phantom{\rule{0.3em}{0ex}}\sum _{Z}\prod _{j=1}^{N}\frac{1}{{\left(1+\frac{{G}_{j}}{{\sigma}^{2}}\right)}^{\mathit{\text{mn}}}}\phantom{\rule{0.3em}{0ex}}\right]\phantom{\rule{0.3em}{0ex}}\right\}.\end{array}

(20)

It is evident from Equation 20 that in order to maximize *R*_{0}, one has to minimize the term

F\left(\mathcal{A}\right)=\sum _{Y}\sum _{Z}\prod _{j=1}^{N}\frac{1}{{\left(1+\frac{{G}_{j}}{{\sigma}^{2}}\right)}^{\mathit{\text{mn}}}}.

(21)

This, in turn, is achieved by identifying ‘good’ unitary matrices using the relation (16), between *G*_{
j
} and , such that Equation 21 is minimized. This is the subject of the next subsections.

### 4.2 Using gradient descent algorithm

Unfortunately, matrices that minimize Equation 21, and hence optimize the cutoff rate, do not admit a closed form solution. In this subsection, we propose a gradient descent-based algorithm for finding a matrix, , that minimizes Equation 21. is typically required to obey the following constraint

\sum _{i,j=1}^{n}|{a}_{\mathit{\text{ij}}}{|}^{2}=n,

(22)

so that the total average transmitted power remains constant. First, denote the complex derivative of the real function F\left(\mathcal{A}\right) with respect to *a*_{
kl
} (the term in row *k* and column *l* of ) as

\frac{\u25b3F\left(\mathcal{A}\right)}{\u25b3{a}_{\mathit{\text{kl}}}}=\frac{\mathrm{\partial F}\left(\mathcal{A}\right)}{\partial \Re \left({a}_{\mathit{\text{kl}}}\right)}+i\frac{\mathrm{\partial F}\left(\mathcal{A}\right)}{\mathrm{\partial \Im}\left({a}_{\mathit{\text{kl}}}\right)}.

(23)

A single iteration of gradient descent for minimizing Equation 21 is given by

{a}_{\mathit{\text{kl}}}^{(i+1)}={a}_{\mathit{\text{kl}}}^{\left(i\right)}-\delta {\left(\right)close="|">\xb7\frac{\u25b3F\left(\mathcal{A}\right)}{\u25b3{a}_{\mathit{\text{kl}}}}}_{}\n \n A\n =\n \n \n A\n \n \n (\n i\n )\n \n \n \n

(24)

i.e., each iteration attempts to update in a direction that makes F\left(\mathcal{A}\right) smaller. *δ* is a positive constant which determines the step size in each iteration. The derivative (23) can be calculated from Equation 21 as:

\begin{array}{c}\frac{\u25b3F\left(\mathcal{A}\right)}{\u25b3{a}_{\mathit{\text{kl}}}}=\sum _{Y}\sum _{Z}\prod _{j=1}^{N}\frac{1}{{\left(1+\frac{{G}_{j}}{{\sigma}^{2}}\right)}^{\mathit{\text{mn}}}}\xb7(-\mathit{\text{mn}})\\ \xb7\frac{1}{2{\sigma}^{2}\left(1+\frac{{G}_{k}}{{\sigma}^{2}}\right)}\xb7\left[{({Y}_{l}-{Z}_{l})}^{\ast}\xb7\sum _{i=1}^{N}{a}_{\mathit{\text{ki}}}({Y}_{i}-{Z}_{i})\right],\end{array}

(25)

where each iteration of Equation 24 is followed by normalization of the matrix according to Equation 22; this normalization is less acute for convergence of the algorithm when the step size *δ* gets smaller.

### 4.3 Construction of DRT matrices using elementary unitary matrices

An alternative approach for constructing DRT matrices is briefly described herein.

A unitary matrix with determinant equal to one can be constructed as the product of elementary unitary matrices with determinant equal to one [13]. An elementary unitary matrix is of the form

{T}_{\mathit{\text{ij}}}=\left[\begin{array}{ccccccc}1& 0& 0& \dots & \dots & \dots & 0\\ 0& 1& 0& \dots & \dots & \dots & 0\\ \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ 0& \dots & {e}^{j{\phi}_{\mathit{\text{ij}}}}cos\left({\varphi}_{\mathit{\text{ij}}}\right)& \dots & -{e}^{j{\theta}_{\mathit{\text{ij}}}}sin\left({\varphi}_{\mathit{\text{ij}}}\right)& \dots & 0\\ \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ 0& \dots & {e}^{-j{\theta}_{\mathit{\text{ij}}}}sin\left({\varphi}_{\mathit{\text{ij}}}\right)& \dots & {e}^{-j{\phi}_{\mathit{\text{ij}}}}cos\left({\varphi}_{\mathit{\text{ij}}}\right)& \dots & 0\\ 0& \dots & \dots & \dots & 0& 1& 0\\ 0& \dots & \dots & \dots & 0& 0& 1\end{array}\right].

(26)

The elementary unitary matrix *T*_{
ij
} has 3 degrees of freedom, and differs from the unit matrix in only four elements, located at the intersection of exactly two rows *i* and *j* with two columns *i* and *j* where *i*<*j*. We construct unitary matrices with determinant one as a product of \frac{1}{2}N(N-1) elementary *N*-dimensional unitary matrices

\mathcal{A}=\prod _{i=1}^{N}\prod _{j=i+1}^{N}{T}_{\mathit{\text{ij}}}.

(27)

### 4.4 Transform optimization - results

The diversity transform can be optimized with respect to the cutoff rate using either one of the aforementioned methods. It follows from Equation 20 that the optimum diversity matrix, denoted {\mathcal{A}}_{M,N}(m,n), is SNR dependent, with *n* and *m* being the number of transmit and receive antennas, respectively, and where *M*= 2, 4, 8, and 16 correspond to BPSK, QPSK, 8-PSK, and 16-QAM constellations, respectively. Good matrices have been obtained by employing numerical optimization based on either the gradient descent algorithm in the form of Equation 25, or by manipulating the three degrees of freedom of each constituent matrix *T*_{i,j} of Equation 27. The main advantage of the former approach over the latter is its simplicity, particularly for large transform orders *N*. The main disadvantage of the former approach lies in its dependency on initial value for the matrix and step size *δ*. When implemented correctly, both methods provide very similar results in terms of maximizing the cutoff rate. The cutoff rates thus obtained are plotted in Figure 2 as a function of SNR, for some *M*, *N*, *m*, and *n* values. For all the cases shown, we assumed one receive antenna (i.e., *m*=1), and the maximum achievable rate is 1 bit/sec/Hz, where for *n*=2, we used the orthogonal code *G*_{2} and BPSK modulation, and for *n*=3, we used the orthogonal code *G*_{3} with QPSK modulation since the rate of *G*_{3} is one half. Nice improvement in cutoff rate is observed with a transformation order of *N*=3 compared to the uncoded scheme (corresponding to *N*=1). An example of an optimal matrix is given below, calculated using gradient descent algorithm as described in Subsection 4.2. It was derived for a scenario with *n*=3 transmit antennas, transformation order of *N*=3, rate 1 bit/sec/Hz and SNR of 4 dB. The parameters of the gradient descent algorithm were chosen as follows : initial step size, used in Equation 22, is *δ*=0.03 ; the step size is decreased by a factor of 0.97 with each iteration. The initial value for the matrix is a matrix of equal entries (normalized by a scalar to satisfy the condition (22)). After 100 iterations, the following matrix was derived:

\begin{array}{l}{\mathcal{A}}_{4,3}(1,3)\\ \phantom{\rule{-37.0pt}{0ex}}=\left[\phantom{\rule{0.3em}{0ex}}\begin{array}{ccc}0.4943+0.2753i& 0.1235-0.5696i& 0.5631+0.1511i\\ 0.0976-0.5724i& 0.3038+0.4693i& 0.5218+0.2797i\\ 0.4785+0.3373i& 0.3685+0.4604i& -0.0484-0.5543i\end{array}\right]\phantom{\rule{0.3em}{0ex}}.\end{array}

Employing this matrix in deriving {\left\{{G}_{j}\right\}}_{j=1}^{3} via Equation 16 followed by substitution into Equation 20, a cutoff rate of 0.965 bit/sec/Hz is achieved, compared to an uncoded scheme (where is taken to be the identity matrix of size 3) whose cutoff rate is 0.92 bit/sec/Hz.