EURASIP Journal on Wireless Communications and Networking Multiuser Cooperative Diversity for Wireless Networks EURASIP Journal on Wireless Communications and Networking Multiuser Cooperative Diversity for Wireless Networks

This is a special issue published in volume 2006 of " EURASIP Journal on Wireless Communications and Networking. " All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Copyright © 2006 George K. Karagiannidis et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Multihop relaying technology is a promising solution for future cellular and ad hoc wireless communications systems in order to achieve broader coverage and to mitigate wireless channels impairment without the need to use high power at the transmitter. Recently, a new concept that is being actively studied in multihop-augmented networks is multiuser cooperative diversity, where several terminals form a kind of coalition to assist each other with the transmission of their messages. In general, cooperative relaying systems have a source node multicasting a message to a number of cooperative relays , which in turn resend a processed version to the intended destination node. The destination node combines the signal received from the relays, possibly also taking into account the source's original signal. Cooperative diversity exploits two fundamentals features of wireless medium: its broadcast nature and its ability to achieve diversity through independent channels. There are three advantages from this. (1) Diversity. This occurs because different paths are likely to fade independently. The impact of this is expected to be seen in the physical layer, in the design of a receiver that can exploit this diversity. (2) Beamforming gain. The use of directed beams should improve the capacity on the individual wireless links. The gains may be particularly significant if space-time coding schemes are used. (3) Interference mitigation. A protocol that takes advantage of the wireless channel and the antennas and receivers available could achieve a substantial gain in system throughput by optimizing the processing done in the cooperative relays and in the scheduling of retrans-missions by the relays so as to minimize mutual interference and facilitate information transmission by cooperation. In response to the demand for novel ideas and results, this special issue presents a sample of current activities and up-to-date efforts in design, implementation, and performance analysis of cooperative diversity systems. A brief summary of each paper is listed as follows. In …

Multihop relaying technology is a promising solution for future cellular and ad hoc wireless communications systems in order to achieve broader coverage and to mitigate wireless channels impairment without the need to use high power at the transmitter. Recently, a new concept that is being actively studied in multihop-augmented networks is multiuser cooperative diversity, where several terminals form a kind of coalition to assist each other with the transmission of their messages. In general, cooperative relaying systems have a source node multicasting a message to a number of cooperative relays, which in turn resend a processed version to the intended destination node. The destination node combines the signal received from the relays, possibly also taking into account the source's original signal. Cooperative diversity exploits two fundamentals features of wireless medium: its broadcast nature and its ability to achieve diversity through independent channels. There are three advantages from this.
(1) Diversity. This occurs because different paths are likely to fade independently. The impact of this is expected to be seen in the physical layer, in the design of a receiver that can exploit this diversity. (2) Beamforming gain. The use of directed beams should improve the capacity on the individual wireless links. The gains may be particularly significant if space-time coding schemes are used. (3) Interference mitigation. A protocol that takes advantage of the wireless channel and the antennas and receivers available could achieve a substantial gain in system throughput by optimizing the processing done in the cooperative relays and in the scheduling of retransmissions by the relays so as to minimize mutual inter-ference and facilitate information transmission by cooperation.
In response to the demand for novel ideas and results, this special issue presents a sample of current activities and up-to-date efforts in design, implementation, and performance analysis of cooperative diversity systems. A brief summary of each paper is listed as follows.
In the first paper by Z. Yang and A. Høst-Madsen, the cooperation efficiency of the multiple-relay channel when carrier-level synchronization is not available is investigated, assuming that all nodes use a decode-forward scheme. It is shown that by using decode-forward relay signaling, the transmission is interference-free, even when all communications share one common physical medium. Furthermore, for any channel realization, there always exist a sequential path and a corresponding simple power-allocation policy, which are optimal. To illustrate the efficiency of cooperation and provide prototypes for practical implementation of relaychannel signaling, the authors propose two heuristic algorithms. Finally, the numerical results show that in the lowrate regime, the gain from cooperation is limited, while the gain is considerable in the high-rate regime.
In the second paper, by D. Wang and U. Tureli, the authors try to face the inefficiencies caused due to the existing medium-access control (MAC) schemes, when multipleinput multiple-output (MIMO) transmit/receive schemes and orthogonal frequency-division multiplexing (OFDM) are used in broadband multihop ad hoc networks. A new transceiver architecture with MIMO-OFDM and MAC scheme is proposed, named multiple-antennas receiverinitiated busy-tone medium access (MARI-BTMA), which 2 EURASIP Journal on Wireless Communications and Networking is based on receiver-initiated busy-tone medium access (RI-BTMA) and uses multiple out-of-band busy tones to avoid the collision of nodes on the same channel. With the proposed MAC scheme, multiple users can transmit simultaneously in the same neighborhood. Although the proposed MARI-BTMA shows good performance at high traffic load, to improve the performance at low traffic loads, 1-persistent MARI-BTMA is proposed so that users can choose different MAC schemes according to the statistical traffic load in the system. In the same paper, both theoretical and numerical analyses of the throughput and delay are presented, while analytical and simulation results show the improved performance of MARI-BTMA compared with RI-BTMA and carrier sensing medium access/collision avoidance (CSMA/CA).
In the third paper, by Y. Yuan et al., a cluster-based cooperative MIMO scheme is proposed to reduce the adverse impacts caused by radio irregularity and fading in multihop wireless sensor networks. This is an extension of the LEACH protocol, enabling the multihop transmissions among clusters by incorporating a cooperative MIMO scheme into hopby-hop transmissions. The proposed scheme can gain effective performance improvement through the adaptive selection of cooperative nodes and the coordination between multihop routing and cooperative MIMO transmissions. Moreover, the optimal parameters which minimize the overall energy consumption, such as the number of clusters and the number of cooperative nodes, are derived. Simulation results exhibit that the proposed scheme can effectively save energy and prolong the network lifetime.
In the fourth paper, by T. Abe et al., the MIMO relay scheme is proposed where each of the multiple-relay nodes performs QR decompositions of the backward and forward channel matrices in conjunction with phase control (QR-P-QR). A group nulling approach is used to decompose a multiple source-destination (SD) MIMO relay channel into parallel independent SD MIMO relay channels, and then apply the QR-P-QR scheme to each of the decomposed MIMO relay links. Numerical examples show that the proposed relay scheme offers higher capacity than other existing relay schemes.
In the last paper by T. A. Tsiftsis et al., the end-toend performance of dual-hop cooperative selective diversity links, equipped with nonregenerative relays and operating over nonidentical Nakagami-m fading channels, is studied. Closed-form expressions are presented for the cumulative distribution function and the probability density function of the end-to-end signal-to-noise ratio (SNR), while analytical formulae are derived for the moments and the momentgenerating function. The proposed mathematical analysis is complemented by numerical examples, including the effects on the overall performance of the SNRs unbalancing as well as the fading severity.

ACKNOWLEDGMENTS
We would like to thank all the authors of the papers submitted (either accepted or not) for considering this issue as a means of publication of their own work, the reviewers who allowed us to make our editorial decisions in a timely man-ner, and the Editor-in-Chief, Phillip Regalia, for giving us the opportunity and the support to achieve this special issue. We hope that these five papers will contribute to the literature of this very exciting research area and also motivate further research.

INTRODUCTION
A wireless ad hoc network is an infrastructureless network, in which the communications between two nodes are typically maintained by the cooperation of other nodes. The traditional multihopping operation lets each intermediate node receive information only from its immediate predecessor and then send it to its immediate successor. A more advanced operation is to use relay-channel signaling. The essential difference between the traditional multihopping and the relaychannel signaling is that in the latter, a node uses the information from all its upstream nodes instead of the information from the closest one.
The relay channel was first introduced by van der Meulen [1,2]. In a simplest case, a relay channel has only one relay to assist the transmission between the source and the destination. The relay channel can be denoted by (X 1 , X 2 , p(y 2 , y 3 | x 1 , x 2 ), Y 2 , Y 3 ), where X 1 , X 2 are the transmitter alphabets of the source and the relay, respectively, Y 2 , Y 3 are the receiver alphabets of the relay and the destination, respectively, and a collection of probability p(·, · | x 1 , x 2 ) on Y 2 , Y 3 , one for each (x 1 , x 2 ) ∈ X 1 , X 2 . Here x 1 , x 2 are the channel inputs by the source and the relay and y 2 , y 3 are the outputs of the relay and the destination, respectively. The relay channel was extensively studied in [3], where two cooperation schemes, decode-forward and compress-forward, were proposed. Inspired by a renewed interest in ad hoc networks and network information theory, much research has been done recently on relay channels and cooperative diversity [4][5][6][7][8][9][10][11][12][13][14].
We assume that every node uses a decode-forward scheme. Although the other two relaying schemes, amplifyforward and compress-forward, can achieve higher rates under certain channel realizations [4,6,7], they are difficult to scale to large networks. In an amplify-forward scheme, the relays essentially act as analog repeaters, and therefore enhance the system noise. Another challenge in using the amplify-forward scheme in large networks is the difficulty of implementing routing algorithms. Compress-forward requires complex Wyner-Ziv coding, which is difficult to be implemented in practice [15], especially when scaled to large networks. Decode-forward has its own drawback in that it requires full decoding at each relay, and therefore may cause error propagation. However, this can be compensated by strong channel coding.
The achievable rate of a one-relay channel using a decode-forward scheme is [3] R ≤ max P(x1,x2) min I X 1 ; Y 2 | X 2 , I X 1 , X 2 ; Y 3 .
The interpretation is that the relay first fully decodes the message from the inputs of the source, which results in the first term in the min{·} function, and then the destination decodes the messages from the inputs of both the source and the relay, and thus gives the second term in the min{·} 2 EURASIP Journal on Wireless Communications and Networking function. An adaptive transmission scheme will allow the source to communicate directly with the destination if the relay has a poor link to the source-one form of routing. It then gives the following achievable rate: max min I X 1 ; Y 2 | X 2 , I X 1 , X 2 ; Y 3 , For a physically degraded channel, that is, when X 1 → (X 2 , Y 2 ) → Y 3 forms a Markov chain, (1) achieves the capacity. However, for a general relay channel, the capacity is unknown even for one-relay case. Therefore, most of the research on multiple-relay channels concentrated on achievable rates and capacity bounds [5,6,8,16] or on the capacity for some special type of multiple-relay channels such as the degraded multiple-relay channel [17]. A multiple-relay channel is generally a multilevel structure, in which each level contains one or more nodes and the nodes in the same level decode a message at the same time.
The wireless communication broadcast property is referred to as "wireless multicast advantage" (WMA) or "wireless broadcast advantage" (WBA) and may be used in the routing algorithm in wireless networks to reduce power consumption and improve reliability [18,19]. If different transmitters can be synchronized at carrier level and thus are able to coordinate to use beamforming techniques, it is shown that cooperation achieves significant gain in reducing total power consumption [20]. Relay-channel signaling further exploits the broadcast transmission and multiaccess reception properties by allowing a node to accumulate the soft information of all its received signals, that is, a node's decoding may depend on multiple received signals. Although it is obvious that relay-channel signaling can further improve the performance, it is at the cost of higher complexity. One fundamental question is whether it pays off using relay-channel signaling or not. In this paper, we will investigate the cooperation efficiency in the multiple-relay-channel setting. Specifically, we consider the quasistatic Gaussian wireless multiplerelay channel.
A quasistatic channel here means that the channel realization remains unchanged during the transmission of one message and goes to another independent realization in the transmission period of the next message. One useful measure of the performance in this scenario is outage probability, which is the probability that the channel cannot support a particular communication rate under certain constraints. The quasistatic model is suitable for delay-sensitive services that have strict delay requirements. For delay-insensitive services, the source and the relay may choose to adjust their transmission rate according to the channel condition [7]. In many applications such as sensor networks, the nodes typically operate on limited-energy batteries, which are usually not rechargeable or replaceable, and thus results in severe energy constraints. A main concern is therefore optimizing energy consumption in the network.
Consider at first a simple point-to-point channel in Rayleigh fading with a channel gain h. If it is desired to transmit at a certain constant rate R, the required power is proportional to h −2 and the average power is proportional to E[h −2 ], which can be shown to be infinite. Thus, it is impossible to transmit in all channel conditions, and a threshold h 0 has to be chosen so that if h < h 0 , no transmission is done and an outage is declared. Equivalently, a threshold power P 0 can be set so that if the required power P for transmission at rate R is larger than P 0 , transmission is given up and an outage is declared. The average power consumption is an increasing function of P 0 , while the outage probability is a decreasing function of P 0 , which should therefore be chosen as a compromise between power consumption and acceptable outage probability. Notice that P 0 is not related to the physical power constraint of the transmission circuit of the terminal, although of course P 0 must be chosen less than this.
Generalizing this to networks, we consider a total power constraint, that is, at any time, the overall power consumption cannot exceed a particular amount of power P 0 . This seems the most reasonable point of view: if the total power (energy) needed in the network exceeds a certain threshold, transmission is given up. A precise statement of this is as follows. Assuming that the source-destination pair in the multiple-relay channel wants to maintain a constant communication rate R, we define an outage event for a given transmission scheme T and the channel realization H as where R T (P, H ) is the maximal rate that the transmission scheme T can achieve for the channel realization H with a total power consumption of at most P 0 . For all reasonable transmission schemes T , R T (P, H ) is a nondecreasing function of P. We define P T (R, H ) as the minimum total power required by transmission scheme T to achieve the rate R for the channel realization H . Then the outage event is equivalent to the event Therefore, we can minimize outage probability by minimizing the total power needed to achieve the target rate R for each channel realization. This problem was investigated for parallel (two-hop) relay channels in [21,22]. Here we generalize this to multihop channels where arbitrary interrelay communication is allowed. The problem then becomes more complicated, requiring finding both an optimal arrangement of nodes and a corresponding power allocation policy. Apart from the above overall power constraint, individual node power constraints may also be relevant. Firstly, the power allocation can result in uneven power consumption among the nodes. However, with channel variations, this is averaged out; furthermore, if all nodes at sometime or other act as source-destination pairs, the power consumption can be expected to be fairly distributed. Secondly, the power allocation could result in a solution where an individual node power consumption is above what the node is physically capable of. However, taking this into account would just complicate the solution without giving further insight.
The rest of the paper is organized as follows. In Section 2, we give the model for the Gaussian multiple-relay channel, for which we will find an optimum arrangement of nodes, shown to be a sequential path, and its corresponding optimal power allocation policy is given in Section 3. To investigate the performance of the relay-channel signaling and to provide some prototype algorithms for practical implementation of relay-channel signaling, we provide two heuristic algorithms for the cooperative relay-channel signaling problem in Section 4. In Section 5, we extend our discussion to the case when nodes have only limited signal processing capability. The numerical results are provided in Section 6 and a brief summarization is given in Section 7.

CHANNEL MODEL
In this paper, we consider a quasistatic multiple-relay channel with N nodes, numbered from 1 to N. Without loss of generality, we assume that 1 and N is the source-destination pair and that the other nodes act as relays. We assume that all nodes operate in full-duplex mode, and thus they can receive and transmit in the same frequency band at the same time. Full-duplex communication is generally regarded as difficult to achieve in practice, but there are techniques that make it possible [23].
Another important assumption is on synchronization among nodes. There are three levels of synchronization: frame, symbol, and carrier. We assume that the receivers are completely synchronized at all levels. For transmitters, it is realistic to assume that frame-and symbol-level synchronizations are available. The contentious point is on carrier-level synchronization, which requires that separate microwave oscillators at different nodes are synchronized. This seems highly unrealistic. Left by themselves, the drift of the oscillators makes synchronization impossible. It might be possible to couple oscillators, and very closely spaced nodes could even autocouple, but this requires nontrivial microwave innovation, and in general this seems quite improbable especially for sensor networks with simple nodes. We will therefore assume that there is no carrier synchronization. The link between any pair of nodes (i, j) can be parameterized by a complex channel gain h i j , which is assumed to be symmetric, that is, h i j = h ji . The channel gains h i j are independent random variables as a result of the random movement of nodes and (or) fading. They are assumed to be fixed during one-message transmission period and go to another independent realization in the next-message transmission period.
The source wants to send a message w to the destination during the duration of each channel realization be the channel input of node i at time k and let Y j (k), j ∈ {2, . . . , N}, be the channel output of node j at time k, we have where Z i (k) ∼ CN (0, 1) are i.i.d. unit power white Gaussian noises for all i, k. We assume that full channel state information is available noncausally to all nodes. While this may not be realistic in fast-changing channels, it is possible if the channel is not varying too quickly. Furthermore, this gives a bound on performance as for the case when less knowledge is available.

ACHIEVABLE RATES OF THE GAUSSIAN MULTIPLE-RELAY CHANNEL
In [5], Gupta and Kumar demonstrated an achievable region for a multiple-relay channel, and later Xie and Kumar [16] established an explicit formula for the achievable rate, which, in general, exceeds the rate in [5]. Here we restate the theorem in [16] as follows.
Theorem 1 (see [16,Theorem 3.4]). For a discrete memoryless multiple-relay channel with source node 1, destination node N, and the other nodes arranged into L−1 levels with each level k consisting of a set of nodes Γ k , k = 1, . . . , L − 1, the following rate is achievable: where boldface characters denote vectors for inputs of the nodes in each group. Here Γ 0 := {1} and Γ L : For an asynchronous Gaussian multiple-relay channel, we have the following corollary of Theorem 1.

Proposition 1.
Assume that node j uses transmission power P j . For an asynchronous Gaussian multiple-relay channel with L − 1 levels of relay nodes, the following rate is achievable: Proof. The message w is first split into B blocks w 1 , . . . , w B of nR bits each. Each node i generates a codebook with 2 nR i.i.d. n-sequences with i.i.d. Gaussian components and index them as x i (w j ), w j ∈ {1, . . . , 2 nR }. The whole transmission is performed in B + L − 1 time slots, and thus the overall rate is R · B/(B + L − 1) bits per channel use. By making B large, we can get the rate arbitrarily close to R. In each of the first B time slots, the source node 1 transmits the codeword x 1 (w i ) for each w i , i ∈ {1, . . . , B}, and in the remaining time slots, it transmits constant signals x 1 (1). A node i in level k, 1 ≤ k ≤ L, starts the decoding of w 1 at the end of kth time slot and sends out x i (w 1 ) in time slot k + 1. It continues the same decoding and encoding procedure in each time slot thereafter until it has decoded and sent out all the messages. It transmits some constant signals x i (1) in the remaining slots. To illustrate the encoding scheme, we give an example of a relay channel of 5 nodes, in which (1, 5) is the source-destination pair. Nodes 2 and 3 are assigned to level 1 and node 4 is in level 2. The message w is split into 6 message blocks. The encoding scheme is shown in Figure 1. The relays and the destination decode each w i , i ∈ {1, . . . , B}, using similar sliding-window decoding technique Block 1 x 1 (w 1 ) x 2 (w 1 ) Block 7 x 1 (1) x 2 (w 6 ) x 3 (w 6 ) x 4 (w 5 ) x 4 (w 6 ) Figure 1: Encoding scheme. [6,24]. A node i in level l can decode w 1 at the end of lth time slot using a window of the first l received blocks. After decoding the first message, the window is shifted by one and the part due to the transmission of the first message is subtracted from the received signals in the new window and then the second message is decoded. It continues until all messages are decoded. For each message, node i is actually receiving information from l independent parallel channels [25]. Thus for node i to successfully decode the message, we have where P j is the power assigned to node j. Since each node except for the source needs to fully decode each message block, we have Here for simplification of notation, we assume that Remark 1. Note that we do not introduce any correlation between the inputs of the nodes as it will not produce any gain if no carrier-level synchronization between transmitters is available [6,7].

Remark 2.
To achieve the rate in (7), all X i 's are Gaussian distributed and mutually independent.
Remark 3. As can be seen from (9), the interference from other nodes is effectively cancelled out after a node subtracts from its received signals the part contributed from the messages it knows. From (9), we obtain an equivalent form of (7 for all l.

The optimal multilevel structure and power allocation policy
As we have shown, in order to minimize the outage probability in a quasistatic channel, we need to find a multilevel structure S and a corresponding power allocation policy T(S) such that the total power to achieve the rate requirement R is minimized. Assuming that a multilevel structure S has L + 1 levels, we denote the nodes in each level 0 ≤ l ≤ L by Γ l and the size of Γ l by |Γ l |. We have Γ 0 = {1} and Γ L = {N }. Denote the level of a node i as (i). Note that S may not include all the nodes, that is, some nodes may be chosen not to participate in the transmission. Denote the power assigned to node i ∈ S by a power allocation policy T(S) for S as P i (T, S). We then need to solve the following optimization problem: Since a Gaussian multiple-relay channel in general is not a degraded channel as the one studied in [17], it does not have a natural arrangement of nodes that is optimal. However, it does have some special properties for an optimal multilevel structure S and its corresponding optimal power allocation policy T(S) as stated in the next two theorems. Theorem 2. For any channel realization H and rate requirement R, the overall power allocation is minimized by a sequential-path multilevel structure S, that is, one with |Γ l | = 1 for all l.
Proof. We need to show the existence of a sequential path P that is optimal. For any channel realization H , there always exist a multilevel structure S and a corresponding power allocation policy T(S) that are optimal. Assuming that S has L + 1 levels, we prove by induction that it can always be converted to an equivalent path P without increasing total power consumption by properly removing some nodes in S and adjusting transmission power of the remaining nodes.
First, for level L, Γ L = {N }, thus |Γ L | = 1. Suppose that for decoding orders l ≥ T + 1 (T < L), we have |Γ l | = 1. We will then show that we can always make |Γ T | = 1 without violating the constraints. For convenience of presentation, we denote the only node in Γ l , l ≥ T + 1, by ζ l .
If |Γ T | = 1, we are done. Otherwise, assume that |Γ T | = M (M ≥ 2) and Γ T = {t m : 1 ≤ m ≤ M}. Without risk of confusion, we simplify the notation of P i (T, S) to P i and we have P i > 0, for all i ∈ S. Z. Yang and A. Høst-Madsen 5 We consider two cases.
We perform the following recursive power updating procedure.
(1) Fix the transmission power of all nodes that reside in level T or higher except for t 1 and t 2 . Adjust the transmission power P t1 of t 1 to P new t1 = P t1 + δ, where δ is a small value.
(2) Adjust the transmission power P t2 of t 2 such that the left-hand side of the constraint (12) for ζ T+1 remains unchanged. Therefore, we have P new (3) Adjust the transmission power of ζ T+1 such that the left-hand side of constraint (12) is kept the same for ζ T+2 to get (4) Recursively update the transmission power of node i, i = ζ T+2 , . . . , ζ L−1 , such that the left-hand side of the constraint (12) is kept the same for the node right behind it.
This recursive updating procedure guarantees that the constraint (12) is still satisfied at all relay nodes and at the destination. Since we vary the transmission power of only one node at each step, the total amount of power change is proportional to δ. Denote the total transmission power for the multilevel structure S and the corresponding power allocation policy T(S) as ξ(S, T(S)), that is, ξ(S, T(S)) = i∈S P i (T, S). Then ξ S, T new (S) = ξ S, T(S) + f (S)δ, (15) where T new is the new power allocation policy after the power updating procedure and f (S) is a constant that does not depend on δ but only on the multilevel structure S if |δ| is small enough. Obviously, δ is allowed to be either positive or negative, that is, we can either increase or decrease the transmission power of t 1 . Thus, if f (S) = 0, we can always choose the sign of δ such that the total amount of power change f (S)δ < 0, and hence This contradicts the fact that the original multilevel structure and power allocation policy pair (S, T(S)) is optimal. Therefore we must have f (S) = 0, and thus (S, T new (S)) is also optimal. In this case, we can repeatedly perform the same updating procedure by decreasing the transmission power of t 1 (or t 2 ) and increasing the transmission power of t 2 (or t 1 ) until either the transmission power of node i, P new T+1 , . . . , ζ L−1 }, then node i can be removed from the relaying structure and we can continue the updating procedure above. If P ti = 0, i = 1, 2, it means that we can remove t i from the structure S.
If there still exist two or more nodes with decoding order T, we can always take out two of them and repeat the same procedure above to remove one node each time until only one node is kept.
In this case, there is only one node t 1 in level T that has finite-length link to node ζ T+1 . This case is actually essentially the same as in Case 1. Pick a node t 2 in Γ T , t 2 = t 1 , and a node ζ T+i ∈ {ζ T+2 , . . . , ζ L } such that (ζ T+i ) < (k), for all k ∈ {ζ T+2 , . . . , ζ L }, k = ζ T+i , and d t2ζT+i > 0. Then we can perform the same recursive power updating algorithm as in Case 1. The only difference is that node ζ T+i−1 takes the place of t 1 in Case 1. Thus we can always reduce the power of t 2 to 0 and thus remove it from the multilevel relaying structure.
Combining our discussions of Cases 1 and 2, we can conclude that we are always able to keep only one node at decoding order T without increasing the total power consumption.
By induction l, for all 1 ≤ l ≤ L, we may have |Γ l | = 1 and this establishes the proof.
Note that the new relaying path P does not necessary have the same number of levels as S.
The implication of Theorem 2 is that we can restrict our search to sequential paths without loss of optimality. In doing so, we greatly reduce the search space. The following theorem shows how power is optimally allocated given a sequential relaying path. Theorem 3. For a sequential relaying path P, the optimal power allocation policy T(P) can be implemented by a recursive power-filling procedure, that is, along path P, starting from the source, each node i adjusts its transmission power such that the constraint (12) is satisfied with equality sign at its immediate successor j, ( j) = (i) + 1.
Proof. Let the relaying path be P = (ζ 0 , ζ 1 , . . . , ζ L ), where ζ 0 = 1, ζ L = N. Initially we set the power of all nodes to 0. Since node ζ 1 only receives information from the source ζ 0 , we must let the source transmit at a power level such that constraint (12) is exactly satisfied at node ζ 1 . Now the message is known to ζ 0 and ζ 1 and only they are eligible to transmit. With the objective to save transmission power, at any time we always let the node whose transmission is most efficient (results in less total transmission power) increase its transmission power. Now the transmission of node ζ 1 will be more efficient. Otherwise, if the transmission of ζ 0 is more efficient, it will increase its transmission power until a node other than node ζ 1 satisfies constraint (12). That node can then decode in the same decoding order as node ζ 1 and it contradicts the fact that there is only one node in each level. Thus the source has to stop increasing its transmission power as long as node ζ 1 satisfies (12). Node ζ 1 then adjusts its power level such that ζ 2 satisfies constraint (12) with equality sign. This procedure proceeds until the destination meets condition (12) exactly.
Here we do not need to know how to exactly determine the efficiency of the transmission of a particular node. What 6 EURASIP Journal on Wireless Communications and Networking we only need to know is that it depends on the structure of the relaying path and the state of the relaying path, that is, whether constraint (12) is satisfied at the nodes in the path or not. Therefore, before the state of the relaying path changes, the transmission efficiency of any node that has satisfied (12) remains unchanged. Theorem 3 implies that every node except for the destination transmits with certain level of positive power and every node except for the source receives exactly enough information from its upstream nodes.

Example
Now we give a simple example to illustrate the benefit of cooperative relay signaling. Figure 2 shows a multiple-relaychannel network with 4 nodes in which (1,4) is the sourcedestination pair. The label attached to the link (i, j) is the value d i j as defined before. All 4 possible sequential relaying paths and their corresponding total power consumption are presented in Table 1. The path 1 → 3 → 2 → 4 is not an eligible relaying path as by the power allocation policy, node 2 cannot decode after node 3. The total power consumption is calculated using the recursive power-filling procedure. For example, for the path 1 → 2 → 4, in order to make node 2 able to decode, we have P 1 = 10. To make node 4 able to decode, we have P 1 /42 + P 2 /30 = 1, and thus P 2 ≈ 22.86. The overall power consumption is then P 1 + P 2 = 32.86. A traditional multihop operation that uses the shortest path algorithms will find 1 → 2 → 4 as the optimal path with overall power consumption 40. However, the transmission from node 1 to node 2 will give rise to interference to the communications between node 2 and node 4. Therefore, the actual power consumption will be larger than 40. From Table 1, it is interesting to see that the best relaying path 1 → 2 → 3 → 4 is the worst one from the point of view of traditional multihopping algorithms.

HEURISTIC ALGORITHMS
From Theorems 2 and 3, we have shown that for any channel realization H , there exist an optimal relaying path P and a corresponding simple power allocation policy T(P). Thus limiting our search to sequential paths can greatly reduce the search space for optimal solutions. There have been some elegant shortest path algorithms to find a shortest path in a network [26]. However, the Bellman principle used in these traditional shortest path algorithms is not satisfied here. For example, consider a relay network with 4 nodes V = {1, 2, 3, 4} and costs d 21 = 3, d 32 = 4, d 31 = 6, d 41 = 7, d 42 = 12, d 43 = 0.1. We may verify that the optimal relaying path is 1 → 3 → 4. By the Bellman principle, the optimal cooperative relaying path from 1 to 3 should be the direct link from 1 to 3, which requires a total power consumption of 34. However, from 1 to 3 we can find that the path 1 → 2 → 3 actually requires a smaller total power consumption of (10 + (34 − 10)/34 × 25) ≈ 27.65. This shows that the Bellman principle does not apply to the cooperative routing problem.
Another difference between the optimal relaying path problem in this paper and the traditional shortest path  problem is that in the former we have to use a node-based metric instead of a link-based metric since we want to minimize the total power consumption of all nodes. Therefore, we cannot expect using standard shortest path algorithms to find an optimal relaying path. An exhaustive search algorithm that searches through all multilevel structures has a complexity of O((N − 2) (N−2) ). Theorem 2 reduces this complexity to O((N − 1)!). We may improve on this using the property of an optimal relaying path in Theorem 3 to remove many unqualified candidates. As implied in Theorem 3, when selecting the node for a particular level, it is not necessary to consider those nodes that have already satisfied condition (12). Otherwise, they will receive more information than necessary. This reduces the worst case complexity to 2 N−2 candidate paths, which makes it possible to find the optimum solution for small networks (i.e., less than 20 nodes). Still, for larger networks, the complexity is too high. We therefore consider heuristic algorithms for finding relaying paths and the corresponding power allocation policies for general multiple-relay channels. The algorithms provide achievable rates which might not be optimal for the given coding scheme, but simulation results show that one of the heuristic algorithms is essentially equal to the optimum solutions for small networks where the optimal solution can be found. Furthermore, the heuristic algorithms provide prototype algorithms for practical (central) implementation of relay-channel signaling.
The following heuristic algorithms are based on Theorems 2 and 3. From Theorem 2, although it is still difficult to find an optimal path, we may try to search for a path that is close to optimum. We then enforce the optimal power allocation policy in Theorem 3 on the path selected.

Heuristic algorithm 1: CTNCR
A traditional noncooperative multihopping algorithm finds a shortest path assuming no interference from upstream nodes and, in general, it generates a suboptimal path. However, it might be a starting point for finding a good relaysignaling cooperative path. In this heuristic, we first find a shortest noncooperative path using standard Dijkstra's algorithm based on the link-based metric and then use the power allocation policy in Theorem 3 to determine the overall power consumption and possibly remove some nodes from the path. The algorithm works as follows.
Step 1 (initialization). Find a noncooperative path P using Dijkstra's algorithm. Set the transmission power of all nodes in P to 0. Set the source as the active node, which is the only one that can adjust transmission power.
Step 2. Among the active nodes' downstream nodes that have not satisfied (12), find node K such that it requires the least transmission power of the active node to decode the message (satisfying condition (12)). Remove the nodes between the active node and K from P. Set K as the active node.
Step 3 (stop criterion). Stop if K is the destination and the new P is the final path with the transmission power of nodes as determined in Step 2; otherwise go to Step 2.
The computational complexity of Dijkstra's algorithm is O(N 2 ) [26]. In Step 2, we note that there are |P| − 1 iterations and the number of operations in each iteration is proportional to |P|. Therefore in the worst case, the computation in Step 2 is O(|P| 2 ). Thus the computation of CTNCR is O(N 2 + |P| 2 ). Since |P| ≤ N, in the worst case, the computation of CTNCR is O(N 2 ).

Heuristic algorithm 2: SNER
This heuristic algorithm is essentially a greedy algorithm similar to the Prim-Dijkstra spanning-tree algorithm but it stops whenever the destination is included in the tree. The algorithm works as follows.
Step 4 (initialization). Form a set of nodes Ξ d , which is called the decoded set, with only the source node included and a nondecoded set Ξ n = V − Ξ d , where V is the set of all nodes.
Step 5. For each node K ∈ Ξ n , find a node T in Ξ d as its predecessor that requires the least total power consumption for K to satisfy (12) using the recursive power-filling procedure. Record the path and the corresponding overall power allocation for K to satisfy (12). Among all K ∈ Ξ n , find the node that requires the least overall power, denote it by K min . Add K min to Ξ d and remove it from Ξ n .
Step 6 (stop criterion). If K min is the destination, stop; otherwise, go to Step 5.
To estimate the computation required by SNER algorithm, we note that in the worst case there are N − 1 iterations. In each iteration, for each node K ∈ Ξ n , we need to do N − |Ξ n | comparisons. Hence in each iteration, the computation is |Ξ n |(N − |Ξ n |). In the worst case, the computation of SNER is N−1 i=1 i(N − i) = N 3 /6 − N 2 + N/6. Thus the computation complexity of SNER is O(N 3 ). However, since we only need |P| − 1 iterations, the actual computation of SNER is

COMPLEXITY-CONSTRAINED NETWORKS
In our previous discussion, every node is assumed to be able to store and process all related received signals to decode a message. In some applications, the relays may have only limited memory and signal processing capability, and thus cannot combine all these signals, especially if the path is long. On the other hand, the signals received from remote upstream nodes bring insignificant information or interference to the decoding of the message and it may not pay off to include these signals in the decoding of the message. Therefore we may treat them as pure noise with possibly only a slight increase of the overall power consumption. We hence consider a variation of decode-forward relaying path problem by adding a constraint that the relays and the destination decode each message only based on the most current F received signals. The encoding scheme is the same as in Section 2. The difference lies in the decoding of the relays and the destination in that the sizes of their decoding windows are at most F. Note that the relays in level i, i ≤ F, can use all the related received signals. We still assume that a node can subtract all interferences from downstream nodes. Since a node has already decoded the message downstream nodes are transmitting, it also knows precisely what signal downstream nodes are transmitting. This interference subtraction is much less complex than the joint decoding required to handle the signal transmitted by upstream nodes, so the algorithm is complexity constrained. However, in practice, the complexity could be reduced more by only subtracting the signal from the first few nodes downstream.
Again using the parallel channels argument [25], for node i with decoding order l ≥ 1 in a path P to decode a message at rate R, we have where x = max(0, x) and again for notation simplification, we assume that −1 k=0 P k |h ik | 2 = 0. Notice that there is no interference from downstream nodes in (17) in accordance with the assumption of interference subtraction for downstream nodes.
To reduce the complexity of signal processing at the relays and the destination, it is always desirable to keep F small. On the other hand, to be more power efficient, it is desirable 8 EURASIP Journal on Wireless Communications and Networking to choose a larger F. Therefore, there is a tradeoff in properly selecting the value of F. Again, as in the unlimited signal processing case, any optimal multilevel relaying structure can be converted to a relaying path without increasing power consumption.
Theorem 4. For any channel realization H , rate requirement R, and signal processing length F, there always exist a sequential path P and a corresponding power allocation policy T(P) that minimize overall power consumption.
Proof. The proof is essentially the same as for Theorem 2. The only difference is that f (S) is changed to f (S, T(S)), that is, it also depends on the original power allocation policy.
Similarly, the optimal power allocation policy T(P) for any limited data processing path P is still the recursive powerfilling procedure as before.
Theorem 5. For a sequential relaying path P with limited signal processing capability, the optimal power allocation policy can be implemented by a recursive power-filling procedure as stated in Theorem 3.
The proof is similar to the proof of Theorem 3. The two heuristic algorithms CTNCR and SNER can be easily adapted to the limited signal processing capability case. Here we only consider the variation of SNER algorithm and we denote the SNER algorithm with signal processing length L as SNERvL.

NUMERICAL RESULTS
In this section, we illustrate the performance of the relaychannel signaling by simulation. Since our results only depend on the amplitude of channel gains h i j , we consider only the theoretical model of |h i j |, the model of which we use in our simulations is where i j is the distance between i and j, n is the path loss exponent, and α i j is a constant or a random variable. We consider two cases.
(1) α i j = 1, for all i, j. In this case, a signal is attenuated only by path loss. The randomness of the channel realization comes from the random movement of the nodes. (2) α i j is a unit-variance Rayleigh distributed random variable. A signal is then attenuated not only by path loss but also by small scale fading characterized by the parameters α i j . All α i j 's are assumed to be mutually independent.
A typical value of the path loss exponent n is between 2 and 5.
In our simulations, we consider the cases when n = 2, a low attenuation regime; and n = 4, a high attenuation regime.
To simulate the random movement of nodes, for each channel realization we randomly place all the nodes, in our simulations 20 or 50 nodes, in a 100 × 100 grid and randomly pick two of them as the source-destination pair. For n = 2, we consider a desired rate of either R = 0.5 or R = 1; and for n = 4 a desired rate of either R = 0.5 or R = 2. The results are based on 100 000 simulation runs for each case.
The noncooperative multihopping routes are found by the Bellman-Ford algorithm using the link-based metric. As in Section 5, we assume that nodes can subtract interference from all downstream nodes. Traditional multihopping systems most likely do not have this ability, and the curves for performance of noncooperative multihopping should therefore be seen as a lower bound for the performance of practical multihopping. Multihopping is therefore identical to SNERv1, except that SNERv1 uses an interference sensitive routing. The optimal solution for network size 20 is found by exhaustive search over all paths according to Theorems 2 and 3. The simulation results are presented in Figures 3, 4, 5, 6, 7, and 8, which show the outage performance of various algorithms under different total power constraints. The first that can be noticed is that in all the 20 node cases, the heuristic optimization algorithm SNER gives a performance which is essentially identical to the optimal performance, while the less complex CTNCR has a performance slightly worse. We do not present the optimal solutions for network size 50 due to the overwhelming computational task, but based on the results for network size 50 we can expect SNER to be representative also of the optimal solution.
The second remarkable result is the qualitative difference between the low-rate case (R = 0.5) and the high-rate case (R = 1 or R = 2). In the low-rate case, the gain from cooperation is limited-at most 5 dB 1 for n = 2 and network size 50, and for the high-attenuation case n = 4, no gain at all. On the other hand, for high rate, the gain from cooperation is very large, up to 18 dB in Figure 5. Recall that the noncooperation curve is actually a lower bound for practical multihopping, so the gain could very well be even larger. This indicates that a main advantage of cooperation is interference avoidance, as interference increases with rate for traditional multihopping, while relay-channel signaling completely avoids interference. The results for n = 4 confirm the results in [16,27,28] that multihopping is a reasonable choice, but only in the highattenuation/low-rate regime.
The results for SNERvL show that it is not necessary to use the full relay-channel signaling to get significant gains. In all cases considered, SNERv4 gets very close to the optimal relay-channel signaling, so that it would be enough to decode the transmission of the 4 "nearest" neighbors upstream.

CONCLUSIONS
In this paper, we show that the optimal operation of an asynchronous Gaussian multiple-relay channel with decodeforward signaling is given by a path with a corresponding simple power allocation policy. This reduces the complexity 1 All dB gains discussed are for outage probability 10 Outage probability Outage probability of finding the optimal solution, although the complexity is still exponential. We therefore propose heuristic polynomialtime algorithms for path finding, and numerical results show that these heuristic algorithms give solutions very close to the optimal solution.
Our numerical results show that in the low-attenuation regime, both with and without Rayleigh fading, cooperation through relay-channel signaling shows significant gains over traditional noncooperative operation. The gains increase as Outage probability Noncooperative CTNCR SNER SNERvL Outage probability Noncooperative CTNCR SNER SNERvL Figure 6: Outage probability versus total power consumption for path loss exponent 2 and network size 50 with path loss and Rayleigh fading.
the rate increases because of the interference explosion for a noncooperative algorithm. In the high-attenuation regime, however, for low rate, more traditional multihopping operation that uses single-signal-based decoding can be a quite reasonable choice as cooperation brings little gain. For high rate, the cooperative algorithms still show significant gain because of the poor performance of the traditional multihopping algorithm, which, however, may be greatly improved by carefully choosing paths to try to avoid heavy interference. The heuristic algorithms developed here for calculating rate can be used as a starting point for developing practical routing algorithms for relay channels. In challenge, however, is the assumption of full network information at each node. This requirement can be mitigated considering further simplification to the proposed heuristic algorithm. For example, we may consider further simplification of SNERvL by finding the path using some rough channel state information, for example, the positions of nodes, and cancelling only the interference from the transmissions of the most immediate L downstream nodes. In this case, a node only needs to know the positions of other nodes and the perfect channel gains between itself and its 2L closest nodes in the path selected. The heuristic algorithms can also be adapted to distributed (distance-vector or link-state-based) versions.
Another basic assumption is that nodes use full duplex. It will be interesting to extend the results to half-duplex case, which, however, is not trivial as it involves an additional complicated scheduling problem of time slots or frequency bands. Another interesting problem that we may consider in future work is the optimization problem when nodes have individual power constraints in addition to a global power constraint.

ACKNOWLEDGMENT
This work was supported in part by NSF Grant CCR03-29908. [13] A. Bletsas

INTRODUCTION
In the recent years, multihop relaying ad hoc networking has attracted a lot of interest for its flexibility to achieve broad coverage without any infrastructure. Many new techniques have been adopted in ad hoc networks to improve the performance in the physical layer, that is, multiple-input multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM). MIMO systems take advantage of the spatial diversity obtained by spatially separated antennas in a multipath scattering environment. Several different ways can be used to obtain either a diversity gain to combat signal fading or a capacity gain, that is, spacetime coding (STBC) [1], vertical Bell laboratories layered space-time (V-BLAST) [2], and singular value decomposition (SVD) diversity [3]. Thus, MIMO techniques have shown a great potential to improve the capacity of the system in the physical layer [4,5]. On the other hand, OFDM has become a popular technique for transmission of signals over broadband wireless channels since it provides a very high spectral efficiency, combats multipath fading, and can be simply implemented by fast Fourier transform (FFT) with a low receiver complexity. OFDM has been adopted in several wireless standards such as IEEE 802.11a wireless local area network (WLAN) standard [6] and IEEE 802.16a [7]. For high data-rate transmission, the multipath characteristic of the environment results in a frequency selective MIMO channel. OFDM can transform such a frequencyselective MIMO channel into a set of parallel frequencyflat MIMO channels, and therefore decrease receiver complexity. The combination of MIMO and OFDM can provide higher data rate, reduce the fading of a single link with space-time-frequency codes, mitigate interference by using extra spatial degrees of freedom, and allow simultaneous communication with different nodes using combinations of spatial multiplexing and interference cancellation [7][8][9][10].
In this paper, we propose a new transceiver architecture with the capabilities of signal separation and interference cancellation of MIMO-OFDM, in a virtual MIMO scheme combined with OFDM and space-time coding at the transmit nodes. At the receive nodes, multiple antennas are used to separate the independent data flows from different transmit nodes.

EURASIP Journal on Wireless Communications and Networking
This new transceiver architecture allows multiple independent data flows to be transmitted on the same channel simultaneously, which provides the system the capability of multipacket reception (MPR). MPR presents new challenges for medium access control (MAC) in wireless networks since classical MAC schemes are designed to allow only one user in a neighborhood [6,[11][12][13][14]. Currently, most research works on MAC schemes with MPR focus on central controlled systems, for example, [15]. However, multihop ad hoc networks lack the aid of central controllers. In the literature, there are several MAC schemes proposed for distributed ad hoc networks. Stream controlled multiple access (SCMA) proposed in [16] can optimize the selection of the streams at the transmit nodes. However, SCMA requires a lot of information exchange between the different nodes. In [17], mitigating interference using multiple antennas MAC (MIMA-MAC) is proposed to mitigate the interference from the neighboring nodes. In the simulation analysis of [17], fairness and throughput are shown improved over the traditional carrier sensing medium access/collision avoidance (CSMA/CA). However, MIMA-MAC inherits the exposed node problem and hidden node problem associated with CSMA/CA [13,18,19]. In [19], Tobagi and Kleinrock proposed a busytone multiple access (BTMA) to alleviate the hidden problem in a network with a base station. When a base station senses the transmission of a terminal, it broadcasts a busytone signal to all terminals, keeping them (except the current transmitter) from accessing the channel. Based on BTMA, Wu and Li proposed the receiver-initiated busy-tone multiple access scheme (RI-BTMA) in [11] for ad hoc networks. The total spectrum resource is divided into two subbands. One is used to transmit busy-tone signals while the other is used to transmit data. The busy tone is used to acknowledge the channel access request and to prevent transmissions from other nodes. It solves the hidden node problem and the exposed node problem. In this paper, based on RI-BTMA, we propose a new MAC protocol multiple-antennas receiver-initiated busy-tone multiple access (MARI-BTMA). MARI-BTMA utilizes multiple busy tones to notify the other nodes of the number of transmissions currently ongoing in the system. To improve the performance at the low traffic load, we also propose 1-persistent MARI-BTMA in this paper. An adaptive scheme is introduced based on the traffic load. Using OFDM in the transceiver architecture, subbands of OFDM signals can be used to transmit busy tones. Therefore, the overhead of busy tones is proportionally small. Performance analysis and simulation results show much better performance than RI-BTMA and CSMA/CA. The paper is organized as follows. In Section 2, the new transceiver architecture is given. The proposed MAC scheme MARI-BTMA is presented in Section 3. Throughput and delay analysis of MARI-BTMA is given in Section 4. In Section 5, simulation results are given. Conclusions are drawn in Section 6.

TRANSCEIVER ARCHITECTURE WITH MIMO-OFDM
In this section, the new transceiver architecture with MIMO-OFDM is presented. Suppose there are six nodes in a network shown in Figure 1. Node 3 is the relay node of nodes 1 and 2.
The destination of nodes 1 and 2 is node 6. Node 4 has data to send to node 5. In the physical layer, MIMO and OFDM is used to separate signals and cancel interference. Thus, nodes 1 and 2 can transmit to node 3 at the same time by signal separation. The transmission from node 3 to node 6 and the transmission from node 4 to node 5 can also be done simultaneously by interference cancellation. The transceiver architecture in the physical layer with MIMO-OFDM is shown in Figure 2. Suppose each node has n a antennas. A single spacetime (or space-frequency) encoder is employed on these n a antennas. The space-time encoder takes a single stream of binary input data and transforms it into n a parallel streams of baseband constellation symbols. Each stream is broken into OFDM blocks. Each OFDM block of constellation symbols is transformed using an inverse fast Fourier transform (IFFT) and transmitted by the antenna for its corresponding stream. Thus, all n a transmit antennas simultaneously transmit the transformed symbols. At receive nodes, the received signals at each antenna are similarly broken into blocks and processed using an FFT. Then, an interference cancellation scheme is implemented by a space-time processor. The interference cancellation scheme attempts to separate the received signal due to one of the space-time encoders from the received signal due to the other space-time encoder. After this cancellation, maximum-likelihood sequence estimation (MLSE) decoding is employed, followed by successive interference cancellation. The detailed algorithm can be found in [8,9]. All these algorithms need perfect synchronization. To recover the data flow from independent nodes, the number of transmit nodes must be less than or equal to the number of receive antennas at the receiver nodes. In this paper, we assume that there are two antennas at each node without loss of generality to more than two antennas.

MARI-BTMA
In the previous section, multiples nodes were allowed to transmit at the same time thanks to the advanced transceiver architecture in the physical layer. In this section, our proposed MAC protocol-MARI-BTMA-is presented. MARI-BTMA is designed based on the conventional RI-BTMA [11].
In RI-BTMA, the available frequency is divided into two D. Wang and U.  parts: control channel and data channel. Busy-tone signals are transmitted on the control channel while data is transmitted on the data channel. When a node has data to transmit, it will first sense the busy-tone channel. If the busy-tone channel is free, a packet of preamble containing the identification of the destination nodes will be sent. Once the preamble is received correctly by the intended receiver, the receiver sets up an out-of-band busy tone and waits for the data packet. The transmitter, upon sensing the busy tone, sends the data packet to the destination. It can be seen that RI-BTMA is designed to accept one user in a neighborhood. To access more than one user, we design a multiple-busy-tone scheme-MARI-BTMA. In MARI-BTMA, the total spectrum resource is divided into several control channels and one data channel. The number of control channels is equal to the number of busy-tone signals and the number of nodes transmitted simultaneously in the system. Since we assume there are two antennas in each node and two independent data flows to separate, two control channels are used in the following. Packet transmissions occur in a frame fashion. The structure of MARI-BTMA is shown in Figure 3. One MARI-BTMA frame is divided into two subframes. One sub frame (contention period) is used to transmit preambles to access the system and the other (contention-free period) is used to transmit data. In the contention period, similar to 802.11 MAC protocol and MIMA MAC in [17], a back off scheme is used to avoid the collision of the preambles sent by more than two nodes in a highly loaded system. The contention period is divided into minislots. The length of each minislot depends on the transmission time and detection delay of the busy tones. The larger size of contention period will reduce the probability of collision of the preambles, while the overhead will be higher. Therefore, the optimal length of contention period should achieve a balance. In the following, we give a detailed description of MARI-BTMA. Since the throughput is not stable when the traffic load is very low shown later in the throughput analysis, we also propose 1-persistent MARI-BTMA and an adaptive MARI-BTMA.

Basic MARI-BTMA
In the basic MARI-BTMA, only the nodes with data to transmit at the beginning of a frame contend to access the system. A node generating data in the middle of a frame has to wait a random interval till it is scheduled to transmit at the beginning of a frame. Then all the nodes with data to transmit at the beginning of a frame select one minislot in the contention period uniformly and sense the control channels.
(i) If a node senses two busy tones, it will not transmit a preamble in this frame and wait a random interval till it is scheduled to transmit at the beginning of a frame. (ii) If a node senses one busy tone, it will transmit a preamble. If the preamble is successfully received by the intended receiver, the receiver will set up a busy tone on another free control channel.

EURASIP Journal on Wireless Communications and Networking
(iii) If a node senses no busy tone, it will send a preamble and the receiver will set up a busy tone on either of the control channels once it receives the preamble correctly.
Only the nodes receiving the busy tone from their destination nodes can transmit data in the contention-free period. By sensing the number of busy tones, all the destination nodes get to know how many independent data flows to recover.

1-persistent MARI-BTMA
It can be seen that when the traffic load is low, the basic protocol does not work well since all the packets generated during a frame have been ignored. According to [14], when the traffic load is very low, 1-persistent CSMA can improve the throughput of the system greatly. Similar to 1-persistent CSMA, we propose 1-persistent MARI-BTMA in this subsection. In 1-persistent MARI-BTMA, instead of waiting random intervals till the beginning of a frame, all the nodes generating data in the middle of a frame will contend to access the system at the beginning of the next frame. Then all the nodes with data to transmit at the beginning of a frame will get a back off minislot in the contention period and sense the busy-tone channels which is the same with the basic MARI-BTMA.

Adaptive MARI-BTMA
From the previous two subsections, we know that 1-persistent MARI-BTMA is suitable to the low traffic load, while basic MARI-BTMA is appropriate to the high traffic load. Therefore, in this subsection, we present an adaptive MARI-BTMA to combine the performance of basic MARI-BTMA and 1-persistent MARI-BTMA. In adaptive MARI-BTMA, when the traffic load is low, 1-persistent MARI-BTMA is used while when the traffic load is high, basic MARI-BTMA is adopted. The switch point between 1-persistent scheme and basic scheme depends on the frame length of MARI-BTMA and the traffic load.

PERFORMANCE ANALYSIS
In this section, the throughput of MARI-BTMA is analyzed using the method developed by Tobagi and Kleinrock in their study of CSMA and BTMA [18,19]. The network model consists of a large number of terminals communicating with each other over a single channel. All nodes are within the range of each other. We make the following assumptions for MARI-BTMA protocol and the analysis.
(i) There are N nodes in the system. (ii) Each node has two antennas. If there are more than two nodes transmitting simultaneously, the receiver cannot recover the original signals correctly. Correspondingly in RI-BTMA, only one node can be accessed in the system, that is, there is no capture effect on the channel. (iii) Packet collisions are the only source of packet errors.
(iv) The busy-tone signals and the data signals have the same transmission range. (v) The interference between the busy-tone signals and the data signals is negligible. (vi) The bandwidth consumption of the busy tones is negligible compared to the bandwidth of the data channel. (vii) The number of minislots in a contention period is m 1 and the number of minislots in contention-free period is m 2 which is the packet length. Therefore, the frame length is m 1 + m 2 . (viii) The arrival of the packets of each node, including newly generated packets and rescheduled packets, constitutes a Bernoulli process with probability p per minislot at each node. Here, the packet will be rescheduled which means that it waits a random interval and tries again. (ix) The preamble can be successfully received only if there is exactly one preamble transmitted in that minislot.

Throughput of basic MARI-BTMA
Suppose there are currently M nodes with packets to transmit at the beginning of a frame. These M nodes will first randomly select a minislot in the contention period. Let E i denote the event that there is only one node choosing the ith minislot, that is, there is no collision in the ith minislot. We call this minislot then "collision-free" minislot. Then the probability of at least one collision-free minislot in all the m 1 minislots in the contention period is is the probability that a specific set of minislots {i 1 , i 2 , . . . , i n } is collision-free, that is, each of these n minislots has only one node selecting them. There are Therefore, When there are more than two collision-free minislots, only the nodes in the first two collision-free minislots can send preambles since there are at most two busy tones in the system. Let P 2 (M) be the probability of one successful preamble transmitted in the contention period. Let P 3 (M) be the probability of two successful preambles transmitted in the contention period. Therefore P 2 (M) is the probability of only one collision-free minislot in the contention period. P 3 (M) is the probability of at least two collision-free minislots in the contention period. The probability that only one minislots is collision-free is equal to the probability that none of the remaining m 1 −1 minislot is collision-free. First, we fix attention on a particular collision-free minislot and a particular node selecting that minislot. 1 − P 1 (m 1 − 1, M − 1) is the probability that none of the m 1 − 1 minislots is collision-free. Then there are (m 1 − 1) M−1 (1 − P 1 (m 1 − 1, M − 1)) ways that only this minislot and this node are collision free. Therefore, Thus the throughput obtained given M nodes with packets to transmit at the beginning of a frame is Let X denote the number of packet generated and rescheduled at the beginning of a frame. X is a binomial random variable with parameter N and p. Thus Therefore, the average throughput of the system is In the following analysis, we set m 2 = 100 and N = 100. Figure 4 gives the throughput of basic MARI-BTMA with different contention periods. The throughput of RI-BTMA is also given as a benchmark. In [11], Wu and Li give the calculation of the throughput of RI-BTMA: η = (1 + E(length of data portion))/(E(X) + E(length of data portion)) and E(X) = 1/P s . P s is the probability of exactly one arrival in the system in a slot. In [11], the preamble is counted as the useful information. However, in this paper, we treat the preamble as overhead, thus the throughput of RI-BTMA is calculated as where q = N p(1 − p) N−1 . Different contention periods correspond to different performance as shown in Figure 4. With an appropriately designed contention period, basic MARI-BTMA is shown to have a much higher throughput than RI-BTMA when the traffic load is high. If the contention period is very short, for example, m 1 = 2, the probability that no successful preamble can be transmitted in the contention period will be very high. Thus, the probability that there will be no data transmitted in the contention-free period will be high and the throughput will be reduced. However, if the contention period is very long, for example, m 1 = 50, the probability of successful preamble transmitted is high, but the overhead is too high so that the throughput is still low. From Figure 4, we can also see that the higher peak throughput of MARI-BTMA is associated with the unstable situation of the system when the traffic load is very low or very high. One possible solution to this unstable situation is to use 1-persistent MARI-BTMA when the traffic load is very low and a better back off scheme when the traffic load is very high.

Throughput of 1-persistent MARI-BTMA
Let Y denote the number of packets contending at the beginning of a frame. In 1-persistent MARI-BTMA, Y is a 6 EURASIP Journal on Wireless Communications and Networking binomial variable with parameter N(m 1 + m 2 ) and p: Thus, the average throughput is where S(M) is obtained from (5). The throughput of 1persistent MARI-BTMA is shown in Figure 5.
From it, we can see that 1-persistent MARI-BTMA improves the throughput greatly when the traffic load is very low. However, when the traffic load is high, it goes down very quickly.

Throughput of adaptive MARI-BTMA
From the previous Sections 4.1 and 4.2, we know that 1persistent MARI-BTMA works very well at the low traffic load while basic MARI-BTMA works very well at the high traffic load. In this subsection, we investigate the performance of the system using adaptive MARI-BTMA. We set m 1 = 32, m 2 = 100, and N = 100. From Figures 4 and 5, we know the cross point of basic MARI-BTMA and 1-persistent MARI-BTMA is p = 0.01. Therefore, we select p = 0.01 as the switch point between 1-persistent scheme and basic scheme, that is, when the statistic packet generation probability is less than 0.01, 1-persistent scheme is used. However, if that probability is larger than 0.01, basic MARI-BTMA is used. Figure 6 shows the performance of adaptive MARI-BTMA.

Saturation throughput analysis
In this subsection, the saturation throughput of basic MARI-BTMA is analyzed. The saturation throughput is obtained when all the N nodes in the system always have data to transmit. In Section 4.1, if p = 1, it is in the saturation situation. Thus, the saturation throughput (ST) is In Figure 7, both the theoretical result from (11)  the saturation throughput of 802.11 given by [13], which is around 0.9 with the same windows length (32) and the number of nodes (50), the saturation throughput of MARI-BTMA is much higher, around 1.5.

Delay performance of MARI-BTMA
In this subsection, the delay performance of basic MARI-BTMA is given. Similar to the delay performance of slotted Aloha given in [14], we can get the average delay where p is the transmission probability, N is the number of nodes in the system, and S is the throughput from (7). The average delay obtained from (12) is the average number of frames back logged. If we calculate in minislots, the average delay is Similarly, the average delay of basic RI-BTMA is where S is calculated from (8). The relation between delay and number of nodes is given in Figure 8. It can be seen from Figure 8 that delay increases with the increasing of the number of nodes in the system and the delay of MARI-BTMA is much less than that of RI-BTMA.

SIMULATION RESULTS
In this section, two scenarios are expressed to compare the performance of the new MAC scheme with the traditional CSMA/CA and RI-BTMA. In the first scenario, the effect of physical layer to the throughput is considered. For the second scenario, we only consider the effect of MAC to the throughput, that is, the collision of more than two packets is the only

Simulation scenario 1
In this subsection, simulation results are given both in the physical layer and MAC layer. We use the same simulation scenario in [17] shown in Figure 9. Node A has constant data packets to transmit to node B. Node C has constant data packets to transmit to node D. The distances between A and B, C, and D are fixed, while the distance between B and C can be changed. For the distances in the simulation, the relative distances are used.

Physical layer performance
In this paper, to simplify the simulation, we use Alamouti space-time coding at the transmitter. At the receiver, we use the signal separation algorithm described in [9]. Improved space-time or space-frequency coding and space-time processors can be used in this scheme directly. Figure 10 gives the performance of packet error rate (PER) versus signal-tonoise-ratio (SNR). In the simulation, all the antennas at the transmitter nodes have the same power. A 4-tap frequency selective channel model is used with the variance equal to the path loss (1/d 4 ). The channel information is assumed to be known at the receiver side.

MAC layer performance evaluation
In this subsection, the normalized throughputs of CSMA/CA and MARI-BTMA are given. The input signal-to-noise ratio 8 EURASIP Journal on Wireless Communications and Networking In the simulation, m 1 = 10 and m 2 = 100. For CSMA/CA, we ignore the overhead of RTS/CTS. From Table 1, we can see that when node B and node C are close to each other, only one node is accessed with CSMA/CA. However, with MARI-BTMA, both nodes can access to the system. When node B and node C are far away, for example, larger than 1.5, CSMA/CA and MARI-BTMA both guarantee the access of these two nodes. However, for the reason of the fixed structure, the overhead of MARI-BTMA is slightly higher than CSMA/CA. It is interesting to point out that since PER is in the level of 10 −2 , it is access control probability which dominates the throughput.

Simulation scenario 2
In the above subsection, we can see that MARI-BTMA works well in the simple scenario. In this subsection, we will see that MARI-BTMA also works well in a scalable system. In the simulation scenario given in Figure 11, Node B to node A, node C to node A, node D to node E, node H to node E, and node G to node F have constant data flows to transmit. All the nodes have the same distance r with each other. The carrier sensing range is r. The lengths of both contention period are 32 minislots.
Simulation results are given in Table 2. It can be seen from Table 2 that MARI-BTMA can get better performance than MIMA in [17] and much higher throughput than RI-BTMA and CSMA/CA. The problem with MIMA is that it has hidden node problems and exposed node problems associated with CSMA/CA which cause MIMA to have high overhead. Therefore, even though MIMA can access two  users simultaneously, its throughput is still less than RI-BTMA.

CONCLUSION
In this paper, we propose a new transceiver architecture with MIMO-OFDM in the physical layer and MARI-BTMA in the MAC layer. MARI-BTMA uses multiple out of band signals busy tones to notify the number of users in the system so that to avoid the collision of the nodes on the same channel. In MARI-BTMA, the packet slot is divided into two subframes: contention subframe and contention-free subframe. The contention sub frame is used to access the nodes, while the contention-free sub frame is used to transmit data for the successfully accessed nodes. Two MARI-BTMA protocols are proposed in this paper. One is basic MARI-BTMA which is suitable to moderate traffic load. The other is 1persistent MARI-BTMA which is used in the system with low traffic load. By combining basic MARI-BTMA and 1persistent MARI-BTMA, an adaptive MARI-BTMA is proposed. The throughput analysis of basic MARI-BTMA, 1persistent MARI-BTMA, and adaptive MARI-BTMA as well as the delay performance of the basic MARI-BTMA are given in this paper. From both the theoretical analysis and simulation results, the performance of MARI-BTMA is shown to be much better than that of CSMA/CA, RI-BTMA or MIMA. His research interests include signal processing with application to broadband wireless networks, estimation and detection for scalable, adaptive, and robust communications and propagation studies. He has published numerous journal and conference articles in detection and estimation for scalable, adaptive, and robust broadband wireless communications.

INTRODUCTION
Due to the limited energy and difficulty to recharge a large number of sensors, energy efficiency and maximizing network lifetime have been the most important design goals for wireless sensor networks (WSNs). However, channel fading, interference, and radio irregularity pose big challenges on the design of energy efficient communication and routing protocols in the multi-hop WSNs. As the MIMO technology has the potential to dramatically increase the channel capacity and reduce transmission energy consumption in fading channels [1], cooperative MIMO schemes have been proposed for WSNs to improve communication performance [2][3][4][5]. In those schemes, multiple individual single-antenna nodes cooperate on information transmission and/or reception for energy-efficient communications. Cui et al. [2] analyzed a cooperative MIMO scheme with Alamouti code for single-hop transmissions in WSNs. Li [3] proposed a delay and channel estimation scheme without transmission synchronization for decoding for such cooperative MIMO schemes. Li et al. [4] also proposed a STBC-encoded cooperative transmission scheme for WSNs without perfect synchronization. Jayaweera [5] considered the training overhead of such schemes.
However, in the above proposals, the multi-hop routing and distributed operations in WSNs are not taken into consideration, which limits the practical use of the cooperative MIMO schemes in WSN. In this paper we study the feasibility of a cooperative MIMO scheme in multihop WSNs. Radio irregularity of wireless communications and multi-hop routing is considered with the cooperative MIMO scheme. On the other hand, due to its ability of frequency reuse and efficiency in processing highly correlated data, clustering is efficient in the design of WSNs. Therefore, we incorporate the cooperative MIMO scheme with the LEACH protocol, which is an efficient clustering protocol due to its energy-efficient, randomized, adaptive, and selfconfiguring cluster formation. As only single-hop communications from cluster heads to the sink are considered in the original LEACH protocol, we modify the LEACH protocol to allow cluster heads to form a multi-hop backbone and incorporate the cooperative MIMO scheme into each singlehop transmission. Based on the proposed scheme, we investigate the energy consumption of each transmission/reception. Then, the overall energy consumption model is developed, and the optimal parameters of the scheme are found such as the number of clusters and the number of cooperative nodes.  The remainder of the paper is organized as follows. In Section 2 we describe the design of the proposed clusterbased cooperative MIMO scheme (multi-hop MIMO-LEACH). The overall energy consumption of the proposed scheme is analyzed in Section 3. Section 4 presents simulation results and discussions. Section 5 concludes the paper.

THE MULTI-HOP MIMO-LEACH SCHEME
In this section, we will discuss the proposed multi-hop MIMO-LEACH scheme, which is illustrated in Figure 1. First, the strategy to find appropriate cooperative nodes in the single-hop communications between cluster heads is proposed in Section 2.1. Based on the strategy, the multi-hop MIMO-LEACH scheme is presented in Section 2.2.

Strategy to choose cooperative nodes
To maximize the performance of single-hop communications between cluster heads, an appropriate strategy should be taken to choose the optimal cooperative nodes. Suppose that the current cluster head will use J cooperative nodes to transmit data to its neighboring cluster head t by the cooperative MIMO scheme. An AWGN channel with squared power path loss is assumed for intracluster communications. For the intercluster communications, we assume the transmission from each cooperative node experiences frequencynonselective and slow Rayleigh fading. Furthermore, the long distance between any two nodes in the network with respect to the wavelength gives rise to independent fading coefficients for the cooperative nodes. The rationale behind such channel assumptions is that the inter-cluster transmission distance is much larger than the intra-cluster transmission distance and the transmission environments are more complex in the inter-cluster communication.
Denote the distance between node j and its current cluster head by d j1 . Also, denote the distance and path loss for node j to communicate with t as d jt and k jt , respectively. For each single-hop transmission, the current cluster head will broadcast a data packet to the cooperative nodes. Then, the cooperative nodes will encode and transmit the transmission sequence according to the orthogonal space-time block codes (STBC) to cluster head t toward the sink node. The energy consumption for these two operations in the single-hop transmission will be modeled in the remainder of this section. Then, a novel strategy will be developed to find the optimal set of cooperative nodes to minimize the overall energy consumption. In developing the strategy, we assume BPSK is adopted as the modulation scheme and the bandwidth is B Hz.

(1) The energy consumption for the intracluster transmission
Denote by E bt (1) the energy consumption for the current cluster head to broadcast one bit to the cooperative nodes. E bt (1) can be broken down into two main components, the transmit energy consumption E btt (1) and the circuit energy consumption E btc (1).
The BER performance for BPSK is Here r is the signal-to-noise ratio(SNR), which is defined as r = P r /(2Bσ 2 N f ) [6] under the assumption of AWGN channel, where P r is the received signal power, σ 2 is the power density of the AWGN, and N f is the receiver noise figure.
In the high SNR regime, we can approximate the BER performance as P b = e −r by the Chernoff bound [6]. Hence, we obtain P r = −2BN f σ 2 ln(P b ). As the assumption of squared power path loss, E bt (1) can be modelled by where d max is the maximum distance from the cooperative nodes to the cluster head, α is the efficiency of the RF power amplifier, G 1 is the gain factor at d max = 1 m, M l is the link margin, N f is the receiver noise figure, and P ct and P cr are the circuit power consumption of the transmitter and receiver, respectively [2].
Then, (1) can be rewritten as According to the definition, H(d j ) can be measured as follows. Let the current cluster head transmit a signal with transmit power P out . Then, the power of the received signal at its cluster member, node j, is P j1 = P out /H(d j ). Therefore, H(d j ) can be measured as From (2), we can find that the energy consumption in the intra-cluster transmission, E bt (1), can be reduced by choosing the nearer cooperative nodes.

(2) The energy consumption for the intercluster transmission
To analyze the energy consumption for inter-cluster transmissions based on the cooperative scheme, denoted by E bt (2), we refine the results in [2]. In [2] an equal transmit power allocation scheme is used as the channel state information (CSI) is not available at the transmitter. If the average attenuation of the channel for each cooperative node pair can be estimated, we can use an equal signal-to-noise (SNR) policy [7] to allocate the transmit power for its effectiveness and simplicity. The average energy consumption per bit transmission by BPSK in such a scheme can be approximated by where N 0 is the single-sided noise power spectral density, P b is the desired BER performance, G t and G r are the transmitter and receiver antenna gains, respectively, also, λ is the carrier wavelength [2]. The training overhead and transmission rate are not considered in (4), which will be considered in Section 3. The average attenuation of the channel for node j can be estimated as follows. Assume the channel is symmetric, and t transmits a signal with transmit power P out , then the power of the received signal at node j, P jt can be given by where G(d jt , k jt ) = P out /P jt = ((4π) 2 d kjt jt /G t G r λ 2 )M l N f . Therefore, (4) can be reformulated as According to (6), the transmit power of node j to communicate with cluster head t can be described by (

3) The strategy to choose cooperative nodes
Based on (2) and (6), the overall energy consumption for the single-hop transmission can be written as (8) From (8), the energy consumption for the intracluster transmission E bt (1) and intercluster transmission E bt (2) should be traded off to minimize E bt . E bt can be minimized by choosing an appropriate set of cooperative nodes, which can minimize . In order to simplify the distributed strategy design, the cooperative nodes should be chosen as the nodes whose In addition, in order to balance the energy consumption, the selection criterion is defined as where E j is the remaining energy in the current round for node j. The rationale behind definition of β jt is that the node, which has a good tradeoff between E bt (1) and E bt (2) and has more remaining energy, should have a larger chance to be selected as cooperative node. Therefore, J nodes with maximum β jt will be chosen as the cooperative nodes to communicate with cluster head t.

Scheme design
In this section, we will discuss how to enable cluster heads to form a multi-hop backbone by incorporating the cooperative MIMO scheme into the LEACH protocol for each single-hop transmission. As assumed in the LEACH protocol, each node has a unique identifier (ID). The transmit power of each node can be adjusted, and the nodes are assumed to be always synchronized. Similarly, the operations of the proposed scheme are broken into rounds. Each round consists of three phases: (i) cluster formation phase, during which the clusters are organized and cooperative MIMO nodes are selected; (ii) routing phase, during which a routing table in each selected node is constructed; and (iii) transmission phase, during which data are transferred from the nodes to the cluster heads and forwarded to the sink according to the routing table.
(1) Cluster formation phase In this phase, each node will elect itself to be a cluster head with a probability p as specified in the original LEACH protocol. After the cluster heads are elected, each cluster head will broadcast an advertisement message (ADV) by transmit power P out using a nonpersistent CSMA MAC protocol. The message contains the head's ID. If a cluster head receives the advertisement message from another head t and the received signal strength (RSS) exceeds a threshold th, it will take cluster head t as a neighboring cluster head and record t's ID. As for the noncluster head, node j, it will record all the RSSs of the received advertisement messages, and choose the cluster head whose RSS is the maximum. Then, it will calculate and save H(d j ), G(d jt , k jt ), β jt , and P out jt by (3), (5), (7), and (9). Then node j will join the cluster by sending a join-request message (Join-REQ) to the chosen cluster head. This message contains the information of the node's ID, the chosen cluster head's ID, and the corresponding values of β jt . After a cluster head has received all join-request messages, it will set up a TDMA schedule and transmit this schedule to its members as in the original LEACH protocol. If the sink receives the advertisement message, it will find the cluster head with the maximum RSS, and send the sink-position (Sink-POS) message to the cluster head and mark the cluster head as the target cluster head (TCH). After the clusters are formed, each cluster head will select corresponding optimal J cooperative nodes for cooperative MIMO communications with each of its neighboring cluster heads. As stated in Section 2.1, J nodes with maximum β jt will be chosen to communicate with a neighboring cluster head t. If no such J nodes can be found for t, t will be removed from the neighbor list, since too much energy is consumed for communicating with t. After selecting the cooperative nodes, the total energy per bit transmission for communications with t, E bt , can be derived by (4). Then, E bt , the ID set of the cooperative nodes for each neighboring cluster head, will be stored. At the end of this phase, the cluster head will broadcast a cooperate-request message (COOPERATE-REQ) to each cooperative node, which contains the ID of the cluster itself, the ID of the neighboring cluster head t, the IDs of the cooperative nodes, and the index of the cooperative nodes in the cooperative nodes set for each cluster head t. Each cooperative node that receives the cooperaterequest message (COOPERATE-REQ) will store the ID of t, the index, and the transmit power P out jt and send back a cooperate-ACK message (COOPERATE-ACK) to the cluster head.

(2) Routing table construction
To construct the routing table, the basic ideas of distancevector-based routing will be used. Each cluster head will maintain a routing table, in which each entry contains destination cluster ID, next hop cluster ID, IDs of cooperative nodes, and mean energy per bit. Initially, only the neighboring cluster head will have a record in the routing table. Then each cluster head will simply inform its neighboring cluster heads of its routing table. After receiving route advertisements from neighboring cluster heads, the cluster head will update its routing table according to the route cost and advertise to its neighboring cluster heads the modified routes. After several rounds of route exchange and update, the routing table of each cluster head will be converged to the optimal one. Then, TCH will flood a target announcement message (TARGET-ANNOUNCEMENT) containing its ID to each cluster head to enable the creation of paths to the sink.

(3) Data transmission
In this phase, cluster members will transmit first their data to the cluster head by multiple frames as in the traditional LEACH protocol. In each frame, each cluster member will transmit its data during its allocated transmission slot specified by the TDMA schedule in cluster formation phase, and it will be sleep in other slots to save energy. The duration of a frame and the number of frames are the same for all clusters. Thus the duration of each slot depends on the number of members in the cluster. After a cluster head receives data frames from its cluster members, it will perform data aggregation to remove the redundancy in the data. After aggregating received data frames, the cluster head will forward the data packet to the TCH by multiple hops routing. In each single-hop communication, if there exist J-cooperative MIMO nodes, the cluster head will add a packet header to the data packet, which includes the information of source cluster ID, next-hop cluster ID, and destination cluster ID. Then the data packet is broadcasted. Once the corresponding cooperative nodes receive the data packet, they will encode the data packet by orthogonal STBC, and transmit the data as an individual antenna with transmission power P out jt in the MIMO antenna array. In the cooperative MIMO scheme, the transmission delay and channel estimation scheme proposed in [3] can be used to solve the problem of imperfect synchronization in decoding.

THE ENERGY CONSUMPTION MODEL OF THE SCHEME
In this section, we will analyze the energy consumption of the scheme. Based on the result, we will develop an optimization model to find the optimal parameters in the scheme, including the number of clusters k c , and the number of cooperative nodes J. In analysis, we make the following assumptions. (1) There are N nodes distributed uniformly in an M × M region. (2) An AWGN channel with squared power path loss is assumed for the intracluster communication. (3) A flat Rayleigh fading channel with kth-power path loss is assumed for the intercluster communication. (4) BPSK is used as the modulation scheme and the bandwidth is B Hz. (5) In each frame every node will send a packet with size s to the cluster head by probability P. The number of frames in each round is denoted by F n . (6) In maintaining the routing table in each round, each cluster head will broadcast the routing table, whose size is denoted by R ts for R bt times. (7) The energy consumption for data processing is ignored. Now, we are ready to model the overall energy consumption in each round, denoted by E(k c , J). There are four energy consuming operations in each round. (1) The cluster members transmit data to the cluster head, whose energy consumption is denoted by E s (k c ). (2) The cluster heads construct the routing tables, whose energy consumption is Yong Yuan et al.

denoted by E r (k c ). (3)
The cluster heads transmit aggregated data to the cooperative nodes in each single-hop transmission, whose energy consumption is denoted by E c0 (k c , J). (4) The cooperative nodes transmit the data to the next cluster head in each single-hop transmission; whose energy consumption is denoted by E cs (k c , J).

E s (k c )
In order to model E s (k c ), we will first analyze the energy consumption for the source nodes to transmit one bit to the cluster head, denoted by E bs (k c ).
Under the assumption of BPSK modulation and AWGN channel with squared power path loss, E bs (k c ) can be modelled in the same manner as E bt (1) in Section 2.1(1), where d tc is the distance from the node to the cluster head, G 1 is the gain factor at d tc = 1 m. In (10), we use the result in [8] that E[d 2 tc ] = M 2 /2πk c . On the other hand, when the number of clusters is k c , the average number of members for each cluster is N/k c . Hence, the total number of bits transmitted to the cluster head for each cluster by each round is S 1 (k c ) = N/k c F n Ps. Therefore, E s (k c ) = k c S 1 (k c )E bs (k c ).

E r (k c )
In this section, we will model the energy consumption in constructing the routing table, denoted by E r (k c ). When the number of clusters is k c , the radius of each cluster can be approximated as radius = M/ πk c [8]. Therefore, the distance between each pair of direct neighboring clusters can be approximated as d ctoc = 2radius = 2M/ πk c . We also assume the number of direct neighbors of each cluster is 4. Under the assumption of flat Rayleigh fading channel, E r (k c ) can be approximated by [2]

E c0 (k c , J)
In this section, we will analyze the energy consumption for the cluster head to transmit aggregated data to the cooperative nodes, denoted by E c0 (k c , J). When the cluster head broadcasts the data, J cooperative nodes will receive it. Similar to the analysis of E bs (k c ), the energy per bit for this operation, denoted by E bc0 (k c , J), can be described by We adopt the aggregation model in [9] to describe the aggregation operation. The amount of data after aggregation for each round is where agg is the aggregation factor. Therefore, E c0 (k c , J) = k c S 2 (k c )E bc0 (k c , J).

E cs (k c , J)
According to Section 2.1, J cooperative nodes of the current cluster will encode and transmit the transmission sequence according to the orthogonal STBC to the cluster head. In modelling the energy consumption of such operation, we need to consider the impacts of training overhead and transmission rate. Suppose that the block size of the STBC code is F symbols and in each block we include pJ training symbols, and the block will be transmitted in L symbols duration. F/L is called the transmission rate, denoted by R. Then, the actual amount of data to transmit the S 2 (k c ) bits is S e (k c , J) = FS 2 (k c )/R(F − pJ). Therefore, E cs (k c , J) can be described by Based on the above analysis, the overall energy consumption in each round, E(k c , J) can be described as where n k is the average number of hops. In order to simplify the analysis, we assume n k = k c , which is just the number of clusters along each edge of the sensed region. Based on (14), we can formulate the optimization model to choose the optimal k c and J as where the first constraint comes from the fact that more cooperative nodes will not improve the transmission energy efficiency but cost much circuit energy, and the rationale behind the second constraint is that the size of the cluster should not be too small to make efficient aggregation. Since the search space is not large, we can use exhaustive search method to solve (15).

SIMULATION RESULTS
In the simulations, 400 nodes are randomly deployed on a 200 × 200 field. The location of the sink is randomly chosen in each round. The system parameters are summarized in Table 1.
The meanings of the entries in Table 1 are summarized as follows. α is the efficiency of the RF power amplifier, M l is the link margin, G 1 is the gain factor at 1m, k is the path loss, σ 2 is the power density of the AWGN channel in the intracluster communication, N f is the receiver noise figure, f c is the carrier frequency, B is the bandwidth, P b is the desired BER performance, P ct and P cr are the circuit power consumption of the transmitter and receiver, respectively, F n is the number of frames per round, G t , G r are the antenna gains of the transmitter and receiver, s is the packet size, P is the transmit probability of each node, R is the transmission rate, F is the number of symbols in each block, p is the number of required training symbols for each cooperative node, R bt is the times for exchanging the routing table for each round, and R ts is the routing table size.
To simulate the phenomena of radio irregularity, the path loss of the communication between each pair of nodes is distributed randomly from 3 to 5.
Each node begins with 400 J of energy and an unlimited amount of data to send to the sink. When the nodes use up their limited energy during the course of the simulation, they can no longer transmit or receive data.
During the simulation, we tracked the overall number of packets transferred to the sink, the amount of energy and duration required to get the data to the sink, and the percentage of nodes alive. We are interested in the transmission quality and energy saving performance of the proposed scheme. The performance of the proposed multi-hop MIMO-LEACH scheme is compared with the original LEACH and the multihop LEACH scheme, in which cooperative MIMO communications is not implemented. The optimal value of k c for the original LEACH is determined by the model in [8]. We also develop a similar model to find the optimal k c for the multihop LEACH scheme, which will not be discussed here due to the limited space. In the investigated scenario, it is found that the optimal k c for the original LEACH protocol, the multihop LEACH scheme, and the proposed scheme are 3, 41, and 27, respectively. The optimal J for the proposed scheme is found to be 3.
Due to the aggregation operation, the number of effective received packets by sink [8] is a good applicationindependent indication of the transmission quality. The effective received packets refer to the "real" packets represented by the aggregated packets. If no aggregation carries out, the number of effective received packets equals to the number of actual received packets. If the aggregation operation in transmission is information lossless, the number of effective received packets is just the number of total packets transferred by the source nodes. Figures 2 and 3 show the total number of effective packets received at the sink over time and the total number of effective packets received at the sink for a given amount of energy. Figure 2 shows that during its lifetime the LEACH protocol can obtain better latency performance compared to the multi-hop LEACH scheme and the proposed MIMO LEACH scheme. The reason is that the multi-hop operation in the multi-hop LEACH scheme and the multihop MIMO-LEACH scheme will increase the latency, and thus result in a less number of data packets sent to the sink for a given period of time. However, the better latency performance of the LEACH protocol comes from the more energy consumption compared to the other two schemes. Especially, in the fading channel environment, LEACH protocol will consume much more energy due to its single-hop transmission from the cluster heads to the sink, which will result in less network lifetime and less total number of transmitted packets. Figure 3 shows that, with the same amount of energy consumption, the multi-hop MIMO-LEACH scheme can transmit much more data packets compared to the LEACH protocol and the multi-hop LEACH scheme. From these simulation results, we can find that the multi-hop MIMO-LEACH scheme is more suitable for the application scenario which has large requirements on network lifetime but little requirements on latency. Figure 4 shows the percentage of nodes alive over time. From Figure 4, we can find that the proposed multi-hop MIMO-LEACH scheme can improve the network lifetime greatly. If we define the network lifetime of WSN as the duration of more than 70% of network nodes are alive, then we can observe that the network lifetime of WSN with the original LEACH protocol, the multi-hop LEACH scheme, and the proposed multi-hop MIMO-LEACH scheme is about 0.7 × 10 4 , 8.2 × 10 4 , and 11 × 10 4 s. The improvement on network lifetime obtained by the multi-Hop MIMO-LEACH scheme is significant.
However, the percentage of nodes alive over time is not always a good indication to the energy saving performance of a protocol. For example, during the same time, one protocol transmits less packets than other protocols. Then, though the energy saving performance of the protocol is worse than other protocols, it will still consume less energy. In order to further investigate the energy saving performance, we also simulate the performance in terms of the percentage of nodes alive per amount of effective data packets received at the sink, which is shown in Figure 5.
From Figure 5, we find that the proposed multi-hop MIMO-LEACH scheme needs significantly less energy to transmit the same amount of data packets. Therefore, the Yong Yuan et al.  From the simulation results including those shown in Figures 6 and 7, we can find that the energy saving performance of the proposed scheme is impacted by the parameters. As for the number of cluster heads, too many cluster heads will reduce the distance for each single hop transmission, which will reduce the transmit energy consumption. More cluster heads will also generate a larger search space for the routing table construction, which will also reduce the transmit energy consumption further. However, more cluster heads will result in more number of hops in transmission to the sink, which will consume more circuit energy for relaying the data packets. Therefore, the number of cluster heads should be chosen by trading off the transmit energy consumption and circuit energy consumption. As for the number of cooperative nodes, a certain number of cooperative nodes can form the effective independent multipath transmission so as to energy-efficiently combat the fading effects. However, too many cooperative nodes will result in large circuit energy consumption, which will cause large overall energy consumption. Therefore, the number of cooperative nodes should also be chosen to trade off the transmit energy consumption and the circuit energy consumption.

CONCLUSION
In this paper, we proposed a cluster based cooperative MIMO scheme to reduce energy consumption and prolong the network lifetime. A cooperative MIMO scheme is adopted to mitigate the adverse impacts of fading while clustering is used to facilitate network control and coordination. In the proposed scheme, the original LEACH protocol is extended by incorporating the cooperative MIMO communications and multi-hop routing. An adaptive cooperative nodes selection strategy is also designed. Based on the scheme, we investigated the energy consumption of each operation. Then, the overall energy consumption model of the scheme is developed, and the optimal parameters of the scheme are found such as the number of clusters and the number of cooperative nodes. Simulation results exhibit that the proposed scheme minimizes energy consumption.

INTRODUCTION
A wireless network comprises a number of nodes connected by wireless channels. Using internode transmission (relaying) is an important technique to widen network coverage. Network information theory has shown that the use of multiple relay nodes in source and destination (S-D) communications increases the capacity of the S-D system logarithmically with the number of relay nodes [1]. The use of multiple antennas at each node provides additional degrees of freedom to improve further the capacity per S-D pair in the relay network. A significant capacity improvement achieved with multiple-input multiple-output (MIMO) transmission was revealed in [2][3][4][5] for a pointto-point wireless link, and in [6][7][8][9] for multiple-access and broadcast channels. The capacity bounds of the MIMO relay network have recently been derived in [10,11] where the capacity of the MIMO relay network was analyzed in terms of distributed array gain, which offers logarithmic capacity scaling, spatial multiplexing gain, and receive array gain. In [12], we proposed a MIMO relay scheme for a relay network comprising a single S-D pair and multiple relay nodes. The relay technique in [12], called QR-P-QR, performs the QR decomposition (QRD) in the backward and forward channels in conjunction with employing phase control at each relay node, and successive interference cancellation (SIC) at the destination node to detect multiple data streams. This architecture achieves both distributed array gain and receive array gain while maintaining the maximum spatial multiplexing gain, which leads to higher capacity than the existing zero-forcing (ZF) and amplify and forward (AF) relaying techniques [11].
In this paper, we consider a relay network of multiple S-D pairs and multiple relay nodes, and provide a new relaying technique. The proposed relay architecture employs (1) a group nulling (GN) technique, which is applied to the backward and forward MIMO relay channels to decompose the multiple S-D MIMO relay channel into parallel independent S-D MIMO relay channels, and (2) the QR-P-QR scheme, which is applied to each of the decomposed S-D relay links. The group nulling technique separates multiple S-D pairs via unitary transforms that project both received and transmitted signal vectors at a relay node onto the null space of the signals of nondesired S-D pairs. Thus, the group nulling technique retains a higher degree of freedom than the ZF-based stream-wise nulling in MIMO relay channels. Furthermore, the QR-P-QR scheme achieves both distributed array gain and receive array gain while maintaining the maximum spatial multiplexing gain at each of the decomposed MIMO relay links. We analyze the asymptotic capacity of the proposed relay technique and through numerical examples show that the proposed relay 2 EURASIP Journal on Wireless Communications and Networking . .

M L
1st time slot 2nd time slot technique achieves higher capacity than other existing relay schemes. The rest of this paper is organized as follows. Section 2 shows a system model and the upper bound for the capacity of the MIMO relay network. We describe the proposed and existing relay schemes in Section 3. Numerical examples are given in Section 4. Finally, Section 5 concludes this paper.
Notation E{•} and tr{•} denote the expectation and trace operation, respectively. a stands for the norm of vector a, and superscripts T, H, and * represent the transpose, the conjugate transpose, and the conjugate operation, respectively. (A) i and (A) i, j denote the ith row and (i, j)th entry of matrix A, respectively. I i is the i × i identity matrix.

MIMO RELAY NETWORK
The MIMO relay network used in this paper is illustrated in Figure 1. This paper assumes a one-hop relay network comprising L source and destination nodes, each of which has M antennas, and K relay nodes, each of which has N antennas. In addition, we assume that the relay nodes do not transmit and receive simultaneously. In other words, two time slots are required to send a message from the source to the destination as shown in Figure 1.
First, M × 1 vector s l (l = 1, . . . , L), destined for the lth destination node, is sent to all relay nodes from the lth source node without using any channel state information (CSI). The N × 1 vector received at the kth relay node is expressed as y k = L l=1 H k,l s l + n k , where H k,l (k = 1, . . . , K) is the N × M MIMO channel matrix between the lth source node and the kth relay node (backward channel), and n k refers to the N ×1 noise vector at the kth relay node with zero mean and covariance matrix E {n k n H k } = σ 2 r I N . We constrain the transmitted signal power at the source node to E{s l s H l } = (P/M)I M , where P is the total transmit power. A relay operation is performed at the kth relay node by using N × N relay matrix W k to obtain N × 1 transmitted signal vector where E k is a power coefficient resulting from total power constraint E{x k H x k } = P. This can be expressed as where N×LM matrix H k = [H k,1 , . . . , H k,L ]. Finally, the M×1 receive vector given by is obtained at the lth destination node, where G k,l and z l are the M×N channel matrix between the kth relay node and the lth destination node (forward channel), and the M × 1 noise vector added at the lth destination node with zero mean and covariance matrix E{z l z H l } = σ 2 d I M , respectively. Using the cut-set theorem [13], the upper bound for the capacity of the MIMO relay network is derived in [10] as

MIMO RELAY TECHNIQUES
In this paper, we assume that each relay node knows the CSI of its own backward and forward channels. However, we do not allow source nodes, relay nodes, and destination nodes to exchange their CSI with other nodes.

ZF relaying scheme [11]
The ZF relaying scheme computes backward and forward ZF matrices H + k and G + k that satisfy H + k H k = I LM and G k G + k = Relay matrix W k for the ZF scheme is then written asW k = G + k H + k .
Tetsushi Abe et al. 3 Note here that the ZF scheme requires that N ≥ LM. In this case, the effective signal-to-noise ratio (SNR) for the mth data stream, λ ZF l,m (m = 1, . . . , M), at the lth destination node is From (4), we find that due to the transmit and receive ZF operations the signals from K relay nodes are coherently combined at the destination node, which leads to distributed array gain [11].

GN/QR-P-QR relaying scheme
The first step of the GN/QR-P-QR scheme is to compute a pre-group nulling filter at a relay node to suppress the signal component from all source nodes except from the lth source node. To accomplish this, we define Note that the channel matrix between the lth source and kth relay node, H k,l , is removed. Next, we perform the singular value decomposition (SVD) of H (l) k as is then multiplied to y k to obtain N − M(L − 1) × 1 vector y k,l as From (6), we see that U (l) k,L removes the signal contribution from all source nodes except that from the lth source node due to the projection of the received signal vector onto the null space of nondesired source nodes. A null space-based method was also employed in [14] for the precoding in a MIMO down link transmission.
The second step of the GN/QR-P-QR scheme is the transformation of y k,l using N − M(L − 1) × N − M(L − 1) matrix Φ k,l to obtain vector y k,l = Φ k,l U (l)H k,L y k . The computation of Φ k,l will be described later in this section.
The third step is to compute the post-group nulling filter to suppress the transmitted signal to all destination nodes except that to the lth destination node. Toward this goal, we Next, we perform the SVD of G (l) k as where k,L is then multiplied to y k,l to obtain N × 1 vector y k,l = A (l) k,L Φ k,l U (l)H k,L y k . Note here that similar to the ZF scheme, the group nulling scheme also requires that N ≥ LM in order to obtain null space matrices U (l) k,L and A (l) k,L . The above three-step procedure is performed for all L source and destination pairs (l = 1, . . . , L) at the kth relay node. Finally, the N × 1 signal vector transmitted from the kth relay node is In this case, the relaying matrix is written as k,L Φ k,l U (l)H k,L , and the received signal vector at the lth destination is written from (2) as Equation (8) shows that at the lth destination node, the signal contribution from all source nodes is removed except that from the lth source node. Namely, we can establish an independent MIMO relay link between the lth source and destination nodes that is characterized by To compute the intermediate filter Φ k,l , we use the QR-P-QR scheme [12].  [12] for details). We can see that Φ k,l consists of two orthogonal matrices, Q 1k,l and Q 2k,l , obtained by the QRD in the backward and forward channels with phase control matrix D k,l in between (for this reason this scheme is called QR-P(Phase)-QR). Finally, by using the 4 EURASIP Journal on Wireless Communications and Networking computed Φ k,l , (8) is rewritten as An important note here is that E k R H 2k,l D k,l R 1k,l takes the lower triangular form with positive scalars in diagonal entries. The triangular structure provides the receive array gain by using the SIC at the destination node to detect each data stream. The positive diagonal entries achieved by the phase control matrix enable the diagonal elements transmitted from K relay nodes to be coherently combined at the destination node, which obtains the distributed array gain.
The lth destination node simply performs SIC by using the CSI of compound triangular channel K k=1 E k R H 2k,l D k,l R 1k,l to detect each of the multiple streams. The effective signal-tointerference-plus-noise ratio (SINR) for the mth data stream at the lth destination node can be expressed as Consequently, the ergodic capacity of the relay network with total L S-D pairs is

Achievable gains in the relay schemes
To evaluate the achievable gains of the GN/QR-P-QR relay technique, we investigate its asymptotic capacity when K approaches infinity. From (10) and (11), when K approaches infinity, the capacity becomes where we use the approximation log 2 (1+x) ≈ log 2 x(x 1). From (12), we see that the capacity of the GN/QR-P-QR scheme scales with (LM/2) log 2 (K) asymptotically in K. The term log 2 (K) indicates that the distributed array gain of the GN/QR-P-QR scheme is K. In addition, the prelog term LM/2 implies that the multiplexing gain is LM/2, where 1/2 represents the loss when using two time slots in each transmission. Furthermore, it was shown in [11] that the upper bound of the capacity in (3) and the capacity of the ZF scheme asymptotically scale with (LM/2) log 2 (K). Thus, we see that the GN/QR-P-QR scheme as well as the ZF scheme exhibit the optimum capacity scaling for a large K value.
The difference between the GN/QR-P-QR scheme and the ZF scheme is the available degrees of freedom remaining after interference suppression among multiple S-D pairs. The ZF scheme performs complete stream-wise nulling in both the backward and forward channels. At each channel the ZF scheme separates LM streams, which requires LM − 1 degrees of freedom. Thus, the degrees of freedom that remain after the ZF relaying are N − (LM − 1). On the other hand, since the proposed scheme performs group-wise nulling, it preserves a higher degree of freedom than the ZF scheme. To be more specific, we define the N − M(L − 1) × M decomposed forward MIMO channel for the lth S-D pair from (6) as H k,l ≡ U (l)H k,L H k,l . Assuming (H k,l ) i, j are i.i.d. complex random variables with zero mean and unit variance, ( H k,l ) i, j has the following statistical property: We can see from (6) and (13) that the group nulling transforms N × M i.i.d. matrix H k,l to an N − M(L − 1) × M i.i.d. matrix H k,l . This shows that due to the group nulling, M(L − 1) degrees of freedom are lost for the lth S-D pair, but H k,l still holds N −M(L−1) degrees of freedom. Furthermore, it is straightforward that the same discussion holds for the backward decomposed channel G k,l A (l) k,L . Thus, after the group nulling operations, the proposed scheme holds N − M(L − 1) degrees of freedom, which are higher than that of ZF by M − 1. This additional degree of freedom is converted as the receive array gain through the channel triangulation in (9) using the QR-P-QR technique and the following SIC at the destination node.

Other simple schemes
For GN-based relaying, we could simply employ an AF relay scheme instead of the QR-P-QR scheme, which gives the intermediate filter Φ k,l = I N−M(L−1) . In this case, however, we cannot obtain the distributed array gain because signals from K relay nodes are randomly combined at the destination Tetsushi Abe et al.

5
node. In addition, [10,15] describe another simple matched filter (MF) relaying scheme in which each relay node performs receive and transmit MF operations. For the MF relaying, the relay matrix is expressed as W k = G H k H H k . Unlike the ZF and the proposed schemes, this scheme does not require that N ≥ LM, and the capacity still scales logarithmically with the number of relay nodes [10].

NUMERICAL RESULTS
The ergodic capacities of the relaying schemes presented in the previous section were evaluated. We obtained the capacity plots of the upper bound, ZF, GN/QR-P-QR, GN/AF, and MF. In addition, we evaluated as a reference the capacity of QR-P-QR when all relay and destination nodes fully cooperate. To be more specific, we calculated the capacity of the QR-P-QR scheme in a network comprising a source node with LM transmit antennas, a relay node with KN antennas, and a destination node with LM antennas. In this case, the power constraints at the source and relay are LP and KP, respectively. We assumed a flat fading channel in which each component of H k and G k is an i.i.d. complex random variable with zero mean and unit variance. We set σ 2 r = σ 2 d and identical transmit power P for all source and relay nodes. We did not take into account path loss. Figure 2 shows the capacity versus the number of relay nodes K for L = 2, M = 4, and N = 8. The total transmit powerto-noise ratio (PNR = P/σ 2 r ) was set to 20 dB. The graph shows that the capacity of the GN/AF scheme is saturated when K becomes large. This is because although the separation of multiple S-D pairs is accomplished by the group nulling, the signals relayed from multiple relay nodes are randomly combined at each destination node due to the simple AF relay operation, and thus the distributed array gain is not obtained. On the other hand, we can see that the GN/QR-P-QR scheme, ZF scheme, and MF scheme exhibit logarithmic capacity scaling as does the upper bound of the capacity. This is due to the fact that signal components from multiple relay nodes are coherently combined at the destination node. Furthermore, the GN/QR-P-QR scheme offers higher capacity than the ZF scheme due to the higher degree of freedom converted to the receive array gain at the destination node as described in Section 3.3. The capacity of the MF scheme is lower than that of the others due to its inability to suppress actively the interference among S-D pairs. The capacity gap between GN/QR-P-QR and the upper bound is due to the imperfect cooperation among nodes. As mentioned in [10], the capacity upper bound in (3) can be achieved if all the relay nodes perform joint decoding and encoding. To examine this, we obtained the capacity of QR-P-QR when all the relay nodes and all destination nodes cooperate. Note that in this case, there is no need for GN. We can see that the capacity of the QR-P-QR scheme with perfect node cooperation approaches the upper bound. Furthermore, when K becomes larger the gap between the two becomes narrower. This can be briefly explained as follows. The capacity upper bound in (3) only depends on the backward channel. On the other hand, the capacity expressions of QR-P-QR in (10) with (11) show that the noise power at destination node σ 2 d becomes less significant when K becomes large. Thus, the capacity depends more on the backward channel and thus approaches closer to the upper bound. Therefore, if we allow relay nodes to perform the joint relay operation, we could approach closer to the bound. However, this requires all relay nodes and all the destination nodes to exchange their CSI. In addition, the joint relay operation requires the QRD of KN × LM matrix, which might be practically demanding in terms of complexity. Figure 3 shows capacity plots for L = 2, M = 2, and N = 4. A similar tendency is observed, but the gap between GN/QR-P-QR and ZF is decreased. This is because the number of antennas at each node is reduced by half, and thus the receive array gain obtained in the GN/QR-P-QR scheme is decreased. Figure 4 shows capacity plots for L = 4, M = 2, and N = 8. In this case, the total number of antennas in the network is the same as in the case in Figure 2, but the capacity obtained by each relay scheme is higher than that in Figure 2 except for MF. This is because the total transmit power in the network is increased due to the increased number of the S-D pairs. MF scheme is better than the other schemes in a low PNR region due to the SNR gain of the matched filtering. However, the capacity saturates in a high PNR region due to the interference among S-D pairs. Figure 7 shows the capacity curves of the GN/QR-P-QR scheme for L = 2, M = 4, and N = 8 with K = 2 and 8. Here, we measured the capacity for two cases: time-division multiplexing (TDM) and spatial-division multiplexing (SDM) for the two S-D pairs. Note that in the former case, only one S-D pair is active at any instant, and thus group nulling is not needed. Figure 7 shows that in a low PNR region, TDM provides higher capacity, but in higher PNR regions, SDM offers significantly higher capacity, which matches results of conventional studies on the trade-off between spatial multiplexing and beam-forming. Furthermore, the figure shows that when K increases, the crosspoint of SDM and TDM is shifted to lower PNR regions. This is because the effective SNR at the destination node increases as K increases. Thus, it is clear that it is more advantageous to multiplex spatially multiple S-D pairs in a situation, where the PNR is relatively high or the number of relay nodes is relatively large.  QR and the ZF schemes becomes smaller. This is because as N becomes larger, both the GN and the ZF operations retain enough degrees of freedom after the interference suppression as shown in Section 3.3.

Complexity
Finally, Table 1 shows the computational complexity of the relaying schemes. The complexities were measured as the Tetsushi Abe et al. number of required complex multiplications at each relay node. We approximated the complexity by computing only matrix inversion, multiplication, SVD, and QRD parts and evaluated only terms with the highest order (cubic) in terms of matrix size. First, we observe that the complexity of the MF scheme is much lower than that of others due to its simple operations. The ZF scheme needs only one matrix inversion for both the backward and forward channel matrices (H k and G T k ), but the matrix size N × LM is the largest. The GN/AF scheme requires SVD for every S-D pair of both equivalent  (2,4,8), the GN-based relay schemes offer lower complexity than the ZF due to the matrix size reduction. On the other hand, when the number of S-D pairs becomes larger, such as when (L, M, N) = (4, 2, 8), the ZF scheme offers lower complexity due to fewer matrix operations. Therefore, when the number of S-D pairs is small, the GN/QR-P-QR scheme achieves higher capacity with lower complexity than the ZF scheme.

CONCLUDING REMARKS
In this paper, we proposed a relay technique for a MIMO relay network with multiple S-D pairs. The group nulling technique projects the receive and transmitted signal vectors at the relay node onto the null space of the signals of nondesired S-D pairs, so the multiple S-D MIMO relay channel is decomposed into parallel independent MIMO channels. To each decomposed MIMO relay link, the QR-P-QR technique is applied. This relaying architecture preserves a higher degree of freedom in the MIMO relay channel than the ZF scheme and enables coherent combination of the signals at the destination to achieve distributed array gain. We analyzed the asymptotic capacity of the proposed relay technique and clarified its achievable gains. Numerical examples confirmed that the proposed relay scheme achieves higher capacity than other existing relay schemes. It should be mentioned,  however, that the requirement for the number of antennas, N ≥ LN, in the proposed scheme as well as in the ZF relay scheme could still be a limiting factor in some application scenarios. In addition, since the relay techniques described in this paper assume perfect CSI knowledge for both the backward and forward MIMO channels at each relay terminal, investigation of their capacity with imperfect CSI is an important future research topic.

INTRODUCTION
Relaying technology is a promising solution for the high throughput/data-rate coverage, as required in future cellular and ad hoc wireless and satellite communications systems. There are two main advantages of this relaying technology: (1) very low transmit RF power requirements, (2) use of spatial/multiuser diversity to combat fading.
Recently, the concept of cooperative diversity, where the mobile users relay signals for each other to emulate an antenna array and exploit the benefits of spatial diversity, has gained great interest [1][2][3][4][5][6][7][8][9]. More specifically, Emamian et al. [1] have studied the multiuser spatial diversity systems with channel-state-information-(CSI-) based relays, as was first proposed in [10] (and later in [6]), and have derived closed-form expressions for the outage probability over Rayleigh fading channels. CSI-based relays use the instantaneous CSI of the incoming signal to control the output gain and as a result limit the power of the retransmitted signal. In another contribution, Sendonaris et al. [3,4] have proposed the user cooperation concept and considered practical issues related to its implementation. Moreover, Anghel and Kaveh have presented tight bounds for the outage and error probability of a distributed spatial diversity wireless system in the presence of Rayleigh fading [2]. Note that the relay considered in [2] is an ideal type of CSI-based relay, where the noise figure has been ignored from the relay gain. Also, efficient lower bounds for the average error probability in dual-hop cooperative diversity systems, especially in low average signal-to-noise ratio (SNR) region, have been presented in [5]. Moreover, Laneman et al. have proposed a variety of low-complexity cooperative protocols using the three-terminal case [6]. These protocols have applied on different relaying modes as amplify-and-forward (i.e., nonregenerative relays) and decode-and-forward (i.e., regenerative relays) and also the outage probability, using high-SNR approximations, has been analyzed. The work of Laneman et al. has been extended by Nabar et al. [7], where a new cooperative protocol is presented realizing maximum degrees of broadcasting and exhibits no receive collision. Furthermore, Zimmermann et al. [8] have presented an overview of cooperating relaying protocols and compared their performance with that of direct transmission and conventional relaying. Recently, an interesting work using selection diversity among cooperative users was presented by Bletsas et al. [9]. In this work, a method of opportunistic relaying as an efficient cooperative diversity scheme has been proposed. This scheme selects the "best" relay between source and destination based on instantaneous channel measurements and neither topology information nor communication among the relays is needed. Based upon the above, our paper presents for the first time a completely analytical approach in obtaining the endto-end performance of dual-hop cooperative links over Nakagami-m fading channels. In doing so, we first present closed-form expressions for the probability density function (PDF) and cumulative distribution function (CDF) of the end-to-end SNR. Moreover, analytical expressions for the moments and moment generating function (MGF) can be easily obtained. These new results are then applied to study the end-to-end performance of dual-hop cooperative links such as outage probability, average end-to-end SNR, amount of fading (AoF), and average bit error rate (BER) when a selection combining (SC) receiver is assumed at the destination terminal. It will be shown that our general expressions for Nakagami-m fading reduce to previously published results for m = 1, that is, Rayleigh fading.
The remainder of this paper is organized as follows. Section 2 introduces the system and channel model under consideration. In Section 3, the performance analysis of the system is presented. In particular, closed-form expressions are derived for the outage probability for both types of relays. The analysis is complemented by presenting the moments and the average BER in closed form when fixed gain relay is considered. Numerical results are presented in Section 4 and concluding remarks are given in Section 5.

SYSTEM AND CHANNEL MODEL
A dual-hop relaying system operating over independent and nonidentical Nakagami-m fading channels is illustrated in Figure 1. The source terminal S communicates with the destination terminal D not only directly but also via the cooperative diversity link through terminal R, which acts as a nonregenerative gain relay. Each transmission period is divided into two signaling intervals: in the first signaling interval, terminal S communicates with the relay and the destination terminal, while in the second one, only the relay communicates with terminal D. The above transmission protocol was originally proposed in [10]. The destination terminal combines the received signals using a SC. Assuming that S is transmitting a signal with an average power normalized to unity, the instantaneous equivalent end-to-end SNR of the dual-hop path can be expressed as in [11,12] γ end = α 2 1 /N 0,1 α 2 2 /N 0,2 α 2 2 /N 0,2 + 1/g 2 N 0,1 . (1) In the above equation, α i is the fading amplitude of the ith path, i = 0, 1, 2 (α 0 is the fading amplitude of the direct path), N 0,i is the one-sided power spectral density of the additive white Gaussian noise at the input of R and D, respectively, and g is the gain of the relay. Since α i is modeled as Nakagami-m random variable (RV), the instantaneous SNR, γ i = α 2 i /N 0,i , is a gamma distributed RV with PDF given by where Γ(·) is the gamma function [13, equation (8.310/1)], m i ≥ 1/2 is a parameter describing the fading severity, and γ i the average SNR of the ith path. Hence, its CDF can be written as where Γ(·, ·) is the incomplete gamma function defined in [13, equation (8.350.2)]. When R has available CSI from the first hop and its gain aims to limit the output power of the relay, one kind of gain proposed by Laneman and Wornell [10] is given by Therefore, the instantaneous equivalent end-to-end SNR of the dual-hop path can be expressed as in [1,2,11,12] γ eq1 = γ 1 γ 2 γ 1 + γ 2 + 1 .
When R introduces fixed gain to the received signal given by [14] where C is a positive constant, the instantaneous end-to-end SNR of the dual-hop path can be expressed as in [14] γ eq2 = γ 1 γ 2 C + γ 2 .

Outage probability
When a SC receiver is employed in terminal D, the end-toend outage probability at D, P out , is defined as the probability that the equivalent output SNR, γ eq,csi = max(γ eq1 , γ 0 ),

NUMERICAL EXAMPLES AND SIMULATIONS
In this section, numerical examples will be presented, which also verify the accuracy of our mathematical analysis. Figure  2 depicts the outage probability as a function of the average SNR of the direct link, γ 0 , for both types of relays and for different outage threshold values. Comparing the performance between CSI-based relay with the equivalent, in terms of average introduced gain, fixed gain relay (i.e., g 2 2 = E[1/(α 2 1 + N 0,1 )]), it can be seen that for medium-tolarge SNR values the CSI-based relay outperforms those with fixed gain. However, for the low-SNR region, dual-hop systems employed with fixed gain relays outperform those with variable gain. This happens due to the fact that the maximum value of the CSI-based gain relays, g 1 , is 1/N 0 when α 1 → 0, which is very possible in the low average SNR regime. Considering the higher complexity nature of CSI-based relays, our results show that fixed gain relays may serve as an efficient solution for relayed transmissions. Similar observations have been made for the Rayleigh fading channel in [14]. Furthermore, it is observed that as γ th increases, the SNR range where fixed gain outperforms CSI-based relays also increases.   In Figure 3 the AoF of a dual-hop cooperative link with fixed gain relay over integer-order Nakagami-m fading parameters is depicted. It is clear from Figure 3 that the overall AoF is reduced when a relay is used. Also, it is interesting to note that AoF deteriorates for low values of m even for high amplification gains. This can be explained by the fact that for strong fading conditions the fixed gain relay does not amplify only the received signal but also the noise of the end-to-end link. Moreover, as m increases, the overall difference between AoFs of a direct link and a dual-hop cooperative transmission decreases. Finally, in Figure 4 the average BER for DPSK is showed versus γ 0 . As it was expected, an increase in fading parameter upgrades the error performance of the dual-hop cooperative link.

CONCLUSION
In this paper, the performance of dual-hop cooperative diversity systems with nonregenerative relays and a SC receiver at the destination terminal over Nakagami-m fading channels was studied. Specifically, closed-form expressions for the outage probability for both CSI-based and fixed gain relays were presented. Regarding the outage probability, the analysis showed that the fixed gain relays may serve as an efficient solution for relayed transmissions in contrast to the high complexity nature of CSI-based relays. Moreover, analytical expressions for the AoF and the average BER when