Joint iterative beamforming and power adaptation for MIMO ad hoc networks

In this paper, we present distributed cooperative and regret-matching-based learning schemes for joint transmit power and beamforming selection for multiple antenna wireless ad hoc networks operating in a multi-user interference environment. Under the total network power minimization criterion, a joint iterative approach is proposed to reduce the mutual interference at each node while ensuring a constant received signal-to-interference and noise ratio at each receiver. In cooperative and regret-matching-based power minimization algorithms, transmit beamformers are selected from a predefined codebook to minimize the total power. By selecting transmit beamformers judiciously and performing power adaptation, the cooperative algorithm is shown to converge to a pure strategy Nash equilibrium with high probability in the interference impaired network. The proposed cooperative and regret-matching-based distributed algorithms are also compared with centralized solutions through simulation results.


Introduction
Multiple-input multiple-output (MIMO) communication techniques have been shown to boost the capacity and spectral efficiency of wireless communication systems [1,2]. MIMO wireless systems can sustain more simultaneous transmissions in a reduced area through interference management [3]. When transmission parameters such as transmit power, beamformer selection, frequency, modulation, transmission rate are modified to adapt to the interference environment, MIMO systems gain an additional advantage. Adaptive wireless systems can achieve system efficiency, lower computational complexity, and overhead compared to a centralized system. Transmit beamforming has been the focus of extensive research in the literature [4][5][6][7][8][9][10][11] and designing optimum signaling at the transmitter can lead to significant improvements for systems operating in varying interference [4,6,[12][13][14][15][16]. In spatial transmit beamforming, each communicating node's symbol stream is multiplied by a preselected transmit beamforming weight vector for transmission through multiple antennas such that the overall interference due to other multiple nodes is minimized. Adaptive optimizing of transmitter beamforming improves efficiency by steering the beam toward the intended receiver, while placing nulls toward the unintended receivers in order to avoid causing excessive interference to them. Transmitters may adapt their signals using a low-rate feedback from the receiver [17]. A power control mechanism can also be combined with limited rate feedback from the receiver in order to satisfy certain Quality-of-Service (QoS) requirements at the receiver [18][19][20].
In general, MIMO beamforming techniques in communication systems are addressed in three different systems: point-to-point, cellular, and ad hoc networks. The great potential of MIMO in point-to-point communication is shown in [1,4,6,21] and linear precoders (eigencoders) and beamformers have been designed for pointto-point MIMO links in [5,7]. In cellular networks, beamforming algorithms minimize the total power and enhance capacity for array-equipped base stations and single antenna mobile transmitters [8][9][10][11]. In ad hoc networks, without a central controller, distributed beamforming techniques increase system throughput and lower energy consumption [12,[22][23][24]. However, optimization solutions designed for ad hoc networks need careful study, because the environment is interference limited and the performance of MIMO techniques depends significantly on the overhead introduced by the proposed algorithms.
Distributed spatial beamforming algorithms are proposed for multi-user ad hoc MIMO networks in [23,24] under channel reciprocity conditions. Channel reciprocity holds when the channel matrix at the receiver is the transpose of the channel matrix at the transmitter, this is usually assumed in time-division duplex (TDD) systems [24]. Bromberg [24] consider the capacity maximization problem and propose a locally enabled global optimization (LEGO) algorithm for distributed beamforming update under Gaussian other-user interference. Iltis et al. [23] formulate the problem as a non-cooperative game for overall power minimization of the network under a constant QoS constraint (i.e., target signal-tointerference plus noise ratio (SINR)). The proposed iterative minimum mean-square error (IMMSE) algorithm solves an optimization problem by computing transmit/receive beamformer pairs and transmit powers in a distributed manner [23]. In the IMMSE algorithm, the receive beamformer is the conjugate of the transmit beamformer and the algorithm relies on the channel reciprocity condition. Hence, the IMMSE algorithm does not demand explicit feedback schemes for channel state information (CSI) at the receiver. However, during the updating procedure of the IMMSE algorithm, transmission overhead of training sequences and power control commands are incurred. The amount of overhead increases with iterations, since the algorithm performs transmit/receive beamformer and power updates iteratively. Moreover, if the transmitter and receiver use different channels or frequencies for transmission and reception, i.e., when channel reciprocity is not valid, CSI must be fed back to the transmitter, which necessitates overhead.
In order to lower the communication overhead between transmitter and receiver when channel reciprocity does not hold, a scheme to limit feedback by quantizing the transmit beamformer in single user MIMO systems is proposed in [21]. The concept is based on selecting a codeword in a predetermined codebook that is known to both transmitter and receiver. Selecting the transmit beamformer from a predefined codebook reduces overhead in nonreciprocal channels. Moreover, latency is reduced in highly mobile and unstable communication networks and user participation is minimized. In this scenario, the receiver only feeds back the index of the selected transmit beamformer to the transmitter. When there is no channel reciprocity between transmitters and receivers, an iterative limited feedback beamforming algorithm using a predetermined codebook is proposed in [25]. The algorithm maximizes the transmission rate in MIMO multi-user ad hoc networks using sequential discrete transmit beamformer selection updates. In each iteration, each node formulates its best response strategy, which maximizes the received SINR. However, the convergence of the algorithm has not been investigated.
Game theory has enabled efficiency and convergence proofs of some of the important problems in wireless communications such as distributed power control algorithm design [26], joint code-division multiple access (CDMA) waveform, and power control design [19,20,27] and optimum transmission signaling strategies [28,29]. The application of game theory to distributed beamforming is problematic [23]. Lacatus and Popescu [20] and Popescu et al. [19] study joint CDMA codeword (or sequence) and power adaptation as a noncooperative game. The problem is formulated as a separable game using noncooperative convex games, with corresponding sub-games: power control and codeword control game. However, in contrast to our joint optimization problem, the joint optimization of powers and CDMA codewords is investigated only over convex games (i.e., the set of action space is nonempty, compact, and convex [26,30]), and therefore the decision variables (i.e., the powers and codeword sequences) are continuous, not discrete in these games.
Optimum transmit signaling for rate maximization in MIMO interference systems has been studied using game theory [12][13][14][15][16]. In these papers, the system is modeled as a noncooperative game where every MIMO link is a player and computes against the others by choosing the transmit covariance matrix to maximize its own rate. Liang and Dandekar [13] investigate rate maximization for MIMO ad hoc networks by performing power control. The existence of a Nash equilibrium (NE) solution is shown using concave game analysis. The convergence of the proposed algorithms is not studied. Arslan et al. [14] show that individual mutual information maximization is a concave game [31] in MIMO interference channels, which implies the existence of a NE for arbitrary channel matrices. The equilibrium is provably unique when multi-user interference (MUI) is sufficiently small. Decentralized algorithms using local information provide update strategies to determine the link parameters. As an extension of their work and for more general conditions, the uniqueness of the NE solution is provided in [15]. Scutari et al. [15] provide a unified framework for the noncooperative mutual information maximization problem for MIMO interference systems. A unified set of sufficient conditions guaranteeing the uniqueness of the NE and the convergence of asynchronous water-filling algorithm is provided for square nonsingular channel matrices. The analysis is based on interpreting the MIMO water-filling operator as a matrix projection onto the convex and closed set of covariance matrices. In [16], same authors extend their results for arbitrary channel matrices. However, these papers do not address the selection of discrete optimized signaling. The existence (or uniqueness) of the NE solution that is proven in [13,14] is valid either for convex or concave games or for positive definite covariance matrices that are well defined as a convex and closed set [15,16]. Cooperative and noncooperative algorithms for joint channel and power allocation chosen from the "discrete" strategy space are studied in [32] in the context of wireless mesh networks. However, the proposed noncooperative algorithm is suboptimal and one of the adaptation parameters (i.e., channel adaptation) is not followed after the first iteration.
Power minimization using distributed algorithms with transmit beamformer selection is challenging especially in ad hoc networks. Unlike power control games, in beamforming games there is no natural ordering of the actions [23]. In MIMO ad hoc networks operating in MUI environments, the interference at each user depends on the transmission parameters of the other users. The beamforming decision of each user reshapes the interference emitted to other links, in ways that may be difficult to predict. Changing the beamforming vector may reduce interference on some links, while other links may suffer from higher interference. The affected nodes will then change their own beamforming vectors, setting off an cascade of changes in the network. Moreover, if the node pairs belong to different regulation entities, the noncooperative node pairs may only want to minimize their own transmit power rather than the overall power.
The analysis for the selection of actions from the "discrete" codebook set and convergence analysis is still missing for joint transmit beamforming and power adaptation in the literature. To the best of authors' knowledge, the problem of joint discrete transmit beamforming and power adaptation has not been formalized in multi-user MIMO ad hoc networks. In this paper, we study a decentralized approach for optimizing the transmit beamformer and power levels using local information and reasonable computational burden. We consider total power minimization under a constant received target SINR constraint. Our contributions in this paper are twofold: First, we study an efficient cooperative beamforming algorithm for global power minimization problem with convergence analysis. For the cooperative algorithm, the amount of information to be exchanged between nodes will grow with the number of iterations. Second, we study a noncooperative regret-matching learning algorithm which jointly selects transmit beamformer and power to minimize the total power consumed by the network. The noncooperative solution reduces the amount of overhead by using only local information. We compare the performances of our proposed algorithms with the optimal global solution which is found by exhaustively searching the entire feasible strategy space.
The rest of this paper is organized as follows. Section 2 outlines the system model used in the paper. The optimization problem and its game theoretical interpretation are presented in Section 3. The cooperative wireless ad hoc network and noncooperative counterpart are investigated Sections 4 and 5, respectively. The performance evaluation of the proposed algorithms is provided in Section 6. Finally, Section 7 concludes the paper.

System model and concepts
In this paper, we consider a wireless ad hoc network consisting of multiple transmit and receive antenna node pairs as shown in Figure 1. All nodes are assumed to be using the same channel. The interference comes from the other node pairs which operate simultaneously on the same channels. In this ad hoc network model, there are N node pairs and each node pair m {1, 2, ..., N} consists of one transmitter node and one receiver node. Each transmitter and receiver node is equipped with T antennas. Each node has a unit-norm receive/ transmit beamformer pair (w m , The received signal vector r m ∈ C T at the mth receiving node is given by where H m,i denotes the T × T MIMO channel between the ith transmitting node and the mth receiving node and is assumed to be quasi-static and P m is the power of the mth transmitting node. The additive white Gaussian noise terms n m ∈ C T have identical covariance matrices s 2 I T where s 2 is the noise power and I T is the T × T identity matrix. We note that different covariance matrices for noise will not affect the performance of the proposed algorithms. Note that the first term of the right-hand side of (1) is the desired signal, whereas the second term is the interference from the other transmitting nodes.
As we are interested in the minimum achievable power, we consider the worst case where all node pairs always have some packets to transfer and all nodes in the network can transmit simultaneously. The network is assumed to be synchronous. The set of available code-book beamformers for the mth transmitting and receiving node pair is denoted by m = {t 1 m , t 2 m , . . . , t ϒ m } with cardinality ϒ. In a limited feedback beamforming system, the receiving node selects a transmit beamformer from the codebook and feeds back the index of the selected beamformer. Each node can select between ϒ transmit beamformer vectors. Let t m Î Δ m be the selected transmit beamformer for the mth transmitting and receiving node pair. Denote Θ = [t 1 , t 2 , ..., t N ] T and P = [P 1 , P 2 , ..., P N ] T as the transmit beamformer selection and transmission power vectors for N nodes, respectively. The T × T the interference plus noise covariance matrix at the mth receiving node is where Θ -m and P -m are the transmit beamformers and powers of nodes other than m.
An antenna beam pattern that adjusts the antenna gains to form nulls toward the directions of the interferers while keeping a constant gain toward the directions of the multi-path of the intended receiver can be designed using receive antenna arrays. The minimum variance distortionless response beamformer [23,33] can adjust the array weights properly such that the sum of interference and noise is minimized. The normalized receive beamformer at mth receiving node is The resulting received SINR at the mth receiving node due to desired transmitter of mth node pair is where ||w m || 2 = ||t m || 2 = 1 for all m.
The proposed distributed algorithms attempt to achieve a target SINR by adjusting transmit powers. To construct a distributed iterative limited feedback beamforming scheme, let us first consider the case when there is only one node pair in the wireless network. The receiver selects the transmit beamformer from the codebook Δ 1 as where t † 1 is the optimal transmit beamformer selection for one node pair. Then, the receiver returns the index of the beamformer for transmit beamformer selection t † 1 and the received "normalized" SINR, , through the low-rate feedback channel where R −1 1 = I since there is no interference in a single user system. The transmitter selects the transmitter beamformer in order to minimize its own transmission power P 1 , where P 1 is updated as where g 0 is the target SINR value. Consider now the case where N node pairs coexist in the wireless network. Note that for each node pair m, the value of received SINR, i.e., Γ m , is a function of (Θ, P). Therefore, the transmit power of one node pair depends not only on the transmit beamformer it selects, but also on the transmit power and beamformer selection of other nodes in the network. Furthermore, in beamforming, if user i ≠ m changes its transmit beamformer t i to increase its own SINR Γ i , it can either increase or decrease Γ m , the SINR of link m, depending on the relative positions of the nodes. Therefore, designing an optimal distributed algorithm which converges to a set of beamformers to minimize the overall transmit power while meeting target SINRs for all node pairs is not a straightforward task.

Optimization problem and game theoretical interpretation
The goal is to minimize the transmit power of all nodes m {1, 2, ..., N} under constant target SINR γ 0 . The optimization problem can be defined as, subject to Γ m ≥ g 0 , ||w m || = ||t m || = 1, P min < P m ≤ P max , ∀m {1, 2, ..., N}, where P min and P max are the minimum and maximum transmit powers, respectively. We consider the above problem as a normal form game ∏ which can be mathematically defined by the triplet  (8) where (c m , c −m ) refers to the strategy profile in which the action of user m is changed from c m to c m while the actions of all the other players in the game remain the same. In the following sections, we will discuss the scenarios where the node pairs are cooperative and noncooperative respectively in order to search for the best results and provide convergence guarantees.

Optimal (centralized) solution
In a wireless ad hoc network with a centralized agent, the transmit beamformers and the corresponding transmit powers can be jointly selected to minimize the total transmit power of all transmitting antennas as, T are the optimal transmit beamformer and power solutions, respectively. The transmit power P m of mth node pair is defined as where R m is a function of (Θ -m , P -m ) as shown in (2). A naive approach for solving the problem is to investigate all strategy profiles Θ = [t 1 , ..., t m , ...,t N ] T exhaustively (note that for a given fixed strategy profile Θ, the corresponding power profile P can be computed using (10) for each individual node pair m). In order to compute (9), the centralized agent evaluates the total network power for ϒ N possible beamforming vector combinations. For example, for a network size with 10 node pairs where each user has to select from a code-book of size ϒ = 16 beamformers, the search space is 16 10 strategy profiles. Consequently, finding the centralized transmit beamformer is cumbersome in large-scale wireless ad hoc network. To alleviate the complexity problem, while maintaining good performance results, we propose two decentralized power minimization algorithms using cooperative and noncooperative techniques.

Cooperative power minimization using beamforming
In this section, we consider scenarios where all node pairs in the wireless network are cooperating. In a cooperative game, nodes in the network are able to coordinate and select the transmit beamformers accordingly. We want to find the optimal transmit beamformer and power assignments such that the total power by all the nodes in the network is minimized. The objective function can be written as We assume that each user's utility function is (11). That is, In other words, we model the game as an identical interest game which is a special case of potential games [34]. It is easy to verify that all identical interest games have at least one pure NE, which will represent any action profile that maximizes U network (Θ, P) [14,32]. We analyze a cooperative power minimization algorithm (COPMA) which can converge to the optimal NE with arbitrarily high probability. This method is analogous to the decentralized negotiation method called adaptive play [14]. The key characteristic of COPMA is the randomness deliberately introduced into the decision-making process to avoid reaching a local solution. In COPMA, the choices of players (in our case transmit beamformer selections) lead the system to the optimal NE solution with arbitrarily high probability.
Motivated by Song et al. [32], COPMA can be implemented distributively as follows: Assume that each node pair m in the network has an unique ID m and maintains two variables P current } to the updating node pair m. Finally, the mth node pair will decide whether the new transmit beamformer should be kept or changed with some probability which depends on p current and p updated which are the total transmit power in the network prior to and after the random change of the transmit beamformer, respectively. Note that since p current and p updated are calculated by each node pair independently, the unique ID of each node provides a checklist to accurately add up transmit powers. For this paper, we assume that unique node IDs are built into each node and in network timing synchronization is perfect so that power updates are always received in the correct round. The detailed description of COPMA is provided as follows: Initialization: For each transmitting and receiving pair m, the initial index of transmit beamformers for all node pairs is selected as one and the initial transmit powers are set as P m = P max , ∀m ∈ N .
Repeat: Randomly choose a node pair m as the updating pair with probability 1/N. Denote t m (n) Δ m as the current transmit beamformer of the mth node pair at iteration n.
2. To update node pair m, randomly choose a transmit beamformer, t for the mth node pair with probability i.e., the updating node pair m selects t updated m with probability (13).
6. The mth node pair broadcasts a notifying signal that contains the decision about whether the new transmit beamformer is kept. If not kept, every other node pair i ∈ N \m keeps P yields a better performance, i.e., (P updated -P current ) < 0, the mth node pair will change to up-dated beamformer t updated m with high probability. Otherwise, it will keep the current transmit beamformer with high probability. Note also that the tradeoff between COP-MA's performance and convergence speed is controlled by the parameter. τ. Large τ represents extensive space search with slow convergence, whereas small τ represents restrained space search with fast convergence. The smoothing factor τ is selected to be a function of the number of iterations n such that as n increases, τ ↓ 0. For example, we chose τ inversely proportional to n 2 in our simulations. The long-term behavior of COPMA is characterized in the following theorem.
Theorem 1 : Assume that the objective of each node pair is defined as the sum power minimization in the network as defined in (9). Let Θ (k) = [t 1 (k), t 2 (k), ..., t N (k)] denote the profile of choices at step (or iteration) k in COPMA and † = [t † 1 , t † 2 , . . . , t † N ] the optimal profile. COPMA converges to the optimal NE with arbitrarily high probability. In other words, Proof The proof of Theorem 1 follows similar arguments as presented in [14,32,35].
Notice that the transmit beamformer selection with N players, each with ϒ codebook size, generates an Ndimensional Markovian chain on a finite state space with ϒ N states or different profiles. We study the analysis for two-player games, i.e., N = 2 dimensional case as shown in Figure 2. The analysis can be easily extended for multi-player games, i.e., for an N-dimensional Markovian chain.
Let t m Δ m and t k Δ k be the choices of two players say m and k, and without loss of generality assume that where Θ ij and Θ lp differ in exactly one transmit beamformer selection, i.e., Θ ij ≠ Θ lp for i = l or j = p, τ is the smoothing factor of COPMA and P(Θ ij ) is the minimum total network power required to reach target SINR g 0 for both users at state Θ ij calculated using (10) for each user. If Θ ij and Θ lp are different in more than one position, then ℙ τ (Θ lp | Θ ij ) = 0. In addition, ℙ τ (Θ ij | Θ ij ) >0 is always true. Therefore, for any fixed τ >0, the induced Markov chain is irreducible and aperiodic.
The stationary distribution P † τ for each state can be obtained from the following balance equations (using the arrows in Figure 2): for all ij ∈ ϒ 2 and ip ∈ ϒ 2 . Substituting (15) into (16) gives Then, the stationary distribution of the induced Markov chain at step k is obtained as for arbitrary state (k) ∈ ϒ 2 . Hence, from irreducibility and aperiodicity of the Markovian chain, we have where † ∈ ϒ 2 . The result validates that COPMA converges to the optimal state with arbitrarily high probability for two-player (N = 2) case and the analysis can easily be extended for general multi-player (N >2) cases as well. ■ With the above theorem, the transmit beamformer and power level selections are shown to reach the optimal NE solution with arbitrarily high probability.
One disadvantage of cooperative-based algorithms is that the communication overhead incurred to calculate the total network power increases with the number of iterations. In the next section, we study a noncooperative learning algorithm using local information with less computations.

Regret-matching-based joint transmit beamformer and power selection game (RMSG)
In this section, our goal is to obtain a distributed learning algorithm for joint transmit beamformer and power selection scheme in MIMO ad hoc networks that requires only local information for updates. We will use a utility function for noncooperative users. Note that the interaction among N "selfish" node pairs can be defined as noncooperative power minimization game where each node pair is attempting to find their own transmit beamformers to minimize their corresponding transmit powers. In the noncooperative joint iterative beamforming and power adaptation, the N node pairs care only about their own power minimizations exclusively, rather than accounting for the overall network power. Each player's utility function depends on the choice of the transmit beamformer and its own power, as well as on the other users' selections for transmit powers and beamformers via the perceived interference. Note that the noncooperative distributed beamforming algorithms for multi-user MIMO ad hoc networks lack the quality of "strategic complementarities" [36] that is found in power controlonly games [26]. It is thus not clear how to design an ordered set of actions for noncooperative beamforming games. Instead, we study a noncooperative learning algorithm called the regret-matching adaptive algorithm from [37], in which the players choose their actions based on their regret for not choosing particular actions in the past. The steady-state solution of the regret-matchingbased learning algorithm exhibits "no regret" and the probability of choosing a strategy is proportional to the player's "regret" for not having chosen other strategies.
In the regret-matching-based joint transmit beamformer and power selection game (RMSG), each user m computes R¯t m m for every action t m Δ m in all past steps when all other player's actions remain unchanged. Each player m updates its regret R¯t m m (k) for every set of actionst m based on the following recursion formula: At every step k >1, each user m updates its own average regret vector R¯t m m (k) for every strategy int m . In regret matching, after computing the average regret vector, R¯t m m (k) , each user m chooses an action or strategy t m (k), k >1, according to probability distribution ϕ¯t m m (k) defined as where [x] + equals x when x is positive and zero otherwise. Notice that in the regret-matching game, each user m chooses a strategy t m Δ m at any step with a probability proportional to the average regret for not choosing that strategy t m Δ m in the past steps. The detailed summary of RMSG using a Gauss-Seidel updating scheme [15] is given in Table 1 where is the predefined number of iterations.
Every finite strategy game has a mixed strategy Nash equilibrium [30]. Using a good learning algorithm, any finite game can be shown to converge to a mixed strategy Nash equilibrium. Regret-matching-based selection is distributed and requires limited information exchange between the users if the utility function is properly selected. The time-averaged behavior of regret-matching game converges almost surely (with probability one) to the set of coarse-correlated equilibrium [34,38]. Therefore, the joint transmit beamformer and power selections converges to a mixed strategy equilibrium solution. In fact, in our joint transmit beamformer and power selection game, the average regret of a user using regret matching becomes asymptotically zero, which is confirmed by our simulations.
The utility function of noncooperative or "selfish" users for the transmit beamformer and power selection game at iteration k is  Note that by using the above utility function, each user selects the transmit beamformer t m Δ m to maximize its own "normalized" SINR, t H m H H m,m R −1 m H m,m t m . The average regret in the recursion formula (21) can be updated locally as the best transmit beamformer is being selected.

Simulation results
In this section, we investigate the performance results of centralized optimization, COPMA, and noncooperative regret-matching (RMSG). We assume that the wireless ad hoc network has N homogeneous pairs where each pair has one transmitter node and one receiving node. Each entry in the channel matrix H m,k ∀ m , k ∈ N is assumed to be independent identically distributed complex Gaussian distribution with zero mean and unit variance. We consider a radio propagation channel with path-loss exponent ν = 4. This implies that the fading power is attenuated by d −4 m where d m is the distance between transmitter and receiver for mth node pair. The target SINR γ 0 is selected to be 10 dB. We assume that channels do not vary during the iterations. If channel conditions vary during an iteration, this will change the optimization problem and the proposed algorithms' performance degrades. However, depending on the network configuration and the parameters of the algorithm, like the smoothing factor τ for COPMA, the network optimizer can set the convergence steps to be as small as possible while trading against performance degradation in time-varying channels. The Grassmannian codebook of [21] is used for the simulation results. The codebook size is selected to be ϒ = 16 with T = 3 antennas for all users. P max = 100 mW (20 dBm) and P min = 1 mW (0 dBm) in our simulations. We assume six different transmit power levels : 1, 5, 20, 30, 50, and 100 mW motivated by the IEEE 802.11b standard in [39]. Note that the transmit powers are selected from this discrete power level set which corresponds to ceiling function of (10). The selected network topologies are assumed to be feasible for the given power levels [40]. The noise power is s 2 = 3.16 × 10 -13 W (-95 dBm) which corresponds to approximate thermal noise power for a bandwidth of 20 MHz.

Comparison of centralized optimization, COPMA, and RMSG for N = 4 node pairs
We first consider a small wireless ad hoc network with 4 users, i.e., N = 4. All transmitting and receiving nodes are randomly located in a square of 30 m × 30 m area. We choose τ = 0.1/n 2 in our simulations, where n denotes the iteration step. The global optimum solution obtained by enumerating all feasible strategies, i.e., 16 4 profiles, is the performance benchmark. The maximum number of iterations is = 120 for COPMA and RMSG. The total power consumed by the network is shown in Figure 3. The global minimum power solution obtained by centralized optimization is the lower bound on the overall power consumed by the network. We observe that COPMA's performance improves over time and settles at the global optimum combination after 92 iterations. Note that 68 and 76% of the gain from using COPMA algorithm is realized within the first 59 and 83 iterations, respectively.
RMSG algorithm discussed in Section 5 minimizes the total transmit power in the network defined by (9) using the utility function (23) in a noncooperative manner. Figure 3 also shows how the total power in the network varies over 120 iterations using RMSG. Note that RMSG yields inferior performance compared to COPMA in terms of the achieved overall power. However, the updating procedure is noncooperative and requires less overhead as the iterations continue. The total network power converges to a value of 135 mW on the 68th iteration whereas the centralized algorithm's solution requires 65 mW total network power. Steady state is reached when all the users select a transmit beamformer index with probability one. Figures 4 and 5 depict the trajectories of transmit beamformer selection indices and power trajectories in COPMA for each user in the network topology. At the initialization step, each user starts with maximum power levels and first index of transmit beamformer selections. Then, each user updates iteratively following COPMA algorithm, until the optimum Nash equilibrium is achieved. Note that when the transmit beamformers Θ and power level vectors P converge in Figures 4 and 5, the corresponding overall transmit power obtained by COPMA is shown in Figure 3. The existence of NE and Probability mass function (p.m.f): In this subsection, we take a look at the probability mass function (p.m.f) ϕ¯t m m of the RMSG algorithm calculated in (22). Figure 6 represents the change in the p.m.f after 1, 12, 50, and 100 iterations for one user. Initially, users choose the strategies, i.e., transmit beamformers, with equal probability. The strategies are represented by the indices 1 to ϒ = 16 in the x-axis and the probabilities of selecting these indices are given on the y-axis. After 12 iterations, the probability of choosing transmit beamformer index 9 is higher than that for any other transmit beamformer index, although the other probabilities for indices 3, 4, and 12 are not totally eliminated. After 25 iterations, all other probabilities except those of 4 and 9 are eliminated. A stationary point is reached when user 1 chooses transmit beamformer index 9 in the 100th iteration.

Comparison of COPMA and RMSG for N = 10 node pairs
We now consider a larger wireless ad hoc network with N = 10 node pairs randomly located on a 100 m × 100 m area. The smoothing factor for COPMA is selected as τ = 200/n 2 in order to search more efficiently in this large strategy space. All other simulation parameters remain the same. An optimization problem with feasible points exceeding 2 N when N >30 is very difficult to find [41]. The centralized approach is no longer feasible in this scenario due to the enormous strategy space of 16 10  profiles. Figure 7 shows the network topology and transmit beampatterns of COPMA with N = 10 users. We again investigate both cooperative and regretmatching learning algorithms represented by COPMA and RMSG curves, where the maximum number of iterations is set to = 400 for COPMA and = 1500 for RMSG. COPMA's performance was found to be optimal solution for the 4 link network, so it provides a good benchmark to test the performance of RMSG. Figure 8 shows the total network power versus the number of iterations for COPMA and RMSG. This figure shows that RMSG's performance is within 75.56% of the COPMA value at the end of iterations. Furthermore, RMSG needs a larger amount of iterations compared to COPMA for the convergence. However, note that RMSG performs noncooperative updates for transmit beamformer and powers at each iteration and thus the amount of overhead is minimal.
For the RMSG algorithm, the total power converges to a total network power of 0.2250 W from the 1,296th iteration. The joint selection of transmit beamformer indices and transmit powers reaches steady state when no user in the network deviates from its chosen strategy. The majority of users reach a steady state within 115 iterations. However, one user takes longer than 1,000 iterations to reach steady state.
Probability mass function (p.m.f): A similar figure for the p.m.f of RMSG for the user that takes longer convergence time than others is shown in Figure 9 for the large network size with N = 10. As can be seen in Figure 9, the probability of choosing index 16 is higher than other indices at iteration 500, but the probability of choosing index 5 and 12 is not totally eliminated even after 1,000 iterations. Since the network size is large, the learning process is slower to converge (around 1,332 iterations) to steady-state transmit beamformer indices, compared to the smaller network with N = 4 node pairs.

Conclusion
In this paper, we have considered both cooperative and noncooperative joint power control and beamforming in MIMO ad hoc networks using a game theoretic approach. Under constant SINR requirements, the joint transmit beamformer and power selection algorithms were studied in the context of total network power minimization. We first considered a cooperative case where all users collaborate with each other in order to minimize the overall power of the network. The game was formulated as an identical interest game, and a decentralized algorithm COPMA with high probability of convergence was proposed and analyzed. To reduce the required overhead incurred by the cooperative algorithm, we have also proposed a noncooperative solution which requires only local information. For our proposed noncooperative algorithm, users update their probabilities of choosing a transmit beamformer and power based on the "regret" of not choosing the other strategies. Numerical results illustrate the convergence properties of the proposed algorithms and their performance in terms of overall power minimization in the network.

Note
This paper is presented in part at the IEEE Global Communications Conference 2010 (Globecom'10), Miami, FL, December 6-10, 2010.