Resource Allocation in a MAC with and without security via Game Theoretic Learning

In this paper a $K$-user fading multiple access channel with and without security constraints is studied. First we consider a F-MAC without the security constraints. Under the assumption of individual CSI of users, we propose the problem of power allocation as a stochastic game when the receiver sends an ACK or a NACK depending on whether it was able to decode the message or not. We have used Multiplicative weight no-regret algorithm to obtain a Coarse Correlated Equilibrium (CCE). Then we consider the case when the users can decode ACK/NACK of each other. In this scenario we provide an algorithm to maximize the weighted sum-utility of all the users and obtain a Pareto optimal point. PP is socially optimal but may be unfair to individual users. Next we consider the case where the users can cooperate with each other so as to disagree with the policy which will be unfair to individual user. We then obtain a Nash bargaining solution, which in addition to being Pareto optimal, is also fair to each user. Next we study a $K$-user fading multiple access wiretap Channel with CSI of Eve available to the users. We use the previous algorithms to obtain a CCE, PP and a NBS. Next we consider the case where each user does not know the CSI of Eve but only its distribution. In that case we use secrecy outage as the criterion for the receiver to send an ACK or a NACK. Here also we use the previous algorithms to obtain a CCE, PP or a NBS. Finally we show that our algorithms can be extended to the case where a user can transmit at different rates. At the end we provide a few examples to compute different solutions and compare them under different CSI scenarios.

secrecy-rate condition. We use the above developed algorithms to get the equilibrium points. Next we study the case where each user knows only the distribution of the channel state of Eve.

Index Terms
Physical layer security, Coarse correlated equilibrium, multiple access channel, resource allocation, algorithmic game theory.

INTRODUCTION
A multiple access channel (MAC) is a basic building block in wireless networks [1]. Also, it models the uplink in a wireless cellular system. Therefore it has been studied extensively over the years ( [2], [3], [4]). More recent, it has also received attention from information theoretic security point of view. In this paper, we study a MAC with and without an eavesdropper using game theoretic techniques. This allows operating in the capacity region which is fair to the users and also provides distributed algorithms with local information at the users. First we provide a literature survey on this problem.
A general M−user fading MAC is considered in [5] where the receiver has perfect channel state knowledge and broadcasts channel state information of all the users to all the transmitters.
The authors prove that the capacity region of a M−user MAC has a polymatroid structure, and they exploit this structural property to find the optimal power and rate control policy. Time varying additive white Gaussian noise (AWGN) MAC is studied in [6] where it is assumed that only the receiver can track the channel and not the transmitters. In that case the transmitters allocate fixed powers (which satisfy the average power constraint) and transmit data over the channel.
In [7] the authors propose a distributed power allocation scheme using Game Theory. The authors assume that each user knows the channel gain of other users also, in addition to knowing his own channel gain. The authors prove that the sum-rate point on the capacity region is a Nash equilibrium when the decoding strategy of the receiver is not known to the transmitters. The authors also prove the existence of a Stackelberg equilibrium in which the receiver acts as a leader and the transmitters play a low level game. Using repeated games, the authors prove that each point on the capacity region of a fading MAC is achieved by some power control policy.
In [8] the authors prove stronger results by assuming that each user knows only its own channel gain, but knows the distribution of channel gains of the other users. Under these conditions, the authors prove the existence and uniqueness of a Bayesian equilibrium. In an orthogonal multiple access channel the authors in [9] have used evolutionary game theory to obtain a power allocation scheme, while assuming that each user knows the channel gain of all users via feedback.
With security constraints, a multiple access wiretap channel (MAC-WT) has been well studied in literature. One of the early works is reported in [10] where only one user has confidential messages to be transmitted. The authors have obtained upper bounds on the secrecy-rate regions.
In [11] the authors consider a more general setup wherein they consider a discrete memoryless multiple access channel where the transmitting users receive a noisy version of each others' conversation, and they do not trust each other. In this scenerio the authors have obtained an achievable secrecy rate region and some outer bounds. In some special cases this provides secrecy capacity region. A multiple access wiretap channel with feedback has been studied in [12]. An achievable region of a Gaussian multiple access wiretap channel (G-MAC-WT) was obtained in [13] (the secrecy capacity region is still an open problem).
In the above work, weak secrecy criterion is used. A strong secrecy based achievable rate July 6, 2016 DRAFT region for a MAC-WT is reported in [14]. In [15] the authors find secure degrees of freedom for a MAC-WT. More recently in [16] the authors have studied a compound MAC-WT and have characterized inner and outer bounds on the secrecy capacity region. In [17] the authors have studied a fading MAC-WT with full CSI of Eve and also when each user knows the channel state of all the users to the receiver, but is ignorant of the instantaneous value of channel state to the eavesdropper (only its distribution is known). But knowing other users' channel gains to the legitimate receiver may also not be practical: it needs a lot of signalling overhead and feedback information. Hence in this paper we present a game-theoretic solution to the resource allocation scheme under the hypothesis that each user only knows its own channel gain and is completely ignorant of other users' channels (not even the distributions).
In interference channel model [18], the authors use learning algorithms to study a stochastic game, and learn optimal power allocation policies. The authors use no-regret algorithm to prove the existence of a correlated equilibrium. It is assumed that each user knows power allocation policy of other users, which is not always realistic. The same authors extend this work to the case where each user knows only his own channel gain and does not know the power levels used by other users. The authors prove the existence of a coarse correlated equilibrium using multiplicative weights no-regret algorithm ( [19]) In this paper we first consider a fading MAC (F-MAC) without security constraint. We assume each user knows only its individual channel gain (unlike [8] we do not assume that it knows the distributions of channel gains of others). Since the receiver is receiving data from all the users, it is quite practical to assume that the receiver has channel state information of all the transmitting users. Once a user sends a codeword corresponding to a particular message, the receiver sends an ACK if it decodes it successfully, else it sends a NACK. Each user defines a utility based on the ACK/NACK. We use multiplicative weight no-regret algorithm to obtain an equilibrium.
We also assume in the later part of the paper, that each user can decode ACK/NACK of other users and hence knows their utility. Then we aim to maximize the sum-utility and propose an algorithm to obtain a Pareto point. We also find a Nash bargaining solution which provides a Pareto point and ensures fairness among users. We also study the case where users can transmit at multiple rates rather than fixed rates.
Next we consider a fading MAC-WT where we first assume that each user knows its channel gains to the receiver and Eve. In this case we repeat all the algorithms which we used for a F-MAC (without security), i.e., MW, PP, NBS and also consider the multiple rates case.
Since it is not practical to assume instantaneous channel gain of the eavesdropper to be known at the transmitter and the receiver, we next consider the case where the receiver only knows the distribution of the Eve's channel gains. The receiver calculates secrecy-outage and sends an ACK/NACK based on that. We again obtain a CCE, PP and a NBS. To the best of our knowledge this is the first paper which is using game theory on MAC-WT. Finally we compare the sum-rates obtained via all these algorithms to the global CSI case and also with the sum-rate obtained in [17].
The rest of the paper is organized as follows. In Section 5.2 we describe the channel model and formulate the problem. In Section 5.3 we use Multiplicative Weight Algorithm to obtain a CCE. In Section 5.4 we obtain Pareto optimal points. In Section 5.5 we a consider fading-MAC-WT when the CSI of Eve is not available at the transmitters (only its distribution is known) and obtain a CCE, a NBS, and a PP. In Section 5.6 we compare the various schemes on an example.
Finally, in Section 5.7 we conclude the paper.
at time t, where η b (t) is white Gaussian noise with mean zero and variance 1, denoted by The fading gains are assumed discrete valued, in the sets H i {h To transmit any codeword, user i can choose any power level from the set Also, user i has average power constraint P i .
User i transmits at a fixed rate r i (to be generalized later) via a usual point to point channel encoder. If the receiver successfully decodes a message, it sends an ACK to that particular user.
Otherwise, it sends a NACK. We assume that the NACK, ACK are transmitted at low rates so that these can be received with negligible error at the intended transmitter. The goal of each user is to maximize its probability of successful transmission.
Each user i is assumed to know its own channel gain H i (t) at time t. Since the receiver can estimate the channel gain of all the users (either by receiving known pilots or by using initial data received), the receiver can use successive cancellation decoding strategy to decode all the users.
Let π(i) be the user which has the i th highest channel gain (in case of a tie we arbitrarily order them). The decoder first decodes the user π(1) with the best channel gain first, taking the DRAFT July 6, 2016 transmissions from the other users as noise. Then it removes it from the received signal Y (t) and then decodes the next best user, taking the other users as noise and so on. Let . ( Then the receiver will send an ACK to the transmitting user π(i) if The above constraint follows from the successive cancellation decoding scheme chosen. Each user i takes action (allocating power) P to transmit at its rate.
We define feasible action space for user i as We define |P i | M i (where |A| denotes the cardinality of set A) and index the elements of set Let a i denote a feasible power policy of user i, i.e., a i takes a value from P i , and a i (h) is the power level used by user i when its channel gain is h ∈ H under policy a i .
The action space of K users is denoted as and the action space of users, other than user i is where The action profile of all the users is denoted as a = July 6, 2016 DRAFT (a 1 , . . . , a K ). A probability distribution ψ(i) on P i is called a strategy of user i. When a certain action is chosen with probability one, it is called a pure strategy. The objective of each transmitter is to maximize its probability of successful transmission. Since the actions chosen by one user may influence the outcome for the other users in terms of probability of successful transmission, this can be formulated as a stochastic game. For user i, if the channel gain in time slot t is H i (t) and the action profile chosen is (a i , a −i ), we define its reward as, We are interested in the time average of the reward process We will restrict ourselves to Markov stationary policies, i.e., action of user i depends only on its current state H i (t). Then {ω i (a i , H i (t))} are iid across time t. Hence by strong law of large numbers, the average reward i (a i , H i ) is same as the probability of successful transmission. In terms of a mixed strategy (ψ i , ψ −i ), the average reward is Hence this stochastic game can be modelled as a one-shot game in which player i maximizes its utility (8). In the rest of the paper we develop algorithms to compute equilibrium points for this game. DRAFT

MULTIPLICATIVE WEIGHT ALGORITHM FOR LEARNING CCE
In this section we use multiplicative weight algorithm ( [20]) to compute an equilibrium point of the system. This is a distributed algorithm. The cost of each user can be defined as . Now we have the following definition.
for each i and all actionsâ i , then it is called ǫ− coarse correlated equilibrium, where on the right side a −i has the marginal distribution ψ.
A mixed-Nash equilibrium is a CCE. Hence for our finite game a CCE exists ( [20]).

Definition 3:([20])
For user i, the external regret is defined as for a given pure strategy sequence a with respect to an action a i .
In a No-regret algorithm, called multiplicative weight algorithm, users update their strategies based on the cost received, such that the external regret coverges to zero. This algorithm is presented in Algorithm 1. It converges to a CCE according to the following theorem ( [20], [21]).
i denote the outcome distribution at time t. There exists an integer T > 0 such that the regret of user i is less than ǫ after T iterations. Then, ψ = 1 is an ǫ-coarse correlated equilibrium.
User i receives average utility for choosing a i ν i (a i ) 9: Time t + 1 10:

PARETO OPTIMAL POINTS
In a wireless environment it is realistic to assume that the ACK/NACK bits sent to a particular user can be successfully decoded by all the other users also (because these are sent at a low rate using robust codes). In that case all users can learn about the utility of each other at time t.
We show in this section that this information can be used to get a socially optimal Pareto point which generally provides a better performance than a CCE.
Definition: An action profile a ∈ P is a Pareto point if there does not exist another profileã such that ν i (ã) ≥ ν i (a), ∀ i ∈ K and ν j (ã) > ν i (a) for some j = i. Define for fixed γ i ≥ 0, i = 1, . . . , K. Then a solution to the optimization problem max a Ω(a), subject to a ∈ P.
In Algorithm 2 below we provide a distributed algorithm in which the users update their strategies in a sequential fashion so as to improve Ω(a). This distributed algorithm is the variation of a heuristic stochastic local search algorithm. In this algorithm each user chooses a random action and uses it for a fixed number of time slots (say T ). Then each user finds weighted sum of the utilities (since each user receives ACK/NACK of other users, it can calculate this quantity). After T slots a user experiments randomly (with probability say, ρ) and then with some probability updates the action profile according to its channel state. Now one user uses this action for next T slots and the other users use the previous action. Based on the weighted sum of utilities, the particular user defines a benchmark. The details of algorithm in the scenario of interference channel can be found in [18]. The algorithm is presented below as Algorithm 2.

NASH BARGAINING SOLUTION
The Pareto points obtained in Section 4 are socially optimal, but may not be fair to all users: some users may get much more rates than others. To obtain fair Pareto points we use the concept of Nash Bargaining Solution (NBS) [23].
In NBS we need to specify a disagreement strategy ∆ and the corresponding outcome δ = (δ 1 , . . . , δ K ) that specifies the utility of each user that it receives by playing the disagreement strategy whenever there is no improvement over this utility in playing the bargaining outcome.
This bargaining problem is denoted by (V, δ). Update weight of each user i After T slots: w.p ρ i user i experiments 7: procedure ACTION UPDATE 8: w.p ǫ choose a ′ i = a i , a ′ i ∈ P i 9: w.p. 1 − ǫ 10: choose a ′ i = a i s.t. h i with high α i gets higher power level 11: If α i same for all h i , then higher value of channel state gets higher power level. 12: end procedure 13: Call new actionâ i

14:
User i: useâ i for T time slots. 15:â j = a j if user j is not experimenting. 16: 21: Randomly select another action 22: end if 23: end procedure The aim of the bargaining problem is to find a bargaining solution which is Pareto optimal and satisfies the axioms of symmetry, invariance and independence of irrelevant alternatives ( [24]).

Theorem 5.1 ([23]). There exists a unique bargaining solution (provided the feasible region is non-empty) and it is given by the solution of the optimization problem:
We obtain the disagreement outcome for our problem by the following procedure DRAFT July 6, 2016 ⋆ Each user chooses an action that gives higher power level to the channel state that has higher probability of occurrence. In other words, among the set of feasible actions, choose a subset of pure strategies that gives the highest power level to the channel state with highest probability of occurrence. We shrink the subset by considering the actions that give higher power level to the second frequently occurring channel state and we repeat this process until we get a single strategy.
⋆ If all the channel states occur with equal probability, we follow the above procedure by considering the value of the channel gain instead of the probabilities of occurrence of the channel gains.
Let a i denote the pure strategy chosen by the i th user and let T δ be the number of time slots over which this strategy is used. Then the disagreement value for user i is We use Algorithm 2 to obtain a distributed solution of (13), with the objective function defined as From [23], if the set of utilities V is convex then a Nash bargaining solution is also proportionally fair. In our problem V is convex and hence the solution is proportionally fair also.

FADING MAC WITH SECURITY CONSTRAINTS
In this section we consider a time slotted fading-MAC-WT channel with K-users who have messages to transmit confidentially to a legitimate receiver (Bob), while a passive eavesdropper (Eve) is listening to the conversation and trying to decode. The notation corresponding to Bob is same as in the previous sections. Here we define the notation for the channel to Eve. Let { G i (t)} be the channel gain process from user i to Eve. At time t Eve receives where η e (t) is white Gaussian noise, with distribution N (0, 1) and independent of {η b (t)} and the channel gain processes and Each user i is assumed to know its own channel gains H i (t) and G i (t) at time t. Since the receiver can estimate the channel gain of all the users (either by receiving known pilots or by using initial data received), the receiver can use successive decoding strategy to decode all the users.
Now we can use all the algorithms of Section II to obtain a CCE, PP and NBS.

A. Fading MAC-WT with Individual Main Channel CSI Only
We consider now the case where the users as well as the receiver do not know Eve's channel gain, but only its distribution. Also the transmitters do not know even the distribution of Eve's channel gains. In this scenario, the natural metric for the receiver to decide whether to send an ACK or a NACK will be outage based. First we define the secrecy outage, when h 1 , . . . , h K are given, as P S O (π(i)) P r r π (i) > log 1 + h π(i) P π(i) (H π(i) ) 1 + K j=i+1 h π(j) P π(j) (H π(i) ) − log 1 + G π(i) P π(i) (H π(i) ) 1 + K j =i G π(j) P π(j) (H π(j) ) .
The receiver sends an ACK if P S o < ǫ, else the receiver sends a NACK. Hence we define utility of user i as where ½ {C} is an indicator function. With these utility functions, we can use the algorithms provided in Sections III-V.

B. Avoiding Security Breach
In the previous sections we assumed that when the legitimate receiver cannot securely decode the message it sends a NACK. This is useful for the transmitters to learn the equilibrium point.
But the messages transmitted during those slots may be decoded by Eve (with probability > ǫ in Section 6A). Now we modify the system a little so as to use the above coding scheme but mitigate this secrecy loss also.
We assume that each slot is comprised of two subslots. The fading process does not change during the whole slot. In the first part of the slot we transmit a dummy (random) message. If Bob sends an ACK to user i then the actual confidential message can be transmitted by user i in the second subslot at the same power. If Bob sends a NACK then user i should not use the second subslot. We can make the second subslot much larger than the first subslot so that the rate loss due to the dummy messages is minimal.

TRANSMISSION AT MULTIPLE RATES
Till now we have considered the case where the users are transmitting at fixed rates. Now we consider the more realistic scenario where the users can transmit at different rates, depending on their channel gains. We assume that user i can choose any rate from the rate set We now define a new strategy set such that choosing the rate of transmission becomes part of the action taken along with the power chosen. Hence we define the modified strategy set as We can now use all the existing algorithms to compute CCE, PP and NBS. DRAFT

NUMERICAL RESULTS
In this section we provide several examples using the algorithms developed in this paper. We In this scenario we first consider the case when users are transmitting at fixed rate, 1 bit/sec. In this scenario we compare the sum-rate obtained by our three algorithms i.e., CCE, PP and NBS (see Fig. 1). We note that NBS and PP are better than CCE. Also, regarding the fairness among the users, we see from use Algorithm 2, with the weights γ i = 1. As expected, we observe that PP and NBS give much better rates than CCE (Fig. 3). From Fig. 4 we also observe that here also NBS is fairer among the three algorithms. We also compare our schemes with that of [8], where each user knows its own channel and distribution of other users' channel gains. We observe that PP and NBS give better sum-rate than this scheme (Fig. 5). We first consider a fixed rate scenario. Each user knows its channel gain to Bob and Eve. We observe that the PP and the NBS obtain much higher sum rate than the CCE (Fig. 6). Also we observe that the NBS is fairer than the PP and the CCE (Fig. 7).
Next we consider the case where the users don't have CSI of Eve available but only the distribution is known. As in the previous example, here also we observe the same trend (Fig. 8,   Fig. ??).  In this paper a K-user fading multiple access channel with and without security constraints is studied. First we consider a F-MAC without the security constraints. Under the assumption of individual CSI of users, we propose the problem of power allocation as a stochastic game when the receiver sends an ACK or a NACK depending on whether it was able to decode the message or not. We have used Multiplicative weight no-regret algorithm to obtain a Coarse Correlated Equilibrium (CCE). Then we consider the case when the users can decode ACK/NACK of each other. In this scenario we provide an algorithm to maximize the weighted sum-utility of all the users and obtain a Pareto optimal point. PP is socially optimal but may be unfair to individual users. Next we consider the case where the users can cooperate with each other so as to disagree with the policy which will be unfair to individual user. We then obtain a Nash bargaining solution, which in addition to being Pareto optimal, is also fair to each user.
Next we study a K-user fading multiple access wiretap Channel with CSI of Eve available to the users. We use the previous algorithms to obtain a CCE, PP and a NBS. Next we consider the case where each user does not know the CSI of Eve but only its distribution. In that case we use secrecy outage as the criterion for the receiver to send an ACK or a NACK. Here also we use the previous algorithms to obtain a CCE, PP or a NBS. Finally we show that our algorithms can be extended to the case where a user can transmit at different rates. At the end we provide a few examples to compute different solutions and compare them under different CSI scenarios.