Distributed ranking-based resource allocation for sporadic M2M communication

This work proposes a novel scheme for distributed ranking-based and contention-free resource allocation in large-scale machine-to-machine (M2M) communication networks. We partition a network of N devices into disjoint clusters based on service type, and assign to each cluster a cluster-specific signature for active cluster members to indicate their active status. The devices in each cluster are totally ordered in some a priori-known manner, which gives rise to an active ranking of active cluster members. In order to tackle complexity issues in large-scale M2M networks with a massive number of devices, we propose a distributed resource allocation scheme using the framework of compressed sensing (CS), which mainly consists of three phases: (i) In a full-duplex acquisition phase, the devices transmit their cluster-specific signatures simultaneously and the network activation pattern is collected in a distributed manner. (ii) The base station detects the active clusters and the number of active devices per cluster using block sketching, and allocates resources to each active cluster accordingly. (iii) Each active device determines its active ranking in the cluster and accesses a specific resource according to the ranking position. By exploiting the sparsity in the activation pattern of the M2M devices, the proposed scheme is formulated as a CS support recovery problem for a particular binary block-sparse signal x∈BN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in {\mathbb{B}}^N$$\end{document} – with block sparsity KB\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_{B}$$\end{document} and in-block sparsity KI\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_{I}$$\end{document} over block size d. Our analysis shows that the proposed scheme efficiently reduces the signature length to O(max{KBlogN,KBKIlogd})\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(\max \{K_{B}\log N, K_{B}K_{I}\log d\})$$\end{document} and achieves less computational complexity of O(dKI2+NdlogN)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(dK_{I}^{2}+\frac{N}{d}\log N)$$\end{document} compared with standard CS algorithms. Moreover, numerical results suggest strong robustness of the proposed scheme under noisy conditions.

requests from the M2M devices, thereby posing significant challenges in the design and operation of the wireless networks. For large-scale networks such as M2M communication networks, distributed resource allocation schemes are highly desired in order to alleviate the scaling issues due to massive connectivity [6]. In distributed schemes, each device decides semi-autonomously on which resources to access the channel, which results in a significant reduction of signaling overhead for coordination and information exchange between the devices and the BS. Moreover, considering the limited computing capability of the M2M devices [7], there is a strong need for low-complexity solutions that require relatively low computing power.
Apart from the massive number of devices, a key feature of the M2M traffic is that the device activity patterns are typically sporadic [8] so that at any given time instant each device has a low probability of being active. This results in a certain level of sparsity in the device activity. To this end, compressed sensing (CS) [9][10][11][12] is identified as an ideal framework in such scenarios since it provides tools for efficient reconstruction of highdimensional signals with a sparse representation. Moreover, it is often observed that the messages sent by the M2M devices are strongly correlated, e.g., due to proximity, the same service type, and etc [7]. Therefore, it is in particular beneficial to partition a given set of all devices into a number of clusters such that similar requests can be handled jointly. Given this cluster-like behavior and the sparse nature of the M2M traffic, the activation pattern of the devices can be modeled as a block-sparse signal with an additional in-block structure [13] in CS-based applications.
The main objective of this study is to seek for an efficient resource allocation scheme to mitigate serious scaling problems resulting from massive connectivity, by exploiting the specific sparsity feature in the device activity. A special attention is attached to low overhead communication with enhanced spectral efficiency and reduced computational complexity. Towards this end, we propose a three-phase resource allocation scheme, where we use a distributed approach to reduce the communication overhead and a sketching algorithm to lower the computing load.

Related works
Conventional cellular networks are designed based on the scheduling of active users to orthogonal time or frequency resources. The excessive control overhead incurred by the massive number of sporadically active M2M devices, however, renders such kind of resource allocation schemes unrealistic. As an alternative, contention-based schemes, such as slotted ALOHA [14], have been proposed to deal with this issue. However, the main drawback of these schemes is the lack of deterministic performance guarantees, and in particular the performance deteriorates significantly under massive connectivity settings. In addition, the authors of [15] also raised concerns of energy efficiency in large-scale M2M networks, where systems with less computational complexity and lower power consumption are particularly desired for low-cost M2M devices with limited operational capability.
Taking sparsity in the activity pattern into account, access schemes using CS also receive a great deal of attention in recent years. To the best of our knowledge, the authors of [16] were the first to propose the idea of CS-based multi-user detection. They introduced a smart adaptive algorithm that switches between a CS-based reconstruction algorithm and a classical detection method depending on the sparsity level of the signals being detected. Reference [17] proposed greedy CS algorithms to facilitate a joint detection of device activity and transmitted data. The idea was further extended in [18] to include multicarrier access schemes and to provide higher spectral efficiency and more flexibility of such schemes. Furthermore, schemes for distributed compressed sensing were also widely studied (e.g. in [19]) to take advantage of both inter-and intra-signal correlations by jointly reconstructing signals that have been compressed independently. The concept of distributed compressed sensing has been applied by the authors of [20] to facilitate device detection in M2M communications, which shows significant performance gains expressed in terms of robustness. However, none of the works mentioned above has exploited the particular cluster-like behavior of M2M devices, thereby ignoring the block sparsity structure in the activation pattern. In addition, greedy algorithms such as orthogonal matching pursuit (OMP) [21] are widely used in detection schemes owing to their low computational complexity. However, in many wireless applications including that considered in this study, even this reduced complexity is a bottleneck due to strictly limited computing resources on the M2M devices. Therefore, we employ a sketching algorithm in our proposed approach to further offload the computation burden incurred by massive M2M communications.

Main contribution
In this work, we present and analyze a novel distributed scheme for device detection and contention-free resource allocation in large-scale M2M communication networks. We partition the M2M devices into disjoint clusters in advance based on service type, and assign to each device a cluster-specific signature that active devices use for their initial access. In this paper, we use the following definition of an active cluster. Definition 1 (Active cluster) We say that a cluster is active if there is at least one active device in this cluster.
The devices in each cluster are totally ordered according to some given criterion such as service priority. In other words, the set of devices in each cluster is a totally ordered set S so that if a, b ∈ S, a � = b and a ≤ b , then a < b . This order is a priori known at all cluster members and gives rise to what we call active ranking that determines the order within each cluster in which the active devices access the set of assigned resource blocks, i.e., an active device that is the i-th element in the active ranking accesses the i-th resource block. Definition 2 (Active ranking) The active ranking associated with each active cluster is the totally ordered subset induced by the active devices in a cluster.
Motivated by the CS principle, the proposed scheme mainly consists of three phases: • Phase (i) Signal acquisition The active devices transmit simultaneously pre-equalized individual signatures, each of which indicates the membership to a particular cluster. Exploiting full-duplex transceivers, 1 all the devices and the BS obtain their individual measurements, which are the superposition of the transmitted signatures from the active devices. • Phase (ii) Detection at BS The BS detects the active clusters, the number of active devices in each cluster, and also the collision pattern in the received measurements.
Then it broadcasts the detected information to the devices and assigns a sufficient amount of resources to each active cluster accordingly. • Phase (iii) Detection at devices Each active device detects the active ranking of its cluster, and then accesses the corresponding resource assigned by the BS based on its ranking position.
We study a particular signature design for the devices in each cluster to facilitate the detection process of phase (ii) at the BS side and that of phase (iii) at the device side. Moreover, based on the Count-Sketch procedure [23,24], we develop a novel block sketching algorithm to perform phase (ii) and to bring down computational complexity induced by massive connectivity. Phase (iii) is performed using a conventional greedy algorithm such as OMP [21] except that we use feedback information from the BS to enhance robustness and to further reduce complexity. Furthermore, with the distributed ranking-based resource allocation scheme, each active device accesses autonomously a pool of resource blocks assigned by the BS in a contention-free manner without further control signaling, thus the communication overhead is significantly reduced. We show via theoretical analysis that the proposed scheme achieves a better scaling with increasing network size both in terms of communication overhead and computational complexity when compared with classical CS-based approaches. Moreover, the simulation results reveal a significantly enhanced robustness of the proposed scheme in the presence of Gaussian noise and inaccurate channel estimations.

Organization of the paper
The remainder of this paper is organized as follows. Section 2 introduces a mathematical model for the proposed scheme. In Sect. 3, we present the detection algorithms in detail. Section 4 is devoted to theoretical analysis of the proposed scheme, while Sect. 5 evaluates the performance with numerical simulation results. Finally, Sect. 6 concludes the paper with some final remarks.

Notational remarks
Throughout this work, matrices and vectors are denoted by uppercase and lowercase letters, respectively. The superscript (·) T represents the transpose of a matrix or a vector, and (·) H indicates the Hermitian transpose. A • B denotes the Hadamard product of matrices A and B. The field of binary, real and complex numbers are denoted by B , R and C . The cardinality of a set is given by | · | , and the ℓ 2 -norm is given by || · || . Furthermore, O denotes "big-O" according to Knuth's notation. Unless otherwise stated, all logarithms are assumed to be to base 2.

Methods
In this section, we introduce the underlying system model and formulate the problem which is addressed in this paper.

Transmitter side
We consider an M2M network consisting of N devices, which are partitioned into L clusters of equal size d according to the service type. 2 The members of each cluster are known both at the BS and at all devices, which can be achieved via device registration to the network. We assume that the devices have sporadic transmissions, which implies that at a given time instant, only a relatively small number of devices from a few clusters are activated to access the channel. Therefore, we define a twofold sparsity pattern to model the active status of the devices: • Block sparsity K B The maximum number of active clusters at any time.
• In-block sparsity K I The maximum number of active devices in an active cluster.
Therefore, the total number of active devices in the network is K ≤ K B K I . On the other hand, due to the sparsely-activated devices, we have K ≤ K B K I ≪ N = Ld.
We use a K-sparse binary sequence (or vector) x ∈ B N to model the activation pattern of the devices in the network, where entry "1" indicates that the corresponding device is active while an inactive device results in "0". Furthermore, we use x ℓ ∈ B d , ℓ ∈ {1, · · · , L} to denote the subsequence (or subvector) of x corresponding to cluster ℓ . In addition, the block support, denoted as S B , is defined to be the set of indices of the active clusters: S B = {ℓ ∈ {1, · · · , L} : ||x ℓ || 0 � = 0} . Similarly, the in-block support, denoted as S I,ℓ , indicates the set of indices of the active devices in cluster ℓ : S I,ℓ = {j ∈ {1, · · · , d} : x ℓ,j = 1} . By definition, we have |S B | ≤ K B and |S I,ℓ | ≤ K I for all ℓ ∈ {1, · · · , L} . Thus, the activation pattern of the devices x is modeled as a sparse signal with block sparsity K B and in-block sparsity K I , and we call signals of such a sparsity pattern (K B , K I ) block-sparse.
Due to the sparsity in x, we use the CS theory [9][10][11] to reconstruct x based on measurements performed by the BS and the devices in the network. We use A ∈ R M×N to denote the measurement matrix whose exact structure is defined later in Sect. 3.1. Each column of A, say column i denoted by a i , i ∈ {1, · · · , N } , corresponds to the unique signature sent by device i if it is active, whereas A −,ℓ ∈ R M×d denotes a submatrix of A corresponding to the signatures sent by the devices from the ℓ-th cluster.

Receiver side
Due to the superposition property of the wireless channel, each receiver observes a noisy superposition of signatures transmitted by the active devices. We assume that all frames are received synchronously at the aggregation node. In practical systems this can be ensured e.g. by a synchronization signal from the BS. Alternatively, the requirement for precise synchronization can be mitigated by using schemes such as that considered in [25] which is robust to the lack of synchronization and requires only a coarse synchronization. To this end, the received signal at the BS y ∈ C M is a noisy linear combination of the transmitted signatures given by where H B ∈ C M×N is the channel matrix, and ǫ B ∈ C M denotes the additive noise which is assumed to be zero-mean with independent components of variance σ 2 B . The Hadamard product is used here to model the effective channel when applied with advanced technologies such as frequency hopping [26], which is introduced in detail in Sect. 3.2.
In addition, the active devices also perform their own local measurements during the acquisition phase. The received signal y D ∈ C M observed by a particular device at some given time instant is obtained as where H D is an M × N matrix representing the wireless channels between the devices, and ǫ D ∈ C M is the vector of independent noise components with zero mean and variance σ 2 D . Furthermore, we make the following assumption on channel knowledge: Assumption 1 Each device has the channel state information to the BS.
In current systems [5], the channel information can be obtained with pilot-signals from the network with no extra cost. Alternatively, it can also be possibly obtained via statistical channel knowledge [27], location-based estimation [28], channel reciprocity [29] or long-term observation [30]. We point out that the entries of H B and H D are in general not physical channels but rather effective channels that depend on the underlying transmission scheme. For instance, the effective channels of the energy-detection based scheme in [25] is related to channel power gains, which are much easier to acquire than complex-valued channel coefficients.  devices is as in (2). Hereafter, we apply CS related techniques to reconstruct the ( K B , K I ) block-sparse signal x with y at the BS and y D at the devices. To be specific, the BS performs block support recovery to obtain an accurate estimation of the block support S B and the cardinality of the in-block support |S I,ℓ | for each ℓ ∈ {1, · · · , L} . Subsequently, each active device from cluster ℓ performs in-block support recovery to estimate the inblock support S I,ℓ using the received signal y D in (2) and side information broadcast by the BS. In a typical massive connectivity scenario as in M2M communications, we need to find proper solutions of high efficiency and low computational complexity to the following problems.
• Problem 2 In-block support recovery at the active devices in cluster ℓ with P S I,ℓ = S I,ℓ ≥ 1 − δ. 3

Distributed ranking-based resource allocation
In this section, we present our algorithm design to tackle the target problems. In particular, this includes the structured signature model and the decoding procedures at the BS as well as at the devices, respectively.

Structured random signature model
The measurement matrix A ∈ R M×N we design here is a structured random matrix, which is an extension of those utilized by the Count-Sketch procedure proposed in [23,24]. These matrices are desired as they in general facilitate low computational complexity. We denote by A(R, T , L, d, α) a particular distribution over matrices with RT rows and Ld columns, which is specified below, and we assume that the measurement matrix A is drawn from this distribution, i.e., A ∼ A(R, T , L, d, α) with M = RT, N = Ld.
As illustrated in Fig. 2, the measurement matrix A is composed of the vertical concatenation of T individual random matrices that we denote by A t,− ∈ R R×N for  t ∈ {1, · · · , T } , where A t,− consists of the horizontal concatenation of L sub-matrices A t,ℓ ∈ R R×d for ℓ ∈ {1, · · · , L} . Each A t,ℓ is a sparse matrix containing exactly d non-zero components -located on the same row and with the same value. The index of the row with non-zero elements is chosen uniformly at random from the set {1, 2, · · · , R} , and the non-zero components take either the value +α or −α with probability 1/2. For a given realization of A t,ℓ , let q t,ℓ ∈ {1, · · · , R} denote the index of the row of A t,ℓ with non-zero entries, and s t,ℓ ∈ {−α, +α} be the corresponding value of the non-zero components in A t,ℓ . As each signature transmitted by the devices is the corresponding column of A, it is therefore a sparse sequence of length M with sparsity level T.

Block support recovery at BS
The objective of the decoding procedure at the BS is to obtain an accurate estimation of the block support S B and the cardinality of the in-block support |S I,ℓ | for each active The signal y received by the BS at some given time instant is given by (1). Since the channel state information H B is assumed to be available at the devices, the active devices can perform a channel inversion before transmitting the signatures to indicate their active state. Particular variants of (generalized) inverses of the channel matrix may be taken at the transmitter side. For example, H B can be inverted by simply taking the reciprocal of the non-zero elements whose magnitude is above certain threshold θ , which is given by In this case devices with weak links due to the near-far behavior may stay offline and avoid excessively large transmit power and strong interference to other nodes. Then the obtained measurements at the BS are given as In this paper, for the sake of simplicity we develop a fast block sketching algorithm based on the Count-Sketch procedure proposed in [23,24] to realize the decoding process at the BS. To be more precise, we use y t ∈ R R to denote the subvector of y corresponding to the observations obtained via submatrix A t,− . So, we have Given y t for each t ∈ {1, · · · , T } , we form the signal estimate x t ∈ R N by indexing and scaling the entries of the corresponding observations y t such that where each A t,− consists of the horizontal concatenation of L submatrices A t,ℓ for ℓ ∈ {1, · · · , L} . Further recall that each matrix A t,ℓ is a sparse matrix containing d non-zero components located on the same row q t,ℓ ∈ {1, · · · , R} and all the non-zero components have the same value s t,ℓ ∈ {−α, +α} . As a result, the i-th entry x t,i of x t for the ℓ-th block can be written as where y t,ℓ = y t q t,ℓ . We use a q t,ℓ ,i to denote the entry in the q t,ℓ -th row and i-th column of A t,− , then we have Since y t,ℓ is dominated by the non-zero entries on the q t,ℓ -th row of A t,− , we denote by S t,ℓ a set of indices of the clusters which have non-zero components on the same row as block ℓ , i.e., S t,ℓ = {j = {1, · · · , L} : q t,j = q t,ℓ } . Then y t,ℓ is given by where (a) follows from the structure of A t,ℓ with equal non-zero elements on the same row, (b) is due to x i ∈ {0, 1} , and (c) holds since |S I,k | = 0 if k / ∈ S B . Putting y t,ℓ in (9) into (7) yields where � t,ℓ is the interference term from other active blocks, and � t,ℓ is the noise term with zero mean and variance γ 2 .
To mitigate the interference � t,ℓ from other blocks, we consider a block-wise estimate x ℓ for each ℓ ∈ {1, 2, · · · , L} given by Notice that, instead of the mean, the estimate x ℓ for block ℓ is equal to the median of x t,i over O(Td) samples. The rationale behind this approach is to make the estimates more (7) x t,i = s t,ℓ y t,ℓ , for i ∈ {dℓ − d + 1, · · · , dℓ}, ℓ ∈ {1, · · · , L}, t=1,i=dℓ−d+1 , for ℓ ∈ {1, 2, · · · , L}.  (11), each estimate x ℓ for the ℓ-th block corresponds to |S I,ℓ | with high probability (w.h.p.), and the cardinality of the in-block support set |S I,ℓ | is obtained as Therefore, since |S I,ℓ | indicates the number of active devices in cluster ℓ ∈ {1, 2, · · · , L} , those clusters with |S I,ℓ | > 0 are marked as "active" by the BS. For brevity, we assume that each device needs one resource block for transmission. Therefore, the number of resource blocks assigned by the BS to cluster ℓ ∈ {1, 2, · · · , L} equals to |S I,ℓ |.
In addition, for a given x i from an active block ℓ ∈ S B , if an individual estimate x t,i in (7) is much larger than the block-wise estimate x ℓ in (11), i.e., x t,i ≫x ℓ , then we can conclude that the corresponding measurement might suffer strong interference from the other active clusters. That is, for a given x i from block ℓ and a particular t ∈ {1, · · · , T } , the interference term � t,ℓ in (10) is non-zero. In such a case, we mark the measurement as "collided" for cluster ℓ and keep its index q t,ℓ in the collision pattern vector Q ℓ for the corresponding cluster.
The above approach provides the BS with an accurate estimate of S B and |S I,ℓ | , and therefore it solves Problem 1. The detailed proof of the performance guarantee will be given in Sect. 4.1. In addition, it provides the collision patterns Q ℓ in the measurements for ℓ ∈ {1, 2, · · · , L} . The BS broadcasts these information to the devices and assigns to each cluster, say cluster ℓ ∈ {1, 2, · · · , L} , |S I,ℓ | resource blocks to accommodate all active devices in this cluster.

In-block support recovery at devices
During the signal acquisition phase, each active device also collects its own measurements, which are linear combinations of the transmitted signatures from other active devices. In this section, we address Problem 2 in Sect. 2.3. The objective is to develop a scheme that enables each active device in cluster ℓ ∈ {1, · · · , L} to reliably estimate the in-block support S I,ℓ with low computational complexity, based on its local measurements and the feedback from the BS as side information.
Given the measurement matrix A under the random structured model in Sect. 3.1 and the pre-channel correction in (4), the measurement y D collected at an active device is given by According to the specific structure of A, the corresponding submatrix A −,ℓ for cluster ℓ ∈ {1, · · · , L} has only T rows with non-zero components. The indices of these rows are collected in the set D ℓ . Furthermore, with the feedback information from the BS on the collision pattern Q ℓ for cluster ℓ indicating those collided measurements to be discarded, we form an index set U ℓ = D ℓ ∩Q ℓ . Therefore, in order to perform the in-block support (12) |S I,ℓ | = 1 α 2 ·x ℓ , for ℓ ∈ {1, 2, · · · , L}. recovery of x ℓ at any device from cluster ℓ , we simply need to focus on y D,ℓ -a vector composed of the entries of y D corresponding to U ℓ . We denote Ã D,ℓ as a |U ℓ | × d submatrix of Ã with vertical concatenation of rows corresponding to U ℓ and columns for block ℓ . Therefore, we have As introduced in Sect. 2.2, we use technologies such as frequency hopping [26] for the transmission where symbols are transmitted hopping over multiple subcarriers. Since the channels between the devices over different subcarriers are assumed to be i.i.d., we can conclude that Ã D,ℓ has independent columns and row-blocks (e.g., to be i.i.d Subgaussian). Therefore, some classic CS decoding algorithms can be applied to perform the in-block support recovery. We argue in favor of the greedy algorithms such as OMP [21] due to their low complexity, which is particularly attractive to M2M-based applications where limited computational capability as well as energy consumption at the devices are important design criteria. An example of the modification on OMP for in-block support recovery is summarized in Algorithm 1, where the modified steps are marked in boldface. By exploiting the broadcast information on the cardinality of the in-block support |S I,ℓ | for cluster ℓ , the stopping criteria for implementing the greedy algorithms can be set by limiting the number of iterations to |S I,ℓ | , thereby leading to further reduced computational complexity. This is merely an illustrative example assuming that channel knowledge is available at the devices. However, this assumption can be further relaxed by using approximate message passing (AMP) algorithms as in [31].
To this end, Problem 2 is explicitly resolved. In Sect. 4.2, we prove conditions under which the in-block support S I,ℓ can be accurately reconstructed at the devices in cluster ℓ . Thus, the activation pattern x ℓ of the ℓ-th cluster can be precisely reconstructed and detected by the active devices. Thereafter, each active device is able to learn the active ranking in its cluster and accesses the corresponding resource blocks assigned by the BS. (14) y D,ℓ =Ã D,ℓ x ℓ +ǫ D .

Theoretical performance analysis
This section provides a sufficient condition for the performance guarantees of our proposed scheme. In particular, we come with the following theorem. A rigorous proof of Theorem 1 will be presented in the following, considering both the block support recovery at the BS and in-block support recovery at the devices, respectively. And we also show that it achieves a better scaling when compared with classical CS-based approaches both in terms of communication overhead and computational complexity.

Block support recovery at BS
First, we analyze the recovery guarantee for the individual estimate x t,i in (7).

Lemma 1 Suppose that x ∈ B N is a (K B , K I ) block-sparse signal over block size d, and
A ∈ R M×N is randomly drawn from A(R, T , L, d, α) . Given y ∈ R M in (4) and the estimate x t,i in (7) for a particular entry x i from block ℓ ∈ {1, · · · , L} and a given t ∈ {1, · · · , T } , let Ŵ x t,i := {|x t,i − α 2 |S I,ℓ || ≤ 3γ } , where γ 2 is the variance of the noise term � t,ℓ in (10).

The probability of Ŵ(x t,i ) is lower bounded by 1 Proof
According to (10), for a particular estimate x t,i of x i in block ℓ with t ∈ {1, · · · , T } , Ŵ(x t,i ) holds w.h.p. if the corresponding interference term � t,ℓ = 0 , since the noise term ǫ t,ℓ is randomly drawn from a Gaussian ensemble with zero mean and variance γ 2 [32]. A sufficient (but not necessary) condition for � t,ℓ = 0 to hold is that the set S B ∩ S t,ℓ \{ℓ} = ∅ , where S t,ℓ = {j = {1, · · · , L} : q t,j = q t,ℓ } . This implies that q t,ℓ is distinct from q t,l for all l ∈ S B \{ℓ} . Therefore, we have = P(q t,ℓ � = q t,l ) where (a) follows since the index of rows with non-zero entries q t,ℓ are drawn i.i.d. uniformly at random for each ℓ ∈ {1, · · · , L} and |S B | = K B , and the inequality in (b) follows from the Bernoulli's inequality [33].
Then we look into the recovery guarantee for the block estimate x ℓ in (11) obtained via the median operator.

Lemma 2
Suppose that x ∈ B N is a ( K B , K I ) block-sparse signal over block size d, and A ∈ R M×N is randomly drawn from A(R, T , L, d, α) . Given y ∈ R M in (4) and the block estimate x ℓ in (11) for a particular block ℓ ∈ {1, · · · , L} , let Ŵ(x ℓ ) := {|x ℓ − α 2 |S I,ℓ || ≤ 3γ } , where γ 2 is the variance of the noise term � t,ℓ in (10). The probability of Ŵ(x ℓ ) is lower bounded by where 0 < δ < 1 is an arbitrary target error bound.

Proof
As in (11), the block estimate x ℓ is obtained by taking the median of the individual estimate x t,i over O(Td) samples for a given ℓ ∈ {1, · · · , L} . Suppose at least Td 2 estimates x t,i fulfills the Ŵ condition of Lemma 1, then Ŵ(x ℓ ) in Lemma 2 will follow affirmatively. We analyze in the following where Ŵ(x t,i ) holds for at least Td 2 individual estimates.
Let X 1 , · · · , X Td be independent (0,1) Bernoulli random variables where X t , 1 ≤ t ≤ Td indicates whether the corresponding estimate x t,i of x i satisfies the Ŵ condition of Lemma 1. As proved in Lemma 1, the probability of each X t being equal to 1 is p ≥ 1 − K B −1 R . Then the probability that the number of simultaneous occurrence of the events {X t = 1} exceeds Td/2 is given by [32] A lower bound on this probability can be calculated using Chernoff 's inequality [34] to obtain The minimum bound can be easily found as achieved by p = 1/2 . By setting a lower threshold θ ∈ ( 1 2 , 1) to the probability p, we have p ≥ 1 − K B −1 R ≥ θ from which it follows that Given the proof of Lemma 2, the overall performance guarantee for the block support recovery at the BS follows inherently.

Lemma 3
Suppose that x ∈ B N is a (K B , K I ) block-sparse signal over block size d, and A ∈ R M×N is randomly drawn from A (R, T , L, d, α) . Given y ∈ R M in (4) and the block estimate x ℓ in (11) for block ℓ ∈ {1, · · · , L} , the probability that Ŵ(x ℓ ) in Lemma 2 satisfies for all blocks ℓ ∈ {1, · · · , L} is lower bounded by where 0 < δ < 1 is an arbitrary target error bound.

Proof
Since the block estimate x ℓ for each block is i.i.d., the probability that Ŵ(x ℓ ) in Lemma 2 holds for all ℓ ∈ {1, · · · , L} is obtained as where the second inequality follows from Bernoulli's inequality [33] since δ/L ≪ 1 .

Proof
According to (7), for a particular entry x i of x from block ℓ ∈ {1, · · · , L} and a given t ∈ {1, · · · , T } , the calculation of its corresponding estimate x t,i only involves a single multiplication. Thereafter, the block-wise estimate x ℓ in (11) is obtained as the median of x t,i over O(Td) samples. The computational complexity for finding the median of an unsorted array with N elements is of O(N ) by using the median-of-medians algorithm [35]. Moreover, since the submatrix A t,ℓ in (6) has same non-zero entries on the same row, the calculation cost for the block-wise estimate x ℓ can be further reduced to T times multiplication and O(T ) operations to find the median, resulting in the computational complexity of 4 An alternative way of proving this bound is by considering that p ≥ 1 − 1 R K B −1 ≥ θ which follows from the derivation in (19). In this case, we obtain again R ≥ Remark If the cluster size d scales linearly with the increasing network size N, i.e., d = O(N ) , the term N d turns to be an arbitrary constant value. Thus, the algorithm achieves sublinear complexity of O(log N ) that scales significantly better than conventional approaches.
In short, the above analysis shows that by choosing R and T large enough, i.e., R = O(K B ) and T = O(log N ) (for a total of M = O(K B log N ) measurements), a reliable block support recovery at the BS can be guaranteed w.h.p. and the computational complexity is of O( N d log N ).

Proof
As shown in (14), the effective measurements that can be used to perform the in-block support recovery of x ℓ for a given ℓ ∈ {1, · · · , L} comprise only the entries of y D indexed by D ℓ \Q ℓ , where D ℓ is the index set of rows with non-zero components in A −,ℓ , and Q ℓ is the set of collided measurements feedback by the BS. By Lemma 1 and by the fact that the individual estimates are independent, it follows that the overall number of effective measurements T I for the in-block support recovery of an active cluster ℓ ∈ S B can be estimated as To elaborate the in-block support recovery at a device using Algorithm 1, for simplicity we assume that Ã D,ℓ in (14) is real-valued and the system is noise-free. However, the scheme can be easily extended to complex settings and noisy cases as in [36]. Since Ã D,ℓ has independent columns and row-blocks, it follows the column-independent model [37, p. 49] for Subgaussian matrices. Herein Ã D,ℓ can be decomposed as Ã D,ℓ = �Q and we have y D,ℓ = �Qx ℓ = �x ℓ , where is the column-normalized matrix of Ã D,ℓ and Q is a diagonal matrix with each diagonal entry to be the original norm of the corresponding column. Furthermore, [38] provides a sufficient condition on the measurement matrix for uniform and robust sparse signal recovery, which is the well-known Restricted isometry property (RIP). Moreover, according to [37], random matrices with i.i.d. Subgaussian entries and normalized columns have optimal RIP, therefore follows the RIP condition and is admissible for reliable sparse signal recovery. It has been further investigated in (26) It is further verified in [40] that the computational complexity is of O(NK 2 ) for sparse signal recovery via OMP. Thus, since the signal to be reconstructed for in-block support recovery at a device is of dimension d and sparsity level K I , O(dK 2 I ) operations are sufficient for decoding with the modified OMP algorithm. Remark If the signal is treated as a conventional K-sparse vector (where K = K B K I ) as in [11] without exploiting knowledge of the block-sparse structure, a sufficient condition for reliable signal recovery using OMP would be M = O(K log N ) = O(K B K I log N ) with computational complexity is of O(NK 2 ) . As M is the length of the unique signature transmitted by an active device, it is also an indication of the signal acquisition time or the communication overhead. Since d ≪ N and K I ≪ K , we can see that from the scaling point of view, both the communication cost and the computational complexity are significantly reduced by the proposed scheme.

Results and discussion
We conduct numerical experiments to verify the performance of the proposed distributed device detection and resource allocation scheme. Mapped into our mathematical model, the target problem is to reconstruct a K-sparse binary vector of length N from M distributed measurements obtained via the measurement matrix A ∈ R M×N which (27) (28) M = O(max{K B log N , K B K I log d}). is randomly drawn from A(R, T , L, d, α) introduced in Sect. 3.1. We compare the performance of the proposed scheme to conventional CS-based approaches, among which we take the classic greedy algorithm OMP with random Gaussian measurements as the baseline. We assume for the baseline that the signal to be reconstructed is treated as a conventional K-sparse vector, and centralized decoding is performed without exploiting knowledge of the block-sparse structure.
In our simulations, we take the number of devices N in the network within the range [10 3 , 10 6 ] and they are partitioned into clusters with equal size d = 100 . The sparsity level of the signal is set to be K = 20 , with block sparsity K B = 4 and in-block sparsity K I = 5 , respectively. For each plot we average over 1000 pairs of realizations of the measurement matrix and the block-sparse signal. Figure 3a and b give an intuitive comparison between the proposed scheme and standard OMP with Gaussian measurements for reliable signal recovery, in terms of both the length of signatures transmitted by the active devices and the number of operations conducted by the algorithms. It can be seen that the proposed scheme requires a significantly reduced signature length and computational complexity, especially when the number of devices in the network becomes excessively large. As the signature length also implies the signal acquisition time or communication overhead in distributed systems, the proposed scheme leads to a drastically reduced communication cost. Figure 4a depicts the detection success probability as a function of the signature length for the proposed scheme and the baseline, while taking the network size N = 10 4 . The performance is evaluated for the noise-free case. We can see that the proposed scheme significantly outperforms standard OMP, where less measurements are required by the proposed scheme to achieve the same detection success probability as the baseline. Figure 4b further extends the evaluation to noisy settings as well as with imperfect channel knowledge. The performance for the noisy case is evaluated by setting the signal-tonoise ratio (SNR) to 5 dB in the simulations. In addition, since the channel estimation error can be treated as a component that contributes as an additional source of distortion independent of noise [41], it is modeled as an additive noise term in the measurement matrix with the same variance as the white Gaussian noise. We can see that the proposed scheme achieves significantly higher detection success probability than the baseline under noisy conditions. The performance gain mainly comes from the reliable detection of in-block support cardinality of the active clusters using the sketching algorithm, which sets an appropriate stopping criteria for Algorithm 1 and minimizes the occurrence of false alarms in the detection. Therefore, the proposed scheme shows strong robustness in the presence of noise and imperfect channel knowledge.
We also compare performance of the proposed scheme with two classical random access schemes, namely the LTE RA procedure [5] and the conventional cluster-based approach [42] where a cluster head aggregates messages/requests for the rest of the devices in the cluster and initiates the RA procedure on behalf of the cluster members. We set the number of measurements M to be 839 bits in the simulation-same as the length of Zadoff-Chu sequence [5] used for the LTE RA procedure, thus the three schemes are running with the same signature length. The sparsity level K = K B K I is set within the range between 10 and 100. Figure 5a depicts the detection probability by the three schemes as a function of the number of active devices in the network (i.e., the sparsity level), and Fig. 5b plots the averaged access delay performance of the three schemes. It can be easily observed that the proposed scheme significantly outperforms the LTE RA procedure both in terms of higher detection success probability and reduced access delay, thus achieving much better scalability with the increasing network size and leading to more robustness in the detection process. Moreover, when compared with the cluster-based approach, the proposed scheme also achieves better detection performance if the sparsity level is sufficiently large ( ≤ 70). Meanwhile, since the proposed distributed scheme is able to avoid the excessive communication and coordination between the devices as well as to the infrastructure as required by the cluster-based approach, the signaling overhead is substantially reduced and thus leading to significantly decreased access latency.
However, there are still some limitations on the proposed approach, especially on the requirement of perfect synchronization during the acquisition phase and priori knowledge of the channel state by the M2M devices. Further extensions to relax these limitations will be investigated in future work.

Conclusion
This work utilizes the framework of CS for detection of the network activation pattern to facilitate distributed resource allocation in large-scale M2M communication networks. The particular block-sparsity in the activation pattern of the M2M devices is exploited, thus mapping the objective into a support recovery problem for a particular blocksparse signal-with additional in-block structure-in CS based applications. The detection techniques are mainly based on sketching and greedy algorithms, which inherit the virtues of low computational complexity. Furthermore, by applying the distributed ranking-based resource allocation scheme, each active device decides autonomously on which resource to access the channel in a contention-free manner without further coordination, thus excessive control overhead is avoided. It has been verified via theoretical analysis that a ( K B , K I ) block-sparse binary signal x ∈ B N over block size d can be reliably reconstructed using the proposed scheme with O(max{K B log N , K B K I log d}) measurements and computational complexity of O(dK 2 I + N d log N ) , which achieves a better scaling compared with conventional CS based approaches. Furthermore, the simulation results also reveal the strong robustness of the proposed scheme under noisy conditions and with imperfect channel knowledge.

M2M
Machine-to-machine CS Compressed sensing IoT Internet of Things BS Base station OMP Orthogonal matching pursuit AMP Appropriate message passing RIP Restricted Isometry property SNR Signal-to-noise ratio