MRCS: matrix recovery-based communication-efficient compressive sampling on temporal-spatial data of dynamic-scale sparsity in large-scale environmental IoT networks

Xu, Zhonghu; Zhang, Linjun; Shen, Jinqi; Zhou, Hao; Liu, Xuefeng; Cao, Jiannong; Xing, Kai

doi:10.1186/s13638-018-1312-1

Research
Open access
Published: 23 January 2019

MRCS: matrix recovery-based communication-efficient compressive sampling on temporal-spatial data of dynamic-scale sparsity in large-scale environmental IoT networks

Zhonghu Xu¹,
Linjun Zhang¹,
Jinqi Shen¹,
Hao Zhou¹,
Xuefeng Liu²,
Jiannong Cao³ &
…
Kai Xing ORCID: orcid.org/0000-0002-3449-8842¹

EURASIP Journal on Wireless Communications and Networking volume 2019, Article number: 18 (2019) Cite this article

1634 Accesses
3 Citations
Metrics details

Abstract

In the past few years, a large variety of IoT applications has been witnessed by fast proliferation of IoT devices (e.g., environment surveillance devices, wearable devices, city-wide NB-IoT devices). However, launching data collection from these mass IoT devices raises a challenge due to limited computation, storage, bandwidth, and energy support. Existing solutions either rely on traditional data gathering methods by relaying data from each node to the sink, which keep data unaltered but suffering from costly communication, or tackle the spacial data in a proper basis to compress effectively in order to reduce the magnitude of data to be collected, which implicitly assumes the sparsity of the data and inevitably may result in a poor data recovery on account of the risk of sparsity violation.

Note that these data collection approaches focus on either the fidelity or the magnitude of data, which can solve either problem well but never both simultaneously. This paper presents a new attempt to tackle both problems at the same time from theoretical design to practical experiments and validate in real environmental datasets. Specifically, we exploit data correlation at both temporal and spatial domains, then provide a cross-domain basis to collect data and a low-rank matrix recovery design to recover the data. To evaluate our method, we conduct extensive experimental study with real datasets. The results indicate that the recovered data generally achieve SNR 10 times (10 db) better than compressive sensing method, while the communication cost is kept the same.

1 Introduction

As Internet-of-Things (IoT) applications are proliferating rapidly in recent years, the vastly distributed IoT devices and the resultant large volume of sensing data attract lots of research effort and have triggered a wide range of applications, such as smart city, transportation, and agriculture, owing to its capability of completing complex social and geographical sensing applications. Such social and geographical sensing usually requires large amounts of participants (usually IoT devices) to sense the surrounding environment and collect data from the sensing devices to a data sink due to limited computation, storage, and energy support.

Due to the scale of mass data generated in IoT networks, it is difficult to continuously gather the original data from the network, since such collection usually requires considerable effort of communication and storage at intermediate nodes. A traditional way of solving this problem includes wavelet-based collaborative aggregation [1], cluster-based aggregation and compression [2, 3], and distributed source coding [4, 5]. All of them utilize the spatial correlation of device readings among device nodes. But they may meet robustness issues when dealing with cross-domain (temporal and spatial) event readings and behave limited capacity in compression. In recent years, it is suggested that compressed sensing (CS) may benefit the compression in data aggregation scenarios. It avoids introducing excessive computational, communication, and storage overheads at each device. Therefore, it meets the capacity limitation at each sensing device and is viewed as a promising technology for data gathering in IoT networks.

However, compressive sensing is based on constant sparsity, which means a stable/fixed transform basis (though unnecessary to known) is required according to prior information of sensed data. Such a situation hardly holds in real cases, and data with changing sparsity would impact the recovery quality significantly. In order to address this problem, Wang et al. [13] proposed an adaptive data gathering scheme based on CS. The “adaptive” here has twofold meaning: for one, the CS reconstruction becomes adaptive to the sensed data, which is accomplished by the adjustment of autoregressive (AR) parameters in the objective function, and for the other, the number of measurements required to the sensed data is turned adaptively according to the variation of data. To further deal with the varied sensing data, Wang et al. suggested that each time when the reconstruction is accomplished at sink node, the result is approximately evaluated and forms a feedback to the device nodes. The intuition here is that the temporal correlation between historically reconstructed data could help estimate current reconstruction result at sink node. It is notable that compression of original readings with CS-based method [6–8] or matrix completion-based method [9, 10] will reduce the quality of recovered data at sink node and routing the raw data to sink to preserve the fidelity brings considerable overhead. There is a conflict between high compression ratio and high fidelity.

Data gathering and recovery with event readings is another problem studied in compressive sensing-based data gathering. A well-known method to tackle this problem is to decompose data d into d_n+d_α, assuming d_α is sparse in time domain since the abnormal readings are usually sporadic. However, when environment changes occur, it may result in a significant amount of readings beginning to change, which would further make d_α not sparse in spatial domain. Besides, though d_n is sparse in spatial domain under a proper basis and d_α is sparse in time domain, they are not necessarily sparse at the same time under the same basis. Therefore, it is doubtful that the sparsity of d could preserve cross time and space domain. Furthermore, the proper basis may vary in accordance with different events.

In this paper, we consider the temporal and spatial correlation of the sensed data and provide a low-rank matrix recovery-based data aggregation design, which could compress the data and address the event data gathering problem at the same time. Compared with the existent work of data gathering in device networks, our approach has the following contributions:

In IoT sensing and data aggregation scenarios, note that either the fidelity problem or the magnitude problem can be solved well but never both simultaneously. This paper presents a new attempt to tackle both problems at the same time in large-scale IoT networks with diverse time/space-scale events and reduce global-scale communication cost without introducing intensive computation or complicated transmissions at each IoT device.
The experiments of this paper on a real environmental IoT sensing network observe that constant sparsity hardly holds in real cases with diverse time/space-scale events, while low-rank property may be true. This observation may provide a fresh vision for research in both compressive sampling applications and IoT sensing and data aggregation scenarios. This paper further generalizes the low-rank-based optimization design to a nuclear norm-based optimization design, to make the proposed approach more general and robust.
Theoretical analysis indicates that our matrix recovery-based method is robust over diverse time/space-scale event readings. The extensive experimental results show that event readings are almost kept unaltered under the proposed design and our method outperforms typical compressive sensing [11] in terms of SNR by 10 times (10 db) generally in the meanwhile.

This paper is organized as follows. Section 2 introduces the preliminaries and the network model. Section 3 proposes the data gathering and recovery design. Section 4 analyzes communication overhead of the proposed method with comparison to compressive sensing. Section 5 presents the experimental results with real environmental IoT sensing datasets from [12]. Then, we summarize related work in Section 6. Finally, we give out the discussion in Section 7 and conclude this paper in Section 8.

2 Preliminaries and assumptions

2.1 Matrix recovery

Let X denote the original data, where X is a M×N matrix and is no longer to be sparse even in a proper basis (different from compressive sensing). Let rank(X)=r, where r is assumed to be much smaller than min{M,N}.

According to [13], in order to recover X from a linear combinations of X_ij, the number of the combinations needed is no larger than cr(M+N), where c is a constant.

Let A denote a linear map from R^M×N space to R^p space; we have the following optimization problem:

$$ \underset{\boldsymbol{X}}{\text{min}} ||\boldsymbol{X}||_{*} \quad \text{s.t.} A\boldsymbol{X} = \boldsymbol{b} $$

(1)

where b is the vector, and ||·||_∗ denotes the nuclear norm (the sum of σ_ii in SVD decomposition). Note that we replace the rank (the number of the nonzero singular values) with the nuclear norm (the sum of the singular values), which makes the problem become a convex optimization problem and be solvable if p≥Cn^5/4rlogn (where n=max(M,N) and C is a constant) [13]. Considering noisy measurements, we further modify the problem in the following format:

$$ \underset{\boldsymbol{X}}{\text{min}}\quad \mu ||\boldsymbol{X}||_{*}+ ||A\boldsymbol{X-b}||_{L_{2}} \quad \text{s.t.} A(\boldsymbol{X})=\boldsymbol{b} $$

(2)

2.2 Network model

We consider a participatory IoT sensing network in which a base station (BS) continuously collects data from participatory IoT devices. Due to the scale of mass data and the limited ability in computation, storage, bandwidth, and energy at each device, these devices need to compress data with light computation overhead before data transmission. Suppose there are N resource-constrained IoT devices in the network, whose positions can be determined after deployment via a self-positioning mechanism such as those proposed in [14–16]. Then, the data collection path could be predetermined by the base station and be aware by each device. We further assume that the clocks of all nodes are loosely synchronized [17–19]. In particular, t₁,t₂,⋯,t_i,⋯,t_j,⋯ are used to represent the time instants in the network, where t_i<t_j given i<j, i,j∈Z⁺. Every time instance, a device generates a reading. $\mathcal {N}(u)$ is denoted as the n nearest neighbors of an open neighborhood of u. Note that $\mathcal {N}(u)$ could be the one-hop neighborhood or any neighboring area containing more nodes.

In this paper, we assume that participatory IoT devices follow a semi-honest model [20]. Specifically speaking, they are honest and follow the protocol properly except that they may record intermediate results. We assume that the messages are securely transmitted within the network, which can be achieved via conventional symmetric encryption and key distribution schemes.

3 Temporal-spatial compressive sampling design

3.1 Problem formulation

Given M time instances and N devices in the network, the original data in the network can be represented by a m×n matrix X, where each row represents the readings in the network at a time instant and each column represents the readings of an IoT device at a different time instant. X_ij(1≤i≤M,1≤j≤N) denotes the reading of each node j at time instant t_i.

Let A denote a linear map from R^M×N space to R^p space and vec denote a linear map to transform a matrix to a vector by overlaying one column on another; we have

$$A(\boldsymbol{X})=\boldsymbol{\Phi} \cdot vec(\boldsymbol{X}) $$

where Φ is a p×MN matrix. Let Φ be a random matrix satisfying the RIP condition [21]. Before deployment, each device is equipped with a pseudo-random number generator. Once the device produces a reading at some time instance, the pseudo-random number generator will generate a random vector of length p with the combination of current time instance and the device’s ID as random seed. The elements of this random vector is i.i.d. sampled from a Gaussian distribution of mean 0 and variance 1/p. Note that this pseudo-random number generation at each device could be reproduced by the base station by using the same generator.

p, the dimension of A, means the number of elements (namely the combinations) to recover X. Typically, p should be not less than cr(3m+3n−5r) [13]. Therefore, the problem can be formulated as the following optimization problem:

$$ \underset{\boldsymbol{X} \in R^{M\times N}}{\text{min}}\quad \frac{1}{2} ||A(\boldsymbol{X})-b||^{2}_{F}+ \mu ||\boldsymbol{X}||_{*} \quad \text{s.t.} A{\boldsymbol{X}}=\boldsymbol{b} $$

(3)

where the first part is for noise and the second part is for low rank.

Remark 1

In IoT networks, devices may produce erroneous readings due to noisy environment or error-prone hardware. The erroneous readings usually occur at sporadic time and locations and thus may have few impacts on the data sparsity of the network. Thus, outlier/abnormal reading recovery/detection could still work in compressive sensing-based data gathering. However, device measurements on the same event usually have strong inter-correlations and geographically concentrated in a group of devices in close proximity. Such events may spread in diverse time and space scale and result in dynamic sparsity of the data, which would further violate the assumption of constant sparsity in compressive sensing and thus lead to poor recovery.

Remark 2

Given N M−dim signal vectors generated from N devices within M time instances, a good basis to make these vectors sparse may not be easy to find. Interestingly, [22] has analyzed different sets of data from two independent device network testbeds. The results indicate that the N×Mdim data matrix may be approximately low rank under various scenarios under investigation. Therefore, such N×M temporal-spatial signal gathering problem with diverse scale event data that cannot be well addressed by CS method could be tackled under the low-rank framework^{Footnote 1}.

3.2 Path along compressive collection

In this paper, we provide a generalization of current data gathering methods on temporal-spatial signals with diverse scale events, during which device readings are compressively collected along the relay paths, e.g., chain-type or mesh topology, to the sink.

At each device s_j, given the reading produced from s_j at time instance t₁, s_j generates a random vector Φ_1j of length p, with time instance t₁ and its ID s_j as the seed, and computes the vector X_1jΦ_1j. At the next time instance t₂, s_j generates a random vector Φ_2j, computes X_2jΦ_2j, and adds it to the previous vector X_1jΦ_1j. At time instance t_M, s_j computes X_MjΦ_Mj and would have the summation $S_{j}=\sum \limits ^{M}_{i=1} \boldsymbol {X}_{ij}\Phi _{ij}$.

In the network, each device s_j continuously updates its vector sum S_j till time instance t_M. After that, device s_j relays the vector S_j to the next device s_i. Then, s_i adds S_j with its vector sum S_i and forwards S_i+S_j to the next device. After the collection along the relay paths, the sink receives $\sum \limits ^{M}_{i=1} \boldsymbol {X}_{ij}\Phi _{ij}$.

Remark 3

During data gathering, each node sends out only one vector of fixed length along the collection path, regardless of the distance to the sink (The property of the fixed-length vector will be discussed in Section 4).

Considering event data, recall that the row of data matrix X (the signal in the network) represents the data acquired at some time instance from all devices and each column of matrix X represents the data got from one device at different time instances.

Outlier readings could come from the internal errors at error-prone devices, for example, noise, systematic errors, or caused by external events due to environmental changes. Former internal errors are often sparse in spatial domain, while the latter readings are usually low rank in time domain. They both keep sparse at the corresponding domain but together may lead to dynamic changes of data sparsity.

Let matrix X be decomposed into two parts, the normal one and the abnormal one: X=X_n+X_s. We could have:

$$ \begin{aligned} A\boldsymbol{X}&=A\left(\boldsymbol{X}_{n}+\boldsymbol{X}_{s}\right)\\ &=A\cdot[\!I,I]\left(\boldsymbol{X}_{n},{X}_{s}\right)^{T}\\ &=[\!A,A]\left[\boldsymbol{X}_{n},\boldsymbol{X}_{s}\right]^{T} \quad \text{s.t.} A(\boldsymbol{X})=\boldsymbol{b}\\ \end{aligned} $$

(4)

Based on Eq. 1, [A,A] is a new linear map. The formulated problem could be solved in the framework of matrix recovery. That is, given the observation vector y∈R^p, the original data matrix X^∗ could be recovered in R^2M×N.

3.3 A basic design of data recovery

This section provides the generalization of data recovery method from compressive sensing to the realm of matrix recovery. The advantages of such an extension are twofold: (1) it exploits the data correlation in both time and space domains and (2) the diverse scale of event data, which would mute the power of CS method due to sparsity changes, could be tackled with the proposed method.

According to Eqs. 3 and 4, the general form of the problem could be expressed with the following minimization problem:

$$ \underset{\boldsymbol{X} \in R^{m\times n}}{\text{min}}\quad \frac{1}{2} ||A(\boldsymbol{X})-b||^{2}_{F}+ \mu ||\boldsymbol{X}||_{*} \quad \text{s.t.} A(\boldsymbol{X})=\boldsymbol{b} $$

(5)

where A(x)=ΦT(x) given T(·) as the transformation of a matrix to a vector by overlaying one column of x on another. Φ is a p×MN random matrix.

Note that Eq. 3 is the Lasso form of Eq. 2. In relaxed conditions, its solution is the solution of Eq. 2 [23]. Therefore, we consider Eq. 3 (Eqs. 3 and 5 are essentially same) instead of the original problem in Eq. 2.

This problem could be further transformed into the following form:

$$ \underset{\boldsymbol{X} \in R^{m\times n}}{\text{min}}\quad F(x)\triangleq f(x) +P(x) $$

(6)

where $f(x)=\frac {1}{2} ||A(\boldsymbol {X})-b||^{2}_{F}$ and P(x)=μ||X||_∗

Note that both parts are convex, but only the first part is differential while the second part may not. Then, we could have

$$\nabla f(\boldsymbol{X})= A^{*}(A(\boldsymbol{X})-b) $$

where A^∗ is the dual operator of A.

Since A^∗(X)=Φ^TX, we have

$$\nabla f(\boldsymbol{X})= A^{*}(A(\boldsymbol{X})-b)=\Phi^{T}(\Phi^{*}T(X)-b)^{*} $$

Because ∇f is linear, it is Lipschitz continuous. Then, we could have a positive constant L_f to satisfy the following inequation:

$$||\nabla f(\boldsymbol{X})-\nabla f(\boldsymbol{Y})||_{F}\leq L_{f} ||\boldsymbol{X}-\boldsymbol{Y}||_{F}\quad\forall \boldsymbol{X},\boldsymbol{Y}\in R^{M\times N} $$

Lemma 1

A rough estimation of L_f

$$\sqrt{MN\cdot \underset{i}{\text{max}}\: \left\{\left(\Phi^{T}\Phi\right)^{2}_{i})\right\}}, $$

where $\left (\Phi ^{T}\Phi)^{2}_{i}\right)^{2}$ is the ith column of the matrix Φ^TΦ.

Proof

$||\nabla f(\boldsymbol {X})-\nabla f(\boldsymbol {Y})||_{F}^{2}=||\Phi ^{T}(\Phi ^{*}T(\boldsymbol {X}-\boldsymbol {Y}))||^{2}_{2}$

Set $\Phi ^{T}\Phi = \left (\begin {array}{ccc} a_{11} & \cdots & a_{1,MN} \\ \vdots & & \vdots \\ a_{p1} &\cdots &a_{p,MN} \end {array}\right)$,

$T(X-Y)=\left (\begin {array}{c} x_{11} \\ \vdots \\ x_{MN} \end {array}\right)$,

and $h= \underset {i}{\text {max}}\; \left \{\left (\Phi ^{T}\Phi \right)^{2}_{i}\right)$,then

$$\begin{aligned} ||\Phi^{T} \Phi T(X-Y) ||^{2}_{2}&=\sum\limits^{p}_{j=1}\left(\sum\limits^{MN}_{i=1} a_{ji}x_{i}\right)\\ &\leq h(x_{1}+\ldots+x_{MN})^{2}\\ &\leq M\cdot N \cdot h(x_{1}+\ldots+x_{MN})^{2}\\ &=MNh||X-Y||^{2}_{F} \end{aligned} $$

Thus, $L_{f}\leq \sqrt {MNh}$. □

Remark 4

A much smaller L_f could be found in various real scenarios and may help converge quickly. The experimental results of this paper show that the L_f could be much smaller than the rough estimation above, given the matrix sampled from a Gaussian distribution.

Considering the following quadratic approximation of F(·) of Eq. 6 at Y:

$$ \begin{aligned} Q_{\tau}(X,Y)&\triangleq f(Y)+<\nabla f(Y),X-Y>\\ &\quad+\frac{\tau}{2}||X-Y||^{2}_{F} +P(X)\\ &= \frac{\tau}{2}||X-G||^{2}_{F}+P(X)+f(Y)\\ &\quad-\frac{1}{2\tau}||\nabla F(Y)||^{2}_{F}\\ \end{aligned} $$

(7)

where τ>0 is a given parameter, G=Y−τ⁻¹∇f(Y).

Since the above function of X is strong convex, it has a unique global minimizer.

Considering the minimization problem

$$ \underset{X\in R^{M\times N}}{\text{min}}\quad \frac{\tau}{2}||X-G||^{2}_{F}+\mu||X||_{*} $$

(8)

where G∈R^M×N. Note that if G=Y−τ⁻¹A^∗(A(Y)−b), then the above minimization problem is a special case of Eq. 7 with $f(X)=\frac {1}{2}||A(X)-b||^{2}_{2}$ and P(X)=μ||X||_∗ when we ignore the constant term.

Let S_τ(G) denote the minimizer of (6). According to [24], we further have

$$S_{\tau}(G)=U\cdot diag((\delta-\mu/\tau)_{+})\cdot V^{T} $$

given the SVD decomposition of G=Y−τ⁻¹A^∗(A(Y)−b)=U·diag(δ)·V^T. Here, for a given vector x∈R^p, we let x₊=max{x,0} where the maximum is taken component-wise.

Based on the accelerated proximal gradient(APG) design given [13, 24], we further denote t₀=t₁=1 and τ_k=L_f and {X_k},{Y_k},{t_k} as the sequence generated by APG. For i=1,2,3,⋯, we have

Step 1: Set $Y_{k}=X_{k}+\frac {t^{k-1}-1}{t^{k}}\left (X_{k}-X_{k-1}\right)$
Step 2: Set G_k=Y_k−(τ_k)⁻¹A^∗(A(Y_k)−b). Compute $S_{\tau _{k}}(G_{k})$ from the SVD of G_k
Step 3: Set $X^{k+1}=S_{\tau _{k}}(G_{k})$
Step 4: Set $t_{k+1}=\frac {1+\sqrt {1+4(t_{k})^{2}}}{2}$

Lemma 2

For any μ>0, the optimal solution X^∗ of Eq. 3 is bounded according to [13, 24]. And ||X||_F<χ where

$$ \chi = \left\{ \begin{array}{ll} min\left\{||b||^{2}_{2}/(2\mu), ||X_{LS}||_{*}\right\} & \text{if A is surjective}\\ ||b||^{2}_{2}/(2\mu) & \text{Otherwise} \end{array} \right. $$

(9)

with X_LS=A^∗(AA^∗)⁻¹b

Based on this lemma, we could reach a deterministic estimation of the procedure and speed of convergence of data recovery.

Let {X_k},{Y_k},{t_k} be the sequence generated by APG. Then, for any k≥1, we could have

$$F(X_{k})-F(X^{*})\leq \frac{2L_{f}||X^{*}-X_{0}||^{2}_{F}}{(k+1)^{2}} $$

Thus,

$$F(X_{k})-F(X^{*})\leq \varepsilon \quad \text{if}\quad k\geq \sqrt{\frac{2L_{f}}{\varepsilon}}(||X_{0}||_{F}+\chi)-1. $$

Let δ(x) denote dist(0,∂(f(x))+μ||X||_∗), where δ(x) represents the convergence speed of data recovery. It is easy to see that the process naturally stops when δ(x) is small enough.

Since ||X||_∗ is not differential, it may not be easy to compute δ(x). However, there is a good upper bound for δ(x) provided by APG designs [24].

Given

$$\begin{aligned} \tau_{k}(G_{k}-X_{k+1})&=\tau_{k}(Y_{k}-X_{k+1})-\nabla f(Y_{k})\\ &=\tau_{k} (Y_{k}-X_{k+1})\\ &\quad-\Phi^{T}(\Phi\cdot vec(Y_{k})-b) \end{aligned} $$

Note that

$$\partial (\mu ||X_{k+1}||_{*}) \geq \tau_{k}(G_{k}-X_{k+1}) $$

let

$$\begin{aligned} S_{k+1}&\triangleq \tau_{k}(Y_{k}-X_{k+1})+\nabla f(X_{k+1})-\nabla f(Y_{k})\\ &= \tau_{k}(Y_{k}-X_{k+1}) +A^{*}(A(X_{k+1})-A(Y_{k}))\\ &=\tau_{k}(Y_{k}-X_{k+1})+\Phi^{T}(\Phi \cdot T(Y_{k}-X_{k+1})) \end{aligned} $$

we could have

$$S_{k+1} \in \partial (f(X_{k+1})+\mu ||X_{k+1}||_{*}) $$

Therefore, we have δ(X_k+1)≤||S_k+1||.

According to the derivation above, the stopping condition could be given as follows,

$$\hspace{45pt} \frac{||S_{k+1}||_{F}}{\tau_{k} \text{max}\{1,||X_{k+1}||_{F}\}}\leq Tol $$

where Tol is a tolerance defined by user, usually moderately small threshold.

4 Advanced design of data recovery

This section provides a generalization of previous low-rank-based matrix recovery design to a nuclear-form-based design. Suppose X₀ denotes an M×N matrix with rank r given the singular value decomposition (SVD) UΣV^∗, where M≤N, Σ is r×r, U is M×r, and V is N×r.

Let subspace T denote the set of matrices of the form UY^∗+XV^∗, where X (Y) is an arbitrary M×r (N×r) matrix. UY^∗ and XV^∗ are both M×N matrices. The span of UY^∗ and XV^∗ have dimension of Mr and Nr, respectively, and the intersection of two spans has dimension of r². Therefore,

$$d_{T} = dim(T) = r(M + N-r) $$

Let T^⊥ denote the subspace of matrices spanned by the family (xy^∗) and x and y denote arbitrary vectors orthogonal to U and V, respectively. Note that the spectral norm ||·|| is dual to the nuclear norm. We have the subdifferential of the nuclear norm at X₀

$$\partial ||\boldsymbol{X}_{0}||_{*}=\{\boldsymbol{Z} : P_{T}(\boldsymbol{Z})=\boldsymbol{UV^{*}} and ||P_{T^{\perp}}(\boldsymbol{Z}) ||\leq 1\} $$

where UV∗ is equal to $\sqrt r$ under the Euclidean norm.

Theorem 1

Given X₀, an arbitrary M×N rank-r-matrix, and ||·||, the matrix nuclear norm, considering a Gaussian mapping Φ with m≤c·r(3M+3N−5r) for some c>1, the recovery is exact with probability at least 1−−2e^(1−c)n/8, where n=max(M,N) [13].

Here the Gaussian mapping Φ is an M×N random matrix with i.i.d., zero-mean Gaussian entries with variance 1/p. It adopts a linear operator where $[\Phi (Z)]_{i} = \text {tr}(\Phi _{i}^{*}\cdot Z)$.

By stacking the vector(column) of Z on top of one another, Φ could be equivalently written by a p×(MN) dimensional matrix. Then, we have the dual multiplier

$$Y = \Phi^{*}\cdot\Phi_{T} (\Phi^{*}_{T}\cdot\Phi_{T})^{-1}(UV^{*}) $$

Remark 5

According to this theorem, each device sends out only one vector of fixed length of cr(3M+3N−5r) along the collection path at the end of time M, with an overwhelming recovery probability of the original data.

Given p<(M+N−r)r, we could always find two distinct matrices Z and Z₀ of rank at most r with the property A(Z)=A(Z₀), no matter what A is. Let U∈R^M×r,V∈R^N×r be two matrices with orthogonal columns, considering the linear space of matrices

$$T = \left\{UX^{*}Y V^{*} : X \in R^{N\times r}, Y \in R^{N*r}\right\} $$

Note that the dimension of T is r(M+N−r); if p<(M+N−r)r, there exists Z=UX^∗YV^∗=0 in T such that Φ(Z)=0, since we have Φ(UX^∗)=Φ(YV^∗) for two distinct matrices of rank at most r. Interestingly, different from the results in compressive sensing, the number of measurements required is within a constant of a theoretical lower limit—No extra log factor.

Comparing with compressive sensing (CS)-based data gathering design, the length of vector sent by each device at each time instance with CS -based design is O(logN). Based on recent results on the bounds for low-complexity recovery models [25], the total amount of vectors collected during all M time instances will be O(MN log(N)) in compressive sensing.

When M is larger than O(N/ log(N)), the proposed design will exhibit advantage in communication overhead. When M and N have the same order of magnitude, the proposed method has similar communication overhead compared with CS-based method.

Before estimating the recovery error and its upper bound, we first introduce the restricted isometry property (RIP):

Definition 1

Let r=1,2,…,n, the isometry constant δ_r of A is the smallest quantity such that

$$(1-\delta_{r}) || X ||^{2}_{F}\leq || A(X) ||^{2}_{2}\leq (1 +\delta_{r}) || X ||^{2}_{F} $$

holds for all matrices of rank at most r.

If δ_r is bounded by a sufficiently small constant between 0 and 1, we say that A satisfies the RIP at rank r.

Theorem 2

Suppose X^∗ is the solution of the recovery method. Given the noise z satisfying ||Φ^∗(z)||≤ε and ||ΦT(z)||_∞≤η, for some ε≤η, if $\delta _{r}<\frac {1}{3}$ with r≤2, then ||X−X^∗||_F≤(ε+η)₊

Based on this theorem, given a random matrix Φ properly chosen from i.i.d. zero-mean Gaussian distribution with variance 1/p, the error of the proposed method could be bounded under the noise.

Based on the above analysis, we could see that (1) the vector kept by each device is bounded by cr(3M+3N−5r) and (2) the communication overhead of the network, i.e., the total number of message, is O(Ncr(3M + 3N − 5r)). Comparing with the overhead of CS-based data gathering method, which are M·(2cs logN + s) and MN·(2cs logN+s), respectively, for low-rank data [25], it is easy to see that the proposed method could outperform CS-based data gathering methods in terms of communication given a large collection period M. To compare with matrix completion-based method, we take STCDG proposed in [22] as an example under the same assumption that N nodes are deployed randomly. According to [13], the overhead of STCDG can be derived as O(n^5/4N^1/2r logn)(n=max(M,N)). STCDG may suffer a much larger overhead compared with our method under large-scale IoTs.

Remark 6

According to the analysis, the larger the sampling period M at each device, the better the communication overhead efficiency the proposed method has.

5 Results

To evaluate data recovery quality and robustness of the proposed method, we conduct experiments on both artificial datasets and real sensor datasets. Artificial datasets are constructed by a 100×100 matrix representing a random deployed sensor network of 100 nodes within a 100-h duration. The real sensor datasets are extracted from CitySee project [12], which has deployed a large-scale wireless sensor network consisting of multiple sub-networks in a urban area in Wuxi, China. Specially, we compare the proposed method with a compressive sensing (CS)-based method proposed in [11] on the temperature and humidity data generated from 55 sensors in 115 h.

The CS-based method proposed in [11] generates sampling matrix randomly and keeps original readings sparse in DCT domain. To detect abnormal readings, [11] decomposes the original reading d=d₀+d_s where d₀ contains the normal readings and d_s contains the deviated values of abnormal readings and constructs a sparse basis for d=[d₀,d_s]. The sink reconstructs sensor readings with linear programming (LP) techniques [26]. We generate sampling matrix with the same distribution in CS-based method and our proposed method with sampling rate about 47% on original readings (115×55 matrix).

In data gathering and recovery problem, event readings usually result in dynamic and diverse sparsity changes in both time and space domains, which may seriously undermine the foundation of CS-based method during environment changes. Prior works [6, 27–30] have made an attempt to tackle data recovery with small-scale event readings with CS-based methods, e.g., events reported from several devices brought by device accidents or small-range environment change. However, when events spread in large range and various time scales, it is doubtful whether CS-based data gathering method could deal with it or not. In this paper, we conduct the experiments to study the data recovery quality on the data with both small-range events and large-range events on both CS-based method and the proposed method.

5.1 Recovery quality and robustness study on data with large-scale events

As shown in Figs. 1 and 2, the proposed method achieves high recovery quality with large-range event in Figs. 1c and 2c. Event readings are recovered almost exactly the same as the original data in spatial domain. As shown in Fig. 3b, c, although large-range event leads to dynamic and diverse scale of sparsity changes and brings more challenge to data recovery, the proposed method generally achieves about 10 db better recovery quality than that of CS-base method. We further confirm the observation in (1) snapshot in spatial domain and time domain of humidity data with large-range event recovered by the proposed method and CS-based method at the 5th, 25th, and 50th nodes in Figs. 4, 5, and 6, respectively, and (2) snapshot in spatial domain of temperature data with large-range event at 115 h recovered by MR method and CS-based method in Fig. 7.

In the meanwhile, CS-based method could not recover the data as shown in Figs. 1b and 2b. The recovered data in the event area are almost overwhelmed in the noise due to the changes of the sparsity foundation of CS-based method. And the recovery quality of CS method in other areas (except event area) is affected by event readings due to the violation of static sparsity. Therefore, CS-based method has limited recovery capability and less robustness against large-scale event compared with the proposed method.

5.2 Recovery quality and robustness study on data with small-scale events

As shown in Fig. 8, the humidity data with small-range event is plotted in 3D contour maps. It is obvious that the proposed method recovers the data in high quality. Event readings are easy to observe by the small hill in the map of Fig. 8a, c, while CS-based method can only recover the data to some degree as shown in Fig. 8b, since event readings are recovered in low quality as the recovered data in the event area are almost overwhelmed in the noise. What is worse is that some areas are obviously altered due to the change of sparsity. It is easy to find that CS-based method provides much worse recovery robustness against small-scale event compared with the proposed method.

As shown in Fig. 3b, our method generally achieves about 10 db better recovery quality than CS-based method under small-scale events. We further confirm the observation in comparison of snapshot in time domain of humidity data with small-range event recovered by the proposed method with that of CS-based method at arbitrary mode respectively in Fig. 9.

5.3 Recovery quality and robustness study on data without events

The temperature data and humidity data (original, recovered by CS-based method and recovered by the proposed method) are plotted in a 3D contour map in Figs. 10 and 11. As shown in Figs. 10c and 11c compared with Figs. 10a and 11a, the contour map of data recovered by the proposed method makes little change compared with the original data. As artificial datasets are generated randomly, the 2D contour map in Fig. 12 gives the comparison more obviously. We also plot the temperature data and humidity data in 2D contour maps as shown in Figs. 13 and 14 to make the result more clear. The result is further confirmed on the study of recovery quality quantitatively measured in SNR. Assuming X_j∈R^M denoting the reading of the jth node and $\widehat {X}_{j}$ denoting the recovered reading respectively, SNR of node j in time domain is defined as ${SNR}_{j}=-20\log _{10}\frac {||X_{j}-\widehat {X_{j}}||_{2}}{||X_{j}||_{2}}$. It is shown that our proposed method achieves about 20 db gain in the recovered data in Fig. 3a. We also measure the recovery performance by the root mean square error (RMSE). In time domain, the RMSE of node j is ${RMSE}_{j}=\sqrt {\frac {\sum _{i=1}^{M}{\left (\widehat {X_{ij}}-X_{ij}\right)^{2}}}{M}}$. In spatial domain, the RMSE of time slot i is ${RMSE}_{t_{i}}=\sqrt {\frac {\sum _{j=1}^{N}{\left (\widehat {X_{ij}}-X_{ij}\right)^{2}}}{N}}$. RMSE measurement on temperature and humidity data is shown in Figs. 15 and 16 which indicates that our method brings less error than CS method.

As shown in Fig. 10b, compressive sensing-based method can recover the temperature data in some degree. It is interesting to observe that CS-based method could hardly keep recovery quality stable, while the proposed method can achieve much better recovery quality as well as robustness. It can be further confirmed with the comparison of the SNR result of both methods at each sensor. The proposed method outperforms CS-based method in SNR with about 10 times (10 db) as shown in Fig. 3a.

6 Related work

In device networks, data gathering usually results in considerable communication overhead. Traditional approaches dealing with such problem include distributed source coding [31, 32], in-network collaborative wavelet transform [33–35], holistic aggregation [36], and clustered data aggregation and compression [37, 38]. Though these approaches to some extent utilize the spatial correlation of device readings, they lack the ability to support the recovery of diverse-scale events.

In the past decade, compressive sensing (CS) has gained increasing attention due to its capacity of sparse signal sampling and reconstruction [39, 40] and triggered a large variety of applications, ranging from image processing to gathering geophysics data [41].

In terms of data gathering, various CS-based approaches have been proposed to the decentralized data compression and gathering of networked devices, aiming to efficiently collect data among a vast number of distributed nodes [6, 27, 28]. Liu et al. [7] present a novel compressive data collection scheme for IoT sensing networks adopting a power-law decaying data model verified by real data sets. Zheng et al. [8] propose another method handling with data gathering in IoT sensing networks by random walk algorithm. Xie and Jia [42] develop a clustering method that uses hybrid CS for device networks reducing the number of transmissions significantly. Li et al. [43] apply compressive sensing technique into data sampling and acquisition in IoT sensing networks and Internet of Things (IoT). Mamaghanian et al. [44] propose the potential of the compressed sensing for signal acquisition and compression in low complexity ECG data in wireless body device networks (WBSN). Zhang et al. [29] propose a compressive sensing-based approach for sparse target counting and positioning in IoT sensing networks. Tian and Giannakis [45] utilize compressed sensing technique for the coarse sensing task of spectrum hole identification. In addition, there are several papers researching in CS for device network focusing on throughput, routing, video streaming processing, and sparse event detection in [30, 46–48].

Cheng et al. [49] focus on dealing with continuous sensed data. Extracting kernel or dominant dataset from big sensory data in WSN provides another compressing method in [50, 51].

In recent years, low-rank matrix recovery (LRMR) extends the vectors’ sparsity to the low rank of matrices, becoming another important method to obtain and represent data after CS given only incomplete and indirect observations [10]. Keshavan et al. compared the performance of three matrix completion algorithms based on low-rank matrix completion with noisy observations [52]. Zhang et al. [53] present a spatio-temporal compressive sensing framework on Internet traffic matrices. Yi et al. [9] take advantage of both the low-rankness and the DCT compactness features improving the recovery accuracy. Compared with prior work based on LRMR, our method achieves better compression ratio and lower communication overhead.

7 Discussion

According to the analysis and experimental study, it is interesting to observe that the proposed method enables IoT networks the ability of dealing with both fidelity problem and magnitude problem simultaneously with diverse time/space-scale events and reduce global-scale communication cost without intensive edge computation.

The experiments of this paper on a real environmental IoT sensing network also reveal that constant sparsity hardly holds in real cases with diverse time/space-scale events, while low-rank property may be true. While events may violate constant sparsity in compressive sensing and reduce the recovery quality severely, the recovery quality of the proposed method still keeps the fidelity of event readings, which is about 10 times (10 db) better than typical compressive sensing [11] in terms of SNR. This observation may provide a fresh vision for research in both compressive sampling applications and IoT sensing and data aggregation scenarios.

However, it is worth noting that there is still limitation in the cases that low rank property does not hold in the network. To deal with this problem, this paper further generalizes the low-rank-based optimization design to a nuclear norm-based optimization design, to make the proposed approach more general and robust. In future work, we would like to focus on enhancing the performance of our method in IoT networks with events.

8 Conclusion

In this paper, we have shown the effectiveness and validity of cross-domain matrix recovery in data compression, gathering, and recovery through the study on environmental IoT sensing datasets. It is obvious that the proposed method could be further extended to a large variety of other IoT application scenarios. In particular, we have demonstrated the capacity of the proposed MRCS method dealing with both data fidelity and magnitude problems simultaneously in data gathering of IoT networks, via both theoretical analysis and experimental study. The results show that the proposed MRCS method outperforms the original CS method in terms of recovery quality. Our work provides a new approach in both compressive sampling applications and IoT networks with diverse time/space-scale events and suggests a general design given the relaxation from low-rank-based optimization to nuclear norm-based optimization.

Notes

Indeed, the matrix could be recovered by solving the nuclear-norm based MR optimization problem rather than the low rank based MR optimization problem, the details would be elaborated in Section 4.

Abbreviations

BS:: Base station
CS:: Compressed sensing
IoT:: Internet of Things
LRMR:: Low-rank matrix recovery
MR:: Matrix recovery
SNR:: Signal to noise ratio

References

A. L. S. Orozco, J. R. Corripio, J. C. Hernandez-Castro, Source identification for mobile devices, based on wavelet transforms combined with sensor imperfections. Computing. 96(9), 829–841 (2014).
Article Google Scholar
P. Kasirajan, C. Larsen, S. Jagannathan, A new data aggregation scheme via adaptive compression for wireless sensor networks. ACM Trans. Sens. Netw.9(1), 1–26 (2012).
Article Google Scholar
G. Yang, M. Xiao, S. Zhang, Data aggregation scheme based on compressed sensing in wireless sensor network. Netw. J.8(1), 556–561 (2013).
Google Scholar
C. Tapparello, O. Simeone, M. Rossi, Dynamic compression-transmission for energy-harvesting multihop networks with correlated sources. IEEE ACM Trans. Netw.22(6), 1729–1741 (2014).
Article Google Scholar
A. Zahedi, J. Ostergaard, S. H. Jensen, P. Naylor, S. Bech, in Data Compression Conference (DCC), 2014. Distributed remote vector gaussian source coding for wireless acoustic sensor networks (IEEE, 2014), pp. 263–272.
J. Haupt, W. U. Bajwa, M. Rabbat, R. Nowak, Compressed sensing for networked data. IEEE Signal Process. Mag.25(2), 92–101 (2008).
Article Google Scholar
X. Y. Liu, Y. Zhu, L. Kong, C. Liu, Cdc : Compressive data collection for wireless sensor networks. IEEE Trans. Parallel Distrib. Syst.26(8), 2188–2197 (2015).
Article Google Scholar
H. Zheng, F. Yang, X. Tian, X. Gan, X. Wang, S. Xiao, Data gathering with compressive sensing in wireless sensor networks: a random walk based approach. IEEE Trans. Parallel Distrib. Syst.26(1), 35–44 (2014).
Article Google Scholar
K. Yi, J. Wan, T. Bao, L. Yao, A DCT regularized matrix completion algorithm for energy efficient data gathering in wireless sensor networks. Int. Distrib. J. Sensor Netw.2015(1), 96 (2015).
Google Scholar
S. Maok, Y. Xie, Maximum entropy low-rank matrix recovery. IEEE J. Sel. Top. Sign. Process. (2017). IEEE.
C. Luo, F. Wu, J. Sun, C. W. Chen, in Proceedings of the 15th annual international conference on Mobile computing and networking. Compressive data gathering for large-scale wireless sensor networks (ACM, 2009), pp. 145–156.
X. Mao, X. Miao, Y. He, X. Y. Li, Y. Liu, in INFOCOM, 2012 Proceedings IEEE. Citysee: urban CO2 monitoring with sensors (IEEE, 2012), pp. 1611–1619.
E. J. Candes, B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math.9(6), 717 (2009).
Article MathSciNet Google Scholar
X. Cheng, A. Thaeler, G. Xue, D. Chen, in INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies. vol. 4. TPS: a time-based positioning scheme for outdoor wireless sensor networks (IEEE, 2004), pp. 2685–2696. https://doi.org/10.1109/INFCOM.2004.1354687.
F. Liu, X. Cheng, D. Hua, D. Chen, in Wireless Sensor Networks and Applications. TPSS: a time-based positioning scheme for sensor networks with short range beacons (Springer, 2005), pp. 175–193.
A. Thaeler, M. Ding, X. Cheng, iTPS: an improved location discovery scheme for sensor networks with long-range beacons. Parallel. J. Distrib. Comput.65(2), 98–106 (2005).
Article Google Scholar
K. Sun, P. Ning, C. Wang, Secure and resilient clock synchronization in wireless sensor networks. IEEE Sel. J. Areas Commun.24(2), 395–408 (2006).
Article Google Scholar
L. Chen, J. Leneutre, in Parallel Processing Workshops, 2006. ICPP 2006 Workshops. 2006 International Conference on. A secure and scalable time synchronization protocol in ieee 802.11 ad hoc networks (IEEE, 2006), pp. 8–pp.
K. Römer, in Proceedings of the 2nd ACM international symposium on Mobile ad hoc networking & computing. Time synchronization in ad hoc networks (ACM, 2001), pp. 173–182. https://doi.org/10.1145/501416.501440.
O. Goldreich, Foundations of cryptography volume 2. 2(1), 1–14 (2004). Cambridge University Press.
E. J. Candes, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies?IEEE Trans. Inf Theory. 52(12), 5406–5425 (2004).
Article MathSciNet Google Scholar
J. Cheng, Q. Ye, H. Jiang, D. Wang, C. Wang, STCDG: an efficient data gathering algorithm based on matrix completion for wireless sensor networks. IEEE Trans. Wirel. Commun.12(2), 850–861 (2013).
Article Google Scholar
E. Richard, P. A. Savalle, N. Vayatis, Estimation of simultaneously sparse and low rank matrices. Int. Conf. Mach. Learn., 51–58 (2012).
K-C Toh, S Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pac. Optim. J.6(3), 615–640 (2009).
MathSciNet MATH Google Scholar
E. Candes, B. Recht, Simple bounds for low-complexity model reconstruction. Acta Bot. Gallica Bull. Soc. Bot Fr.156(3), 477–486 (2011).
Google Scholar
D. L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory. 52:, 1289–1306 (2006).
Article MathSciNet Google Scholar
C. Luo, F. Wu, J. Sun, C. W. Chen, in Proceedings of the 15th annual international conference on Mobile computing and networking. Compressive data gathering for large-scale wireless sensor networks (ACM, 2009), pp. 145–156.
J. Wang, S. Tang, B. Yin, X. Y. Li, in INFOCOM, 2012 Proceedings IEEE. Data gathering in wireless sensor networks through intelligent compressive sensing (IEEE, 2012), pp. 603–611.
B. Zhang, X. Cheng, N. Zhang, Y. Cui, Y. Li, Q. Liang, in INFOCOM, 2011 Proceedings IEEE. Sparse target counting and localization in sensor networks based on compressive sensing (IEEE, 2011), pp. 2255–2263.
J. Meng, H. Li, Z. Han, in Information Sciences and Systems, 2009. CISS 2009. 43rd Annual Conference on. Sparse event detection in wireless sensor networks using compressive sensing (IEEE, 2009), pp. 181–185.
A. Zahedi, J. Ostergaard, S. H. Jensen, P. Naylor, S. Bech, in Data Compression Conference (DCC), 2014. Distributed remote vector Gaussian source coding for wireless acoustic sensor networks (IEEE, 2014), pp. 263–272.
A. J. Aljohani, S. X. Ng, L. Hanzo, Distributed source coding and its applications in relaying-based transmission. IEEE Access. 4:, 1940–1970 (2016).
Article Google Scholar
A. Ciancio, S. Pattem, A. Ortega, B. Krishnamachari, in Proceedings of the 5th international conference on Information processing in sensor networks. Energy-efficient data representation and routing for wireless sensor networks based on a distributed wavelet compression algorithm (ACM, 2006), pp. 309–316.
M. Crovella, E. Kolaczyk, in INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies. Graph wavelets for spatial traffic analysis, vol. 3 (IEEE, 2003), pp. 1848–1857.
X. H. Xu, X. Y. Li, P. J. Wan, S. J. Tang, Efficient scheduling for periodic aggregation queries in multihop sensor networks. IEEE/ACM Trans. Netw.20(3), 690–698 (2012).
Article Google Scholar
J. Li, S. Cheng, Y. Li, Z. Cai, Approximate holistic aggregation in wireless sensor networks. ACM Trans. Sensor Netw.13(2), 11 (2017).
Article Google Scholar
A. Sinha, D. K. Lobiyal, Performance evaluation of data aggregation for cluster-based wireless sensor network. Human-centric Comput. Inf. Sci.3(1), 13 (2013).
Article Google Scholar
X. Xu, R. Ansari, A. Khokhar, A. V. Vasilakos, Hierarchical data aggregation using compressive sensing (HDACS) in WSNs. ACM Trans. Sensor Netw.11(3), 1–25 (2015).
Article Google Scholar
E. J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory. 52(2), 489–509 (2006).
Article MathSciNet Google Scholar
D. L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory. 52(4), 1289–1306 (2006).
Article MathSciNet Google Scholar
S. Qaisar, R. M. Bilal, W. Iqbal, M. Naureen, S. Lee, Compressive sensing: from theory to applications, a survey. Commun. J. Netw.15(5), 443–456 (2013).
Article Google Scholar
R. Xie, X. Jia, Transmission-efficient clustering method for wireless sensor networks using compressive sensing. IEEE Trans. Parallel Distrib. Syst.25(3), 806–815 (2014).
Article Google Scholar
S. Li, L. D. Xu, X. Wang, Compressed sensing signal and data acquisition in wireless sensor networks and Internet of Things. IEEE Trans. Ind. Inform.9(4), 2177–2186 (2013).
Article Google Scholar
H. Mamaghanian, N. Khaled, D. Atienza, P. Vandergheynst, Compressed sensing for real-time energy-efficient ECG compression on wireless body sensor nodes. IEEE Trans. Biomed. Eng.58(9), 2456–2466 (2011).
Article Google Scholar
Z. Tian, G. B. Giannakis, in IEEE International Conference on Acoustics, Speech and Signal Processing. Compressed sensing for wideband cognitive radios (Michigan Technological Univ Houghton, 2007), pp. 1357–1360.
J. Luo, L. Xiang, C. Rosenberg, in Communications (ICC), 2010 IEEE international conference on. Does compressed sensing improve the throughput of wireless sensor networks? (IEEE, 2010), pp. 1–6.
S. Lee, S. Pattem, M. Sathiamoorthy, B. Krishnamachari, A. Ortega, in International Conference on GeoSensor Networks. Spatially-localized compressed sensing and routing in multi-hop sensor networks (Springer, 2009), pp. 11–20.
S. Pudlewski, A. Prasanna, T. Melodia, Compressed-sensing-enabled video streaming for wireless multimedia sensor networks. IEEE Trans. Mob. Comput.11(6), 1060–1072 (2012).
Article Google Scholar
S. Cheng, Z. Cai, J. Li, Curve query processing in wireless sensor networks. IEEE Trans. Veh. Technol.64(11), 5198–5209 (2015).
Article Google Scholar
S. Cheng, Z. Cai, J. Li, H. Gao, Extracting kernel dataset from big sensory data in wireless sensor networks. IEEE Trans. Knowl. Data Eng.29(4), 813–827 (2017).
Article Google Scholar
S. Cheng, Z. Cai, J. Li, X. Fang, in Computer Communications (INFOCOM), 2015 IEEE Conference on. Drawing dominant dataset from big sensory data in wireless sensor networks (IEEE, 2015), pp. 531–539.
R. H. Keshavan, A. Montanari, S. Oh, in Communication, Control, and Computing, 2009. Allerton 2009. 47th Annual Allerton Conference on. Low-rank matrix completion with noisy observations: a quantitative comparison (IEEE, 2009), pp. 1216–1222.
Y. Zhang, M. Roughan, W. Willinger, L. Qiu, in ACM SIGCOMM Computer Communication Review. Spatio-temporal compressive sensing and Internet traffic matrices, vol. 39(4) (ACM, 2009), pp. 267–278.

Download references

Acknowledgements

Not applicable.

Funding

This work was financially supported by NSFC 61332004.

Availability of data and materials

The experiment is based on CitySee project [12]. We conduct the experiment on a subnetwork of 55 sensors within 115 h duration. The data can be found at: https://github.com/oleotiger/experimental-data.

Author information

Authors and Affiliations

School of Computer Science, University of Science and Technology of China, Hefei, Anhui, 230026, China
Zhonghu Xu, Linjun Zhang, Jinqi Shen, Hao Zhou & Kai Xing
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Xuefeng Liu
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
Jiannong Cao

Authors

Zhonghu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Linjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jinqi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiannong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Kai Xing
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors have contributed jointly to the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Kai Xing.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Xu, Z., Zhang, L., Shen, J. et al. MRCS: matrix recovery-based communication-efficient compressive sampling on temporal-spatial data of dynamic-scale sparsity in large-scale environmental IoT networks. J Wireless Com Network 2019, 18 (2019). https://doi.org/10.1186/s13638-018-1312-1

Download citation

Received: 11 August 2018
Accepted: 30 November 2018
Published: 23 January 2019
DOI: https://doi.org/10.1186/s13638-018-1312-1

MRCS: matrix recovery-based communication-efficient compressive sampling on temporal-spatial data of dynamic-scale sparsity in large-scale environmental IoT networks

Abstract

1 Introduction

2 Preliminaries and assumptions

2.1 Matrix recovery

2.2 Network model

3 Temporal-spatial compressive sampling design

3.1 Problem formulation

Remark 1

Remark 2

3.2 Path along compressive collection

Remark 3

3.3 A basic design of data recovery

Lemma 1

Proof

Remark 4

Lemma 2

4 Advanced design of data recovery

Theorem 1

Remark 5

Definition 1

Theorem 2

Remark 6

5 Results

5.1 Recovery quality and robustness study on data with large-scale events

5.2 Recovery quality and robustness study on data with small-scale events

5.3 Recovery quality and robustness study on data without events

6 Related work

7 Discussion

8 Conclusion

Notes

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords