Quantized Network Coding for correlated sources

In this paper, we present a data gathering technique for sensor networks that exploits correlation between sensor data at different locations in the network. Contrary to distributed source coding, our method does not rely on knowledge of the source correlation model in each node although this knowledge is required at the decoder node. Similar to network coding, our proposed method (which we call Quantized Network Coding) propagates mixtures of packets through the network. The main conceptual difference between our technique and other existing methods is that Quantized Network Coding operates on the field of real numbers and not on a finite field. By exploiting principles borrowed from compressed sensing, we show that the proposed technique can achieve a good approximation of the network data at the sink node with only a few packets received and that this approximation gets progressively better as the number of received packets increases. We explain in the paper the theoretical foundation for the algorithm based on an analysis of the restricted isometry property of the corresponding measurement matrices. Extensive simulations comparing the proposed Quantized Network Coding to classic network coding and packet forwarding scenarios demonstrate its delay/distortion advantage.

including medicine, transportation, and military [1].As a relatively new technology, the trends and challenges are more felt in the networking aspects of communication than in the classic physical layer era [2].One of the introduced challenges is the gathering of sensed data at a central node of the network, where delivery delay, precision, and robustness to network changes are emerging issues.
As the conventional way of transmission in the networks, packet forwarding via routing is widely used in different implementations of sensor networks.While it achieves capacity rates in the case of lossless networks [3], packet forwarding requires an appropriate routing [4] protocol to be run.In the case of correlated sources, distributed source coding [5], [6] on top of packet forwarding is proved to be optimal, in terms of achieved conditional information rates [7].However, packet forwarding can lead to difficulties because of its need for queuing, and its slow adaptation to network changes, caused by deploying new node(s) or link failure(s).
These issues and a lot more have motivated the invention of network coding [3], as an alternative for packet forwarding in sensor networks [8], [9].Specifically, network coding sends a function of incoming packets to the intermediate nodes, as opposed to sending their original content.Furthermore, the usage of random linear functions, also known as random linear network coding, is proved to be sufficient in lossless networks [10], [11].Moreover, theoretical analysis shows that when network coding is used for transmission, no queuing is required to achieve the optimal information rates [3].Network coding in lossy networks can result in improved achieved rate regions, compared to packet forwarding [12], [13].
Similar to packet forwarding, network coding can be separately applied on top of distributed source coding for correlated sources [14], [15].On the other hand, one has to perform joint source network decoding in order to achieve optimal performance limits, which may not be feasible because of its computational complexity [15].Sub-optimal solutions have been proposed to tackle this practicality issue [16]- [18], by using low density codes and sum product algorithm [19] for decoding.Similar to the distributed source coding, which requires knowledge of appropriate marginal rates at each encoder node, these approaches need some knowledge of correlation model of sources, at the encoders' side.This knowledge of appropriate rates may be a luxury in some cases, especially when it is changing over time and needs to be updated frequently.Hence, it is essential to study the possibility of developing a non-adaptive joint source network coding for such cases.In this paper, we aim to develop a non-adaptive random linear network coding for efficient joint distributed source network coding of correlated sources in sensor networks.
Recently, the idea of using compressed sensing [20], [21] and sparse recovery concepts in sensor networks has drawn attention [22]- [25].For instance, in [26], [27], theoretical discussion on sparse recovery of graph constrained measurements with an interest in network monitoring application is presented.Joint source, channel and network coding was also proposed in [28], where random linear mixing was proposed for compression of temporally and spatially correlated sources.In [29], practical possibility of finite field network coding of highly correlated sources was investigated, with the aid of low density codes and belief propagation base decoding.Unfortunately, a solid theoretical investigation on the feasibility of adopting sparse recovery in random linear network coding has not been done previously.
In our earlier work [30], we proposed non-adaptive joint source network coding of exactly sparse sources, with the aid of the results in compressed sensing literature.In this paper, we extend our work to the general case of correlated sources and discuss theoretical and practical aspects of having robust distributed compression.
A detailed description of data gathering scenario with our notations is presented in section II.In section III, we introduce and formulate our proposed quantized network coding, which is followed by its theoretical feasibility discussion using restricted isometry property, in section IV.In section V, we present the decoding algorithm used to recover quantized network coded packets, and derive a performance bound on its recovery error.Our simulation setup and results are presented in section VI.Finally in section VII, we conclude the paper by discussion on our proposed method and the ongoing works on this topic.

II. PROBLEM DESCRIPTION AND NOTATION
In this paper, we consider a lossless model of networks, for which the links have limited capacities.
Although it may not be a perfect model in practical cases where the links have mutual interference, it still reflects the effect of such imperfectnesses when calculating single input single output capacity of the links.A future work may study the case of noisy networks of links, by understanding the effect of interference between the links.
As shown in Fig. 1, we represent the network by a directed graph, G = (V, E), where V and E are the sets of nodes (vertices) and directed edges (links).Each node, v, is from the finite sorted set V = {1, • • • , n} and each edge, e, is from the finite sorted set E = {1, • • • , |E|}.Further, each edge (link) can maintain a lossless transmission from tail(e) to head(e), at a maximum finite rate of C e bits per use.We define the sets of incoming and outgoing edges of node v, denoted by In(v) and Out(v), respectively, as follows:  (t) are the same and from a finite alphabet of size 2 LCe , where denotes truncation to the lower integer.In the rest of the paper, the realizations of all capital letter random variables are denoted by lower case letters.
The nodes of the network are equipped with sensors and specifically each node v has an information source, X v , where X v ∈ R. The sensed data are supposed to be correlated, as this is a valid assumption in a lot of different applications.We model the correlation between these sensed data, by the near-sparseness property, since it can be considered as a generalization of compressibility and sparseness.Specifically, by defining the sorted vector of X v 's: we assume that X is near-sparse in some orthonormal transform domain φ n×n . 1 Explicitly, for S = φ T •X, and a small positive k , we have: where S k is such that: and is called k-sparse.An example of the sparsifying transform matrix, φ, is the Karhunen Loeve transform of the messages.
Having these correlated information sources and the information network characterized, we study the transmission of X v 's to a single gateway node.The gateway or decoder node, denoted by v 0 , v 0 ∈ V, has high computational resources and is usually in charge of forwarding the information to a next level network; e.g. a wired backbone network.The described (single session) incast of sources to the unique decoder node is referred to as data gathering.The purpose of this paper is to discuss the theoretical and practical feasibility of non-adaptive joint source network coding in the described data gathering scenario.
More specifically, we take a compressed sensing approach in order to handle the transmission of sensed data.

III. QUANTIZED NETWORK CODING
Random linear network coding for multicast of independent sources has been proposed and studied in [11], where the algebraic operations are in finite field.Since our work is motivated by the concepts of compressed sensing, in which the results are valid in the infinite field of real number, we have to use a real field alternative for conventional finite field network coding.On the other hand, finite capacity of the edges has to be appropriately coped with the infinite number of symbols in the adopted real field network coding.As a result, we propose Quantized Network Coding (QNC), which uses quantization to bridge between the limited capacity of the links and infinite alphabet of real field network coded packets.
In [30], for ∀v ∈ V, ∀e ∈ Out(v), we defined QNC at node v, according to: where Y e (1) = 0, ∀e ∈ E, ensures initial rest condition in the network.The messages, X v 's are assumed to be constant until the transmission is complete. 2The local network coding coefficients, β e,e (t)'s and α e,v (t)'s are real valued and are usually picked semi-randomly.The quantizer operator, Q e ( ), corresponding to outgoing edge e, is designed based on the values of C e and L, and the distribution of its input (i.e.random linear combinations).A simple diagram of QNC at node v is shown in Fig. 2.
Denoting the quantization noise of Q e ( ) at time t, by N e (t), we can reformulate (6) as follows: We denote the adjacency matrix, [F (t)] |E|×|E| , and [A(t)] |E|×n matrix, such that: We also define the vectors of edge contents, Y (t), and quantization noises, N (t), according to: As a result, Eq. 7 can be re-written in the following form: Depending on the network deployment, matrix [B] |In(v0)|×|E| defines the relation between the content of edges, Y (t), and the received packets at the decoder node.Explicitly, we define the vector of marginal measurements (received packets) at time t at the decoder: where: By considering (12) as the difference equation, characterizing a linear system with X and N (t)'s as its inputs, and Z(t) its output, and using the results in [31], {Z(t)} i 's are given by: where the marginal measurement matrix, Ψ(t), and the marginal effective noise vector, N eff (t), are calculated as follows: In Eqs.16,17, the matrix multiplication is defined as: By storing Z(t)'s, at the decoder, we build up the total measurements vector, Z tot (t), as follows: where m = (t − 1)|In(v 0 )|.Therefore, the following can be established: where the m × n total measurement matrix, Ψ tot (t), and the total effective noise vector, N eff,tot (t), are the concatenation result of marginal measurement matrices, Ψ(t)'s, and marginal effective noise vectors, N eff (t).Because of our assumption to start transmission from t = 1, {Z(1)} i 's are not useful for decoding, and therefore: In the conventional linear network coding, the number of total measurements, m, is at least equal to the number of data, n.More precisely, the total measurement matrix is of full column rank, which makes us able to uniquely find a solution. 3In this paper, we are interested to investigate the feasibility of robust recovery of X, when fewer number of measurements are received at the decoder than the number of messages; i.e. m < n.
Considering the characteristic equation of (20), describing the QNC scenario, we can treat as a compressed sensing measurement equation.This gives us an opportunity to apply the results in the literature of compressed sensing and sparse recovery [20], [32] to our QNC scenario with near-sparse messages.However, one needs to examine the required conditions which guarantee sparse recovery in the proposed QNC scenario.In the following, we discuss theoretical and practical feasibility of robust recovery with a compressed sensing perspective.

IV. RESTRICTED ISOMETRY PROPERTY
One of the advantageous of compressed sensing is to tackle a non-adaptive design for sensing of sparse signals, where the support (location of non-zero elements) is not known at the encoding side.To pay back this non-adaptive characteristic, we may need more measurements than the exact number of non-zero elements.Fortunately, if appropriate types of linear measurements are chosen, we can keep the required number of measurements much less than the number of messages; that is: m n.
One of the properties that is widely used to characterize appropriate measurement matrices in the compressed sensing literature, is the Restricted Isometry Property (RIP) [33].Roughly speaking, it provides a measure of norm conservation while the dimensionality is reduced [34].An m × n matrix Θ tot (t) is said to satisfy RIP of order k with constant δ k , if for all k-sparse vectors s k ∈ R n , we have: Remark 4.1: Random matrices with identically and independently distributed (i.i.d) zero mean Gaussian entries are appropriate measurement matrices for compressed sensing.Explicitly, an m × n i.i.d Gaussian random matrix, denoted G, with entries of variance 1 m , satisfies RIP of order k and constant δ k , with a probability exceeding (called overwhelming probability) if In ( 24), ( 25), κ 1 and κ 2 only depend on the value of δ k (theorem 5.2 in [35]).
In [30], [36], we proposed a design for local network coding coefficients, β e,e (t)'s and α e,v (t)'s, which results in an appropriate total measurement matrix, Ψ tot (t), in the compressed sensing framework.
For such a scenario, the entries of the resulting Ψ tot (t) are zero mean Gaussian random variables.
It is also numerically shown in [36] that locally orthonormal set of β e,e (t)'s is a better choice than non-orthonormal sets; that is for all e, e ∈ Out(v), we have: e ∈In(v) In [36], we established the relation between the satisfaction of RIP and the tail probability by proving the following theorem.
Theorem 4.3: (Theorem 4.1 in [36]) Consider Ψ tot (t) with the tail probability, as defined in (28), and an orthonormal transform matrix φ.Then, Θ tot (t) = Ψ tot (t) • φ satisfies RIP of order k and constant δ k , with a probability exceeding, By using theorem 4. and have a nice mathematical conclusion about the required number of measurements.
In Fig. 3, we present the numerical values of tail probabilities (defined in Eq. 28) for the resulting Ψ tot (t), p tail (Ψ tot (t), ε), using the proposed local network coding coefficients in theorem 4.2.These tail probabilities are compared with those of i.i.d Gaussian matrices, G, versus the number of measurements, m, in each case. 4emark 4.4: Our numerical evaluations in Fig. 3 show that for the same value of tail probability, there is a QNC resulting measurement matrix, Ψ tot (t), and an i.i.d Gaussian matrix, G, which have the same order of measurements, m.Furthermore, using theorem 4.3, we can say the the resulting Ψ tot (t) has a similar behavior as Gaussian matrices, in terms of RIP satisfaction.
In the following section, we use the aforementioned conclusion for the resulting Ψ tot (t) to derive a performance bound for QNC scenario.

V. DECODING USING SPARSE RECOVERY
Sparse recovery for exactly sparse data can be done by using linear programming [37], where NPhard 0 minimization is replaced by 1 minimization.Fortunately, this alteration of cost function does not affect the optimality in recovery of exactly sparse vectors from noiseless measurements [32], [37].
However, when dealing with noisy measurements, 1 -min recovery does not necessarily offer an optimal solution.There is still a lot of work being done to develop practical and near-optimal recovery algorithms for noisy cases.In the following, we discuss 1 -min recovery for QNC scenario and establish theoretical bounds on its recovery error.
Motivated by the work in [20], [33], the compressed sensing based decoder for QNC scenario solves the following convex optimization: which can be solved by using linear programming [37].In the following, we present our results on the recovery error using 1 -min decoding of Eq. 30.
Theorem 5.1: Consider a QNC scenario where, for all v ∈ V, the network coding coefficients satisfy In such scenario, we assume that the resulting Θ tot (t) = Ψ tot (t)φ satisfies RIP of order 2k with constant δ 2k , where δ 2k < √ 2 − 1.Moreover, the messages, X v 's, are supposed to be bounded between −q max 2 rec (t) = 1 4 and +q max , and the edge quantizers, Q e ( )'s, are uniform with the step size Now, for the 1 -min decoding of (30) where 2 rec (t) is as defined in Eq. 33, and ∆ Q = [∆ e : e ∈ E], we have: In the inequality of ( 34), c 1 and c 2 , are constants, defined as follows: Proof: Since the network is lossless and network coding coefficients satisfy the condition of Eq.31, and |X v | ≤ q max , ∀ v, the only associated measurement noise is resulting from the quantization noise at the edges.Moreover, for each e ∈ E, we have: since the quantizers are uniform.Equivalently, the absolute value vector of N (t), represented by N (t) , is such that: Since B is a one-to-one mapping matrix with positive entries, the effective noise vector, N eff (t), can be upper bounded as follows: Therefore, using (38), we have: Finally, by using inequality of (43), N eff (t ) we can show that: such that rec (t) is as defined in Eq. 33.Now, by applying theorem 4.2 in [21], and using the definition of k in (4), we can finish the proof of our theorem.
According to the preceding theorem, the upper bound, c 1 rec , is decreased when the quantization steps, ∆ e 's, are decreased, too.And since ∆ e = 2q max / 2 LCe , a smaller upper bound on the 2 -norm of recovery error is forced by increasing the block length, L.Although this can be done practically, it will simultaneously increase the point to point transmission delays in the network, which may not be desirable.Introducing a trade-off on the choice of block length, one has to find its appropriate value for a specific quality of service (i.e.recovery error).
Based on theorem 5.1, if the resulting Ψ tot (t) satisfies RIP of appropriate order with a high probability, then the robust recovery can be guaranteed.On the other hand, using remarks 4.1,4.4,we can say that the resulting Ψ tot (t) satisfies RIP with a high probability, while the number of measurements, m, has a smaller order than the number of messages, n.Therefore, putting all these numerical and theoretical results together, it is true to say that QNC can result in bounded error recovery (34) with a smaller order of measurements (received packets at the decoder) than that of messages.This saving in the required number of received packets can be interpreted as an embedded distributed compression, achieved by quantized network coding at the nodes.

VI. SIMULATION RESULTS
In this section, we evaluate the performance of quantized network coding, by using different numerical simulations.We are interested to find out the compression achievements, resulting from QNC, by obtaining the delay-distortion curves in different scenarios.
Although we were able to derive mathematical performance measures for the QNC scenario, they are not comprehensive and do not offer any guarantee on the statistical performance measures; e.g.mean squared error.However, deriving such statistical performance bounds requires a lot more of theoretical work on the sparse recovery, and meanwhile we can only rely on the numerical evaluations.
We initiate our numerical evaluations, by comparing the delay-quality performance of QNC and conventional routing based packet forwarding for lossy transmission of a set of correlated sources (messages).To set up the simulations, we randomly generate random deployments of directed networks with uniformly distributed edges (making sure there is not any pair of nodes with two assigned edges).
The edges can maintain a lossless communication of one bit per use, meaning C e = 1, for all e ∈ E.
One of the nodes is randomly picked to be the gateway node, v 0 , in which the messages are decoded.To generate a realization of messages, x, we first generate a k-sparse random vector, s k , whose components are uniformly distributed between − 1 2 and + 1 2 .A near-sparse vector, s, is obtained by adding a zero mean uniform noise, such that ||s − s k || 2 is bounded by k .This is followed by generation of an orthonormal random matrix, φ, and calculating random messages; x = φ • s, and normalizing the range of x v 's, between −q max and +q max , where q max = 10.Different values of sparsity factor, k n , and k are used in our simulations.A summary of the simulation parameters is presented in Table I.
For each generated random network deployment, we perform QNC with 1 -min decoding.Local net-work coding coefficients, α e,v (t)'s and β e,e (t)'s, are generated according to the conditions of theorem 4.2.
The freedom degrees are limited by picking β e,e (t)'s such that they are locally orthonormal; The resulting coefficients are then normalized to satisfy the normalization condition of Eq. 31 and prevent overflow in the linear combination of QNC.Edge quantizers, Q e ( )'s, are uniform with a range of [−q max , +q max ] and 2 L intervals (since C e = 1, ∀e).This completes all the required parameters and vectors to simulate quantized network coding and obtain the received packets at the decoder node, z(t)'s. 5Random α e,v (2)'s can be generated in a pseudo-random way and therefore only the generator seed needs to be transmitted to the decoder as a header.
At the decoder, the received measurements up to t, z tot (t), are used to recover the original messages.
Specifically, for a realization of messages, x, we define xQNC (t), to be the recovered messages, using 1 -min decoding, according to (30).Moreover, the convex optimization, involved in ( 30) is solved by using the open source implementation of disciplined convex programming [38], [39].
For each deployment, we also simulated a routing based packet forwarding and compare it with the results for QNC.To find the routes from each node to the gateway node, we find the shortest path from each node to the gateway node, using the Dijkstra algorithm [40].Further, the real valued messages, x v 's, are quantized at their corresponding source nodes, by using similar uniform quantizers, as used in QNC transmission.It is aimed to deliver all x v 's to the decoder node and keep the track of delivered messages over time, t, in the recovered vector of messages, xPF (t).Moreover, if a message, x v , is not delivered by time index t, zero is used as its recovered value: The 2 norm of recovery error, ||x − x(t)|| 2 , is used as the quality measure in our numerical comparisons.The payback measure in our comparisons is the delivery delay, corresponding to achieve a minimum quality of service.Explicitly, delivery delay for a transmission which has terminated at t is equal to (t−1)L in both cases of QNC and packet forwarding. 6In QNC scenario, for each value of k n , and k , we calculate the average of x − xQNC (t)

2
's over different realizations of network deployments.
Since the sparsity of messages does not affect the performance of packet forwarding, we only need to present its results for different network parameters (i.e.number of edges).
For a fixed block length, L = 20, the average of 2 norm of recovery error versus the average delivery 5 Lower case notations are used for realization of random variables.As it is shown in Figs.4(b),4(a), when using the same block length, QNC achieves significant improvement, compared to PF, for low values of delivery delay.These low delays correspond to the initial t's in the transmission, at which a small number of packets are received at the decoder, as expected.
After enough packets are received at the decoder, QNC achieves its best performance (at around −20 [dB]), as a result of associated quantization noises.The best performance for packet forwarding happens after a longer period of time than for QNC.On the other hand, the best performance of PF (around −80 [dB]) is higher than that of QNC, which can be explained by noise propagation in the network during QNC.However, as it is shown in the following, QNC outperforms PF in a wide range of delay values, when the appropriate block length is adopted in each case.
After simulating QNC and PF scenarios for different block lengths and calculating the corresponding delay and recovery error norms, we find the best values of block length for each specific average 2 norm of recovery error (as a measure of quality of service).The resulting L-optimized curve for each QNC and PF scenario is depicted in Fig. 5.In Figs.5(a)-5(f), QNC performance is compared with that of PF, for different number of edges, different sparsity factors, and near-sparsity parameters.Generally speaking, these figures show a promising improvement over conventional packet forwarding, when QNC is adopted for transmission of near-sparse messages.The achieved improvement is increased as the sparsity factor, k n , is decreased, meaning a higher level of correlation between the messages.
As a drawback, QNC seems to fail when the sparsity model is not good for describing the correlation model.Specifically, if the near-sparsity parameter, k , is too high, then the resulting performance of QNC can not even achieve that of PF, for a wide range of delivery delays (see Fig. 5(b) for instance).
In Figs.6(a),6(b),6(c), the effect of k on the resulting QNC performance is illustrated.As it is shown, as long as k is small (relative to the 1 norm of message), there is not any difference in the QNC performance.But, if it is so high that the sparsity model does not characterize the messages fairly, then QNC fails to work properly (since 1 min decoding criteria is not a good cost function anymore).
In the routing based packet forwarding scenarios, the intermediate (sensor) nodes have to go through route training and storage of packets.As one of the main advantages of network coding, in QNC scenario, the intermediate nodes should only carry simple linear combination and quantization, being liberated in terms of computational complexity.On the other side, at the decoder sides, QNC requires an 1 -min decoder which is potentially more complex than the receiver required for packet forwarding.However, this may not be an issue in practical cases, as the gateway node is usually capable of handling high computational operations.

VII. CONCLUSIONS AND FUTURE WORKS
Joint source network coding of correlated sources was studied with a sparse recovery perspective.In order to achieve non-adaptive encoding, we proposed quantized network coding, which incorporates real field network coding and quantization to take advantage of decoding using linear programming.Thanks to the work in the literature of compressed sensing, we discussed theoretical guarantees to ensure efficient encoding and robust decoding of messages.Moreover, we were able to make conclusive statements about the robust recovery of messages, when fewer number of received packets than the number of sources (messages) were available at the decoder.Finally, our computer simulations verified the reduction in the average delivery delay, by using quantized network coding.
Although the proposed sparse recovery algorithm is working well for correlated messages with nearsparse characterization, it does not offer optimal recovery for other cases of correlated sources.Currently, we are studying the feasibility of near-optimal decoding, when other forms of prior information are

Fig. 3 .
Fig.3.Logarithmic tail probability versus logarithmic ratio of minimum required number of measurements in our QNC scenario and i.i.d.Gaussian measurement matrices, for n = 100 nodes, different RIP constants, and different number of edges .
The input and output contents of edge e at time instant t are represented by Y e (t) and Y e (t), where t represents the discrete (integer) time index, during which a block of length L is transmitted.Since the edges are lossless, Y e (t) and Y e 3, we can analyze the behavior of Ψ tot (t), resulting from the proposed local network coding coefficients in theorem 4.2.Specifically, we try to see if we can obtain the same tail probability ) makes it difficult to derive a simple mathematical form for the tail probability, p tail (Ψ tot (t), ε), as a Gaussian ensemble, with the same order of measurements.Unfortunately, the complicated relation of local network coding coefficients and network parameters with the resulting Ψ tot (t) (see Eqs.8, 9,

TABLE I THE
PARAMETERS OF MESSAGES AND THE NETWORKS, USED IN OUR SIMULATIONS.