Network coding for reliable video distribution in multi-hop device-to-device communications

It is becoming more and more popular to share videos among multiple users. However, sharing video in traditional cellular networks will incur high expenses. Device-to-device (D2D) communication is one of the crucial technologies in the fifth-generation network, and it enables the devices to transmit data directly without the relay of base stations. This paper proposes a network-coding-based video distribution scheme for the D2D communication environment. The proposed scheme applies the network coding technology in the H.264 video transmission, which can protect crucial information of the video. This scheme enables the receivers to decode the original video with a high probability, especially in the networks with interferences. Both the simulation results and the actual experimental results show that using network coding technology in video transmission can improve the quality of the received video. Compared with the traditional scheme, the successful decoding rate of the proposed scheme is increased by 46%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$46\%$$\end{document} in our experimental settings.

However, D2D communication is susceptible to interferences from base stations and other devices working in the same frequency band, which will reduce transmission reliability, especially during the video transmission process [8]. Therefore, it has become an essential goal in the D2D communication environment to provide highquality and stable video streams to mobile users [9]. As a high-resolution video is usually larger in size, it is necessary to compress the video to improve the efficiency of transmission and storage. H.264 is a typical video compression standard with a high data compression ratio. In H.264 video streams, video frames are of different importance, which means that the loss of crucial information will significantly reduce the video quality, especially in the environment with limited bandwidth and poor signal quality. To minimize the negative effect of interferences caused by the instability of the network environment, a robust and stable video transmission scheme in the D2D communication environment is critical to be designed [10]. In recent years, related researches showed that network coding has great potential in improving transmission reliability [11,12] and throughput [13,14] in D2D communication.
Ahlswede et al. initially proposed network coding [15], and they proved that network coding could increase reliability and bandwidth efficiency. The principle of network coding is to re-encode the data at the intermediate nodes of the network. Through the re-encoding operation at intermediate nodes, the overall network could obtain some additional performance gain [16][17][18]. Figures 1 and 2 show the operations at intermediate nodes in the traditional store-and-forward scheme and the network-coding-based scheme, respectively. In these figures, node A needs to send a packet a to both node B and node C. Similarly, node B needs to send a packet b to both node A and node C. In the traditional communication scheme, the intermediate node C needs to store and forward the packets a and b in sequence, so it requires four transmissions in total. In the network coding scheme, node C generates a new packet (a ⊕ b) by performing the bitwise logical XOR operation after receiving packets a and b. Then, the new packet a ⊕ b is sent to both A and B in one transmission through the wireless channel. Therefore, three times of transmission are enough. In this example, the network coding scheme requires 25% fewer transmissions compared to the traditional scheme. Network coding includes linear network coding [19] and nonlinear network coding. Linear network coding includes deterministic linear network coding (DLNC) and random linear network coding (RLNC). DLNC requires the knowledge of global topology, which is challenging to implement in a wireless network, so it is seldom used in practical applications. RLNC is a significant breakthrough in the studies of network coding [20]. In the scheme based on RLNC, the nodes in the network are not required to have the knowledge of global topology at runtime, which implies that RLNC is more suitable for networks in which the topology changes frequently. Therefore, using RLNC is feasible in multi-hop D2D communication networks [21,22].
In the traditional wired networks, re-encoding operations at intermediate nodes (such as routers) are not supported, which becomes an obstacle to carry out the advantage of the network coding technique fully. In a multi-hop D2D communication environment, mobile devices can be used as intermediate nodes to perform complicated re-encoding operations and provide direct service to other devices. Therefore, the obstacle in traditional networks no longer exists in D2D communication networks. D2D communication provides an ideal application scenario for network coding technology and takes full advantage of network coding [23]. The combination of the two techniques can improve the overall network performance, and the users can obtain more stable video streams when network coding is applied in video services [6,24].
The rest of this paper is organized as follows: In Sect. 2, some closely related studies are introduced. In Sect. 3, a network-coding-based video distribution scheme is proposed. In Sect. 4, some results of mathematical analysis are provided for both the proposed scheme and existing schemes. In Sect. 5, the performance of the proposed scheme is evaluated in both the simulated network and the actual experimental testbed. Finally, the conclusion is drawn in Sect. 6.

Related works
At present, researchers have carried out many studies on video transmission in different network environments [25][26][27][28][29]. Nguyen et al. [25] proposed a network coding framework for efficient video streaming transmission in peer-to-peer (P2P) networks. Their framework introduced multiple servers as peers for video transmission. The technology of layered network coding is applied to scalable video streams to deal with bandwidth fluctuations on the Internet. The simulation results showed that network coding technology could save significant bandwidth overhead compared with the traditional schemes. Although D2D communication is essentially a kind of P2P communication, it has distinctive characteristics. Therefore, we cannot directly use the technology designed for P2P networks in our work, and we need to design optimal schemes in multi-hop D2D networks.
To meet consumers' demand for high-quality video in wireless networks in crowded spaces and reduce the transmission overhead, Ferreira et al. [26] proposed a real-time streaming media solution based on wireless multicast. It utilizes partial feedback realtime network coding to generate a repaired package that is maximally useful for all receivers based on feedback messages. The scheme can achieve a balance between the timeliness of the packets and the coding overhead. In their scheme, all the data in the video stream has equal coding importance. While in an H.264 video stream, the frames are of different importance. If this property is taken into account during the design of the encoding scheme, the users will be able to obtain better service.
Rhaiem et al. [27] proposed a transmission scheme for the hierarchical transmission of data packets in H.264, which can improve the quality of video streams in mobile ad hoc networks. An Extended Multicast Scalable Video Transmission using Classification Scheduling Algorithms and Network Coding (EMSCNC) over MANET is proposed based on Multicast Scalable Video Transmission (MSVT) [30]. In EMSCNC, the source nodes group the packets and then perform the encoding operation. The intermediate nodes decode the encoded packets generated by the source nodes and then re-encode them according to the hierarchical design of H.264 before forwarding. In this paper, the scheme is proposed for D2D communications to reduce the negative impact caused by interferences, so we addressed the network-coding-based video transmission from a probabilistic analysis perspective. Moreover, we evaluated the feasibility and performance of the proposed scheme in an actual experimental testbed.
Wang et al. [28,29] studied the application of network coding in wireless sensor networks and WiFi networks. In the literature [28], the authors applied network coding to improve transmission reliability in wireless sensor networks. Although this paper is about video transmission in the D2D communication environment, it is still applicable to enhance the reliability of the transmission process by using network coding. The authors designed and implemented a reliable network-coding-based video conferencing system (NCVCS) [29] to improve the user experience. In NCVCS, an encoding server is introduced as the intermediate node to perform the re-encoding operation, which can improve the utilization rate of network bandwidth. Moreover, NCVCS adopts a unified coding scheme during data frame transmission without providing additional protection to keyframes. In this paper, mobile devices are used as intermediate nodes, and we provide a network coding scheme based on the priority of different frames. Moreover, NCVCS cannot work without the access point, while all the devices in the testbed in this research could directly communicate with each other without the relay of the access point. Although both NCVCS and the proposed scheme in this paper focus on the implementation of network-coding-based transmission, the network architectures of these two schemes are different.
Our contributions can be summarized as follows: We proposed a network-codingbased video distribution scheme in the D2D communication environment, which can protect critical information; we established a probability-based mathematical analysis model for transmission in the D2D environment; we implemented the proposed scheme and evaluated the performance with actual experiments.

Network model
In the traditional unicast environment, when users working in the same cell need to obtain video information from other devices, they need to establish multiple connections with all other devices. When the problem occurs in the D2D environment that introduces intermediate nodes, the users need to establish a D2D connection with a particular device. As shown in Fig. 3, the mobile devices S 1 , S 2 , and S 3 act as the source nodes, and the mobile device S 4 acts as an intermediate node. In our network model, S 1 , S 2 , and S 3 send video streams to S 4 , respectively. After receiving the video from other devices, S 4 performs the re-encoding operation. The re-encoding operations protect the critical information in the video streams, and then, the generated re-encoded data are sent back to the network. Each device can decode the video stream as long as sufficient encoded data are received.

Coding scheme
In the conventional data transmission, the intermediate node only forwards the packets after receiving them from the source node. The network coding technique allows intermediate nodes to perform additional coding operations on the received packets before forwarding. Encoding, re-encoding, and decoding operations of network coding are linear operations performed over a Galois field (GF) with a size of 2 q where q is a natural number. According to previous studies [31,32], the Galois field GF(256) can provide a good balance between computational efficiency and resource overhead. Therefore, all the coding operations in this paper are based on GF(256).
Equation (1) shows a typical linear coding operation [28,29]. The coding operation is conducted to obtain a linearly independent combination of original data blocks, as shown in Eq. (1).
In Eq. (1), k stands for the total number of original blocks, n stands for the number of encoded blocks, c ij refers to a coefficient randomly selected from GF(256), and b i refers to the i th original block. The symbol y i (i ∈ [1, n]) stands for an encoded block.
In most RLNC-based schemes [33,34], the Gauss-Jordan elimination method [35] with excellent decoding efficiency is adopted. The Gauss-Jordan elimination method is a classical decoding method for network coding schemes. Compare to other decoding methods such as matrix inversion method, the Gauss-Jordan elimination method has higher decoding efficiency. In this sense, it could reduce the decoding delay. This method has the following advantages: When the packet is linearly dependent of the existing packets, it does not contain meaningful information. In this case, an all-zero row will appear after matrix transformation during the decoding operation, and then, the linearly dependent packet will be removed. Besides, the receiving node can start decoding after receiving part of the data packet, instead of waiting for k linearly independent encoded packets to arrive, which improves the decoding efficiency and reduces the waiting time of the receiving devices.

Reliability analysis
In the video codec system H.264/AVC(Advanced Video Coding), video frames have different types, such as I-frame, P-frame, and B-frame. The I-frame is intra-coded, which has crucial information about other frames [36]. In a video sequence, the first data frame is always an I-frame, followed by a series of P-frames or B-frames [37]. P-frame and B-frame are generated based on temporal and spatial correlations. However, this structure also makes the video sequence more susceptible to error propagation caused by inter-frame dependency, and corruption of the previous frames may result in incorrect propagation of consecutive frames in the Group of Pictures (GOP) [38]. If there is an error in the I-frame, only a small amount of related video information can be recovered in the GOP, while a broad range of frames is lost. Moreover, since the data length of the I-frame is significantly larger than that of other frames, I-frame is more likely to be lost during transmission. Therefore, it is necessary to protect the I-frame during the transmission of the video stream. Assume that there are multiple GOPs in a video, and the x th GOP is represented by G(x); an I-frame belonging to G(x) is represented by I(x); P(x) represents P-frame of the x th GOP; G s (x) refers to the successful delivery rate (SDR) of all frames in x th group; I s (x) and P s (x) are the SDRs of I-frame and P-frame, respectively.
Some studies had analyzed the decoding rate of RLNC based on SDR [39,40]. Therefore, we use the SDR as an indicator to analyze the performance of the proposed scheme. In the transmission process, I l refers to the size of the I-frame, P l is the size of the P-frame, and the bit error ratio (BER) is p b . The successful transmission probabilities of I-frame and P-frame are I t and P t , respectively. Although the transmission process of the I-frame and P-frame is independent of each other, their decoding processes are correlated. The SDR of I-frame at the decoding device equals the probability of successful transmission of the I-frame, so I s equals I t . The decoding operation of the P-frame requires the information of the I-frame, so the SDR of P-frame at the decoding device can be expressed by Eq. (2).
In the network-coding-based scheme, I ′ l stands for the size of the I-frame after the reencoding operation, and P ′ l is the size of the P-frame after the re-encoding operation. The successful transmission probabilities of encoded I-frame and encoded P-frame are I ′ t and P ′ t , respectively. The encoded I-frame is divided into n packets, the size of each packet is I ′ l n , and the successful receiving rate of each packet is p.
The intermediate node will perform RLNC with n 1 different data packets before transmission. The receiving devices need to receive k 1 linearly independent data packets ( k 1 < n 1 ) to decode and obtain the original data packet. The probability of successfully decoding the I-frame at the receiving devices after encoding is calculated as follows.
When the packet coding is performed on a P-frame, the method based on RLNC is also adopted. P-frame is divided into n 2 packets, the size of each packet is P ′ l n , and the rate of successful receiving each packet is p ′ . The initial data can be successfully decoded as long as k 2 packets are received. After encoding, the successful receiving rate P ′ t of P-frame transmission is shown in Eq. (5).
The successful decoding rate P ′ s of the encoded P-frame is obtained with Eq. (6).
The constraints are set as follows to provide priority protection to I-frame during the encoding process: The probability of successfully decoding the GOP at the decoding node after the network coding operation is finally obtained, which is shown in equation.

Algorithms in the proposed scheme
In the process of network transmission, to deal with video frame loss and transmission errors, video frames need to be encoded redundantly. Part of the original data can be recovered by processing the redundant data. However, data redundancy is limited. Repeated transmissions will reduce the utilization efficiency of the transmission bandwidth. Therefore, how to obtain optimal transmission with limited redundancy is a problem that we need to address.
Since the length of each frame in the video stream is different, it is necessary to design an appropriate transmission method to maximize the utilization of transmission resources. We designed a dynamic adaptive algorithm to transmit video and improve data transmission efficiency.
In this algorithm, we have chosen an appropriate value as the packet size for one transmission. Therefore, the number of slices changes dynamically with the change of the size of the video frames. We add the slice information and encoding coefficients to the header of each encoded frame, which will be used to assist the decoding operation at receiving devices. RLNC is essentially a linear coding scheme. Therefore, the encoding and decoding operations are to obtain the product of two matrices. The time complexity of multiplying two matrices is O(n 3 ) , which is a polynomial-time complexity and is feasible to implement in practical applications.
The advantage of network coding comes from the re-encoding operation, which will increase independence among data packets during transmission. However, the reencoding operation will affect the recovery of video frames at the receiving device. In the conventional replication-based transmission scheme, after the device receives the frame information, it can immediately recover the original video frame.
In the proposed scheme, the packets generated by RLNC are linearly mixed. In the theory of linear network coding, the original data are encoded into n(n ≥ k) linearly independent packets in which any k out of n encoded packets are sufficient to recover the original data. Therefore, only when at least k linearly independent packets are successfully received, can the receivers decode the original data. Therefore, waiting time is necessary for the decoding operation. In all the linearly coding schemes, the waiting time cannot be avoided. In our scheme, during the transmission of a frame of video, the device can fully recover the original data through the decoding operation after receiving k linearly independent packets and does not have to wait for the subsequent data packets of the current video frame. Besides, to ensure the real-time performance of the video, it is not necessary to keep waiting for the old video frame when the new frame arrives.
When a mobile device works as an intermediate node, it overtakes the re-encoding operation, which causes extra energy overhead of batteries. Therefore, the intermediate node will run out of energy first, and then, it will be eliminated from the network. To increase the network lifetime for all the mobile devices, we need to design a strategy to periodically select the devices as intermediate nodes so that the energy overhead could be balanced. The strategy needs to take account of at least two factors, namely location and remaining power. Moreover, the strategy needs to assign weights for these factors.
In this algorithm, each node D i broadcasts its status packet PS i every ten minutes, which contains its remaining power and location coordinates. When any other node D j receives PS i , it extracts the location and remaining power of node D i and then calculates the distance from node D i to itself. The priority value (PV) is obtained with equation.
In our scheme, the distance and remaining power are assigned different weights, namely w d and w b . The distance metric DM is divided into five levels, and the remaining power is quantified as E.

Performance analysis
In this section, we verify the reliability of the proposed solution through mathematical computation. The influence of video frame length and packet loss rate on the SDR of I frame, P frame, and GOP in the scheme is analyzed, and we compared the results of the proposed scheme with the replication-based transmission scheme [29] and the transmission scheme based on instantly decodable network coding (IDNC) [41,42].
From Fig. 4, the decoding rates of all schemes are high when the packet loss rate is low. The SDR gradually decreases as the packet loss rate increases, which is consistent with the analysis of Eq. (4). Compared with IDNC-based and replication-based solutions, the RLNC-based scheme's descent rate is more stable. Moreover, in an extreme network environment with a high packet loss rate, the RLNC-based transmission scheme is more stable.
The impact of frame length on the decoding rate of the I-frame is shown in Fig. 5. As the length of keyframes increases, the SDR of I-frames decreases gradually, which is the reason why I-frames are more likely to be lost. According to Eq. (4), when the video frame to be transmitted is large, the RLNC-based scheme can provide a higher frame recovery rate than other schemes. The impact of the packet loss rate on the SDR of P-frame is shown in Fig. 6. In the figure, the packet loss rate is inversely proportional to SDR. According to our previous analysis of Eqs. (5) and (6), the information contained in keyframes has a significant impact on the decoding of P-frame. Figure 7 shows the impact of keyframe length on the SDR of P-frame. As the length of video frames increases, it is more likely to be lost in the transmission process, and the decoding rate of P frames will be affected accordingly. Figures 8 and 9 show the impact of packet loss rate and keyframe length on the SDR of the whole GOP, respectively, according to Eq. (8).

Performance evaluation
In this section, we implement the proposed scheme in the simulation environment. In the simulation, there are 20 nodes. Moreover, to evaluate the feasibility of the proposed scheme, we implemented it in a real-world testbed consisting of 6 mobile devices.

Setting of simulation
The simulation tool we used is OMNeT++5.4. In our simulation scenario, nodes were randomly deployed in a 1000 m × 800 m area. The transmission radius of nodes

SDR of P-frame
Replication-based RLNC-based IDNC-based Fig. 7 The impact of keyframe on the SDR of P-frame was 800 m. The loss rates of different links were independent of each other. The packet loss rate can be calculated by the BER. We assume that the BER is p ′ b , and the loss rate of a packet p k with transmission data size l s is Each simulation experiment lasted for 120 seconds. Unless otherwise stated, we used the parameters shown in Table 1.
In the simulation, we used the default physical layer and transport layer protocols provided by OMNeT++, and we modified the application layer protocol. We designed a new application module to realize the encoding and decoding functions of RLNC. First, the device sends its data to the device used as an intermediate node.
After all data from the participating devices are received, the intermediate device encodes the received data and then broadcasts it back to other devices. Figure 10 shows the experimental scene.

Illustrative results
In the simulation environment, we simulated the data transmission process and analyzed the impact of BER, the number of devices participating in the transmission, and the data redundancy on the SDR. We also compared the performance of our scheme with that of the traditional replication-based scheme and IDNC-based scheme. The experimental results were obtained by averaging ten experimental results in order to reduce the influence of unforeseen factors. Figure 11 shows the relation between BER and SDR. In this experiment, the generation size k was set at 4, and n was set at 6. According to Fig. 11, as BER gradually increases, SDR of all three schemes decreases. However, the SDR of the RLNC-based scheme is still higher than that of the other two schemes, because some packets are allowed to be lost after the re-encoding operation. RLNC is not sensitive to changes in

Transmission radius 800m
Transmission rate 10Mbps the packet loss rate, and it has higher stability than the other two schemes. Therefore, it is more suitable for transmission in the networks with extreme channel conditions. The relation between the number of devices involved in transmission and SDR is shown in Fig. 12. In this simulation, k was set at 4, n was set at 6, and p was set at 0.1. According to Fig. 12, in the RLNC-based scheme, after the transmission, the probability of successfully recovering the original data at the receiving devices is the highest, while that of the traditional replication scheme is the lowest. When there are few devices, the results fluctuate greatly. However, as the number of devices increases, the trend of three curves is gradually stabilized. The decoding rate of the RLNC-based scheme is above 0.95, the successful decoding rate of IDNC is above 0.85, and the traditional transmission scheme only has a decoding rate of around 0.65. The experiment results are consistent with the previous theoretical analysis. Therefore, the performance of the scheme based on RLNC is significantly better than that of the other two schemes.
We also analyzed the impact of the relation between n and k on the performance of RLNC. The experiment was carried out in a network with 41 devices, and the packet loss rate was set at 0.15. The experiments were carried out with the simulation time of 25s, 50s, 75s, 100s, and 125s, respectively. Then, the experiment results were obtained through averaging the experiment results in different simulation times. The redundancy rate was set at 50% . From Fig. 13, as k increases, only the RLNC-based coding scheme maintains a high decoding rate, while the decoding rate of the other two schemes gradually declines, which is because RLNC increases the independence among packets, and it can successfully decode as long as an adequate number of packets are received. In the other two schemes, the probability of receiving duplicated packets or linearly dependent packets is higher than that in the RLNC scheme.

Testbed settings
The model of the mobile device we used for testing is MI 4LTE. The operating system is Android 6.0.1. The frequency of the CPU processor is 2.45 GHz, and the running memory is 2GB. The media access control (MAC) layer protocol we used is IEEE 802.11g. Because the 5G network has not been widely deployed, and there is not any commercial device supporting D2D communications, we used the technology of WIFI direct to implement device-to-device direct connections. The links between different devices are independent of each other. Figures 14 and 15 show the real experiment scenes.

Analysis of experimental results
We evaluate the performance of our scheme first and compare it with the performance of traditional schemes based on replication and IDNC. Then, we use the device's successful recovery rate of the video frames as an evaluation criterion. We also evaluate the coding latency of the proposed scheme on different mobile devices. According to Fig. 16, as the transmission distance increases, the SDR of the video frames decreases in all three schemes. The video frame recovery rate in the RLNC-based scheme is higher than that in the other two schemes, which is consistent with our theoretical analysis and simulation results. The relation between the number of redundant packets and the recovery rate of frames is shown in Fig. 17. Compared with the other two schemes, the SDR of the RLNC-based scheme is higher. With the increase in redundant data packets, the decoding efficiency is gradually improved. The peak signal-to-noise ratio (PSNR) is often used to measure the quality of pictures in the video. To evaluate the impact of the proposed scheme on video quality, we randomly selected an H.264 video stream for transmission and compared the video frames  Figure 18 shows that the video obtained with the RLNC-based scheme has the best quality. In  Fig. 18, when the value of PSNR is zero, it means that the corresponding frame is missing; when the value of PSNR is 100, it means the device has completely recovered the corresponding frames. We know that advantages always go with disadvantages. It is very difficult to have a scheme that can increase performance in one area without sacrificing another. Therefore, it is inevitable to cause coding delay after using network coding. Figures 16 and 17 show that we could obtain better performance after using network coding. Then, we need to discuss whether the coding delay makes a negative impact on user experience. According to the previous section, the time complexity of decoding operation in the networkcoding-based scheme is O(n 3 ) , which is obtained from a theoretical perspective. In this section, we study the computational overhead from a practical perspective. We analyzed the impact of generation size k on coding latency and evaluated the coding efficiency of the proposed scheme on different mobile devices, which is shown in Fig. 19. In Fig. 19, each value is obtained by averaging ten experimental results. The size of the video we used was 1MB. From Fig. 19, with the increase in the value of k, the throughput of the encoding in the device gradually decreases. The complexity of the coding increases as the coding dimension becomes higher. According to Fig. 19, the hardware performance of mobile devices determines the coding latency. For example, the bandwidth overhead required to receive a video (1-minute duration, 1920-by-1080 resolution) is about 10MB. According to Fig. 19, the decoding rate after using network coding is far greater than the required receiving rate. Therefore, the coding latency of network coding does not have an apparent adverse effect on the user experience.
In Algorithm 3, we designed a selection strategy for intermediate nodes. The strategy needs to be evaluated in a network with many nodes. However, in our testbed, there are only six nodes. Therefore, we evaluated the performance of the proposed strategy in the simulated network, which is shown in Fig. 10. Figure  Coding latency (s) Mi

Conclusion
In order to improve the quality of video transmission in a multi-hop D2D communication environment, we propose a network-coding-based video distribution scheme. This scheme can provide additional protection to the critical information of video, which can improve the reliability during transmission. According to the experimental results, our scheme has higher stability and better video quality than the other two traditional schemes. Moreover, through the practical evaluation in our testbed, we observed that the coding delay introduced by network coding does not make a negative impact on the user experience during video playback at the receiving nodes.