 Research
 Open access
 Published:
Design and implementation of wireless multimedia sensor network node based on FPGA and binocular vision
EURASIP Journal on Wireless Communications and Networking volume 2018, Article number: 163 (2018)
Abstract
According to the actual demand of the wireless sensor network for multimedia information acquisition, the new multimedia wireless sensor node is designed and implemented with FPGA as the core and Zigbee wireless communication. On the multimedia nodes, sensors such as binocular camera and accelerometer are configured to realize realtime perception and acquisition of environmental information and image data, and the short distance wireless communication and detection ability is formed through Zigbee protocol. When the test distance is not more than 60 m, the measured data fluctuate around the theoretical curve, and the communication is reliable. When the distance of the node is within the 14 m, the data transmission has no error. In view of the fact that the mismatch rate of the traditional Census transform is high under the condition of accuracy and noise, an improved Census transform stereo matching algorithm is proposed, which is applied to the nodes of multimedia sensor network. The experimental results show that this scheme can realize the data transmission and processing of wireless sensor network nodes and has strong operation capability and visual matching precision while reducing power consumption.
1 Introduction
WSN (wireless sensor network) is a new technology that combines sensing technology with network communication technology. It can make it easy to monitor, collect and obtain information quickly, and meet the needs of military, medical, industrial, agricultural, and environmental monitoring applications. Multimedia wireless network can introduce image, audio, video, and other media information into wireless sensor network to realize more precise and subtle monitoring [1].
The hardware design of wireless multimedia sensor node is to increase visual perception based on wireless sensor node. Typical multimedia sensor nodes includes MeshEye node, Panoptes node [2], Cyclops node, CMU camhe node and so on [3, 4].
In [5], Feng et al. defined the Panoptes videobased sensor networking architecture. This paper describes a video sensor platform that can deliver highquality video over 802.11 networks with a power requirement of approximately 5 watts. In addition, authors describe the streaming and prioritization mechanisms that allow it to survive longperiods of disconnected operation. In [6], Du et al. defined virtual backbone construction based on Connected Dominating Set (CDS) is a competitive approach among the existing methods used to establish virtual backbone in WSNs. Under Unit Disk Graph (UDG) model, authors propose an innovative polynomialtime constantapproximation algorithm, GOCMCDSC, that produces a CDS D whose size I D is within a constant factor from that of the minimum CDS.
As the core of the wireless processor network—each node in the network, its core processor is currently using DSP or ARM, but in the face of a wide application neighborhood, its data processing capability and flexibility are limited. FPGA has the ability of parallel processing data and can handle a large number of different tasks at the same time. Wireless node system has more advantages in parallel processing, high speed fixedpoint data processing, and complex computation using FPGA.
Stereo matching algorithm is one of the hot topics in computer vision. Use two cameras or one with different positions to shoot binocular image by moving and rotating; then, calculate the space points corresponding points of the disparity in the image. After a series of projection inverse transformation, the 3D information of the space point is obtained. It has been widely used in robot navigation, virtual reality, 3D reconstruction, etc.
The existing stereo matching algorithms can be roughly divided into global and local algorithms. In the local algorithm, gray correlation measure is mainly used. Typical measures includes normalized crosscorrelation [7], segment support [8], and nonparametric Census transform [9].
The nonparametric Census transform can replace the pixel’s gray value based on the relationship between the window elements and the central elements. It can get the accurate disparity map even if the image is noisy and amplitude distortion. It has strong robustness.
In [10], the gradient map of the image is introduced into the Census transformation to match, and the corresponding coefficients need to be changed to achieve better results. Though the original dense matrix is transformed into a sparse matrix in [11], the rate of nonmatching in the edge region is still high. The proposed Census stereo matching algorithm in [12] increases a certain robustness but reduces the matching effect of pixels in deep discontinuous regions due to the information contained in pixels in the window. In [13, 14], an adaptive weight stereo matching algorithm is proposed. The matching effect of image edge pixels is obviously improved, but the algorithm complexity is large, which is not conducive to hardware implementation. The literature [15] proposes a stereo matching algorithm based on MiniCensus transform, which performs a Census transformation on a fixed six points in the window. The literature [16, 17] transforms the mean value of the neighborhood window as a reference value for Census transformation, which improves the robustness of the algorithm to noise. In [18], an adaptive weight matching algorithm is proposed, but the realtime performance of the algorithm is reduced. In [19], a noisy stereo matching algorithm with outliers is proposed, but the matching accuracy of line target in image is not high, and there is matching error in deep discontinuity area.
This paper relies on the highperformance FPGA, combined with Zigbee wireless transmission protocol and the binocular vision theory; constructs the model of the support of image processing, stereo vision for wireless multimedia sensor nodes, in addition to the realization of the basic image acquisition and image processing, aimed at obtaining the scene depth perception; and aims to realize the stereoscopic perception of scene, so that the node has image perception and understanding ability stronger.
2 Multimedia node system architecture and functions
2.1 Node block diagram
The wireless multimedia sensor node consists of 4 modules: the sensor module, the data processing module, the Zigbee wireless communication module and the video module. The sensor module includes an infrared sensor, temperature, humidity, acceleration, and magnetic compass and is mainly responsible for the external perception of information; data processing module is composed of a highperformance Cyclone IV series FPGA chip EP4CE15E17C8N and external auxiliary parts for processing and recognition of images, audio, and other large data signal; video module is composed of binocular camera and its decoding chip; Zigbee module is for wireless transmission of wireless multimedia sensor network nodes and node dynamic data. Figure 1 is the structure of the wireless multimedia sensor network node.
2.2 Interface design
The binocular camera is converted into a digital signal through a special decoder chip processor for receiving and processing, and the processing results sent via the SPI bus to the Zigbee module; the Zigbee module can also be through the SPI bus to the main processor which sends control signals to control the FPGA worker process. The acceleration sensor, electromagnetic compass and video decoder chip exchange information with the main controller via I2C bus. The temperature and humidity sensors are read through the 8bit ADC interface. The heat release external sensor monitors the human body by triggering the external interrupt of the main control module.
3 Binocular stereo matching technique
3.1 Census transformation improvement
3.1.1 An overview of the traditional Census transform stereo matching algorithm
The basic principle of the traditional Census transform is to traverse the image in a rectangular window, usually choose the gray value of the center pixel as the reference value, compare the gray value of each pixel in the rectangle window with the reference value, and express the size relationship with 0 and 1. The essence of the Census transform is to encode the gray value of the image into binary stream to represent the relation between the neighborhood pixels and the central pixels. The transformation process can be expressed as
In the upper expression, I(u, v) represents the gray value of the coordinates of (u, v) pixels in the image, and the point is the center point; q represents any pixel in the N_{(u, v)}window, centered on (u, v). The corresponding binary code stream is obtained from the mapping relation, which is defined as follows:
In the formula, I_{Census}(u, v) is the Census transform code of the central pixel point and ⊗ represents the bitbit connection.
Traditional Census transform can improve the stereo matching performance of noise interference to a great extent. However, because the transformation process only uses the gray value of the center pixel and the gray value of every point in the neighborhood, it inevitably has some limitations. Firstly, when the gray value of the center pixel affected by noise and distortion, it is difficult to find the matching points by Census transform. Secondly, the Census transform is simply a nonparametric transformation of the original gray value and does not make use of the correlative information between pixels.
3.1.2 Improved Census transform
Calculating the mean value of the gray value I_{ m }(u, v) of all the pixels in the transformation windoww centered on the target pixel I(u, v). When the mean is obtained, the gray value of the center point is compared and the absolute value of the comparison value is obtained. Finally, the gray value of the central pixel is determined by comparing the absolute value to the threshold value. The gray value of the center pixel is defined as follows:
In the formula, T_{ a } is a set threshold. When the threshold value T_{ a } is small, if the center pixel does not mutate in fact but is misjudged as a mutation, it has little influence on the Census change code because the threshold is small, and the difference between the mean and the central pixel is not obvious; when the threshold value is large, if the center pixel is actually mutated, but it is misjudged as no mutation, the situation will have great influence on the Census change code, which will affect the matching accuracy. Therefore, a smaller threshold should be selected. By analyzing the adjacent points of multiple images and simulating the Census changes under different thresholds, we get that the suitable interval of thresholds is [15,16,17,18,19,20,21,22,23,24]. The threshold of this algorithm is 20.
3.2 Stereo matching process
3.2.1 Initial matching cost calculation
Adaptive weight algorithm has different treatment of pixels within the window of the polymerization at the stage of cost aggregation. This method uses the principle of similarity and proximity of Gestalt theory. It uses a fixed size window support to give different weights according to each pixel in the window and the color difference matching pixel or spatial position difference, so as to get the aggregate window reliable. The concrete implementation process is as follows:
Take the Census transform window of size W_{ T } × W_{ T } in the right image. I_{(ur, v)} is the central pixel after replacement. I_{ qr } is a neighborhood pixel except the central pixel, and a total of \( {W}_T^21 \). According to the Gestalt theory [20], assign different weights W(R, P) for each pixel in the neighborhood. If the pixel information of the neighborhood pixels and the central pixel is closer, the greater the weight, contrarily, become smaller. The expression of the k element after weight separation is expressed in a piecewise function (4)
In the formula,c(R, P) = exp(− △ c/γ_{ c }), △c is the Euclidean distance of neighborhood pixels. γ_{ c } is a constant that adjusts the size of the c. Similarly, on the left to transform, use the same method to get WC_{(ul, v)}(k), and get the matching cost two transform window as follows:
3.2.2 Matching cost aggregation
Select a matching window size W and use the method of literature [21] to assign a different weight to each pixel in the window, and the final aggregation cost is
Among them, c_{ A }(R, P) is the weight of the neighborhood pixels of the right image matching window, and the c_{ A }(L, Q) is the weight of the left image, W = 2s + 1, and s is the radius of the matching window. After calculating the matching cost D of all pixels, we use the local optimization strategy WTA (WinnerTakeAll) [20] to select the lowest disparity value as the initial parallax \( D=\underset{D}{\min }{D}_{\left( ur,, ul,v\right)} \).
3.2.3 Parallax thinning
Compared with the traditional algorithm, the improved Census algorithm described above has greatly improved the matching accuracy, but the disparity map still contains many outliers, which reduce the matching accuracy. In order to further improve the accuracy rate of parallax graph, the normal value and abnormal value are determined by the Formula (7) in the left and right consistency tests firstly:
In the formula, d_{ l }(p) is the parallax value of point p in the left view. {p − [d_{ l }(p), 0]} is the parallax value corresponding to point p in the right view. δ_{0} is the tolerance threshold. If the parallax of the left and right points is greater than 1, it is the outlier; otherwise, it will be the normal point. After finding out the anomaly point, the anomaly points are divided into error matching points and occlusion points according to the principle of polar geometry. Then, the error matching point and the occlusion point are processed. Then, we select and fill the closest point of the surrounding gray value of the error matching point and the minimum parallax value around the occlusion point. Finally, we use subpixel enhancement to reduce the error caused by the discrete parallax and use the 3 × 3 sliding window to carry out median filtering for the parallax graph to generate the final parallax graph.
4 Simulation results and analysis
4.1 Communication simulation
4.1.1 Communications distance test
In order to get the relationship between the distance between nodes and the received signal strength, a distance measurement model based on RSSI (receive signal strength indicator) is established, and the empirical model of wireless signal intensity propagation is shown in Formula 8.
In the formula, p_{0}(dBm) is the signal intensity value received by the node when the distance is r_{0} = 1m; p_{ i }(dBm) is the signal intensity value received by the node when the distance is r_{i}(m)_{.} The n is a path loss parameter whose value depends on the environment and the type of building. The larger the n is, the faster the signal intensity of the signal is transmitted in the channel. In actual measurement, the following model is selected.
In the formula, the radio frequency parameter A is the RSSI value of the node distance of 1m, the d is the distance from the transmitting node, and the n is the signal attenuation factor. To obtain the distance parameter between nodes, we need to determine the parameters n and A in the Formula (9). They are the parameters related to the specific environment. In this paper, the experiment is carried out in the open area of the campus. The distance between the nodes is a series of determined values between the signal transmission distance and the 0–60 m, and the corresponding node receiving signal intensity is obtained through the Linux programming. When the distance is 1 m, the received signal strength value is A = 36dB, and the signal attenuation factor n = 3.051 is obtained by fitting Matlab parameters. Two nodes are used for sending data test. They are in fixed position and mobile state respectively, sending data continuously to another node through one node, sending successfully to indicate normal communication. The theoretical curves and measured data are shown in Fig. 2. When the test distance is no more than 60 m, the measured data fluctuate around the theoretical curve, and the communication is reliable.
4.1.2 Data transfer testing
The error rate of data communication is obtained by using 1MB data to test data transmission between two nodes. The results are shown in Fig. 3. When the node distance is within 14 m, no error occurs in data transmission. With the increase of the distance, the error code appears, and the error code distance is 18 m; the error code rate has been greatly increased. Because the actual deployment distance of nodes is generally not more than 14 m, this node can achieve more reliable wireless data communication.
4.2 Binocular visual video simulation
Figure 4 is the flow chart of the algorithm in this article. Firstly, use the Census transform to improve the left and right image, then the initial matching cost is obtained by adaptive weight method. Then, the cost is aggregated to get the initial parallax of the left and right views. Finally, the final view is generated through disparity refinement.
In order to verify the effectiveness of the algorithm, 4 benchmark color images Tsukuba, Venus, Teddy and Cones were tested on the standard stereo matching algorithm platform of Middlebury website. The evaluation index is the mismatch pixel ratio in parallax map, which is defined as follows:
In the formula, N is the total pixel of the image,d_{ c }(u, v) is the parallax value calculated at (u, v), d_{ s }(u, v) is the standard parallax value at (u, v), δ_{ d } is the threshold, and the false matching pixel ratio reflects the proportion of mismatched pixels in the disparity map. In order to analyze the performance of the algorithm more objectively, we calculate the pixel ratio of mismatch ratio in different regions at threshold δ_{ d } = 1, and compare with the traditional Census [23], RTCensus [11], RINCensus [24], and SADIGMCT [10] algorithms, as shown in Table 1. Among them, the columns of Nocc, All, and Disc are mismatched pixel ratio, total mismatched pixel ratio, and mismatched pixel ratio in depth discontinuous region, respectively.
From the data in Table 1, we can see that the improved Census transform proposed in this paper has higher matching effect than the traditional Census transform, and the matching results in deep discontinuous area, nonoccluded area, and all regions are better than those in traditional methods. This is because the algorithm introduces adaptive weights. When performing Census transformation and cost aggregation, it takes different treatment of the points in the window, assigns larger weights to the similar points in the window, and assigns smaller weights to the larger points.
Since this algorithm uses the WTA strategy to select the initial cost, the size of the Census window has an important influence on the similarity calculation of the matching algorithm. Figure 5 describes the effect of this algorithm and the traditional Census algorithm on the accuracy of the unshielded region matching of different Census windows when the size of the cost aggregation window is 5 × 5. In addition, the performance of the traditional Census algorithm is better than the other similarity measurement algorithms in the case of distortion caused by illumination and other factors. In addition, the performance of the traditional Census algorithm is better than the other similarity measurement algorithms in the case of distortion caused by illumination and other factors. In addition to the superiority of the original algorithm, this algorithm improves the matching precision of the image pair caused by the noise to a certain extent. Figure 6 shows the matching contrast of the results after adding random noise to the algorithm and the traditional Census algorithm. As shown in Figs. 5 and 6, we can see that:

(1)
The smaller the size of the Census window M is, the higher the false matching rate R is. The higher the M is, the lower the mismatch rate R is. The main reason is that when the value of window M is small, the antinoise ability in the window is poor, which makes it easy to mismatch.

(2)
As the size of Census window M increases gradually, when the value is larger than a certain value, the false match rate R increases. This is because the larger the window in the disparity discontinuity area, the larger the Census transformation result, resulting in the mismatch.
5 Conclusions
The new WMSN node is designed and implemented by FPGA and binocular camera, combined with wireless communication and multiple sensors. The node can carry out image acquisition, compression, transmission, and other tasks and can get the depth information of the scene environment through binocular stereoscopic camera. An improved stereo matching algorithm for Census transform is proposed in view of the shortcomings of the traditional Census transform. We compare the difference between the mean value of gray value and the gray value of the center pixel in the Census transform window with the threshold and get the new central pixel gray value based on the comparison results. It overcomes the statistical characteristics of the traditional Census transform relying too much on the central pixel. We use the adaptive weight method to get the initial matching cost and improve the accuracy of matching. At the cost aggregation stage, the matching cost is obtained by adaptive weight. Finally, dense disparity map is obtained by thinning parallax. The algorithm is simple in structure and low in complexity. It can improve the matching accuracy of the stereo matching algorithm based on Census transform and is suitable for hardware implementation, and it is applied to the node of multimedia sensor network. Through the Zigbee wireless communication protocol of WMSN node, the scheme can realize dynamic networking of multiple nodes, realize the rapid transmission of data and information, and have strong computing power and visual matching accuracy while reducing power consumption.
References
CK Liang, YC Cheng, CF Li, A virtual force based movement scheme for area coverage in directional sensor networks Proc. the 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kitakyushu, 2014), pp. 718–722
W Feng, B Code, E Kaiser, et al, Panoptes: a Scalable Architecture for Video Sensor Networking Application Proc. the ACM Int’1 Conference on Multimedia, New York, 2003, pp. 151–167
D H, W W, Q Ye, D Li, W Lee, X X, Cdsbased virtual backbone construction with guaranteed routing cost in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 24(4), 652–661 (2013)
T Bulent, B Kemal, Z Ruken, A survey of visual sensor network platforms. Multimedia Tools Application 60, 689–726 (2012)
S Cheng, Z Cai, J Li, H Gao, Extracting kernel dataset from big sensory data in wireless sensor networks. IEEE Trans. Knowl. Data Eng. 29(4), 813–827 (2017)
Z He, Z Cai, Y J, X Wang, Y Sun, Y Li, Costefficient strategies for restraining rumor spreading in mobile social networks. IEEE Trans. Veh. Technol. 66(3), 2789–2900 (2017)
N Einecke, J Eggert, A twostage correlation method for stereoscopic depth estimation (Proc. International Conference on Digital Image Computing: techniques and Applications, Sydney, 2010), pp. 227–234
F Tombari, S Mattoccia, LD Stefano, Segmentation based adaptive support for accurate stereo Correspondence Proc (IEEE Pacific Rim Symposium on Video and Technology, Chile, 2007), pp. 427–438
H Hirschmuller, D Scharstein, Evaluation of stereo matching cost on images with radiometric differences. IEEE Trans. Pattern Anal. Mach. Intell.(S0162–8828) 31(9), 1582–1599 (2009)
K Ambrosch, W Kubinger, Accurate hardwarebased stereo vision. Comput. Vis. Image Underst. 114(11), 1303–1316 (2010)
M Humenberger, C Zinner, A fast stereo matching algorithm suitable for embedded realtime systems. Comput. Vis. Image Underst. 114(11), 1180–1202 (2010)
R Zabih, J Woodfill, Nonparametric local transforms for computing visual Correspondence (Third European Conference on Computer Vision, Stockholm, 1994), pp. 151–158
KJ Yoon, I Kweon, Adaptive supportweight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650–656 (2006)
S Peeri, P Corsonello, G Cocorullo, Adaptive census transform: a novel hardwareoriented stereo vision algorithm. Comput. Vis. Image Underst. 117(1), 29–41 (2013)
YC Chang, TH Tsai, BH Hsu, et al., Algorithm and architecture of disparity estimation with minicensus adaptive support weight. IEEE Transactions on Circuits and Systems for Video Technology 20(6), 792–805 (2010)
JZ Wang, HJ Zhu, J Li, A census transform based stereo matching algorithm using variable support weight. Transactions of Beijing Institute of Technology 33(7), 704–710 (2013)
SP Zhu, LN Yan, Z Li, Stereo Matching algorithm based on Improved census transform and dynamic programming. Acta Opt. Sin. 36(4), 0415001–1–0415001–9 (2016)
WW Zhou, WG Jin, Novel Stereo Matching algorithm for adaptive weight census transform. Comput Eng Appl 52(16), 192–197 (2016)
J XJ Peng, YT Han, et al., Antinoise stereo matching algorithm based on improved census transform and outlier elimination. Acta Opt. Sin. 37(11), 1115004–1–1115004–9 (2017)
SE Palmer, Modern theories of gestalt perception. Mind Lang. 5(4), 289–293 (1990)
KJ Yoon, IS Kweon, Adaptive supportweight approach for correspondence search. Pattern Analysis and Machine Intelligence 28(4), 650–656 (2006)
X Chang, Z Zhou, L Wang, et al., Realtime accurate stereo matching using modified twopass aggregation and WinnerTakeAll guided dynamic programming International Conference on 3D Imaging (International Conference on 3d Imaging, Modeling, Processing, Visualization and Transmission. IEEE, 2011), pp. 73–79
PJ Best, ND Mckay, A method for registration of 3D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)
L Ma, JJ Li, J Ma, Modified census transform with related information of neighborhood for stereo matching algorithm. Comput Eng Appl 50(24), 16–20 (2014)
Funding
This work was supported by the Research Funds of Wuxi Institute of Technology, 2015 and Talent Fundsof University in 2016 No.gxfxZD2016175.
Author information
Authors and Affiliations
Contributions
SJ is the main author of the current paper. WYZ has helped revise the manuscript. SYN has given critical revision of the article. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Jin, S., Yuanzhi, W. & Yining, S. Design and implementation of wireless multimedia sensor network node based on FPGA and binocular vision. J Wireless Com Network 2018, 163 (2018). https://doi.org/10.1186/s1363801811728
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363801811728