- Research
- Open Access

# Design and implementation of wireless multimedia sensor network node based on FPGA and binocular vision

- Shang Jin
^{1}, - Wang Yuanzhi
^{2, 3}Email authorView ORCID ID profile and - Sun Yining
^{2}

**2018**:163

https://doi.org/10.1186/s13638-018-1172-8

© The Author(s). 2018

**Received:**22 March 2018**Accepted:**1 June 2018**Published:**26 June 2018

## Abstract

According to the actual demand of the wireless sensor network for multimedia information acquisition, the new multimedia wireless sensor node is designed and implemented with FPGA as the core and Zigbee wireless communication. On the multimedia nodes, sensors such as binocular camera and accelerometer are configured to realize real-time perception and acquisition of environmental information and image data, and the short distance wireless communication and detection ability is formed through Zigbee protocol. When the test distance is not more than 60 m, the measured data fluctuate around the theoretical curve, and the communication is reliable. When the distance of the node is within the 14 m, the data transmission has no error. In view of the fact that the mismatch rate of the traditional Census transform is high under the condition of accuracy and noise, an improved Census transform stereo matching algorithm is proposed, which is applied to the nodes of multimedia sensor network. The experimental results show that this scheme can realize the data transmission and processing of wireless sensor network nodes and has strong operation capability and visual matching precision while reducing power consumption.

## Keywords

- Sensor network
- Multimedia node
- FPGA
- Zigbee communication
- Binocular vision

## 1 Introduction

WSN (wireless sensor network) is a new technology that combines sensing technology with network communication technology. It can make it easy to monitor, collect and obtain information quickly, and meet the needs of military, medical, industrial, agricultural, and environmental monitoring applications. Multimedia wireless network can introduce image, audio, video, and other media information into wireless sensor network to realize more precise and subtle monitoring [1].

The hardware design of wireless multimedia sensor node is to increase visual perception based on wireless sensor node. Typical multimedia sensor nodes includes MeshEye node, Panoptes node [2], Cyclops node, CMU camhe node and so on [3, 4].

In [5], Feng et al. defined the Panoptes video-based sensor networking architecture. This paper describes a video sensor platform that can deliver high-quality video over 802.11 networks with a power requirement of approximately 5 watts. In addition, authors describe the streaming and prioritization mechanisms that allow it to survive long-periods of disconnected operation. In [6], Du et al. defined virtual backbone construction based on Connected Dominating Set (CDS) is a competitive approach among the existing methods used to establish virtual backbone in WSNs. Under Unit Disk Graph (UDG) model, authors propose an innovative polynomial-time constant-approximation algorithm, GOC-MCDS-C, that produces a CDS D whose size I D is within a constant factor from that of the minimum CDS.

As the core of the wireless processor network—each node in the network, its core processor is currently using DSP or ARM, but in the face of a wide application neighborhood, its data processing capability and flexibility are limited. FPGA has the ability of parallel processing data and can handle a large number of different tasks at the same time. Wireless node system has more advantages in parallel processing, high speed fixed-point data processing, and complex computation using FPGA.

Stereo matching algorithm is one of the hot topics in computer vision. Use two cameras or one with different positions to shoot binocular image by moving and rotating; then, calculate the space points corresponding points of the disparity in the image. After a series of projection inverse transformation, the 3D information of the space point is obtained. It has been widely used in robot navigation, virtual reality, 3D reconstruction, etc.

The existing stereo matching algorithms can be roughly divided into global and local algorithms. In the local algorithm, gray correlation measure is mainly used. Typical measures includes normalized cross-correlation [7], segment support [8], and non-parametric Census transform [9].

The non-parametric Census transform can replace the pixel’s gray value based on the relationship between the window elements and the central elements. It can get the accurate disparity map even if the image is noisy and amplitude distortion. It has strong robustness.

In [10], the gradient map of the image is introduced into the Census transformation to match, and the corresponding coefficients need to be changed to achieve better results. Though the original dense matrix is transformed into a sparse matrix in [11], the rate of non-matching in the edge region is still high. The proposed Census stereo matching algorithm in [12] increases a certain robustness but reduces the matching effect of pixels in deep discontinuous regions due to the information contained in pixels in the window. In [13, 14], an adaptive weight stereo matching algorithm is proposed. The matching effect of image edge pixels is obviously improved, but the algorithm complexity is large, which is not conducive to hardware implementation. The literature [15] proposes a stereo matching algorithm based on Mini-Census transform, which performs a Census transformation on a fixed six points in the window. The literature [16, 17] transforms the mean value of the neighborhood window as a reference value for Census transformation, which improves the robustness of the algorithm to noise. In [18], an adaptive weight matching algorithm is proposed, but the real-time performance of the algorithm is reduced. In [19], a noisy stereo matching algorithm with outliers is proposed, but the matching accuracy of line target in image is not high, and there is matching error in deep discontinuity area.

This paper relies on the high-performance FPGA, combined with Zigbee wireless transmission protocol and the binocular vision theory; constructs the model of the support of image processing, stereo vision for wireless multimedia sensor nodes, in addition to the realization of the basic image acquisition and image processing, aimed at obtaining the scene depth perception; and aims to realize the stereoscopic perception of scene, so that the node has image perception and understanding ability stronger.

## 2 Multimedia node system architecture and functions

### 2.1 Node block diagram

### 2.2 Interface design

The binocular camera is converted into a digital signal through a special decoder chip processor for receiving and processing, and the processing results sent via the SPI bus to the Zigbee module; the Zigbee module can also be through the SPI bus to the main processor which sends control signals to control the FPGA worker process. The acceleration sensor, electromagnetic compass and video decoder chip exchange information with the main controller via I2C bus. The temperature and humidity sensors are read through the 8-bit ADC interface. The heat release external sensor monitors the human body by triggering the external interrupt of the main control module.

## 3 Binocular stereo matching technique

### 3.1 Census transformation improvement

#### 3.1.1 An overview of the traditional Census transform stereo matching algorithm

*I*(

*u*,

*v*) represents the gray value of the coordinates of (

*u*,

*v*) pixels in the image, and the point is the center point;

*q*represents any pixel in the

*N*

_{(u, v)}window, centered on (

*u*,

*v*). The corresponding binary code stream is obtained from the mapping relation, which is defined as follows:

In the formula, *I*_{Census}(*u*, *v*) is the Census transform code of the central pixel point and ⊗ represents the bit-bit connection.

Traditional Census transform can improve the stereo matching performance of noise interference to a great extent. However, because the transformation process only uses the gray value of the center pixel and the gray value of every point in the neighborhood, it inevitably has some limitations. Firstly, when the gray value of the center pixel affected by noise and distortion, it is difficult to find the matching points by Census transform. Secondly, the Census transform is simply a non-parametric transformation of the original gray value and does not make use of the correlative information between pixels.

#### 3.1.2 Improved Census transform

*I*

_{ m }(

*u*,

*v*) of all the pixels in the transformation window

*w*centered on the target pixel

*I*(

*u*,

*v*). When the mean is obtained, the gray value of the center point is compared and the absolute value of the comparison value is obtained. Finally, the gray value of the central pixel is determined by comparing the absolute value to the threshold value. The gray value of the center pixel is defined as follows:

In the formula, *T*_{
a
} is a set threshold. When the threshold value *T*_{
a
} is small, if the center pixel does not mutate in fact but is misjudged as a mutation, it has little influence on the Census change code because the threshold is small, and the difference between the mean and the central pixel is not obvious; when the threshold value is large, if the center pixel is actually mutated, but it is misjudged as no mutation, the situation will have great influence on the Census change code, which will affect the matching accuracy. Therefore, a smaller threshold should be selected. By analyzing the adjacent points of multiple images and simulating the Census changes under different thresholds, we get that the suitable interval of thresholds is [15–24]. The threshold of this algorithm is 20.

### 3.2 Stereo matching process

#### 3.2.1 Initial matching cost calculation

Adaptive weight algorithm has different treatment of pixels within the window of the polymerization at the stage of cost aggregation. This method uses the principle of similarity and proximity of Gestalt theory. It uses a fixed size window support to give different weights according to each pixel in the window and the color difference matching pixel or spatial position difference, so as to get the aggregate window reliable. The concrete implementation process is as follows:

*W*

_{ T }×

*W*

_{ T }in the right image.

*I*

_{(ur, v)}is the central pixel after replacement.

*I*

_{ qr }is a neighborhood pixel except the central pixel, and a total of \( {W}_T^2-1 \). According to the Gestalt theory [20], assign different weights

*W*(

*R*,

*P*) for each pixel in the neighborhood. If the pixel information of the neighborhood pixels and the central pixel is closer, the greater the weight, contrarily, become smaller. The expression of the

*k*element after weight separation is expressed in a piecewise function (4)

*c*(

*R*,

*P*) = exp(− △

*c*/

*γ*

_{ c }), △

*c*is the Euclidean distance of neighborhood pixels.

*γ*

_{ c }is a constant that adjusts the size of the

*c*. Similarly, on the left to transform, use the same method to get

*WC*

_{(ul, v)}(

*k*), and get the matching cost two transform window as follows:

#### 3.2.2 Matching cost aggregation

*W*and use the method of literature [21] to assign a different weight to each pixel in the window, and the final aggregation cost is

Among them, *c*_{
A
}(*R*, *P*) is the weight of the neighborhood pixels of the right image matching window, and the *c*_{
A
}(*L*, *Q*) is the weight of the left image, *W* = 2*s* + 1, and *s* is the radius of the matching window. After calculating the matching cost *D* of all pixels, we use the local optimization strategy WTA (Winner-Take-All) [20] to select the lowest disparity value as the initial parallax \( D=\underset{D}{\min }{D}_{\left( ur,, ul,v\right)} \).

#### 3.2.3 Parallax thinning

In the formula, *d*_{
l
}(*p*) is the parallax value of point *p* in the left view. {*p* − [*d*_{
l
}(*p*), 0]} is the parallax value corresponding to point *p* in the right view. *δ*_{0} is the tolerance threshold. If the parallax of the left and right points is greater than 1, it is the outlier; otherwise, it will be the normal point. After finding out the anomaly point, the anomaly points are divided into error matching points and occlusion points according to the principle of polar geometry. Then, the error matching point and the occlusion point are processed. Then, we select and fill the closest point of the surrounding gray value of the error matching point and the minimum parallax value around the occlusion point. Finally, we use subpixel enhancement to reduce the error caused by the discrete parallax and use the 3 × 3 sliding window to carry out median filtering for the parallax graph to generate the final parallax graph.

## 4 Simulation results and analysis

### 4.1 Communication simulation

#### 4.1.1 Communications distance test

*p*

_{0}(dBm) is the signal intensity value received by the node when the distance is

*r*

_{0}= 1m;

*p*

_{ i }(dBm) is the signal intensity value received by the node when the distance is

*r*

_{i}(m)

_{.}The

*n*is a path loss parameter whose value depends on the environment and the type of building. The larger the

*n*is, the faster the signal intensity of the signal is transmitted in the channel. In actual measurement, the following model is selected.

*A*is the

*RSSI*value of the node distance of 1m, the

*d*is the distance from the transmitting node, and the

*n*is the signal attenuation factor. To obtain the distance parameter between nodes, we need to determine the parameters

*n*and

*A*in the Formula (9). They are the parameters related to the specific environment. In this paper, the experiment is carried out in the open area of the campus. The distance between the nodes is a series of determined values between the signal transmission distance and the 0–60 m, and the corresponding node receiving signal intensity is obtained through the Linux programming. When the distance is 1 m, the received signal strength value is

*A*= 36dB, and the signal attenuation factor

*n*= 3.051 is obtained by fitting Matlab parameters. Two nodes are used for sending data test. They are in fixed position and mobile state respectively, sending data continuously to another node through one node, sending successfully to indicate normal communication. The theoretical curves and measured data are shown in Fig. 2. When the test distance is no more than 60 m, the measured data fluctuate around the theoretical curve, and the communication is reliable.

#### 4.1.2 Data transfer testing

### 4.2 Binocular visual video simulation

*N*is the total pixel of the image,

*d*

_{ c }(

*u*,

*v*) is the parallax value calculated at (

*u*,

*v*),

*d*

_{ s }(

*u*,

*v*) is the standard parallax value at (

*u*,

*v*),

*δ*

_{ d }is the threshold, and the false matching pixel ratio reflects the proportion of mismatched pixels in the disparity map. In order to analyze the performance of the algorithm more objectively, we calculate the pixel ratio of mismatch ratio in different regions at threshold

*δ*

_{ d }= 1, and compare with the traditional Census [23], RTCensus [11], RINCensus [24], and SAD-IGMCT [10] algorithms, as shown in Table 1. Among them, the columns of Nocc, All, and Disc are mismatched pixel ratio, total mismatched pixel ratio, and mismatched pixel ratio in depth discontinuous region, respectively.

The performance comparison between this algorithm and some common local algorithms

Algorithm | Tsukuba | Venus | Teddy | Cones | Avg/(%) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Nocc | All | Disc | Nocc | All | Disc | Nocc | All | Disc | Nocc | All | Disc | ||

Traditional Census | 10.61 | 12.20 | 21.82 | 3.42 | 4.83 | 10.80 | 14.82 | 22.10 | 28.60 | 11.40 | 18.81 | 17.10 | 16.20 |

RTCensus | 5.08 | 6.25 | 19.20 | 1.58 | 2.42 | 14.20 | 7.96 | 13.80 | 20.3 | 4.10 | 9.54 | 12.20 | 9.73 |

RINCensus | 4.78 | 6.00 | 14.40 | 1.11 | 1.76 | 7.91 | 9.76 | 17.30 | 26.1 | 8.09 | 16.20 | 14.90 | 10.90 |

SAD-IGMCT | 5.81 | 7.14 | 22.60 | 2.61 | 3.33 | 25.30 | 9.79 | 15.50 | 25.7 | 5.08 | 11.50 | 15.00 | 12.50 |

Proposed algorithm | 4.65 | 5.87 | 13.51 | 1.02 | 1.68 | 7.83 | 8.58 | 15.46 | 21.80 | 4.69 | 9.58 | 12.42 | 8.92 |

From the data in Table 1, we can see that the improved Census transform proposed in this paper has higher matching effect than the traditional Census transform, and the matching results in deep discontinuous area, non-occluded area, and all regions are better than those in traditional methods. This is because the algorithm introduces adaptive weights. When performing Census transformation and cost aggregation, it takes different treatment of the points in the window, assigns larger weights to the similar points in the window, and assigns smaller weights to the larger points.

- (1)
The smaller the size of the Census window

*M*is, the higher the false matching rate*R*is. The higher the*M*is, the lower the mismatch rate*R*is. The main reason is that when the value of window*M*is small, the anti-noise ability in the window is poor, which makes it easy to mismatch. - (2)
As the size of Census window

*M*increases gradually, when the value is larger than a certain value, the false match rate*R*increases. This is because the larger the window in the disparity discontinuity area, the larger the Census transformation result, resulting in the mismatch.

## 5 Conclusions

The new WMSN node is designed and implemented by FPGA and binocular camera, combined with wireless communication and multiple sensors. The node can carry out image acquisition, compression, transmission, and other tasks and can get the depth information of the scene environment through binocular stereoscopic camera. An improved stereo matching algorithm for Census transform is proposed in view of the shortcomings of the traditional Census transform. We compare the difference between the mean value of gray value and the gray value of the center pixel in the Census transform window with the threshold and get the new central pixel gray value based on the comparison results. It overcomes the statistical characteristics of the traditional Census transform relying too much on the central pixel. We use the adaptive weight method to get the initial matching cost and improve the accuracy of matching. At the cost aggregation stage, the matching cost is obtained by adaptive weight. Finally, dense disparity map is obtained by thinning parallax. The algorithm is simple in structure and low in complexity. It can improve the matching accuracy of the stereo matching algorithm based on Census transform and is suitable for hardware implementation, and it is applied to the node of multimedia sensor network. Through the Zigbee wireless communication protocol of WMSN node, the scheme can realize dynamic networking of multiple nodes, realize the rapid transmission of data and information, and have strong computing power and visual matching accuracy while reducing power consumption.

## Declarations

### Funding

This work was supported by the Research Funds of Wuxi Institute of Technology, 2015 and Talent Fundsof University in 2016 No.gxfxZD2016175.

### Authors’ contributions

SJ is the main author of the current paper. WYZ has helped revise the manuscript. SYN has given critical revision of the article. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- CK Liang, YC Cheng, CF Li, A virtual force based movement scheme for area coverage in directional sensor networks Proc. the 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kitakyushu, 2014), pp. 718–722Google Scholar
- W Feng, B Code, E Kaiser, et al, Panoptes: a Scalable Architecture for Video Sensor Networking Application Proc. the ACM Int’1 Conference on Multimedia, New York, 2003, pp. 151–167Google Scholar
- D H, W W, Q Ye, D Li, W Lee, X X, Cds-based virtual backbone construction with guaranteed routing cost in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 24(4), 652–661 (2013)View ArticleGoogle Scholar
- T Bulent, B Kemal, Z Ruken, A survey of visual sensor network platforms. Multimedia Tools Application 60, 689–726 (2012)View ArticleGoogle Scholar
- S Cheng, Z Cai, J Li, H Gao, Extracting kernel dataset from big sensory data in wireless sensor networks. IEEE Trans. Knowl. Data Eng. 29(4), 813–827 (2017)View ArticleGoogle Scholar
- Z He, Z Cai, Y J, X Wang, Y Sun, Y Li, Cost-efficient strategies for restraining rumor spreading in mobile social networks. IEEE Trans. Veh. Technol. 66(3), 2789–2900 (2017)View ArticleGoogle Scholar
- N Einecke, J Eggert,
*A two-stage correlation method for stereoscopic depth estimation*(Proc. International Conference on Digital Image Computing: techniques and Applications, Sydney, 2010), pp. 227–234Google Scholar - F Tombari, S Mattoccia, LD Stefano,
*Segmentation based adaptive support for accurate stereo Correspondence Proc*(IEEE Pacific Rim Symposium on Video and Technology, Chile, 2007), pp. 427–438Google Scholar - H Hirschmuller, D Scharstein, Evaluation of stereo matching cost on images with radiometric differences. IEEE Trans. Pattern Anal. Mach. Intell.(S0162–8828) 31(9), 1582–1599 (2009)View ArticleGoogle Scholar
- K Ambrosch, W Kubinger, Accurate hardware-based stereo vision. Comput. Vis. Image Underst. 114(11), 1303–1316 (2010)View ArticleGoogle Scholar
- M Humenberger, C Zinner, A fast stereo matching algorithm suitable for embedded real-time systems. Comput. Vis. Image Underst. 114(11), 1180–1202 (2010)View ArticleGoogle Scholar
- R Zabih, J Woodfill,
*Non-parametric local transforms for computing visual Correspondence*(Third European Conference on Computer Vision, Stockholm, 1994), pp. 151–158Google Scholar - KJ Yoon, I Kweon, Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650–656 (2006)View ArticleGoogle Scholar
- S Peeri, P Corsonello, G Cocorullo, Adaptive census transform: a novel hardware-oriented stereo vision algorithm. Comput. Vis. Image Underst. 117(1), 29–41 (2013)View ArticleGoogle Scholar
- YC Chang, TH Tsai, BH Hsu, et al., Algorithm and architecture of disparity estimation with mini-census adaptive support weight. IEEE Transactions on Circuits and Systems for Video Technology 20(6), 792–805 (2010)View ArticleGoogle Scholar
- JZ Wang, HJ Zhu, J Li, A census transform based stereo matching algorithm using variable support weight. Transactions of Beijing Institute of Technology 33(7), 704–710 (2013)MATHGoogle Scholar
- SP Zhu, LN Yan, Z Li, Stereo Matching algorithm based on Improved census transform and dynamic programming. Acta Opt. Sin. 36(4), 0415001–1–0415001–9 (2016)Google Scholar
- WW Zhou, WG Jin, Novel Stereo Matching algorithm for adaptive weight census transform. Comput Eng Appl 52(16), 192–197 (2016)Google Scholar
- J XJ Peng, YT Han, et al., Anti-noise stereo matching algorithm based on improved census transform and outlier elimination. Acta Opt. Sin. 37(11), 1115004–1–1115004–9 (2017)Google Scholar
- SE Palmer, Modern theories of gestalt perception. Mind Lang. 5(4), 289–293 (1990)View ArticleGoogle Scholar
- KJ Yoon, IS Kweon, Adaptive support-weight approach for correspondence search. Pattern Analysis and Machine Intelligence 28(4), 650–656 (2006)View ArticleGoogle Scholar
- X Chang, Z Zhou, L Wang, et al., Real-time accurate stereo matching using modified two-pass aggregation and Winner-Take-All guided dynamic programming International Conference on 3D Imaging (International Conference on 3d Imaging, Modeling, Processing, Visualization and Transmission. IEEE, 2011), pp. 73–79Google Scholar
- PJ Best, ND Mckay, A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)View ArticleGoogle Scholar
- L Ma, JJ Li, J Ma, Modified census transform with related information of neighborhood for stereo matching algorithm. Comput Eng Appl 50(24), 16–20 (2014)Google Scholar