A biologically-inspired embedded monitoring network system for moving target detection in panoramic view

An embedded monitoring network system is based on the visual principle of compound eye, which meets the acquirements in field angle, detecting efficiency, and structural complexity of panoramic monitoring network. Three fixed wide-angle cameras are adopted as sub-eyes, and a main camera is installed on a high-speed platform. The system ensures the continuity of tracking with high sensitivity and accuracy in a field of view (FOV) of 360 × 180°. In the non-overlapping FOV of the sub-eyes, we adopt Gaussian background difference model and morphological algorithm to detect moving targets. However, in the overlapping FOV, we use the strategy of lateral inhibition network which improves the continuity of detection and speed of response. The experimental results show that our system locates a target within 0.15 s after it starts moving in the non-overlapping field; when a target moves in the overlapping field, it takes 0.23 s to locate it. The system reduces the cost and complexity in traditional panoramic monitoring network and lessens the labor intensity in the field of monitoring.


Introduction
Conventional surveillance cameras are of limited field of view (FOV) and fail in continuous panoramic monitoring of 360 × 180°FOV. In order to solve the problem, a parallel network of multiple cameras is commonly used to cover the panoramic monitoring area [1]. However, such network is expensive and complex. Worse still, multi-channel parallel video processing may affect the real-time character of the system and increase the misjudgment rate. In recent years, the fish-eye lens [2] have gradually popularized in panoramic monitoring. However, distortion is large in the edge of the field, where no effective information can be obtained.
Biologically-inspired design methods are developing rapidly [3][4][5][6][7][8][9][10]. The compound eye vision system of insects has large FOV and high sensitivity. Such systems have advantages over conventional vision systems in applications of community monitoring, robot vision system and intelligent vehicle, etc. By means of that, the whole system can be of small volume, light weight, large field of view, and high sensitivity to moving targets. Compound eye vision system obtains original image information from different directions at the same time. Its unique structure significantly enlarges the field range and has paraxial optical path for each view angle, which decreases distortion [11][12][13]. Besides, the concept of lateral inhibition [14] among sub-eyes can be used in artificial bionic network to improve sensitivity. Therefore, bionic compound eye network can realize continuous tracking and locating of moving targets in panoramic view robustly. It provides a new mode for the development of detectors and sensors.
Starting from the insects' compound eye system, this paper describes an embedded network system used for continuous tracking moving targets in panoramic view. A panoramic detection with low distortion is realized by multiple cameras. Meanwhile, global low-speed acquisition and local high-speed image acquisition are combined to shorten the time used by the detection algorithm and to improve sensitivity as well. Besides, high-resolution automatic tracking mode and lateral inhibition is used in improving the limitations of the current system in the aspects of field angle, detecting efficiency, and structural complexity.

System components and setup
Three wide-angle cameras are fixed in a ring, and we call them 'sub-eye cameras.' Each sub-eye camera covers about 120°in horizontal field and 180°in meridian plane; thus, the total field of view is 360°in the sagittal plane. The panoramic video information of 360 × 180°s pace angle is then obtained. Each sub-eye camera has a charge-coupled device (CCD) of 1/3 inches, with a resolution of 704 × 576.
A high-resolution camera is used as the main camera. The camera uses a CCD of 1/2 inches and ×18 optical lens with automatic zooming. Its FOV is 45°, and its resolution is 1,280 × 1,024. The main camera is installed on a rotary platform, which has a highest rotary speed of 400°/s, presets 128 positions, and has a baud rate of 9,600 bps. The platform uses a pitching and horizontal rotating axis motor system and a processing module. The system architecture is shown in Figure 1.

Detection process
After a setup process, the three sub-eye cameras start global sub-sampling, that is, sampling pixel values in alternative lines. When a sub-eye camera detects a moving target, it switches to full-resolution sampling mode, which is, sampling every pixel value. Then, it extracts the centroid of the target and calculates the distance between the target and its optical axis. According to the calibrated position of the main camera, the visual information obtained by the sub-eye cameras is delivered to the main camera through serial communication in PELCO-D protocol [15]. The main camera immediately turns to the direction of the target and tracks it with its high resolution. The main camera zooms automatically and thus accurately locates and images targets in various distances. The images are saved in real time by a flip-flop register until the target leaves the FOV.
When the target moves out of the FOV of the main camera, the sub-eye cameras restart the detection mode. When a camera spots a moving target, it immediately sends a signal to the main camera; the camera again positions the target. A panoramic continuous detection is thus achieved.
When multiple targets are spotted at the same time, the system adopts a default detection mode (size-priority mode, speed-priority mode, etc.). The entire process does not need complex manual operation. Figure 2 shows the overall flowchart of this panoramic detection. On the one hand, in order to avoid loss of information caused by dead zone, every two sub-eye cameras share a certain overlapping FOV. On the other hand, in order to prevent information aliasing and positions targets more accurately, different tracking strategies are used in the overlapping and non-overlapping FOV. In the non-overlapping FOV, background difference method under Gaussian background model is used, while in the overlapping FOV, lateral inhibition algorithm is used.

Self-adaptive Gaussian background difference method
After sub-eye cameras obtain images by global subsampling, primary detection is done by background difference method. Target extraction algorithms that are commonly used include background difference method [16] and frame difference method [17]. We adopt the adaptive Gaussian background model to obtain the foreground target and update the background model synchronously.
We assume that the background changes are consistent with a random probability distribution. In avoiding the unpredictability when light changes slowly, we build a sub-eye self-adaptive Gaussian background model, which adapts different environment change and gets a better background estimation.
In the Gaussian background model, we assume that each pixel value, say, f (x, y), accords with Gaussian distribution in time domain [18]. We establish a Gaussian model for each pixel in view. By fitting the new frame with the Gaussian model, we extract the background image. The background is synchronously updated to make the algorithm adaptive.
The probability distribution of Gaussian background model is: where μ is the mean value; σ is the standard deviation. In this model, a Gaussian probability distribution η (μ, σ 2 ) is established for each pixel. Let f k (x, y) be the pixel value of the image of the kth frame.
(1) Background image initialization: where μ 0 and σ 0 2 are the estimates of the mean and variance of a point in the initialization background, respectively. N is the number of frames. The value of N should be appropriate, not too large. Here, we let N = 5.
(2) Background image update. After the background model is built, we subtract the background model from the current frame and get a difference image. Now we set a threshold. In Section 2.2, we will describe in detail how a threshold is selected. If the pixel value in the difference image is larger than a threshold, that is as follows: The object is then taken as a moving target. For each pixel, we have: If less than the threshold, which means: Then, it is considered as background. Here, the background model should be continuously updated. The update rules are as follows: where I k (x, y) is a pixel value in the kth frame; α is the background updating rate, ranging from 0 to 1. The larger α is, the faster the system updates. If the system updates too fast, noise occurs; while if it updates slow, it takes a long time to adapt to the background. So we should give α an appropriate value. Here, we initialize α = 0.5. α accords with the probability distribution: where A is a modulation factor. This makes the background update automatically according to a certain statistical regularity.

The adaptive threshold
After the background model is built, a binarization image from the difference image is needed to show the target. It is important to select an appropriate threshold, Th. If the threshold is too large, a target point may be mistaken for a background point; while if the threshold is too small, a background point may be mistaken for a target point. The threshold is conventionally set artificially, which lacks adaptability and requires manual intervention. Here, we select the threshold using self-adaptive iteration [19] and get the global optimal threshold. Thus, we achieve a satisfying adaptability, as Figure 3 shows. The detailed iteration process is as follows: (1) Calculate the maximum and minimum gray value t 1 and t k , and initialize the threshold value as T 0 = (t 1 + t 0 )/2. (2) Segment the image into two parts: the target and the background. Calculate the pixel number in each part, N 1 k and N 2 k , and then, calculate the average gray level of the two parts t 0 and t A : where t (i, j) is the gray level of point (i, j); N (i, j) is the weight of point (i, j). We set N (i, j) = 1. (3) Calculate the new threshold T k+1 = (t A + t 0 )/2. (4) If T k = T k+1 or k > M, then T k is the suitable global threshold, and the iteration is over. Otherwise, go to step (2) and continue further iteration. M is the maximum iteration.
Moreover, the target image on the obtained binarization image has shadows and discontinuity. We use the morphological opening operation [20] to enhance the target image.
After the target is extracted from sub-eye images, we extract its centroid to determine whether the target has moved into overlapping FOV or not. Once it moves out of the certain rectangle boundary region, the system will automatically switch to the algorithm for overlapping FOV. Figure 4 shows an experiment using human body as a moving target. The three pictures on the left are images obtained by the three sub-eye cameras. Black means no moving targets detected, while white regions are images of detected targets after morphological closing operation [20]. The picture on the right is the current field image by the main camera.

Experiment of tracking in non-overlapping FOV
In Figure 4, no.1 sub-eye camera detects a moving target in its 96th frame. The target is in non-overlapping FOV and is not in the view of the main camera. The main camera then rotates to the direction of the target. Later, when no.1 sub-eye camera gets its 100th frame, the target occurs in the FOV of the main camera. The detection takes 0.15 s.
In comparison, if the sub-eye cameras sample every pixel from the beginning, the detection time is about 0.4 s. It is thus clear that the use of sub-sampling method improves detection sensitivity.

Tracking algorithm and experiments in overlapping FOV
The detection strategy applied in the overlapping FOV is different from that used in the non-overlapping fields. In this case, two sub-cameras are involved, and thus, if we simply use the background difference method applied in non-overlapping FOV, the extracted edge of the target is blurred, incomplete, and has shadows. Also, in this case, the stability and efficiency of the algorithm are relatively low. It is difficult to determine the centroid position of the target. So, we adopt the lateral inhibition algorithm that is conventionally used in bionic compound eye systems and extract the edge of the moving targets stably.
The phenomenon of lateral inhibition widely exists in the compound eye systems of insects. It refers to the fact that a receptor is inhibited by receptors around it, and this inhibitory effect is a spatial additive. Besides, a receptor is inhibited more strongly by receptors near it than by those far away from it. For the overlapping FOV, we first extract the edge quickly by adopting lateral inhibition algorithm and then extract the target image by background difference method. This algorithm is stable and accurate, resistant to gray scale change, and improves the detection accuracy and sensitivity.
We adopt the centroid position tracking method to determine whether the target moves into overlapping fields. When a sub-eye camera detect a target whose centroid is 4/5 its image width to the farthest vertical edge; meanwhile, a sub-eye next to it also detects the target, and its centroid is 1/5 the image width to the nearest edge. We consider the target in the overlapping field, and the system automatically switches to the lateral inhibition algorithm. After the target has been detected, the main camera turns to the orientation of the bisector of the angle between the optical axes of the two sub-eye cameras.
We take each pixel as a sub-eye receptor. The spatial contrast is large on the edge of the target. According to the bionic lateral inhibition principle, the nearest receptors inhibit strongly the receptor that detects the edge, and such inhibition is stronger from the nearer receptors. We enhance the edge according to the inhibition coefficient [21]. We analyze this method in the time domain below.
Take a simple two-unit inhibition network as an example. Let y 1, y 2 be gray values of input units. We assume: where 0 < k < 1. Outputs of the network are X 1 , X 2 : where we set 0 < β < k < 1, so that X 1 /X 2 is non-negative. y 1 /y 2 is used to measure the input contrast while X 1 /X 2 describes the output contrast. According to (10) and (11), we have: Equation 13 shows that the output contrast is larger than the input contrast, proving that the inhibition network enhances target edge.
The inhibition model of the overlapping fields is: For a 3 × 3 network, it corresponds to the image: where, I (m, n) is the pixel gray value after the inhibition process; α i,j is the lateral inhibition coefficient for the position (i, j) in the network; f is a function indicating the inhibition competing relationship between input and output; R 0 (m, n) is the lateral inhibition coefficient for position (m, n) in the network. According to the mechanism of compound eye vision system, the lateral relationship between a certain nerve cell on the compound eyes and those surrounding it is relatively stable and coincident. For there is no direction constraint for edges, weights are symmetric about the center. Suppose the centroid weight is α 00 , and the 8 is the weight around equal α 1 . Then, the lateral inhibition coefficient is as follows:  As the optic cells are on a plane of the same inhibition, the lateral inhibition coefficient is approximately zero. So α 00 + 8α 1 = 0. Here, we let α 00 = 1, α 1 = −0.125, and the template for inhibition is as follows: & ð17Þ Figure 5 compares the edge detection results with and without the use of lateral inhibition method. In Figure 5a, b are background difference images by no.1 and no.2 subeye cameras in overlapping field using lateral inhibition method, while c,d are background difference images by Roberts operator treatment [22]. From the pictures, we can see that we obtain the edge of the target in the overlapping FOV more accurately and clearly when using the method proposed in this paper. The total detection time is 0.23 s.

Experiments of multi-target panoramic detection
We detect multiple targets continuously by this embedded monitoring network system. We use the speed-priority mode. The experimental results are shown in Figure 6. Target 1 first appears in the FOV of no.2 camera, and the main camera tracks it immediately. Later, target 2 appears in the FOV of no.3 camera, which moves faster than target 1. The main camera then turns to track target 2 immediately. Meanwhile, target 1 is still detected by sub-eye cameras. Thus, we realized continuous tracking.

Conclusions
This paper proposes a bionic compound eye sensing network and continuous tracking strategy used for panoramic target tracking. We introduced the system structure and related algorithm. The experimental results show that the system has a panoramic view and a high sensitivity and continuity. It extracts moving targets clearly, stably, and accurately. This system can be widely used in security surveillance industry.