 Research
 Open Access
 Published:
Background perception for correlation filter tracker
EURASIP Journal on Wireless Communications and Networking volume 2020, Article number: 20 (2020)
Abstract
Visual object tracking is one of the most fundamental tasks in the field of computer vision, and it has numerous applications in many realms such as public surveillance, humancomputer interaction, robotics, etc. Recently, discriminative correlation filter (DCF)based trackers have achieved promising results in shortterm tracking problems. Most of them focus on extracting reliable features from the foreground of input images to construct a robust and informative description of the target. However, it is often ignored that the image background which contains the surrounding context of the target is often similar across consecutive frames and thus can be beneficial to locating the target. In this paper, we propose a background perception regulation term to additionally exploit useful background information of the target. Specifically, invalid description of the target can be avoided when either background or foreground information becomes unreliable by assigning similar importance to both of them. Moreover, a novel model update strategy is further proposed. Instead of updating the model by frame, we introduce an output evaluation score, which serves to supervise the tracking process and select highconfidence results for model update, thus paving a new way to avoid model corruption. Extensive experiments on OTB100 dataset well demonstrate the effectiveness of the proposed method BPCF, which gets an AUC score of 0.689 and outperforms most of the stateoftheart.
Introduction
Discriminative correlation filter (DCF) trackers [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17] have shown remarkable progress in recent years. The first CFbased method is Minimum Output Sum of Squared Error (MOSSE) [10], which receives a speed of more than 600 frames per second (FPS). After that, many improvements have been made to escalate its performance. The circulant structure of sequences is exploited to augment training samples [1]. Kernelized correlation filter is proposed to get a multichannel extension of linear correlation filters [9]. To integrate multiresolution feature maps, continuous convolution operators for visual tracking are also proposed [4] and utilized by many stateoftheart trackers, such as ECO [3] and CFWCR [12], among which CFWCR exploits the great power of deep convolutional neural networks (CNN) features without using any handcrafted features such as HOG [18] or color names [19], and achieves great performance in both accuracy and robustness. Afterwards, there are also trackers focusing on foreground feature selection [14] and reliability learning [20].
However, most of these methods only focus on foreground information, while they do not take good advantage of the background information which is also beneficial for tracking. Moreover, most trackers update the model after each frame, or after every N frames by using a sparse update scheme to avoid the tracker being dominated by recent samples. Nevertheless, such trackers still suffer from model corruption since they update the model indiscriminately regardless of whether the tracking result is accurate or not.
In this paper, aiming at the above issues, we propose a novel tracker, background perception correlation filter tracker (BPCF), based on an improved version of ECO [3], Correlation Filters with Weighted Convolution Responses (CFWCR) [12], which achieves remarkable results on VOT challenges [21]. In order to better exploit and make full use of the background information of the input, we propose a background perception regulation term. Particularly, we first divide the search area of the input images into several small pieces. By introducing a regulation term that minimizes the sum of the L2 distances between the convolutional outputs of all possible pairs of the pieces, we assign similar importance to all the different small pieces regardless of whether the region belongs to the background or the foreground part of the input samples. In addition, as for the problem of indiscriminate model update, we introduce a novel model update strategy by computing a confidence score for the tracking result after every N frames, and only update the model when the confidence score is higher than a preset threshold, i.e., a particular proportion of the average of all the previous confidence scores. Figure 1 shows some qualitative results of our proposed method BPCF compared to some stateoftheart on sample sequences of OTB2015, from which we can see that our method outperforms all the other trackers. Moreover, quantitative results on OTB2015 dataset show that our tracker BPCF has achieved stateoftheart performance.
To sum up, our work makes the following contributions:
We propose a new DCFbased tracking model which integrates a background perception regulation term to stress the equal contribution of the foreground and background information and a novel model update strategy to supervise the tracking results into a unified tracking framework.
A background perception regulation term is introduced to the existing CFWCR tracker to exploit the background information of the input samples and emphasize equal contributions of the background and foreground information to avoid the tracker being dominated by unreliable parts.
A novel model update strategy is proposed to avoid model corruption. Instead of updating the model by frame, we compute a confidence score for the tracking result after every N frames, and select highconfidence results for model update.
Related work
Correlation filterbased methods for visual object tracking have shown dominant results in recent years. The first CFbased tracker, MOSSE [10], which only uses grayscale image and singlechannel feature, could produce stable tracking results when initialized by a single frame and achieve the speed of 669 frames per second. After that, CFbased trackers become increasingly popular and have received impressive results on OTB2015 object tracking benchmarks [22]. Due to the short of training samples when initializing the tracker, Henriques et al. [1] introduced the circulant structure of input images to augment training samples. Later, Henriques et al. [9] used kernel regression, which has exactly the same complexity as its linear counterpart, to combine different features and received better results. Nevertheless, using the circulant structure of image sequences could beget boundary effect. To solve this problem, Danelljan et al. [8] introduced a spatial regularization term. The proposed regularization weights penalize the correlation filter coefficient by assigning higher value at the edge of the filter and lower value at the central part. Spatialtemporal regularization term is also introduced by Danelljan et al. [10]. The new filter could be solved efficiently via alternating direction method of multipliers (ADMM) and provide a 5× speedup. Later, Danelljan et al. [4] proposed continuous convolution operators to enable the integration of multiresolution deep feature maps. Many subsequent trackers based on this received good results. Danelljan et al. [3] revised CCOT [4] by introducing a factorized convolution operator, a compact generative model, which significantly reduced the computational complexity. He et al. [12] exploited the great power of deep CNN features without using any handcrafted features and got great performance both in accuracy and robustness. Gundogdu et al. [14] put forward the importance of feature selection. Sun et al. [20] introduced a joint discrimination and reliability learning method, which highlighted the importance of the foreground and its different reliability.
Most of the previous methods only focus on the foreground information and rely on handcrafted features such as HOG and CN. Differently, we only take deep features as our input and propose a background perception regulation term to ensure that the background and foreground of the input samples have similar contributions during tracking. By adding this regulation term, the tracker is unlikely to be dominated by unreliable parts of the target object, which could solve the problem of overfitting. Moreover, a novel model update strategy is proposed. Instead of updating the model regardless of whether the tracking result is precise or not, we update only when the tracking result is reliable and stop updating when the tracking result is incorrect or the tracking target is undergoing severe occlusion
Methods
The approach is actually twofold: firstly, we found that the background information in a given image sequence is always similar in consecutive frames, which could help to recognize a given target efficiently. Thus, we introduce a background perception regulation term, which could help us additionally exploit the background information and learn a more robust correlation filter. Secondly, most existing trackers update the tracker indiscriminately regardless of whether the tracking result is precise or not. The problem of such update strategy is that when the target is experiencing severe occlusion or the tracking result is imprecise, it will cause the tracker to corrupt. To solve this problem, a selfadaptive model update strategy is proposed. We introduce an output evaluation score, where the score is lower while the object is being occluded or the tracking result is incorrect. We then can select those reliable samples to update the model.
Base framework
Our framework, like many other DCF trackers, is based on CCOT, a theoretical framework for learning continuous convolution operators. This kind of tracker adopted an implicit interpolation model for the training samples. Assuming each sample x_{j} contains D feature channels\( {x}_j^1 \),\( {x}_j^2 \), …, \( {x}_j^D \), N_{d} as the number of spatial samples in \( {x}_j^d \), where d ∈ {0, 1, 2, …, D}. Unlike the traditional DCF trackers, where each feature channel \( {x}_j^d\in {\mathbb{R}}^{N_d} \) consists of N_{d} discrete spatial variables, it transfers the feature maps into continuous spatial domain by introducing an interpolation kernel b_{d} to get an interpolation operatorJ_{d}:
where the interpolation kernel b_{d} has a period of T (T > 0). Thus, the result N_{d}{x^{d}} is a continuous feature map with a period of T to be used for further computation. CFWCR framework negates handcrafted features such as HOG [18] and CN [19], and adopts CNN features to extract feature maps. Specifically, they use VGGM [23] network pretrained on ILSVRC [24] dataset to extract multiresolution continuous feature maps, and employ the first and the fifth convolutional layer as two deep feature channels. The filter f is trained by minimizing the following function:
where α_{j} represents the importance of each training sample, and ω is the spatial regulation term to avoid boundary effect.
The convolution responses of the two channels are weighted summed to get a final confidence response:
where the feature maps extracted from the first and fifth convolutional layers are first interpolated using Eq.1 and then convoluted with filter f^{a} and f^{b} trained by Eq.2. The assigned weights W_{1} and W_{2} denote the significance of each layer.
Background perception
We propose a background perception regulation termR(h, X), by which we regulate the filter to assign larger importance to the region where the extracted feature map X has a smaller value. By this means, we can assign similar importance to different regions of the training samples and avoid the tracker being dominated by unreliable parts. The regulation term can be formulated as:
where x_{k, d} ∈ ℝ^{K × 1} is the kth cyclically shift of the input vector x_{d} ∈ ℝ^{K × 1} for the dth channel. \( {P}_d^m=\operatorname{diag}\left({p}_d^m(1),...,{p}_d^m(K)\right)\in {\mathbb{R}}^{K\times K} \) is the mth binary mask (Fig. 2) which crops the samples to the mth subregion. h_{d} ∈ ℝ^{K × 1} is the target filter of dth channel. To simplify Eq.4, the formula is rewritten as follows:
where \( {P}^m=\mathit{\operatorname{diag}}\left({P}_1^m,...,{P}_D^m\right)\in {\mathbb{R}}^{DK\times DK} \) is a block diagonal matrix where \( {P}_d^m \) is the dth diagonal block. X_{d} = [x_{1, d}, x_{2, d}, ..., x_{K, d}] ∈ ℝ^{K × K} denotes all the cyclical shift of the input vector x_{d}. \( X={\left[{X}_1^T,{X}_2^T,...,{X}_D^T\right]}^T\in {\mathbb{R}}^{DK\times K} \) is a matrix that fits all the circulant matrices of different channels together. h ∈ ℝ^{DK × 1} is the ultimate filter.
By introducing this background perception term, the filters are learned by minimizing the following objective:
where y is the predefined Gaussian window objective function. The binary mask \( {P}^m=\operatorname{diag}\left({P}_1^m,...,{P}_D^m\right)\in {\mathbb{R}}^{DK\times DK} \) is a block diagonal matrix where \( {P}_d^m \) is the dth diagonal block. Input sample \( X={\left[{X}_1^T,{X}_2^T,...,{X}_D^T\right]}^T\in {\mathbb{R}}^{DK\times K} \)is a matrix that fits all the circulant matrices of different channels together. And h ∈ ℝ^{DK × 1} is the ultimate filter that we get.
To get the optimal filter h, we can solve the minimization problem by using conjugate gradient descent method. We first compute the derivative of Eq.7, and then set it to zero to get the following equation:
where A is defined as:
To solve the normal equation by conjugate gradient descent method, we employ the following iterative procedure:
where we set h_{0} = 0, r_{0} = Xy, P_{0} = r_{0}. And the number of conjugate gradient descent iterations is set to 5.
Target localization
In the target localization step in Nth frame, we first extract the feature map x of the search region. x^{d}denotes the feature map of the dth channel. The convolutional outputs of the two channels are computed by convolution operation and then weighted summed up to get a final confidence response S_{h}(x). The convolution process is computed in Fourier domain to reduce the computing burden.
where d = 2 and the two feature channels are the first and the fifth convolutional layer of VGG16 net pretrained on ILSVRC dataset. And α_{d} is the weight of each layer which denotes the importance of each feature channel.
Model update strategy
Most of the existing stateoftheart DCFbased trackers update the tracking model after each frame [8, 10]. These methods suffer from model corruption since they update the model indiscriminately no matter the tracking result is accurate or not. Moreover, the model is easy to be dominated by recent frames, in which case, if the tracking result of the recent frames is imprecise, the tracking mission is prone to be failed. Thus, when the target object is experiencing severe background cluster, deformation, and occlusion, the model will become highly unreliable. Other trackers such as ECO [3] and CFWCR [12] adopt the sparser updating scheme, where the filter is only updated by starting the optimization process at every N frames. Specifically, the traditional method of model update usually sets N = 1, and the sparser updating method mostly sets N = 5. By applying this scheme, the tracker could avoid being dominated by recent samples. Nevertheless, the problem of indiscriminate model updating still exists.
Our proposed model update strategy also adopts the sparse updating scheme, but unlike the traditional methods where the input samples extracted from each frame are used to update the model, we compute a confidence score for the tracking result and discard those with the confidence less than a threshold, i.e., a certain proportion of the average of all the previous scores. The confidence score for the tracking result is defined as:
where F_{max} and F_{min} denote the maximum and minimum value of the confidence response S_{h}(x). F_{x, y} denotes the xth row and the yth column of S_{h}(x). In this equation, we can see that the confidence score is in direct proportion to the peak value of the response and in inverse proportion to the standard deviation of the response.
Figure 3 illustrates the significance of the proposed method, from which we can see that when the target object is experiencing severe occlusion, the confidence score will reduce drastically due to the reduction of the peak value of the confidence response S_{h}(x) and the increase of the standard deviation of the response. When the output localization of the object is inaccurate, the fluctuation of the confidence response is far more intense and the confidence score reduces significantly. Moreover, the confidence response of different sequences can vary a lot while all of their tracking results are good. Thus, we cannot choose an invariable threshold to decide whether the result is reliable or not. We use the historical average values of the computed score as a testing criterion. If the computed score of the current frame is greater than its respective historical average values with a certain ratio, which is set to 0.6 in our proposed method, the output is considered reliable. In sum, by applying our novel model update strategy, we could discard those unreliable training samples during the process of tracking and only preserve the reliable ones for model update.
Tracking with background perception and selfadaptive template update
The algorithm flow of our proposed method is described in Table 1. For the target localization step, we compute the confidence response S_{h}(x) with Eq. (11) and then optimize S_{h}(x) by the standard Newton’s method. The optimization process will be completed in 5 iterations. As for the update step, we first calculate the confidence score S of the tracking result for the current frame and stop model update while the result has low confidence. The filter h is optimized with Eq. (8) by conjugate gradient descent method.
Results and discussion
In this section, we conduct comprehensive experiments to evaluate our proposed method BPCF and compare it with stateoftheart DCFbased trackers. We first present the implementation details in Section 4.1. Then we compare our tracker with its baseline CFWCR [12] tracker and some other stateoftheart trackers on OTB2015 dataset [22] both quantitatively and qualitatively in Section 4.2 and Section 4.3.
Implementation details
Our proposed method is implemented in MATLAB using Matconvnet tools, and the basic settings are the same as CFWCR. The features are extracted from the conv1 layer and the conv5 layer. The relative weight σ = W_{2}/W_{1} is set to 2. The search image is 4 times the size of the target object. The maximum number of stored training samples is set to 50, and the learning rate is 0.012 with a raining gap 5. The parameter of the proposed regulation term is set to 1.5.
Quantitative evaluation
In this section, we mainly evaluated our algorithm on OTB2015 [22] benchmark. We tested our tracker on all the 100 sequences with tracking difficulties including illuminate variation, outofplane rotation, scale variation, occlusion, deformation, motion blur, fast motion, inplane rotation, out of view, background clutter, low resolution, etc. One commonly used evaluation metric is the overlap score, which is defined as R_{t} ∩ R_{g}/R_{t} ∪ R_{g}, where R_{t} denotes the bounding box of the tracking result and R_{g} denotes the groundtruth bounding box. Given a threshold between 0 and 1, we can get an average success rate by comparing the overlap score and the certain threshold. The main criteria of OTB2015 to determine whether the result is good or not is the area under the curve (AUC) of each success plot, which is the average of the success rates corresponding to different overlap thresholds. In our experiment, we use the area under the curve (AUC) to generate the success plot of OPE.
Baseline comparison
We first compare BPCF with the based work CFWCR. Both are tested on OTB2015 benchmark. We can see in Fig. 4 that the performance of BPCF is escalated in all the situations including illuminate variation, outofplane rotation, scale variation, occlusion, deformation, motion blur, fast motion, inplane rotation, out of view, background clutter, low resolution, etc.
As shown in Fig. 4, our tracker has a significantly better performance compared with CFWCR. By implementing the background perception regulation term and the selfadaptive model update strategy jointly, the performance of our tracker improves by about 4%.
To get a better evaluation, we extend our experiment on OTB2015 benchmark by first separately implement our proposed background perception regulation term and model update strategy and then integrate them up at the same time. Table 2 shows the analysis of our contributions, from which we can see that our tracker BPCF has a significantly better performance compared with the based tracker. By implementing the background perception regulation term, our tracker improves by about 3.6%, and by adding the selfadaptive model update strategy furthermore, it improves by about 4%.
Stateoftheart comparison
Here, we use both success plots and the precision plots [22] over all 100 videos on OTB2015 dataset to analyze our approach BPCF. In the evaluation of success plots, the area under the curve (AUC) of success plots is used to rank the trackers. The precision plot reports the average distance precision score at 20 pixels for each method. We conduct extensive experiment and compare our tracker with 10 stateoftheart methods: MEEM [25], CCOT [4], CFWCR [12], ECO [3], DeepSRDCF [26], SRDCF [8], Staple [27], SiameseFC [28], DSST [29], and KCF [9].
The evaluation results of our proposed tracker BPCF and the 10 competitive trackers are demonstrated in Figs. 5 and 6, from which we can see that among all the existing trackers including some DCFbased trackers and some CNN based trackers, CCOT [4] and our based method CFWCR [12] provide the best results on both success plots and precision plots. The result of our proposed method BPCF outperforms both of them and provides the best result of 0.689 AUC score and 0.905 precision score. Besides, ECO [3] with handcrafted features gets an AUC score of 0.641 and a precision score of 0.854, and SRDCF [26] with deep features gets an AUC score of 0.634 and a precision score of 0.851. Other trackers have an AUC score less than 0.6 and a precision score less than 0.8.
Qualitative evaluation
To evaluate the performance of our tracker alone with some other stateoftheart trackers qualitatively, the tracking results of BPCF along with ECO, CCOT, CFWCR, etc. are presented in Fig. 7.
In Fig. 7a, the results of the sequence bird2 is shown, in which the tracking target is undergoing outofplane rotation and occlusion. We can intuitively see from the result that, among all the trackers, BPCF tracker works best and remains the most robust. All the other trackers drift more or less during tracking.
Figure 7b shows the results of the sequence carscale, where the tracking target is experiencing scale variation and occlusion. We can see from the figure that BPCF tracker stays stable while the target object experiences severe occlusion. The other trackers experience overfitting while the object’s scale varies.
In Fig 7c, the results of the sequences matrix and skating1 are presented, where target objects are undergoing illuminate variation. We can see from the result of the sequence matrix that due to the illuminate variation, other trackers experience some degree of imprecision, while BPCF remains the most robust. In the sequence skating1, CFWCR, and CCOT have lost the target halfway, while BPCF and ECO remain robust to the end of the sequence.
In Fig. 7d, we present the tracking result of sequences tiger1 and tiger2, in which motion blur and occlusion occur frequently. It is shown in the figure that all of the trackers managed to track the target successfully, although CFWCR has some occasional drift.
In Fig. 7e, the tracking results of sequences football and football1 are given. Background clutter can be seen in these two video sequences. We can see from Fig. 6e that when the target is occluded by other similar objects, all the other trackers except BPCF have a certain degree of drift. In the sequence football, one of the trackers even fail to locate the correct object after the occlusion.
Conclusions
In this paper, we discard the traditional use of the handcraft features and adopt deep features only for visual tracking. Moreover, we propose a background perception regulation term to alleviate the overmuch highlighting of the foreground of the input training samples. By making full use of the background information, our tracker works more robustly. The selfadaptive model update strategy is also implemented to avoid model corruption by selecting high confidence tracking results as training samples. We evaluate our method on OTB2015 benchmark and experimental results show that our tracker achieves the stateoftheart performance.
Availability of data and materials
All data are fully available without restriction.
Abbreviations
 CN:

Color names
 CNN:

Convolutional neural network
 DCF:

Discriminative correlation filter
 HOG:

Histogram of gradient
 VGG:

Visual geometry group
References
 1.
J Henriques, R Caseiro, P Martins, J Batista, in European Conference on Computer Vision. Exploiting the circulant structure of trackingbydetection with kernels (2012), pp. 702715.
 2.
M Danelljan, F. S. Khan, M Felsberg, J. V. D. Weijer, in IEEE Conference on Computer Vision and Pattern Recognition. Adaptive color attributes for realtime visual tracking (2014), pp. 1090–1097.
 3.
M Danelljan, G Bhat, F. S. Khan, M Felsberg, in IEEE Conference on Computer Vision and Pattern Recognition. ECO: Efficient Convolution Operators for Tracking (2017), pp. 21–26.
 4.
M Danelljan, A Robinson, F. S. Khan, M Felsberg, in European Conference on Computer Vision. Beyond correlation filters: learning continuous convolution operators for visual tracking (2016), pp. 472488.
 5.
R. Yao, S. Xia, F. Shen, Y. Zhou, Q. Niu, Exploiting spatial structure from parts for adaptive kernelized correlation filter tracker. IEEE Signal Process Lett 23, 658–662 (2016)
 6.
M. Danelljan, G. Häger, F.S. Khan, M. Felsberg, Discriminative scale space tracking. IEEE Trans Pattern Anal Machine Intell 39, 1561–1575 (2016)
 7.
M Wang, Y Liu, Z Huang, Large margin object tracking with circulant feature maps. IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 48004808.
 8.
M Danelljan, G Häger, F. S. Khan, M Felsberg, in IEEE International Conference on Computer Vision. Learning spatially regularized correlation filters for visual tracking (2015), pp. 4310–4318.
 9.
J.F. Henriques, C Rui, P Martins, J Batista, Highspeed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis & Machine Intelligence 37, 583596 (2014).
 10.
D.S. Bolme, J.R. Beveridge, B.A. Draper, Y.M. Lui, in IEEE Conference on Computer Vision and Pattern Recognition. Visual object tracking using adaptive correlation filters (2010), pp. 2544–2550.
 11.
F Li, C Tian, W Zuo, L Zhang, M. H. Yang, in IEEE Conference on Computer Vision and Pattern Recognition. Learning spatialtemporal regularized correlation filters for visual tracking (2018), pp. 49044913.
 12.
Z He, Y Fan, J Zhuang, Y Dong, H. L. Bai, In IEEE International Conference on Computer Vision Workshop. Correlation filters with weighted convolution responses (2017), pp. 19922000.
 13.
C Sun, D Wang, H Lu, M.H. Yang, in IEEE Conference on Computer Vision and Pattern Recognition. Learning spatialaware regressions for visual tracking (2018), pp. 89628970
 14.
E. Gundogdu, A.A. Alatan, Good features to correlate for visual tracking. IEEE Transactions on Image Process. 27(25262540) (2018)
 15.
T. Xu, Z.H. Feng, X.J. Wu, J. Kittler, Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process 28, 5596–5609 (2019)
 16.
B Huang, T Xu, B Liu, B Yuan, Context constraint and pattern memory for longterm correlation tracking. Neurocomputing. In Press, https://doi.org/10.1016/j.neucom.2019.10.021
 17.
B. Huang, T. Xu, S. Jiang, Y. Bai, Y. Chen, SVTN: Siamese Visual Tracking Networks with Spatially Constrained Correlation Filter and Saliency Prior Context Model. IEEE Access. 7, 144339–144353 (2019)
 18.
N Dalal, B Triggs, in IEEE Conference on Computer Vision and Pattern Recognition. Histograms of oriented gradients for human detection (2005), pp. 886893.
 19.
M Danelljan, G Häger, F.S. Khan, M Felsberg, in Scandinavian Conference on Image Analysis. Coloring channel representations for visual tracking (2015), pp. 117–129.
 20.
C Sun, D Wang, H Lu, M. H. Yang, in IEEE Conference on Computer Vision and Pattern Recognition. Correlation tracking via joint discrimination and reliability learning (2018), pp. 489497.
 21.
M Kristan, A Leonardis, J Matas, M Felsberg, Z He, et al, in IEEE International Conference on Computer Vision Workshop. The visual object tracking vot2017 challenge results (2017), pp. 19491972.
 22.
Y. Wu, J. Lim, M.H. Yang, Object tracking benchmark. IEEE Trans Pattern Anal Machine Intell 37, 1834–1848 (2015)
 23.
K Simonyan, A Zisserman, Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556 (2014).
 24.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual recognition challenge. Int J Comput Vision 115, 211–252 (2015)
 25.
J Zhang, S Ma, S Sclaroff, in European Conference on Computer Vision. MEEM: robust tracking via multiple experts using entropy minimization (2014), pp. 188203.
 26.
M Danelljan, G Häger, F. S. Khan, M Felsberg, in IEEE International Conference on Computer Vision (ICCV) Workshops. Convolutional features for correlation filter based visual tracking (2015), pp. 5966.
 27.
L Bertinetto, J Valmadre, S Golodetz, O Miksik, P. H. S. Torr, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Staple: complementary learners for realtime tracking (2016), pp. 14011409.
 28.
L Bertinetto, J Valmadre, J. F. Henriques, A Vedaldi, P. H. Torr, in European Conference on Computer Vision (ECCV) Workshops. FullyConvolutional Siamese Networks for Object Tracking (2016), pp. 850865.
 29.
M Danelljan, G Häger, F. S. Khan, M Felsberg, Accurate scale estimation for robust visual tracking. In BMVC (2014)
Acknowledgements
The authors would like to thank Tingfa Xu for the support.
Funding
This work was supported by the Major Science Instrument Program of the National Natural Science Foundation of China under Grant 61527802.
This work was supported by the General Program of National Nature Science Foundation of China under Grants 61371132 and 61471043.
Author information
Affiliations
Contributions
JL and TX conceived of the tracking method. YZ was responsible for the programming. FW and LW verified the analytical methods. YZ wrote the manuscript, and all authors revised the final manuscript. In addition, TX and JL are the corresponding authors. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Tingfa Xu.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Zhang, Y., Li, J., Wu, F. et al. Background perception for correlation filter tracker. J Wireless Com Network 2020, 20 (2020). https://doi.org/10.1186/s136380191630y
Received:
Accepted:
Published:
Keywords
 Correlation filter
 Background perception
 Model update
 Visual tracking