- Research
- Open Access

# Online adaptive complementation tracker

- Guokai Shi
^{1}, - Tingfa Xu
^{1, 2}Email author, - Jiqiang Luo
^{1}, - Jie Guo
^{1}and - Zhitao Rao
^{1}

**2018**:191

https://doi.org/10.1186/s13638-018-1198-y

© The Author(s). 2018

**Received:**10 April 2018**Accepted:**3 July 2018**Published:**6 August 2018

## Abstract

Correlation filter-based trackers have recently shown excellent performance in terms of motion blur and illumination changes, but they are notoriously sensitive to deformation. It has been demonstrated that the combination of the correlation filter-based tracker and the color histogram-based tracker can alleviate the deformation and keep advantages of the correlation filter-based tracker. However, the most existing complementary tracking algorithms, which use fixed complementary weights, limit the performance of every sub tracker. This paper introduces an adaptive complementary tracker by online learning dynamic complementary weights. The strategy enables inappropriate sub tracker to be down-weighted while increasing the impact of suitable one. We jointly learn the sub trackers and their reliability weights by regression analysis of the corresponding historical tracking results. The robustness of the model also can be improved by training each sub tracker with the result of historical tracking. Finally, both qualitative and quantitative evaluations demonstrate that our tracker achieves the state-of-the-art results in a wide range of tracking scenarios.

## Keywords

- Correlation filter
- Online learning
- Visual tracking
- Complementary tracker

## 1 Background

Visual tracking plays an important role in computer vision and has received fast-growth attention in recent years due to its wide practical application. In generic tracking, the task is to track an unknown target (only a bounding box defining the object of interest in a single frame is given) in an unknown video stream. This problem is very challenging due to the limited set of training samples and the numerous appearance changes, e.g., rotations, scale changes, occlusions, and deformations.

For solving these problems, many effective tracking models have been proposed [1] in recent years. These models can be categorized into methods based on subspace algorithms [2, 3], sparse representation [4–8], online classifiers [9–12], and so on. Recently, correlation filter-based tracking algorithms [13–17] have drawn increasing attention because of its dense sampling property and its fast computation in the Fourier domain.

Bolme et al. [13] propose the MOSSE tracker which finds a filter by minimizing the sum of squared error between the actual convolution outputs and the desired convolution outputs. The MOSSE tracker can handle several hundreds of frames per second because of the fast element-wise multiplication and division in the Fourier domain. Henriques et al. [18] extend correlation filters to a kernel space, leading to the Circulant Structure of Tracking-by-Detection with Kernels (CSK) tracker which achieves competitive performance and efficiency. To further improve the performance, the KCF method [19] integrates multiple features into the CSK tracking algorithm. The Discriminative Scale Space Tracker (DSST) tracker [20] utilizes one-dimensional correlation filters for online estimation of target scale to overcome a wide range of changes to target scale. Furthermore, Montero et al. [21] propose a scalable kernel correlation filter which introduces an adjustable Gaussian window function and a key point-based model for scale estimation. This strategy can deal with the fixed size limitation in the Kernelized Correlation Filter.

Correlation filters are inherently confined to the problem of learning a rigid template. By learning their model from circular shifts of positive examples, correlation filters fail to learn a component that is invariant to permutations. This makes them inherently sensitive to shape deformation. This is a concern when the target experiences shaped formation in the course of a sequence. Yao et al. [22] decompose the target object into several parts to exploit the spatial information of the object appearances. Their strategies improve the tracker’s performance in situations like occlusions and deformations. In addition, color histograms were used in many early approaches to object tracking [23, 24]; they have been demonstrated to be competitive in the Distractor-Aware Online Tracking (DAT) [25]. More recently, Luca et al. [26] propose a simple linear combination of template and color histogram scores by a fixed weight to alleviate shape deformation. However, the fixed weight limits the performance of the combination model.

To solve the discussed problem above, this paper presents an online adaptive complementary tracker. The contributions of this work are twofold. First, we propose a novel formulation for jointly learning two sub trackers and their reliability weights. These sub trackers have better generalization ability, because they use more samples during training. Second, we online analyze the reliability of every sub tracking model in the current scene to real-time adjust their importance in the complementary model. By the adjustment of the adaptive weight, our combination model shows better performance.

## 2 Methods

### 2.1 Problem formulation

*h*

_{1}and

*h*

_{2}and their weights

*α*and

*β*by minimizing a single loss function. To the best of our knowledge, we are the first to introduce the joint optimization framework by the joint loss,

Here, *l*_{1} and *l*_{2} denote two regression problems; they will be used to learn two sub models which make up our complementary appearance model. *h*_{1} is a correlation filter-based tracker, and its complementary tracker *h*_{2} is based on color histogram scores. The weight *ε* assures *l*_{1} and *l*_{2} consistent value interval. ℛ_{1}, ℛ_{2}, and ℛ_{3} are three independent regularization terms.

### 2.2 Constructing and solving the sub tracker *h*
_{1}

We use the Background-aware Correlation Filters (BACF) model [27] as baseline model of *h*_{1} because it can effectively handle the boundary effect which is a fundamental drawback to traditional correlation filter-based trackers. However, similar to many other correlation filter-based trackers, the BACF model also depends strongly on the spatial layout of the tracked object, so is sensitive to deformation. In addition, the BACF model ignores the problem of corrupted samples, and when those corrupted samples are included in the training set, the model drift may be caused.

*h*

_{1}:

*P*denotes transform matrix which is a

*E*×

*J*binary matrix for cropping mid

*E*elements of sample \( {\varphi}_i^d \),

*E*≪

*J*. The cropped patch corresponding to the peak of the correlation output displays the target (positive example), and those corresponding to the zero values of the correlation output display the background content (negative examples). The transform matrix

*P*can be pre-computed, and \( P{\varphi}_i^d \) can be efficiently performed via a lookup table.

*φ*is expressed by Histogram of Oriented Gradients (HOG) features. Compared with [28], our method adopts the sampling strategy of the BACF, which introduces the transform matrix

*P*into cyclic sampling to improve the distinction between samples. More importantly, the proposed optimization model of the parameter

*h*

_{1}not only purifies the training sample set, but also verifies the reliability of the model for current sequence.

*I*

_{D}is an

*D*×

*D*identity matrix, and the operator ⊗ is the Kronecker product. A ˆ denotes the discrete Fourier transform (DFT). The orthonormal matrix

*F*can map any

*J*dimensional vectorized signal to the Fourier domain.

*T*indicates the transpose operator, which computes the conjugate transpose on a complex vector or matrix.

- A.
*Solving sub problem h*_{1}^{∗}

- B.
*Solving sub problem*\( {\widehat{g}}^{\ast } \)

*D*values of \( {\widehat{\varphi}}_i(j)={\left[{\widehat{\varphi}}_i^1(j),\dots, {\widehat{\varphi}}_i^D(j)\right]}^T \) and \( \widehat{g}(j)={\left[ conj\left({\widehat{g}}^1(j)\right),\dots, conj\left({\widehat{g}}^D(j)\right)\right]}^T \). Here, the operator conj (.) indicates the complex conjugate transform. Similar to [27], solving Eq. 5 for \( {\widehat{\mathrm{g}}}^{\ast } \) can be identically expressed as

*J*smaller, independent objectives, solving for \( \widehat{g}{(j)}^{\ast } \):

*A*into a lower triangular component

*L*and a strictly upper triangular component

*U*, such that

*A*=

*U*+

*L*. The filter \( \widehat{g}{(j)}^{\ast } \) is then iteratively calculated by equation:

*m*is

*m*th iteration in Gauss-Seidel iteration operation.

- C.
*Lagrangian vector update*

*n*indicates the

*n*th iteration in the ADMM operation.

### 2.3 Constructing and solving the sub tracker *h*
_{2}

*h*

_{1}. The difference is that we extend the training set of the model in the time dimension. The expanded model, which makes use of more diverse training samples and has better robustness, can be learn by optimizing Eq. 10:

*d*is non-zero. \( {N}^d\left(\mathcal{A}\right)=\left|\left\{u\in \mathcal{A}:k\left[u\right]=d\right\}\right| \) is the number of pixels in the region \( \mathcal{A} \) of the feature

*ψ*.

### 2.4 Solving the reliability weights *α* and *β*

*h*

_{1}and

*h*

_{2}are obtained, Eq. 1 can be expressed as an optimization problem for

*α*and

*β*. The optimization problem about

*α*and

*β*can be rewritten into the following form:

*w*= [

*w*

_{1},

*w*

_{2}, …

*w*

_{i}…

*w*

_{t}],

*L*= [

*L*

_{1}, …,

*L*

_{i}, …,

*L*

_{t}]

^{T},

*w*

_{i}= (

*α*

_{i},

*ε*

^{∗}

*β*

_{i}),

*α*

_{i}> 0,

*β*

_{i}> 0, and

*L*

_{i}= (

*l*

_{1},

*l*

_{2})

_{i}

^{T}.

*p*

_{i}is the prior weights whose definition is similar with one in [28],

*p*

_{i}> 0 and \( {\sum}_{i=1}^t{p}_i=1 \). In the control of the parameter

*p*

_{i}, recent fame is given larger attention to account for fast appearance changes.

We can use the convex quadratic programming methods in Matlab’s Optimization Toolbox to solve the above optimization problem.

### 2.5 Solving the complementary weights

*g*and spatial layout features

*φ*.

\( {f}_{h_2} \) denotes a histogram scores, which is computed from an M-channel feature image.

*γ*is an adaptive variable instead of pre-assigned constant as one in [26]. Reliability weights obtained by Eq. 13 can reflect the reliability of the corresponding sub model. To alleviate the problem of rapid appearance changes, we choose the reliability weights of the nearest

*n*(

*n*= 10) frames to dynamically adjust our weight

*γ*. In addition, considering that correlation filter-based tracker is more favorable for accurate location than the color histogram-based tracker, we choose the template scores as the main tracking scores while the color histogram scores as accessorial ones. Based on the discussion above, we construct the following complementary weight model:

Here, *σ*_{1} and *σ*_{2} are the variances of the data set {*α*_{i} − *p*_{i}| *i* = *t* − *n*, …, *t*} and {*β*_{i} − *p*_{i}| *i* = *t* − *n*, …, *t*}, respectively. \( {\mathcal{T}}_{\mathrm{low}} \) and \( {\mathcal{T}}_{\mathrm{up}} \) are the lower threshold and upper threshold respectively.

## 3 Experimental results and discussion

The proposed method in this paper is implemented in MATLAB 2016a. We perform the experiments on a PC with Intel i7-4790 CPU (3.6GHz) and 16-GB RAM memory, and the tracker runs at 15 fps. We extensively evaluate the performance of the proposed tracker with the total 100 challenging sequences from [30] and compare our tracker with the top 10 state-of-the-art trackers, including DSST [20], Structured Output Tracking with Kernels (Struck) [31], BACF [27], Tracking-by-Detection (TLD) [32], CSK [18], Spatially Regularized Discriminative Correlation Filters (SRDCF) [33], Tracking via Multiple Experts Using Entropy Minimization (MEEM) [34], Adaptive Color Tracker (ACorT) [35], Staple, and DAT [25]. The DSST, the CSK, the SRDCF, and the ACorT belong to the correlation filter-based tracker, which more attention to the structural attribute of the target. The Struck, the TLD, and the MEEM are tracking algorithms based on classifier learning, which divides the scene into target and background. The DAT belongs to the unstructured tracker, which is suitable for the tracking target with severe deformation. The Staple is a combined tracker, which has the strongest correlation with our algorithm.

### 3.1 Parameter setting

In the proposed tracking model, the HOG feature and the color histogram feature are two important factors affecting the computation. During the experiment, we set HOG features to 31 channels and use cell size of 4 × 4 pixels in order to be unified with other algorithms, e.g., BACF, Staple, DSST, and CSK. The size of the search region is set to be 4^{2} times the size of the target area in all the algorithms. Then, we normalized samples to a fixed size by a 50 × 50 square, so that the fps is unaffected by the height and width of video sequence. In the Alternating Direction Method of Multipliers (ADMM) optimization, we set the number of iterations to 4, which is enough to ensure algorithm robustness. Although more iteration is beneficial to improve algorithm performance, it will affect computational efficiency. The penalty factor *μ* is set to 0.25. The regularization factor *λ*_{`1} and *λ*_{2} are set to 0.001 and 0.01, respectively. The iterations of the Gauss-Seidel method are set to be 5, and the learning rate is adopted as *η* = 0.02. The lower threshold \( {\mathcal{T}}_{\mathrm{low}}=0.15 \) and the upper threshold \( {\mathcal{T}}_{\mathrm{up}}=0.8 \). Moreover, bin color histogram is set into 32 × 32 × 32, which has similar settings with the Staple tracker. All parameters remain fixed in all experiments. We run our approach approximately for 15 frames per second.

In practice, the penalty factor, the regularization factor, ADMM iteration number, and the Gauss-Seidel iteration number are independent of the specific scenario and only related to the proposed model itself. The cell size of HOG features, the size of the search region, and the bin color histogram, these three parameters will be affected by the target size and image resolution.

### 3.2 Qualitative evaluation

Figure 2a shows the tracking results on the *Box* and *Coke* sequences. In this figure, the targets undergo partial or short-term total occlusions. Our tracker can effectively track these objects, which benefits from two factors. First, our approach is capable of reducing or removing the impact of uncorrupted samples by readjusting the reliability weights of the sub model in training, thereby lowering the risk of drift and tracking failure. Second, the uncorrupted part of the targets and the background context can still form some uncorrupted samples for training and detection, which is similar to BACF. Most of the traditional correlation filter-based trackers DAT, CSK, ACorT, and DSST may drift after occlusions or fail to track the targets because their search areas are limited and uncorrupted samples are used during the training. BACF and SRDCF can handle some slight occlusions, but when undergo severe occlusions, they all drift away.

Figure 2b is the tracking results on the *KiteSurf* and *Couple* sequences. *KiteSurf* and *Couple* are two sequences of some representative sequences where the target is fast motion. The figure shows that the proposed tracker gain outstanding performance handles. This can be mainly attributed to the correlation filters and the discriminative ability of the proposed method. The proposed method works well in all these sequences. On the one hand, the color histogram information can be used effectively to handle deformations by dynamically increasing the impact of color histogram-based tracker. On the other hand, when target is in fast motion, the impact of correlation filter-based tracker will be dynamically increased. The sub tracker can effectively handle fast motion because allows larger search region. Other trackers also have some abilities against target deformations (i.e., Stape and MEEM), but most of them are failed to track targets when severe deformations are coming.

Figure 2c shows the tracking results on two representative sequences *Rubik* and *DragonBaby* s where the targets are partially or fully occluded. Our tracker performs well in in-plane rotation and out-plane rotation. These properties mainly benefit from up-weighting color histogram-based tracker.

*Box*and

*DragonBaby*), our approach achieved the best results. In two of the remaining four sequences (the

*Rubik*and

*Couple*sequences), our approach obtained the second best results and were very close to the best ones. In the remaining two sequences (

*Coke*and

*Kitesurf*), our tracker also performs better. Generally, our method performed well against existing trackers.

### 3.3 Quantitative evaluation

We use the precision plots and the success plots metric [30] to compare all trackers on OTB-100. Precision plot reports the average distance precision score at 20 pixels for each method. In the evaluation of success plots, the area under the curve (AUC) of success plots is used to rank the trackers.

## 4 Conclusions

In order to alleviate the deformation, this paper presents to online learn an adaptive complementation tracking algorithm. The proposed strategy can dynamically adapt the importance of each sub tracker according to the real-time status of the scene, making the tracker more robust for various target appearance changes. In addition to, joint learning model reliability weights and tracker can also effectively control the reliability of training samples and improved the discriminability of the tracker. Experimental results show that our approach outperforms state-of-the art tracking algorithms on many challenging videos from the OTB.

## Declarations

### Acknowledgements

The authors would like to thank Image Engineering &Video Technology Lab for the support.

### Funding

This work was supported by the Major Science Instrument Program of the National Natural Science Foundation of China under Grant 61527802 and the General Program of the National Natural Science Foundation of China under Grants 61371132 and 61471043.

### Availability of data and materials

All data are fully available without restriction.

### Authors’ contributions

GS and TX conceived and designed the algorithm and the experiments. GS was responsible for most of the implementation of the algorithms. JL, JG, ZR, and GS analyzed the data. GS provided suggestions for the proposed method and its evaluation, and assisted in the preparation of the manuscript. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- AW Smeulders, DM Chu, R Cucchiara, S Calderara, A Dehghan, M Shah, Visual tracking: an experimental survey. IEEE Transactions on Pattern Analysis & Machine Intelligence 36, 1442–1468 (2014).View ArticleGoogle Scholar
- DA Ross, J Lim, R-S Lin, M-H Yang, Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77, 125–141 (2008).View ArticleGoogle Scholar
- D Wang, H Lu, MH Yang, Online object tracking with sparse prototypes. IEEE Trans. Image Process. 22, 314–325 (2013).MathSciNetView ArticleMATHGoogle Scholar
- X Mei, H Ling, in
*IEEE International Conference on Computer Vision*. Robust visual tracking using ℓ 1 minimization (2009), pp. 1436–1443.Google Scholar - H Lu, X Jia, MH Yang, in
*IEEE Conference on Computer Vision and Pattern Recognition*. Visual tracking via adaptive structural local sparse appearance model (2012), pp. 1822–1829.Google Scholar - T Zhang, B Ghanem, S Liu, N Ahuja, Robust visual tracking via structured multi-task sparse learning. Int. J. Comput. Vis. 101, 367–383 (2013).MathSciNetView ArticleGoogle Scholar
- T Zhang, B Ghanem, C Xu, N Ahuja, in
*IEEE Conference on Computer Vision and Pattern Recognition Workshops*. Object tracking by occlusion detection via structured sparse learning (2013), pp. 1033–1040.View ArticleGoogle Scholar - D Wang, H Lu, C Bo, Online visual tracking via two view sparse representation. IEEE Signal Processing Letters 21, 1031–1034 (2014).Google Scholar
- S Avidan, Support vector tracking. IEEE Transactions on Pattern Analysis & Machine Intelligence 26, 1064–1072 (2004).View ArticleGoogle Scholar
- S Avidan, Ensemble tracking. IEEE Transactions on Pattern Analysis & Machine Intelligence 29, 261–271 (2007).Google Scholar
- B Babenko, MH Yang, S Belongie, in
*EEE Conference on Computer Vision and Pattern Recognition*. Visual tracking with online multiple instance learning (2009), pp. 983–990.Google Scholar - J Ning, J Yang, S Jiang, L Zhang, M-H Yang, in
*IEEE Conference on Computer Vision and Pattern Recognition*. Object tracking via dual linear structured SVM and explicit feature map (2016), pp. 4266–4274.Google Scholar - DS Bolme, JR Beveridge, BA Draper, YM Lui, in
*EEE Conference on Computer Vision and Pattern Recognition*. Visual object tracking using adaptive correlation filters (2010), pp. 2544–2550.Google Scholar - M Danelljan, G Häger, F Khan, M Felsberg, in
*British Machine Vision Conference, Nottingham, September*. Accurate scale estimation for robust visual tracking (2014), pp. 1–5.Google Scholar - H JF, R C., P M., J B., High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis\s&\smachine Intelligence 37, 583 (2014).Google Scholar
- C Ma, X Yang, C Zhang, MH Yang, in
*EEE Conference on Computer Vision and Pattern Recognition*. Long-term correlation tracking (2015), pp. 5388–5396.Google Scholar - M Wang, Y Liu, Z Huang, Large margin object tracking with circulant feature maps. IEEE Conference on Computer Vision and Pattern Recognition (2017), pp.4800–4808.Google Scholar
- R Yao, S Xia, F Shen, Y Zhou, Q Niu, Exploiting spatial structure from parts for adaptive kernelized correlation filter tracker. IEEE Signal Processing Letters 23, 658–662 (2016).View ArticleGoogle Scholar
- JF Henriques, R Caseiro, P Martins, J Batista, High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 583–596 (2015).View ArticleGoogle Scholar
- M Danelljan, G Häger, FS Khan, in
*British Machine Vision Conference*. Accurate scale estimation for robust visual tracking (2014), pp. 65.1–65.11.Google Scholar - AS Montero, J Lang, R Laganière, in
*IEEE International Conference on Computer Vision Workshop*. Scalable kernel correlation filter with sparse feature integration (2016), pp. 587–594.Google Scholar - T Liu, G Wang, Q Yang, in
*IEEE Conference on Computer Vision and Pattern Recognition*. Real-time part-based visual tracking via adaptive correlation filters (2015), pp. 4902–4912.Google Scholar - P Pérez, C Hue, J Vermaak, M Gangnet, Color-based probabilistic tracking. ECCV, 2002
**I**, 661–675 (2002).MATHGoogle Scholar - D Comaniciu, V Ramesh, P Meer, Kernel-based object tracking. IEEE Transactions on Pattern Analysis & Machine Intelligence
**25**, 564–575 (2003).View ArticleGoogle Scholar - H Possegger, T Mauthner, H Bischof, in
*IEEE Conference on Computer Vision and Pattern Recognition*. In defense of color-based model-free tracking (2015), pp. 2113–2120.Google Scholar - L Bertinetto, J Valmadre, S Golodetz, O Miksik, PHS Torr, in
*Computer Vision and Pattern Recognition*. Staple: complementary learners for real-time tracking (2016), pp. 1401–1409.Google Scholar - HK Galoogahi, A Fagg, S Lucey, in
*IEEE Conference on Computer Vision and Pattern Recognition*. Learning background-aware correlation filters for visual tracking (2017), pp. 1135–1143.Google Scholar - M Danelljan, G Häger, FS Khan, M Felsberg, in
*IEEE Conference on Computer Vision and Pattern Recognition*. Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking (2016), pp. 1430–1438.Google Scholar - S Boyd, N Parikh, E Chu, B Peleato, J Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations & Trends in Machine Learning 3, 1–122 (2010).View ArticleMATHGoogle Scholar
- Y Wu, J Lim, M-H Yang, Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1834–1848 (2015).View ArticleGoogle Scholar
- S Hare, S Golodetz, A Saffari, V Vineet, M-M Cheng, SL Hicks, et al., Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2096–2109 (2016).View ArticleGoogle Scholar
- Z Kalal, K Mikolajczyk, J Matas, Tracking-learning-detection. IEEE Transactions on Pattern Analysis & Machine Intelligence 34, 1409–1422 (2012).View ArticleGoogle Scholar
- M Danelljan, G Hager, F Shahbaz Khan, M Felsberg, in
*IEEE International Conference on Computer Vision*. Learning spatially regularized correlation filters for visual tracking (2015), pp. 4310–4318.Google Scholar - Zhang J, Ma S, Sclaroff S. MEEM: robust tracking via multiple experts using entropy minimization. European Conference on Computer Vision (2014), pp. 188–203.Google Scholar
- M Danelljan, FS Khan, M Felsberg, JVD Weijer, in
*IEEE Conference on Computer Vision and Pattern Recognition*. Adaptive color attributes for real-time visual tracking (2014), pp. 1090–1097.Google Scholar