Citation: | SANG Haifeng, LI Gongming, ZHAO Ziyu, “Multi-scale Global Retrieval and Temporal-Spatial Consistency Matching based long-term Tracking Network,” Chinese Journal of Electronics, in press, doi: 10.23919/cje.2021.00.195, 2022. |
Compared with the traditional short-term object tracking task based on temporal-spatial consistency, the long-term object tracking task faces the challenges of object disappearance, dramatic changes in object scale, and object appearance. To address these challenges and problems, in this paper we propose a Multi-scale Global Retrieval and Temporal-Spatial Consistency Matching based long-term Tracking Network (MTTNet). MTTNet regards the long-term tracking task as a single sample object detection task and takes full advantage of the temporal-spatial consistency assumption between adjacent video frames to improve the tracking accuracy. MTTNet utilizes the information of single sample as guidance to perform full-image multi-scale retrieval on any instance and does not require online learning and trajectory refinement. Any type of error generated during the detection process will not affect its performance on subsequent video frames. This can overcome the accumulation of errors in the tracking process of traditional object tracking networks. We introduce Atrous Spatial Pyramid Pooling to address the challenge of dramatic changes in the scale and the appearance of the object. On the experimental results, MTTNet can achieve better performance than composite processing methods on two large datasets.
[1] |
C. Mayer, M. Danelljan, D. Pani Paudel, et al., “Learning Target Candidate Association to Keep Track of What Not to Track,” IEEE/CVF International Conference on Computer Vision (ICCV), pp.13424–13434, 2021.
|
[2] |
B. Yan, H. Peng, J. Fu, et al., “Learning Spatio-Temporal Transformer for Visual Tracking,” IEEE/CVF International Conference on Computer Vision (ICCV), pp.10428–10437, 2021.
|
[3] |
K. Dai, Y. Zhang, D. Wang, et al., “Tracking-Learning-Detection,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.6297–6306, 2020.
|
[4] |
Z. Kalal, K. Mikolajczyk and J. Matas, “Tracking-Learning-Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, no.7, pp.1409–1422, 2012. doi: 10.1109/TPAMI.2011.239
|
[5] |
C. Ma, X. Yang, Chongyang Zhang, et al., “Conference on Computer Vision and Pattern Recognition (CVPR),” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5388–5396, 2015.
|
[6] |
C. Fu, J. Ye, J. Xu, et al., “Disruptor-Aware Interval-Based Response Inconsistency for Correlation Filters in Real-Time Aerial Tracking,” IEEE Transactions on Geoscience and Remote Sensing, vol.59, no.8, pp.6301–4313, 2021. doi: 10.1109/TGRS.2020.3030265
|
[7] |
Z. Hong, Z. Chen, C. Wang, et al., “MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.749–758, 2015.
|
[8] |
G. Nebehay and R. Pflugfelder, “Clustering of static-adaptive correspondences for deformable object tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2784–2791, 2015.
|
[9] |
G. Zhu, F. Porikli and H. Li, “Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.943–951, 2016.
|
[10] |
M. Kristan, A. Leonardis, J. Matas, et al., “The sixth visual object tracking vot2018 challenge results,” Proceedings of the European Conference on Computer Vision (ECCV) Workshops, article no.3?53, 2018.
|
[11] |
L. Zhang, A. Gonzalez-Garcia, M. Danelljan, et al., “Learning the Model Update for Siamese Trackers,” IEEE/CVF International Conference on Computer Vision (ICCV), pp.4009–4018, 2019.
|
[12] |
B. Li, J. Yan, W. Wu, et al., “High Performance Visual Tracking with Siamese Region Proposal Network,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.8971–8980, 2018.
|
[13] |
S. Hare, S. Golodetz, A. Saffari, et al., “Struck: Structured Output Tracking with Kernels,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, no.10, pp.2096–2109, 2016. doi: 10.1109/TPAMI.2015.2509974
|
[14] |
H. Fan and H. Ling, “Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking,” IEEE International Conference on Computer Vision (ICCV), pp.5487–5495, 2017.
|
[15] |
J. Valmadre, L. Bertinetto, J.F. Henriques, et al., “Long-term tracking in the wild: A benchmark,” Proceedings of the European conference on computer vision (ECCV), pp.692–707, 2018.
|
[16] |
M. Cen and C. Jung, “Fully Convolutional Siamese Fusion Networks for Object Tracking,” IEEE International Conference on Image Processing (ICIP), pp.3718–3722, 2018.
|
[17] |
S.M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, et al., “Deep Learning for Visual Tracking: A Comprehensive Survey,” IEEE Transactions on Intelligent Transportation Systems, pp.1–26, 2021.
|
[18] |
B. Yan, H. Zhao, D. Wang, et al., “'Skimming-Perusal' Tracking: A Framework for Real-Time and Robust Long-Term Tracking,,” IEEE/CVF International Conference on Computer Vision (ICCV), pp.2385–2393, 2019.
|
[19] |
L. Huang, X. Zhao, K. Huang, “A simple and strong baseline for long-term tracking,” Proceedings of the AAAI Conference on Artificial Intelligence, vol.34, no.07, pp.11037–11044, 2020. doi: 10.1609/aaai.v34i07.6758
|
[20] |
G. Papandreou, I. Kokkinos, K. Murphy, et al., “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, no.4, pp.834–848, 2018. doi: 10.1109/TPAMI.2017.2699184
|
[21] |
J. Deng, W. Dong, R. Socher, et al., “ImageNet: A large-scale hierarchical image database,” IEEE Conference on Computer Vision and Pattern Recognition, pp.248–255, 2009.
|
[22] |
R. Llugsi, S.E. Yacoubi, A. Fontaine, et al., “Comparison between Adam, AdaMax and Adam W optimizers to implement a Weather Forecast based on Neural Networks for the Andean city of Quito,” IEEE Fifth Ecuador Technical Chapters Meeting (ETCM), pp.1–6, 2021.
|
[23] |
M. Danelljan, G. Bhat, F.S. Khan, et al., “ECO: Efficient Convolution Operators for Tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6931–6939, 2017.
|
[24] |
H. K. Galoogahi, A. Fagg and S. Lucey, “Learning Background-Aware Correlation Filters for Visual Tracking,” IEEE International Conference on Computer Vision (ICCV), pp.1144–1152, 2017.
|
[25] |
L. Bertinetto, J. Valmadre, S. Golodetz, et al., “Staple: Complementary Learners for Real-Time Tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1401–1409, 2016.
|
[26] |
H. Nam and B. Han, “Learning Multi-domain Convolutional Neural Networks for Visual Tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4293–4302, 2016.
|
[27] |
R. Tao, E. Gavves and A.W M. Smeulders, “Siamese Instance Search for Tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1420–1429, 2016.
|