Multi-scale Global Retrieval and Temporal-Spatial Consistency Matching based long-term Tracking Network

SANG Haifeng; LI Gongming; ZHAO Ziyu

doi:10.23919/cje.2021.00.195

Article Contents

Article Navigation > Chinese Journal of Electronics > 2022 > Accepted Manuscript

SANG Haifeng, LI Gongming, ZHAO Ziyu, “Multi-scale Global Retrieval and Temporal-Spatial Consistency Matching based long-term Tracking Network,” Chinese Journal of Electronics, in press, doi: 10.23919/cje.2021.00.195, 2022.

Citation:

SANG Haifeng, LI Gongming, ZHAO Ziyu, “Multi-scale Global Retrieval and Temporal-Spatial Consistency Matching based long-term Tracking Network,” Chinese Journal of Electronics, in press, doi: 10.23919/cje.2021.00.195, 2022.

Citation:

PDF( 3471 KB)

Multi-scale Global Retrieval and Temporal-Spatial Consistency Matching based long-term Tracking Network

doi: 10.23919/cje.2021.00.195

1.
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China

Funds: This work is supported by the National Natural Science Foundation of China (No.62173078) and Project of Liaoning Provincial Department of Education (Project of Liaoning Provincial Department of Education No.LJGD2020006)

More Information

Author Bio:
received the B.S. and M.S. degrees from Northeast Normal University, Changchun, China, in 2000 and 2003, respectively, and the Ph.D. degree from Northeastern University, Shenyang, China, in 2006. He is currently a Professor with Shenyang University of Technology. His current research interests include machine vision detection technology and biometric identification technology research. (Email: sanghaif@163.com)

(corresponding author) received the B.S. degree from the Shenyang University of Technology, Shenyang, China, in 2017, where he is currently pursuing the M.S. degree in instrument science and technology. His current research interest includes computer vision and object tracking and detection. (Email: 980432131@qq.com)

received the B.S. and M.S. degree from the Shenyang University of Technology, Shenyang, China, in 2013 and 2017. His current research interest includes action recognition in computer vision. (Email: 643306538@qq.com)
Available Online: 2022-07-20

Abstract

Abstract

Compared with the traditional short-term object tracking task based on temporal-spatial consistency, the long-term object tracking task faces the challenges of object disappearance, dramatic changes in object scale, and object appearance. To address these challenges and problems, in this paper we propose a Multi-scale Global Retrieval and Temporal-Spatial Consistency Matching based long-term Tracking Network (MTTNet). MTTNet regards the long-term tracking task as a single sample object detection task and takes full advantage of the temporal-spatial consistency assumption between adjacent video frames to improve the tracking accuracy. MTTNet utilizes the information of single sample as guidance to perform full-image multi-scale retrieval on any instance and does not require online learning and trajectory refinement. Any type of error generated during the detection process will not affect its performance on subsequent video frames. This can overcome the accumulation of errors in the tracking process of traditional object tracking networks. We introduce Atrous Spatial Pyramid Pooling to address the challenge of dramatic changes in the scale and the appearance of the object. On the experimental results, MTTNet can achieve better performance than composite processing methods on two large datasets.
- long-term object tracking,
- global retrieval,
- Atrous Spatial Pyramid Pooling

FullText(HTML)

References(27)

References

[1]	C. Mayer, M. Danelljan, D. Pani Paudel, et al., “Learning Target Candidate Association to Keep Track of What Not to Track,” IEEE/CVF International Conference on Computer Vision (ICCV), pp.13424–13434, 2021.
[2]	B. Yan, H. Peng, J. Fu, et al., “Learning Spatio-Temporal Transformer for Visual Tracking,” IEEE/CVF International Conference on Computer Vision (ICCV), pp.10428–10437, 2021.
[3]	K. Dai, Y. Zhang, D. Wang, et al., “Tracking-Learning-Detection,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.6297–6306, 2020.
[4]	Z. Kalal, K. Mikolajczyk and J. Matas, “Tracking-Learning-Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, no.7, pp.1409–1422, 2012. doi: 10.1109/TPAMI.2011.239
[5]	C. Ma, X. Yang, Chongyang Zhang, et al., “Conference on Computer Vision and Pattern Recognition (CVPR),” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5388–5396, 2015.
[6]	C. Fu, J. Ye, J. Xu, et al., “Disruptor-Aware Interval-Based Response Inconsistency for Correlation Filters in Real-Time Aerial Tracking,” IEEE Transactions on Geoscience and Remote Sensing, vol.59, no.8, pp.6301–4313, 2021. doi: 10.1109/TGRS.2020.3030265
[7]	Z. Hong, Z. Chen, C. Wang, et al., “MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.749–758, 2015.
[8]	G. Nebehay and R. Pflugfelder, “Clustering of static-adaptive correspondences for deformable object tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2784–2791, 2015.
[9]	G. Zhu, F. Porikli and H. Li, “Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.943–951, 2016.
[10]	M. Kristan, A. Leonardis, J. Matas, et al., “The sixth visual object tracking vot2018 challenge results,” Proceedings of the European Conference on Computer Vision (ECCV) Workshops, article no.3?53, 2018.
[11]	L. Zhang, A. Gonzalez-Garcia, M. Danelljan, et al., “Learning the Model Update for Siamese Trackers,” IEEE/CVF International Conference on Computer Vision (ICCV), pp.4009–4018, 2019.
[12]	B. Li, J. Yan, W. Wu, et al., “High Performance Visual Tracking with Siamese Region Proposal Network,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.8971–8980, 2018.
[13]	S. Hare, S. Golodetz, A. Saffari, et al., “Struck: Structured Output Tracking with Kernels,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, no.10, pp.2096–2109, 2016. doi: 10.1109/TPAMI.2015.2509974
[14]	H. Fan and H. Ling, “Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking,” IEEE International Conference on Computer Vision (ICCV), pp.5487–5495, 2017.
[15]	J. Valmadre, L. Bertinetto, J.F. Henriques, et al., “Long-term tracking in the wild: A benchmark,” Proceedings of the European conference on computer vision (ECCV), pp.692–707, 2018.
[16]	M. Cen and C. Jung, “Fully Convolutional Siamese Fusion Networks for Object Tracking,” IEEE International Conference on Image Processing (ICIP), pp.3718–3722, 2018.
[17]	S.M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, et al., “Deep Learning for Visual Tracking: A Comprehensive Survey,” IEEE Transactions on Intelligent Transportation Systems, pp.1–26, 2021.
[18]	B. Yan, H. Zhao, D. Wang, et al., “'Skimming-Perusal' Tracking: A Framework for Real-Time and Robust Long-Term Tracking,,” IEEE/CVF International Conference on Computer Vision (ICCV), pp.2385–2393, 2019.
[19]	L. Huang, X. Zhao, K. Huang, “A simple and strong baseline for long-term tracking,” Proceedings of the AAAI Conference on Artificial Intelligence, vol.34, no.07, pp.11037–11044, 2020. doi: 10.1609/aaai.v34i07.6758
[20]	G. Papandreou, I. Kokkinos, K. Murphy, et al., “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, no.4, pp.834–848, 2018. doi: 10.1109/TPAMI.2017.2699184
[21]	J. Deng, W. Dong, R. Socher, et al., “ImageNet: A large-scale hierarchical image database,” IEEE Conference on Computer Vision and Pattern Recognition, pp.248–255, 2009.
[22]	R. Llugsi, S.E. Yacoubi, A. Fontaine, et al., “Comparison between Adam, AdaMax and Adam W optimizers to implement a Weather Forecast based on Neural Networks for the Andean city of Quito,” IEEE Fifth Ecuador Technical Chapters Meeting (ETCM), pp.1–6, 2021.
[23]	M. Danelljan, G. Bhat, F.S. Khan, et al., “ECO: Efficient Convolution Operators for Tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6931–6939, 2017.
[24]	H. K. Galoogahi, A. Fagg and S. Lucey, “Learning Background-Aware Correlation Filters for Visual Tracking,” IEEE International Conference on Computer Vision (ICCV), pp.1144–1152, 2017.
[25]	L. Bertinetto, J. Valmadre, S. Golodetz, et al., “Staple: Complementary Learners for Real-Time Tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1401–1409, 2016.
[26]	H. Nam and B. Han, “Learning Multi-domain Convolutional Neural Networks for Visual Tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4293–4302, 2016.
[27]	R. Tao, E. Gavves and A.W M. Smeulders, “Siamese Instance Search for Tracking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1420–1429, 2016.