Citation: | Jian SU, Fang WANG, and Wei ZHUANG, “An Improved YOLOv7-tiny Algorithm for Vehicle and Pedestrian Detection with Occlusion in Autonomous Driving,” Chinese Journal of Electronics, vol. 34, no. 1, pp. 1–13, 2025 doi: 10.23919/cje.2023.00.256 |
[1] |
C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, pp. 7464–7475, 2023.
|
[2] |
R. Girshick, “Fast R-CNN,” in Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1440–1448, 2015.
|
[3] |
S. Q. Ren, K. M. He, R. Girshick, et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. doi: 10.1109/TPAMI.2016.2577031
|
[4] |
K. M. He, G. Gkioxari, P. Dollár, et al., “Mask R-CNN,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020. doi: 10.1109/TPAMI.2018.2844175
|
[5] |
Z. W. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object detection and instance segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1483–1498, 2021. doi: 10.1109/TPAMI.2019.2956516
|
[6] |
W. Liu, D. Anguelov, D. Erhan, et al., “SSD: Single shot MultiBox detector,” in Proceedings of the 14th European Conference, Amsterdam, The Netherlands, pp. 21–37, 2016.
|
[7] |
T. Y. Lin, P. Goyal, R. Girshick, et al., “Focal loss for dense object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, 2020. doi: 10.1109/TPAMI.2018.2858826
|
[8] |
J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint, arXiv: 1804.02767, 2018.
|
[9] |
A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv preprint, arXiv: 2004.10934, 2020.
|
[10] |
J. F. Wang, Y. Chen, Z. K. Dong, et al., “Improved YOLOv5 network for real-time multi-scale traffic sign detection,” Neural Computing and Applications, vol. 35, no. 10, pp. 7853–7865, 2023. doi: 10.1007/s00521-022-08077-5
|
[11] |
R. Girshick, J. Donahue, T. Darrell, et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 580–587, 2014.
|
[12] |
J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, et al., “Selective search for object recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154–171, 2013. doi: 10.1007/s11263-013-0620-5
|
[13] |
P. Sermanet, D. Eigen, X. Zhang, et al., “OverFeat: Integrated recognition, localization and detection using convolutional networks,” in Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 2014.
|
[14] |
T. Y. Lin, P. Dollár, R. Girshick, et al., “Feature pyramid networks for object detection,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 936–944, 2017.
|
[15] |
J. Redmon, S. Divvala, R. Girshick, et al., “You only look once: Unified, real-time object detection,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 779–788, 2016.
|
[16] |
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 6517–6525, 2017.
|
[17] |
S. Liu, L. Qi, H. F. Qin, et al., “Path aggregation network for instance segmentation,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8759–8768, 2018.
|
[18] |
X. R. Dai, “HybridNet: A fast vehicle detection system for autonomous driving,” Signal Processing:Image Communication, vol. 70, pp. 79–88, 2019. doi: 10.1016/j.image.2018.09.002
|
[19] |
X. H. Han, J. Chang, and K. Y. Wang, “Real-time object detection based on YOLO-v2 for tiny vehicle object,” Procedia Computer Science, vol. 183, pp. 61–72, 2021. doi: 10.1016/j.procs.2021.02.031
|
[20] |
Y. Zhang, Y. P. Sun, Z. Wang, et al., “YOLOv7-RAR for urban vehicle detection,” Sensors, vol. 23, article no. 1801, 2023. doi: 10.3390/s23041801
|
[21] |
Y. J. Xue, Z. Y. Ju, Y. M. Li, et al., “MAF-Yolo: Multi-modal attention fusion based YOLO for pedestrian detection,” Infrared Physics & Technology, vol. 118, article no. 103906, 2021. doi: 10.1016/j.infrared.2021.103906
|
[22] |
F. Tang, F. Yang, and X. Q. Tian, “Long-distance person detection based on YOLOv7,” Electronics, vol. 12, no. 6, article no. 1502, 2023. doi: 10.3390/electronics12061502
|
[23] |
Y. M. Rao, W. L. Zhao, Y. S. Tang, et al., “HorNet: Efficient high-order spatial interactions with recursive gated convolutions,” in Proceedings of the 36th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2022.
|
[24] |
C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, “Scaled-YOLOv4: Scaling cross stage partial network,” in Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 13024–13033, 2021.
|
[25] |
X. D. Zhang, H. Zeng, S. Guo, et al., “Efficient long-range attention network for image super-resolution,” in Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, pp. 649–667, 2022.
|
[26] |
Z. Zhong, L. Zheng, G. L. Kang, et al., “Random erasing data augmentation,” in Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, pp. 13001–13008, 2020.
|
[27] |
H. Y. Zhang, M. Cissé, Y. N. Dauphin, et al., “Mixup: Beyond empirical risk minimization,” in Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 2018.
|
[28] |
X. H. Ding, X. Y. Zhang, N. N. Ma, et al., “RepVGG: Making VGG-style ConvNets great again,” in Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 13728–13737, 2021.
|
[29] |
Q. B. Hou, D. Q. Zhou, and J. S. Feng, “Coordinate attention for efficient mobile network design,” in Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 13708–13717, 2021.
|
[30] |
Y. C. Liu, Z. R. Shao, and N. Hoffmann, “Global attention mechanism: Retain information to enhance channel-spatial interactions,” arXiv preprint, arXiv: 2112.05561, 2021.
|
[31] |
Z. H. Zheng, P. Wang, D. W. Ren, et al., “Enhancing geometric factors in model learning and inference for object detection and instance segmentation,” IEEE Transactions on Cybernetics, vol. 52, no. 8, pp. 8574–8586, 2022. doi: 10.1109/TCYB.2021.3095305
|
[32] |
T. T. Jiang and J. Y. Cheng, “Target recognition based on CNN with LeakyReLU and PReLU activation functions,” in Proceedings of 2019 International Conference on Sensing, Diagnostics, Prognostics, and Control, Beijing, China, pp. 718–722, 2019.
|
[33] |
S. Woo, J. Park, J. Y. Lee, et al., “CBAM: Convolutional block attention module,” in Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, pp. 3–19, 2018.
|
[34] |
Y. F. Zhang, W. Q. Ren, Z. Zhang, et al., “Focal and efficient IOU loss for accurate bounding box regression,” Neurocomputing, vol. 506, pp. 146–157, 2022. doi: 10.1016/j.neucom.2022.07.042
|
[35] |
M. Everingham, L. Van Gool, C. K. I. Williams, et al., “The PASCAL visual object classes (VOC) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. doi: 10.1007/s11263-009-0275-4
|
[36] |
M. Everingham, S. M. A. Eslami, L. Van Gool, et al., “The PASCAL visual object classes challenge: A retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, 2015. doi: 10.1007/s11263-014-0733-5
|