Turn off MathJax
Article Contents
Jian SU, Fang WANG, and Wei ZHUANG, “An Improved YOLOv7-tiny Algorithm for Vehicle and Pedestrian Detection with Occlusion in Autonomous Driving,” Chinese Journal of Electronics, vol. 34, no. 1, pp. 1–13, 2025 doi: 10.23919/cje.2023.00.256
Citation: Jian SU, Fang WANG, and Wei ZHUANG, “An Improved YOLOv7-tiny Algorithm for Vehicle and Pedestrian Detection with Occlusion in Autonomous Driving,” Chinese Journal of Electronics, vol. 34, no. 1, pp. 1–13, 2025 doi: 10.23919/cje.2023.00.256

An Improved YOLOv7-tiny Algorithm for Vehicle and Pedestrian Detection with Occlusion in Autonomous Driving

doi: 10.23919/cje.2023.00.256
More Information
  • Author Bio:

    Jian SU received the B.S. degree in electronic and information engineering from Hankou University, Hankou, China, and the M.S. degree in electronic circuit and system from Central China Normal University, Wuhan, China. He received the Ph.D. degree in communication and information systems from University of Electronic Science and Technology of China, Chengdu, China, in 2016. He has been an Associate Professor at the School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China, since 2017. He is a Member of IEEE and a Member of ACM. His current research interests include Internet of Things, RFID, and wireless sensors networking. (Email: sj890718@gmail.com)

    Fang WANG received the B.E. degree from Nanjing University of Information Science and Technology, Nanjing, China, in 2022, where she is currently pursuing the M.S. degree at the School of Software. Her research interest is in object detection. (Email: 202212490256@nuist.edu.cn)

    Wei ZHUANG was born in Jiangsu Province, China, in 1980. He received the B.S. and Ph.D. degrees from Southeast University, Nanjing, China, in 2002 and 2009, respectively. In 2008 and 2019, he was a joint Ph.D. candidate at Michigan Technological University, Houghton, MI, USA, and a Visiting Scholar at the University of Washington, Seattle, DC, USA. Since 2009, he has been working as an Associate Professor with the School of Computer Science, Nanjing University of Information Science and Technology, Nanjing, China. He received the Best Paper Award at the IEEE International Conference on Wireless Communications and Signal Processing in 2009. His current research interests include modeling and analysis for integrated energy systems, deep learning for wearable sensor networks. (Email: zw@nuist.edu.cn)

  • Corresponding author: Email: zw@nuist.edu.cn
  • Received Date: 2023-07-01
  • Accepted Date: 2023-12-04
  • Available Online: 2024-02-21
  • Future transportation is advancing in the direction of intelligent transportation systems, where an essential part is vehicle and pedestrian detection. Due to the complex urban traffic environment, vehicles and pedestrians in road monitoring have different forms of occlusion problems, resulting in the missed detection of objects. We design an improved YOLOv7-tiny algorithm for vehicle and pedestrian detection under occlusion, with the following four main improvements. In order to locate the object more accurately, 1 × 1 convolution and identity connection are added to the 3 × 3 convolution, and convolution reparameterization is used to enhance the inference speed of the network model. In view of the complex road background and more interference, the coordinate attention was added to the connection part of backbone and neck to enhance the network’s capacity to detect the object and lessen interference from other targets. At the same time, before being sent to the detection head, global attention mechanism is added to improve the accuracy of model detection by capturing three-dimensional features. Considering the issue of imbalanced training samples, we propose focal CIOU loss instead of CIOU loss to become the bounding box regression loss, so that the regression process attention to high-quality anchor boxes. Experiments show that the improved YOLOv7-tiny algorithm achieves 82.2% map@0.5 in PASCAL VOC dataset, which is 2.8% higher than before the improvement. The performance of map@0.5:0.95 is 5.2% better than the previous improvement. The proposed improved algorithm can availably to detect partial occlusion objects.
  • loading
  • [1]
    C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, pp. 7464–7475, 2023.
    [2]
    R. Girshick, “Fast R-CNN,” in Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1440–1448, 2015.
    [3]
    S. Q. Ren, K. M. He, R. Girshick, et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. doi: 10.1109/TPAMI.2016.2577031
    [4]
    K. M. He, G. Gkioxari, P. Dollár, et al., “Mask R-CNN,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020. doi: 10.1109/TPAMI.2018.2844175
    [5]
    Z. W. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object detection and instance segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1483–1498, 2021. doi: 10.1109/TPAMI.2019.2956516
    [6]
    W. Liu, D. Anguelov, D. Erhan, et al., “SSD: Single shot MultiBox detector,” in Proceedings of the 14th European Conference, Amsterdam, The Netherlands, pp. 21–37, 2016.
    [7]
    T. Y. Lin, P. Goyal, R. Girshick, et al., “Focal loss for dense object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, 2020. doi: 10.1109/TPAMI.2018.2858826
    [8]
    J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint, arXiv: 1804.02767, 2018.
    [9]
    A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv preprint, arXiv: 2004.10934, 2020.
    [10]
    J. F. Wang, Y. Chen, Z. K. Dong, et al., “Improved YOLOv5 network for real-time multi-scale traffic sign detection,” Neural Computing and Applications, vol. 35, no. 10, pp. 7853–7865, 2023. doi: 10.1007/s00521-022-08077-5
    [11]
    R. Girshick, J. Donahue, T. Darrell, et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 580–587, 2014.
    [12]
    J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, et al., “Selective search for object recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154–171, 2013. doi: 10.1007/s11263-013-0620-5
    [13]
    P. Sermanet, D. Eigen, X. Zhang, et al., “OverFeat: Integrated recognition, localization and detection using convolutional networks,” in Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 2014.
    [14]
    T. Y. Lin, P. Dollár, R. Girshick, et al., “Feature pyramid networks for object detection,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 936–944, 2017.
    [15]
    J. Redmon, S. Divvala, R. Girshick, et al., “You only look once: Unified, real-time object detection,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 779–788, 2016.
    [16]
    J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 6517–6525, 2017.
    [17]
    S. Liu, L. Qi, H. F. Qin, et al., “Path aggregation network for instance segmentation,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8759–8768, 2018.
    [18]
    X. R. Dai, “HybridNet: A fast vehicle detection system for autonomous driving,” Signal Processing:Image Communication, vol. 70, pp. 79–88, 2019. doi: 10.1016/j.image.2018.09.002
    [19]
    X. H. Han, J. Chang, and K. Y. Wang, “Real-time object detection based on YOLO-v2 for tiny vehicle object,” Procedia Computer Science, vol. 183, pp. 61–72, 2021. doi: 10.1016/j.procs.2021.02.031
    [20]
    Y. Zhang, Y. P. Sun, Z. Wang, et al., “YOLOv7-RAR for urban vehicle detection,” Sensors, vol. 23, article no. 1801, 2023. doi: 10.3390/s23041801
    [21]
    Y. J. Xue, Z. Y. Ju, Y. M. Li, et al., “MAF-Yolo: Multi-modal attention fusion based YOLO for pedestrian detection,” Infrared Physics & Technology, vol. 118, article no. 103906, 2021. doi: 10.1016/j.infrared.2021.103906
    [22]
    F. Tang, F. Yang, and X. Q. Tian, “Long-distance person detection based on YOLOv7,” Electronics, vol. 12, no. 6, article no. 1502, 2023. doi: 10.3390/electronics12061502
    [23]
    Y. M. Rao, W. L. Zhao, Y. S. Tang, et al., “HorNet: Efficient high-order spatial interactions with recursive gated convolutions,” in Proceedings of the 36th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2022.
    [24]
    C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, “Scaled-YOLOv4: Scaling cross stage partial network,” in Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 13024–13033, 2021.
    [25]
    X. D. Zhang, H. Zeng, S. Guo, et al., “Efficient long-range attention network for image super-resolution,” in Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, pp. 649–667, 2022.
    [26]
    Z. Zhong, L. Zheng, G. L. Kang, et al., “Random erasing data augmentation,” in Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, pp. 13001–13008, 2020.
    [27]
    H. Y. Zhang, M. Cissé, Y. N. Dauphin, et al., “Mixup: Beyond empirical risk minimization,” in Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 2018.
    [28]
    X. H. Ding, X. Y. Zhang, N. N. Ma, et al., “RepVGG: Making VGG-style ConvNets great again,” in Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 13728–13737, 2021.
    [29]
    Q. B. Hou, D. Q. Zhou, and J. S. Feng, “Coordinate attention for efficient mobile network design,” in Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 13708–13717, 2021.
    [30]
    Y. C. Liu, Z. R. Shao, and N. Hoffmann, “Global attention mechanism: Retain information to enhance channel-spatial interactions,” arXiv preprint, arXiv: 2112.05561, 2021.
    [31]
    Z. H. Zheng, P. Wang, D. W. Ren, et al., “Enhancing geometric factors in model learning and inference for object detection and instance segmentation,” IEEE Transactions on Cybernetics, vol. 52, no. 8, pp. 8574–8586, 2022. doi: 10.1109/TCYB.2021.3095305
    [32]
    T. T. Jiang and J. Y. Cheng, “Target recognition based on CNN with LeakyReLU and PReLU activation functions,” in Proceedings of 2019 International Conference on Sensing, Diagnostics, Prognostics, and Control, Beijing, China, pp. 718–722, 2019.
    [33]
    S. Woo, J. Park, J. Y. Lee, et al., “CBAM: Convolutional block attention module,” in Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, pp. 3–19, 2018.
    [34]
    Y. F. Zhang, W. Q. Ren, Z. Zhang, et al., “Focal and efficient IOU loss for accurate bounding box regression,” Neurocomputing, vol. 506, pp. 146–157, 2022. doi: 10.1016/j.neucom.2022.07.042
    [35]
    M. Everingham, L. Van Gool, C. K. I. Williams, et al., “The PASCAL visual object classes (VOC) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. doi: 10.1007/s11263-009-0275-4
    [36]
    M. Everingham, S. M. A. Eslami, L. Van Gool, et al., “The PASCAL visual object classes challenge: A retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, 2015. doi: 10.1007/s11263-014-0733-5
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(5)

    Article Metrics

    Article views (850) PDF downloads(33) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return