A Fine-Grained Object Detection Model for Aerial Images Based on YOLOv5 Deep Neural Network

ZHANG Rui; XIE Cong; DENG Liwei

doi:10.23919/cje.2022.00.044

Volume 32 Issue 1

Jan. 2023

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2023 > 32(1): 51-63

ZHANG Rui, XIE Cong, DENG Liwei, “A Fine-Grained Object Detection Model for Aerial Images Based on YOLOv5 Deep Neural Network,” Chinese Journal of Electronics, vol. 32, no. 1, pp. 51-63, 2023, doi: 10.23919/cje.2022.00.044

Citation:

ZHANG Rui, XIE Cong, DENG Liwei, “A Fine-Grained Object Detection Model for Aerial Images Based on YOLOv5 Deep Neural Network,” Chinese Journal of Electronics, vol. 32, no. 1, pp. 51-63, 2023, doi: 10.23919/cje.2022.00.044

Citation:

PDF( 22092 KB)

A Fine-Grained Object Detection Model for Aerial Images Based on YOLOv5 Deep Neural Network

doi: 10.23919/cje.2022.00.044

1.
Heilongjiang Provincial Key Laboratory of Complex Intelligent System and Integration, School of Automation, Harbin University of Science and Technology, Harbin 150080, China

Funds: This work was supported by the National Science Foundation of Heilongjiang Province (LH2019F024) and the Key R&D Program Guidance Projects of Heilongjiang Province (GZ20210065)

More Information

Author Bio:
Rui ZHANG was born in Harbin, China, in 1970. In 2006, she graduated from Harbin University of Science and Technology, majoring in measurement technology and instrument, and got a Ph.D. degree. In 2011, she completed postdoctoral research in Harbin Institute of Technology. She has been engaged in research work in fields of power quality monitoring, signal processing, target detection, machine learning for a long time. (Email: zr_gh@sina.com)

Cong XIE was born in Sichuan Province, China, in 1996. He received an M.S. degree candidate in electronic information engineering at Harbin University of Science and Technology. His research interests include digital image processing and deep learning. (Email: mx60610@gmail.com)

Liwei DENG (corresponding author) was born in 1983. He received the M.S. degree from Harbin University of Science and Technology, Harbin, China, in 2010, and Ph.D. degree from Harbin Institute of Technology, Harbin, China, in 2014. He is currently a Associate Professor with Harbin University of Science and Technology, Harbin, China. His research interests include control science and engineering, fractional order system, digital imaging processing, and deep learning algorithm. (Email: dengliwei666@hrbust.edu.cn)
Received Date: 2022-03-15
Accepted Date: 2022-06-21

Available Online: 2022-07-18

Publish Date: 2023-01-05

Abstract

Abstract

Many advanced object detection algorithms are mainly based on natural scenes object and rarely dedicated to fine-grained objects. This seriously limits the application of these advanced detection algorithms in remote sensing object detection. How to apply horizontal detection in remote sensing images has important research significance. The mainstream remote sensing object detection algorithms achieve this task by angle regression, but the periodicity of angle leads to very large losses in this regression method, which increases the difficulty of model learning. Circular smooth label (CSL) solved this problem well by transforming the regression of angle into a classification form. YOLOv5 combines many excellent modules and methods in recent years, which greatly improves the detection accuracy of small objects. We use YOLOv5 as a baseline and combine the CSL method to learn the angle of arbitrarily oriented targets, and distinguish the fine-grained between instance classes by adding an attention mechanism module to accomplish the fine-grained target detection task for remote sensing images. Our improved model achieves an average category accuracy of 39.2% on the FAIR1M dataset. Although our method does not achieve satisfactory results, this approach is very efficient and simple, reducing the hardware requirements of the model.
- Fine-grain object detection,
- High-resolution aerial images,
- Oriented object detection,
- YOLOv5

FullText(HTML)

References(39)

References

[1]	G. Cheng, P. Zhou, and J. Han, “Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing image,” IEEE Transactions on Geoscience and Remote Sensing, vol.54, no.12, pp.7405–7415, 2016. doi: 10.1109/TGRS.2016.2601622
[2]	K. Li, G. Wan, G. Cheng, L. Meng, et al., “Object detection in optical remote sensing images: A survey and a new benchmark,” ISPRS Journal of Photogrammetry Remote Sensing, vol.159, pp.296–307, 2020. doi: 10.1016/j.isprsjprs.2019.11.023
[3]	T. Y. Lin, M. Maire, S. Belongie, et al., “Microsoft COCO: Common objects in context,” in Proceedings of European Conference on Computer Vision, Springer, Cham, pp.740–755, 2014.
[4]	M. Everingham, L. Van Gool, C. K. Williams, et al., “The PASCAL visual object classes (VOC) challenge,” International Journal of Computer, vol.88, no.2, pp.303–338, 2010. doi: 10.1007/s11263-009-0275-4
[5]	A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint, arXiv: 2004.10934, 2020.
[6]	K. Duan, S. Bai, L. Xie, et al., “Centernet: Keypoint triplets for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, pp.6568–6577, 2019.
[7]	R. Girshick, J. Donahue, T. Darrell, et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Venice, Italy, pp.580–587, 2014.
[8]	T.Y. Lin, P. Goyal, R. Girshick, et al., “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp.2980–2988, 2017.
[9]	W. Liu, D. Anguelov, D. Erhan, et al., “SSD: Single shot multibox detector,” in Proceedings of the European Conference on Computer Vision, Springer, Cham, pp.21–37, 2016.
[10]	J. Redmon, S. Divvala, R. Girshick, et al., “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.779–788, 2016.
[11]	S. M. Azimi, E. Vig, R. Bahmanyar, et al., “Towards multi-class object detection in unconstrained remote sensing imagery,” in Proceedings of Asian Conference on Computer Vision, Springer, Cham, pp.150–165, 2019.
[12]	G. Zhang, S. Lu, and W. Zhang, “CAD-Net: A context-aware detection network for objects in remote sensing imagery,” IEEE Transactions on Geoscience Remote Sensing, vol.57, no.12, pp.10015–10024, 2019. doi: 10.1109/TGRS.2019.2930982
[13]	J. Han, J. Ding, N. Xue, et al., “ReDet: A rotation-equivariant detector for aerial object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp.2768–2795, 2021.
[14]	X. Yang, J. Yang, J. Yan, et al. “SCRDet: Towards more robust detection for small, cluttered and rotated objects,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, pp.8231–8240, 2019.
[15]	J. Ding, N. Xue, Y. Long, et al., “Learning roi transformer for oriented object detection in aerial images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.2844–2853, 2019.
[16]	J. Han, J. Ding, J. Li, et al., “Align deep features for oriented object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol.60, pp.1–11, 2021. doi: 10.1109/TGRS.2021.3062048
[17]	X. Yang, J. Yan, Z. Feng, et al., “R3Det: Refined single-stage detector with feature refinement for rotating object,” in Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual Event, pp.3163–3171, 2021.
[18]	X. Yang and J. Yan. “Arbitrary-oriented object detection with circular smooth label,” in Proceedings of European Conference on Computer Vision 2020, LNCS, vol.12353, Springer, Cham, pp.677–694, 2020.
[19]	G. S. Xia, X. Bai, J. Ding, et al., “DOTA: A large-scale dataset for object detection in aerial images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.3974–3983, 2018.
[20]	X. Sun, P. Wang, Z. Yan, et al., “FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery,” ISPRS Journal of Photogrammetry Remote Sensing, vol.184, pp.116–130, 2022. doi: 10.1016/j.isprsjprs.2021.12.004
[21]	X. Yang, H. Sun, K. Fu, et al., “Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks,” Remote Sensing, vol.10, no.1, article no.132, 2018. doi: 10.3390/rs10010132
[22]	K. Fu, Z. Chang, Y. Zhang, et al., “Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images,” ISPRS Journal of Photogrammetry Remote Sensing, vol.161, pp.294–308, 2020. doi: 10.1016/j.isprsjprs.2020.01.025
[23]	Z. Liu, H. Wang, L. Weng, et al., “Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds,” IEEE Geoscience Remote Sensing Letters, vol.13, no.8, pp.1074–1078, 2016. doi: 10.1109/LGRS.2016.2565705
[24]	S. Ren, K. He, R. Girshick, et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, no.6, pp.1137–1149, 2017. doi: 10.1109/TPAMI.2016.2577031
[25]	L. Zhou, H. Wei, H. Li, et al., “Objects detection for remote sensing images based on polar coordinates,” arXiv preprint, arXiv: 2001.02988, 2020.
[26]	J. Yi, P. Wu, B. Liu, et al., “Oriented object detection in aerial images with box boundary-aware vectors,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, pp.2149–2158, 2021.
[27]	W. Li, Y. Chen, K. Hu, et al., “Oriented reppoints for aerial object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Louisiana, pp.1829–1838, 2022.
[28]	X. Yang, X. Yang, J. Yang, et al., “Learning high-precision bounding box for rotated object detection via kullback-leibler divergence,” Advances in Neural Information Processing Systems, vol.34, pp.18381–18394, 2021.
[29]	X. Yang, J. Yan, Q. Ming, et al., “Rethinking rotated object detection with Gaussian Wasserstein distance loss,” in Proceedings of the International Conference on Machine Learning, Vienna, Austria, pp.11830–11841, 2021.
[30]	J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.7132–7141, 2018.
[31]	S. Woo, J. Park, J. Y. Lee, et al., “CBAM: Convolutional block attention module,” in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp.3–19, 2018.
[32]	H. Zhao, J. Jia, and V. Koltun, “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.10073–10082, 2020.
[33]	A. Srinivas, T. Y. Lin, N. Parmar, et al., “Bottleneck transformers for visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp.16514–16524, 2021.
[34]	A. F. Agarap, “Deep learning using rectified linear units (ReLU),” arXiv preprint, arXiv: 1803.08375, 2018.
[35]	B. Xu, N. Wang, T. Chen, et al., “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint, arXiv: 1505.00853, 2015.
[36]	D. Misra, “Mish: A self regularized non-monotonic activation function,” arXiv preprint, arXiv: 1908.08681, 2019.
[37]	J. Deng, W. Dong, R. Socher, et al., “A large-scale hierarchical image database,” in Proceedings of IEEE Computer Vision and Pattern Recognition, Miami, FL, USA, pp.248–255, 2009.
[38]	M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in Proceedings of the International Conference on Machine Learning, Long Beach, California, USA, pp.6105–6114, 2019.
[39]	N. Ma, X. Zhang, M. Liu, et al., “Activate or not: Learning customized activation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp.8028–8038, 2021.