Citation: | ZHANG Rui, XIE Cong, DENG Liwei. A Fine-Grained Object Detection Model for Aerial Images Based on YOLOv5 Deep Neural Network[J]. Chinese Journal of Electronics, 2023, 32(1): 51-63. doi: 10.23919/cje.2022.00.044 |
[1] |
G. Cheng, P. Zhou, and J. Han, “Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing image,” IEEE Transactions on Geoscience and Remote Sensing, vol.54, no.12, pp.7405–7415, 2016. doi: 10.1109/TGRS.2016.2601622
|
[2] |
K. Li, G. Wan, G. Cheng, L. Meng, et al., “Object detection in optical remote sensing images: A survey and a new benchmark,” ISPRS Journal of Photogrammetry Remote Sensing, vol.159, pp.296–307, 2020. doi: 10.1016/j.isprsjprs.2019.11.023
|
[3] |
T. Y. Lin, M. Maire, S. Belongie, et al., “Microsoft COCO: Common objects in context,” in Proceedings of European Conference on Computer Vision, Springer, Cham, pp.740–755, 2014.
|
[4] |
M. Everingham, L. Van Gool, C. K. Williams, et al., “The PASCAL visual object classes (VOC) challenge,” International Journal of Computer, vol.88, no.2, pp.303–338, 2010. doi: 10.1007/s11263-009-0275-4
|
[5] |
A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint, arXiv: 2004.10934, 2020.
|
[6] |
K. Duan, S. Bai, L. Xie, et al., “Centernet: Keypoint triplets for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, pp.6568–6577, 2019.
|
[7] |
R. Girshick, J. Donahue, T. Darrell, et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Venice, Italy, pp.580–587, 2014.
|
[8] |
T.Y. Lin, P. Goyal, R. Girshick, et al., “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp.2980–2988, 2017.
|
[9] |
W. Liu, D. Anguelov, D. Erhan, et al., “SSD: Single shot multibox detector,” in Proceedings of the European Conference on Computer Vision, Springer, Cham, pp.21–37, 2016.
|
[10] |
J. Redmon, S. Divvala, R. Girshick, et al., “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.779–788, 2016.
|
[11] |
S. M. Azimi, E. Vig, R. Bahmanyar, et al., “Towards multi-class object detection in unconstrained remote sensing imagery,” in Proceedings of Asian Conference on Computer Vision, Springer, Cham, pp.150–165, 2019.
|
[12] |
G. Zhang, S. Lu, and W. Zhang, “CAD-Net: A context-aware detection network for objects in remote sensing imagery,” IEEE Transactions on Geoscience Remote Sensing, vol.57, no.12, pp.10015–10024, 2019. doi: 10.1109/TGRS.2019.2930982
|
[13] |
J. Han, J. Ding, N. Xue, et al., “ReDet: A rotation-equivariant detector for aerial object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp.2768–2795, 2021.
|
[14] |
X. Yang, J. Yang, J. Yan, et al. “SCRDet: Towards more robust detection for small, cluttered and rotated objects,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, pp.8231–8240, 2019.
|
[15] |
J. Ding, N. Xue, Y. Long, et al., “Learning roi transformer for oriented object detection in aerial images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.2844–2853, 2019.
|
[16] |
J. Han, J. Ding, J. Li, et al., “Align deep features for oriented object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol.60, pp.1–11, 2021. doi: 10.1109/TGRS.2021.3062048
|
[17] |
X. Yang, J. Yan, Z. Feng, et al., “R3Det: Refined single-stage detector with feature refinement for rotating object,” in Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual Event, pp.3163–3171, 2021.
|
[18] |
X. Yang and J. Yan. “Arbitrary-oriented object detection with circular smooth label,” in Proceedings of European Conference on Computer Vision 2020, LNCS, vol.12353, Springer, Cham, pp.677–694, 2020.
|
[19] |
G. S. Xia, X. Bai, J. Ding, et al., “DOTA: A large-scale dataset for object detection in aerial images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.3974–3983, 2018.
|
[20] |
X. Sun, P. Wang, Z. Yan, et al., “FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery,” ISPRS Journal of Photogrammetry Remote Sensing, vol.184, pp.116–130, 2022. doi: 10.1016/j.isprsjprs.2021.12.004
|
[21] |
X. Yang, H. Sun, K. Fu, et al., “Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks,” Remote Sensing, vol.10, no.1, article no.132, 2018. doi: 10.3390/rs10010132
|
[22] |
K. Fu, Z. Chang, Y. Zhang, et al., “Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images,” ISPRS Journal of Photogrammetry Remote Sensing, vol.161, pp.294–308, 2020. doi: 10.1016/j.isprsjprs.2020.01.025
|
[23] |
Z. Liu, H. Wang, L. Weng, et al., “Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds,” IEEE Geoscience Remote Sensing Letters, vol.13, no.8, pp.1074–1078, 2016. doi: 10.1109/LGRS.2016.2565705
|
[24] |
S. Ren, K. He, R. Girshick, et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, no.6, pp.1137–1149, 2017. doi: 10.1109/TPAMI.2016.2577031
|
[25] |
L. Zhou, H. Wei, H. Li, et al., “Objects detection for remote sensing images based on polar coordinates,” arXiv preprint, arXiv: 2001.02988, 2020.
|
[26] |
J. Yi, P. Wu, B. Liu, et al., “Oriented object detection in aerial images with box boundary-aware vectors,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, pp.2149–2158, 2021.
|
[27] |
W. Li, Y. Chen, K. Hu, et al., “Oriented reppoints for aerial object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Louisiana, pp.1829–1838, 2022.
|
[28] |
X. Yang, X. Yang, J. Yang, et al., “Learning high-precision bounding box for rotated object detection via kullback-leibler divergence,” Advances in Neural Information Processing Systems, vol.34, pp.18381–18394, 2021.
|
[29] |
X. Yang, J. Yan, Q. Ming, et al., “Rethinking rotated object detection with Gaussian Wasserstein distance loss,” in Proceedings of the International Conference on Machine Learning, Vienna, Austria, pp.11830–11841, 2021.
|
[30] |
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.7132–7141, 2018.
|
[31] |
S. Woo, J. Park, J. Y. Lee, et al., “CBAM: Convolutional block attention module,” in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp.3–19, 2018.
|
[32] |
H. Zhao, J. Jia, and V. Koltun, “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.10073–10082, 2020.
|
[33] |
A. Srinivas, T. Y. Lin, N. Parmar, et al., “Bottleneck transformers for visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp.16514–16524, 2021.
|
[34] |
A. F. Agarap, “Deep learning using rectified linear units (ReLU),” arXiv preprint, arXiv: 1803.08375, 2018.
|
[35] |
B. Xu, N. Wang, T. Chen, et al., “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint, arXiv: 1505.00853, 2015.
|
[36] |
D. Misra, “Mish: A self regularized non-monotonic activation function,” arXiv preprint, arXiv: 1908.08681, 2019.
|
[37] |
J. Deng, W. Dong, R. Socher, et al., “A large-scale hierarchical image database,” in Proceedings of IEEE Computer Vision and Pattern Recognition, Miami, FL, USA, pp.248–255, 2009.
|
[38] |
M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in Proceedings of the International Conference on Machine Learning, Long Beach, California, USA, pp.6105–6114, 2019.
|
[39] |
N. Ma, X. Zhang, M. Liu, et al., “Activate or not: Learning customized activation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp.8028–8038, 2021.
|