Citation: | Yutong LI, Miao MA, Shichang LIU, et al., “YOLO-Drone: A Scale-Aware Detector for Drone Vision,” Chinese Journal of Electronics, vol. 33, no. 4, pp. 1034–1045, 2024 doi: 10.23919/cje.2023.00.254 |
[1] |
X. Y. Tian, J. Shao, D. W. Ouyang, et al., “UAV-satellite view synthesis for cross-view geo-localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4804–4815, 2022. doi: 10.1109/TCSVT.2021.3121987
|
[2] |
M. Dai, J. H. Hu, J. D. Zhuang, et al., “A transformer-based feature segmentation and region alignment method for UAV-view geo-localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4376–4389, 2022. doi: 10.1109/TCSVT.2021.3135013
|
[3] |
J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint, arXiv: 1804.02767, 2018.
|
[4] |
T. Y. Lin, P. Goyal, R. Girshick, et al., “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 2999–3007, 2017.
|
[5] |
S. Q. Ren, K. M. He, R. Girshick, et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. doi: 10.1109/TPAMI.2016.2577031
|
[6] |
Z. W. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 6154–6162, 2018.
|
[7] |
Z. Tian, C. H. Shen, H. Chen, et al., “FCOS: Fully convolutional one-stage object detection,” in Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp. 9626–9635, 2019.
|
[8] |
J. Redmon, S. Divvala, R. Girshick, et al., “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.
|
[9] |
H. Law and J. Deng, “CornerNet: Detecting objects as paired keypoints,” in Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, pp. 734–750, 2018.
|
[10] |
K. W. Duan, S. Bai, L. Xie, et al., “CenterNet: Keypoint triplets for object detection,” in Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp. 6568–6577, 2019.
|
[11] |
T. Y. Lin, M. Maire, S. Belongie, et al., “Microsoft COCO: Common objects in context,” in Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, pp. 740–755, 2014.
|
[12] |
M. Everingham, L. V. Gool, C. K. I. Williams, et al., “The PASCAL visual object classes (VOC) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. doi: 10.1007/s11263-009-0275-4
|
[13] |
P. F. Zhu, L. Y. Wen, D. W. Du, et al., “Detection and tracking meet drones challenge,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7380–7399, 2022. doi: 10.1109/TPAMI.2021.3119563
|
[14] |
S. Hong, S. Kang, and D. Cho, “Patch-level augmentation for object detection in aerial images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Korea (South), pp. 127–134, 2019.
|
[15] |
D. W. Du, P. F. Zhu, L. Y. Wen, et al., “VisDrone-SOT2019: The vision meets drone single object tracking challenge results,” in Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Korea (South), pp. 199–212, 2019.
|
[16] |
X. Liang, J. Zhang, L. Zhuo, et al., “Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1758–1770, 2020. doi: 10.1109/TCSVT.2019.2905881
|
[17] |
J. X. Leng, M. J. C. Mo, Y. H. Zhou, et al., “Pareto refocusing for drone view object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1320–1334, 2023. doi: 10.1109/TCSVT.2022.3210207
|
[18] |
J. F. Wan, B. Y. Zhang, Y. Y. Zhao, et al., “VistrongerDet: Stronger visual information for object detection in visDrone images,” in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops, Montreal, BC, Canada, pp. 2820–2829, 2021.
|
[19] |
X. K. Zhu, S. Lyu, X. Wang, et al., “TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios,” in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops, Montreal, BC, Canada, pp. 2778–2788, 2021
|
[20] |
S. Woo, J. Park, J. Y. Lee, et al., “CBAM: Convolutional block attention module,” in Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, pp. 3–9, 2018.
|
[21] |
Z. Liu, H. Z. Mao, C. Y. Wu, et al., “A convNet for the 2020s,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, pp. 11966–11976, 2022.
|
[22] |
Z. Ge, S. T. Liu, F. Wang, et al., “YOLOX: Exceeding YOLO series in 2021,” arXiv preprint, arXiv: 2107.08430, 2021.
|
[23] |
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 6517–6525, 2017.
|
[24] |
T. Y. Lin, P. Dollár, R. Girshick, et al., “Feature pyramid networks for object detection,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944, 2017.
|
[25] |
A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv preprint, arXiv: 2004.10934, 2020.
|
[26] |
Liu S, Qi L, Qin H, et al. “Path aggregation network for instance segmentation” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768, 2018.
|
[27] |
H. R. Wang, Z. X. Wang, M. X. Jia, et al., “Spatial attention for multi-scale feature refinement for object detection,” in Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Korea (South), pp. 64–72, 2019.
|
[28] |
F. Ö Ünel, B. O. Özkalayci, and C. Çiğla, “The power of tiling for small object detection,” in Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, pp. 582–591, 2019.
|
[29] |
C. H. Y. Yang, Z. H. Huang, and N. Y. Wang, “QueryDet: Cascaded sparse query for accelerating high-resolution small object detection,” in Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, pp. 13658–13667, 2022.
|
[30] |
Y. Liu, Z. Y. Lu, J. Li, et al., “Deep image-to-video adaptation and fusion networks for action recognition,” IEEE Transactions on Image Processing, vol. 29 pp. 3168–3182, 2020. doi: 10.1109/TIP.2019.2957930
|
[31] |
N. C. Huang, Q. Jiao, Q. Zhang, et al., “Middle-level feature fusion for lightweight RGB-D salient object detection,” IEEE Transactions on Image Processing, vol. 31 pp. 6621–6634, 2022. doi: 10.1109/TIP.2022.3214092
|
[32] |
X. Y. Dai, Y. P. Chen, B. Xiao, et al., “Dynamic head: Unifying object detection heads with attentions,” in Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 7369–7378, 2021.
|
[33] |
J. W. Wang, C. Xu, W. Yang, et al., “A normalized Gaussian Wasserstein distance for tiny object detection,” arXiv preprint, arXiv: 2110.13389, 2021.
|
[34] |
G. Ghiasi, Y. Cui, A. Srinivas, et al., “Simple copy-paste is a strong data augmentation method for instance segmentation,” in Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 2917–2927, 2021.
|
[35] |
A. Buslaev, V. I. Iglovikov, E. Khvedchenya, et al., “Albumentations: Fast and flexible image augmentations,” Information, vol. 11, no. 2, article no. 125, 2020. doi: 10.3390/info11020125
|
[36] |
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in 7th International Conference on Learning Representations, New Orleans, LA, USA, 2019.
|
[37] |
N. Bodla, B. Singh, R. Chellappa, et al., “Soft-NMS-improving object detection with one line of code,” in Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, pp. 5562–5570, 2017.
|
[38] |
R. Solovyev, W. M. Wang, and T. Gabruseva, “Weighted boxes fusion: Ensembling boxes from different object detection models,” Image and Vision Computing, vol. 107, no. 3, article no. 104117, 2021. doi: 10.1016/j.imavis.2021.104117
|
[39] |
J. R. Zhu, X. D. Wang, Y. Liu, et al., “UavTinyDet: Tiny object detection in UAV scenes,” in Proceedings of the 2022 7th International Conference on Image, Vision and Computing, Xi’an, China, pp. 195–200, 2022.
|