Volume 32 Issue 1
Jan.  2023
Turn off MathJax
Article Contents
GUAN Qi, SHENG Zihao, XUE Shibei, “HRPose: Real-Time High-Resolution 6D Pose Estimation Network Using Knowledge Distillation,” Chinese Journal of Electronics, vol. 32, no. 1, pp. 189-198, 2023, doi: 10.23919/cje.2021.00.211
Citation: GUAN Qi, SHENG Zihao, XUE Shibei, “HRPose: Real-Time High-Resolution 6D Pose Estimation Network Using Knowledge Distillation,” Chinese Journal of Electronics, vol. 32, no. 1, pp. 189-198, 2023, doi: 10.23919/cje.2021.00.211

HRPose: Real-Time High-Resolution 6D Pose Estimation Network Using Knowledge Distillation

doi: 10.23919/cje.2021.00.211
Funds:  This work was supported by the National Natural Science Foundation of China (61873162, 61973317) and Open Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China (ICT2022B47)
More Information
  • Author Bio:

    Qi GUAN received the B.S. degree in measurement and control from Southeast University, Nanjing, China, in 2019. She is currently pursuing the M.S. degree in control engineering with Shanghai Jiao Tong University, Shanghai, China. Her research interests are 6D pose estimation and real-time application in deep learning. (Email: qiguan@sjtu.edu.cn)

    Zihao SHENG received the B.S. degree in automation from Xi’an Jiaotong University, Xi’an, China, in 2019. He is currently pursuing the M.S. degree in control engineering with Shanghai Jiao Tong University, Shanghai, China. His research interests include intelligent transportation systems, autonomous driving, and intelligent control. (Email: zihaosheng@sjtu.edu.cn)

    Shibei XUE (corresponding author) received the Ph.D. degree in control science and engineering from Tsinghua University, Beijing, China, in 2013. From 2014 to 2016, he was a PostDoctoral Researcher with the University of New South Wales, Canberra, ACT, Australia, and then, he worked as a PostDoctoral Researcher with the Department of Physics, Taiwan Cheng Kung University, Tainan, China. In July 2017, he joined Shanghai Jiao Tong University, Shanghai, China, where he is currently an Associate Professor with the Department of Automation. He was selected for the Shanghai Pujiang Program funded by the Shanghai Science and Technology Committee in 2018. His research interests include quantum control, optimization and intelligent control of complex systems. (Email: shbxue@sjtu.edu.cn)

  • Received Date: 2021-06-16
  • Accepted Date: 2022-03-13
  • Available Online: 2022-07-19
  • Publish Date: 2023-01-05
  • Real-time six degrees-of-freedom (6D) object pose estimation is essential for many real-world applications, such as robotic grasping and augmented reality. To achieve an accurate object pose estimation from RGB images in real-time, we propose an effective and lightweight model, namely high-resolution 6D pose estimation network (HRPose). We adopt the efficient and small HRNetV2-W18 as a feature extractor to reduce computational burdens while generating accurate 6D poses. With only 33% of the model size and lower computational costs, our HRPose achieves comparable performance compared with state-of-the-art models. Moreover, by transferring knowledge from a large model to our proposed HRPose through output and feature-similarity distillations, the performance of our HRPose is improved in effectiveness and efficiency. Numerical experiments on the widely-used benchmark LINEMOD demonstrate the superiority of our proposed HRPose against state-of-the-art methods.
  • loading
  • [1]
    D. Xu, D. Anguelov, and A. Jain, “PointFusion: Deep sensor fusion for 3D bounding box estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.244–253, 2018.
    [2]
    Z. Sheng, S. Xue, Y. Xu, et al., “Real-time queue length estimation with trajectory reconstruction using surveillance data,” in Proceedings of 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV), Shenzhen, China, pp.124–129, 2020.
    [3]
    Z. Sheng, L. Liu, S. Xue, et al., “A cooperation-aware lane change method for autonomous vehicles,” arXiv preprint, arXiv: 2201.10746, 2022.
    [4]
    Z. Sheng, Y. Xu, S. Xue, et al., “Graph-based spatial-temporal convolutional network for vehicle trajectory prediction in autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, early access, DOI: 10.1109/TITS.2022.3155749, 2022.
    [5]
    E. Marchand, H. Uchiyama, and F. Spindler, “Pose estimation for augmented reality: A hands-on survey,” IEEE Trans. on Visualization and Computer Graphics, vol.22, no.12, pp.2633–2651, 2015. doi: 10.1109/TVCG.2015.2513408
    [6]
    Y. Xiang, T. Schmidt, V. Narayanan, et al., “PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes,” in Proceedings of 2018 Robotics: Science and Systems Conference, Pittsburgh, Pennsylvania, USA, arXiv:1711.00199, 2018.
    [7]
    N. Correll, K. E. Bekris, D. Berenson, et al., “Analysis and observations from the first amazon picking challenge,” IEEE Transactions on Automation Science and Engineering, vol.15, no.1, pp.172–188, 2018. doi: 10.1109/TASE.2016.2600527
    [8]
    C. Wang, D. Xu, Y. Zhu, et al., “DenseFusion: 6D object pose estimation by iterative dense fusion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.3343–3352, 2019.
    [9]
    Y. He, W. Sun, H. Huang, et al., “PVN3D: A deep point-wise 3D keypoints voting network for 6DoF pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.11632–11641, 2020.
    [10]
    David G Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the IEEE International Conference on Computer Vision, Kerkyra, Greece, vol.2, pp.1150–1157, 1999.
    [11]
    S. Hinterstoisser, C. Cagniart, S. Ilic, et al., “Gradient response maps for real-time detection of textureless objects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, no.5, pp.876–888, 2011. doi: 10.1109/TPAMI.2011.206
    [12]
    M. Rad and V. Lepetit, “BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth,” in Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp.3848–3856, 2017.
    [13]
    B. Tekin, S. N. Sinha, and P. Fua, “Real-time seamless single shot 6D object pose prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.292–301, 2018.
    [14]
    M. Oberweger, M. Rad, and V. Lepetit, “Making deep heatmaps robust to partial occlusions for 3D object pose estimation,” in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp.119–134, 2018.
    [15]
    S. Peng, Y. Liu, Q. Huang, et al., “PVNet: Pixel-wise voting network for 6DoF pose estimation,” in Proceedings of the Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.4556–4565, 2019.
    [16]
    V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” International Journal of Computer Vision, vol.81, no.2, pp.155–166, 2009. doi: 10.1007/s11263-008-0152-6
    [17]
    J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.6517–6525, 2017.
    [18]
    J. Tremblay, T. To, B. Sundaralingam, et al., “Deep object pose estimation for semantic robotic grasping of household objects,” in Proceedings of the 2nd Conference on Robot Learning, Zurich, Switzerland, pp.306–316, 2018.
    [19]
    G. Du, K. Wang, S. Lian, et al., “Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review,” Artificial Intelligence Review, vol.54, no.3, pp.1677–1734, 2021. doi: 10.1007/s10462-020-09888-5
    [20]
    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint, arXiv: 1503.02531, 2015.
    [21]
    H. Felix, W. M. Rodrigues, D. Macêdo, et al., “Squeezed deep 6DoF object detection using knowledge distillation,” in Proceedings of the International Joint Conference on Neural Networks, Glasgow, UK, pp.1–7, 2020.
    [22]
    J. Wang, K. Sun, T. Cheng, et al., “Deep high-resolution representation learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.43, no.10, pp.3349–3364, 2020. doi: 10.1109/TPAMI.2020.2983686
    [23]
    E. Rublee, V. Rabaud, K. Konolige, et al., “ORB: An efficient alternative to SIFT or SURF,” in Proceedings of 2011 International Conference on Computer Vision, Barcelona, Spain, pp.2564–2571, 2011.
    [24]
    W. Kehl, F. Manhardt, F. Tombari, et al., “SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp.1530–1538, 2017.
    [25]
    S. Zakharov, I. Shugurov, and S. Ilic, “DPOD: 6D pose object detector and refiner,” in Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea (South), pp.1941–1950, 2019.
    [26]
    C. Song, J. Song, and Q. Huang, “HybridPose: 6D object pose estimation under hybrid representations,” in Proceeding of the Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.428–437, 2020.
    [27]
    M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol.24, no.6, pp.381–395, 1981. doi: 10.1145/358669.358692
    [28]
    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proceedings of International Conference on Learning Representations, San Diego, CA, USA, pp.1–14, 2015.
    [29]
    K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.770–778, 2016.
    [30]
    Y. Zhang, D. Li, B. Jin, et al., “Monocular 3D reconstruction of human body,” in Proceedings of 2019 Chinese Control Conference (CCC), Guangzhou, China, pp.7889–7894, 2019.
    [31]
    S. Jia, Z. Gan, Y. Xi, et al., “A deep reinforcement learning bidding algorithm on electricity market,” Journal of Thermal Science, vol.29, no.5, pp.1125–1134, 2020. doi: 10.1007/s11630-020-1308-0
    [32]
    Y. Guan, D. Li, S. Xue, et al., “Feature-fusion-kernel-based gaussian process model for probabilistic long-term load forecasting,” Neurocomputing, vol.426, pp.174–184, 2021. doi: 10.1016/j.neucom.2020.10.043
    [33]
    L. J. Ba and R. Caruana, “Do deep nets really need to be deep?” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, Cambridge, MA, USA, pp.2654–2662, 2014.
    [34]
    A. Romero, N. Ballas, S. E. Kahou, et al., “Fitnets: Hints for thin deep nets,” in Proceedings of International Conference on Learning Representations, San Diego, CA, USA, pp.1–12, 2015.
    [35]
    S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” in Proceedings of International Conference on Learning Representations, Toulon, France, pp.1–13, 2017.
    [36]
    J. Yim, D. Joo, J. Bae, et al., “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.7130–7138, 2017.
    [37]
    Y. Liu, C. Shu, J. Wang, et al., “Structured knowledge distillation for dense prediction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, early access, DOI: 10.1109/TPAMI.2020.3001940, 2020.
    [38]
    S. Hinterstoisser, V. Lepetit, S. Ilic, et al., “Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes,” in Proceedings of 11th Asian Conference on Computer Vision, Daejeon, Korea, pp.548–562, 2012.
    [39]
    J. Xiao, J. Hays, K. A. Ehinger, et al., “Sun database: Large-scale scene recognition from abbey to zoo,” in Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, pp.3485–3492, 2010.
    [40]
    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proceedings of International Conference on Learning Representations, San Diego, CA, USA, pp 1–15, 2015.
    [41]
    E. Brachmann, F. Michel, A. Krull, et al., “Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.3364–3372, 2016.
    [42]
    Z. Li, G. Wang, and X. Ji, “CDPN: Coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation,” in Proceedings of the International Conference on Computer Vision, Seoul, Korea (South), pp.7677–7686, 2019.
    [43]
    G. Wang, F. Manhardt, F. Tombari, et al., “GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA, pp.16611–16621, 2021.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(4)  / Tables(5)

    Article Metrics

    Article views (2794) PDF downloads(64) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return