Volume 32 Issue 4
Jul.  2023
Turn off MathJax
Article Contents
ZHANG Zhe, WANG Bilin, YU Zhezhou, et al., “Attention Guided Enhancement Network for Weakly Supervised Semantic Segmentation,” Chinese Journal of Electronics, vol. 32, no. 4, pp. 896-907, 2023, doi: 10.23919/cje.2021.00.230
Citation: ZHANG Zhe, WANG Bilin, YU Zhezhou, et al., “Attention Guided Enhancement Network for Weakly Supervised Semantic Segmentation,” Chinese Journal of Electronics, vol. 32, no. 4, pp. 896-907, 2023, doi: 10.23919/cje.2021.00.230

Attention Guided Enhancement Network for Weakly Supervised Semantic Segmentation

doi: 10.23919/cje.2021.00.230
Funds:  This work was supported by the Development Project of Jilin Province of China (20200801033GH, 2020122328JC), the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (20180520017JH), and the Fundamental Research Funds for the Central Universities, JLU
More Information
  • Author Bio:

    Zhe ZHANG received the M.S. degree in College of Computer Science and Technology from Jilin University, Jilin, China, in 2018, where he is currently pursuing the Ph.D. degree. He is also a member of the Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, China. His main research interests include computer vision, image processing, and deep learning. (Email: zhangzhe18@mails.jlu.edu.cn)

    Bilin WANG received the B.S. degree in computer science and technology from Jilin University, Jilin, China, in 2017, where he is currently pursuing the Ph.D. degree. He is also a member of the Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, China. Her main research interests include application of optimal transport and domain adaptation. (Email: blwang19@mails.jlu.edu.cn)

    Zhezhou YU (corresponding author) received the Ph.D. degree from Jilin University, in 2007. He is currently a Professor with Jilin University. He is also a member of the Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, China. His research interests include computational intelligence, computer vision, image processing and embedded system application. He is a Committee Member of the Undergraduate Electronic Design Competition Organization of Jilin Province, China. (Email: yuzz@jlu.edu.cn)

    Fengzhi ZHAO received the B.E. degree from Jilin Universityin in 2017. He is currently a doctoral candidate in the computational intelligence echelon of Jilin University. His advisor is Dr. YU Zhezhou, a Professor with Jilin University. His research interests include computational intelligence, computer vision, and image segmentation. (Email: zhaofz19@mails.jlu.edu.cn)

  • Received Date: 2021-07-10
  • Accepted Date: 2021-10-21
  • Available Online: 2022-03-05
  • Publish Date: 2023-07-05
  • Weakly supervised semantic segmentation using only image-level labels is critical since it alleviates the need for expensive pixel-level labels. Most cutting-edge methods adopt two-step solutions that learn to produce pseudo-ground-truth using only image-level labels and then train off-the-shelf fully supervised semantic segmentation network with these pseudo labels. Although these methods have made significant progress, they also increase the complexity of the model and training. In this paper, we propose a one-step approach for weakly supervised image semantic segmentation—attention guided enhancement network (AGEN), which produces pseudo-pixel-level labels under the supervision of image-level labels and trains the network to generate segmentation masks in an end-to-end manner. Particularly, we employ class activation maps (CAM) produced by different layers of the classification branch to guide the segmentation branch to learn spatial and semantic information. However, the CAM produced by the lower layer can capture the complete object region but with many noises. Thus, the self-attention module is proposed to enhance object regions adaptively and suppress irrelevant object regions, further boosting the segmentation performance. Experiments on the Pascal VOC 2012 dataset demonstrate that AGEN outperforms alternative state-of-the-art weakly supervised semantic segmentation methods exclusively relying on image-level labels.
  • loading
  • [1]
    L. C. Chen, G. Papandreou, I. Kokkinos, et al., “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, no.4, pp.834–848, 2018. doi: 10.1109/TPAMI.2017.2699184
    [2]
    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, pp.234–241, 2015.
    [3]
    Y. C. Wei, J. S. Feng, X. D. Liang, et al., “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.1568–1576, 2017,.
    [4]
    X. Wang, S. D. You, X. Li, et al., “Weakly-supervised semantic segmentation by iteratively mining common object features,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.1354–1362, 2018.
    [5]
    J. Ahn and S. Kwak, “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.4981–4990, 2018.
    [6]
    T. Y. Zhang, G. S. Lin, J. F. Cai, et al., “Decoupled spatial neural attention for weakly supervised semantic segmentation,” IEEE Transactions on Multimedia, vol.21, no.11, pp.2930–2941, 2019. doi: 10.1109/TMM.2019.2914870
    [7]
    Y. Zeng, Y. Z. Zhuge , H. C. Lu, et al., “Joint learning of saliency detection and weakly supervised semantic segmentation,” in Proceedings of 2019 IEEE International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp.7222–7232, 2019.
    [8]
    Y. C. Wei, H. X. Xiao, H. H. Shi, et al., “Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.7268–7277, 2018.
    [9]
    B. L. Zhou, A. Khosla, A. Lapedriza, et al., “Learning deep features for discriminative localization,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.2921–2929, 2016.
    [10]
    G. Papandreou, L. C. Chen, K. P. Murphy, et al., “Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation,” in Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chlie, pp.1742–1750, 2015.
    [11]
    A. Roy and S. Todorovic, “Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.7282–7291, 2017.
    [12]
    S. Honari, J. Yosinski, P. Vincent, et al., “Recombinator networks: Learning coarse-to-fine feature aggregation,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.5743–5752, 2016.
    [13]
    T. Y. Lin, P. Dollár, R. Girshick, et al., “Feature pyramid networks for object detection,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.936–944, 2017.
    [14]
    M. Everingham, L. Van Gool, C. K. I. Williams, et al., “The pascal visual object classes (VOC) challenge,” International Journal of Computer Vision, vol.88, no.2, pp.303–338, 2010. doi: 10.1007/s11263-009-0275-4
    [15]
    D. Y. Meng and L. N. Sun, “Some new trends of deep learning research,” Chinese Journal of Electronics, vol.28, no.6, pp.1087–1091, 2019. doi: 10.1049/cje.2019.07.011
    [16]
    B. J. Zou, X. Shan, C. Z. Zhu, et al., “Deep learning and its application in diabetic retinopathy screening,” Chinese Journal of Electronics, vol.29, no.6, pp.992–1000, 2020. doi: 10.1049/cje.2020.09.001
    [17]
    J. F. Dai, K. M. He, and J. Sun, “BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chlie, pp.1635–1643, 2015.
    [18]
    D. Lin, J. F. Dai, J. Y. Jia, et al., “ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.3159–3167, 2016.
    [19]
    A. Bearman, O. Russakovsky, V. Ferrari, et al., “What’s the point: Semantic segmentation with point supervision,” in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, pp.549–565, 2016.
    [20]
    A. Kolesnikov and C. H. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, pp.695–711, 2016.
    [21]
    A. Chaudhry, P. K. Dokania, and P. H. S. Torr, “Discovering class-specific pixels for weakly-supervised semantic segmentation,” in Proceedings of the British Machine Vision Conference, London, UK, pp.20.1–20.13, 2017.
    [22]
    K. P. Li, Z. Y. Wu, K. C. Peng, et al., “Tell me where to look: Guided attention inference network,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.9215–9223, 2018.
    [23]
    Q. B. Hou, P. T. Jiang, Y. C. Wei, et al., “Self-erasing network for integral object attention,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Granada, pp.547–557, 2018.
    [24]
    N. Liu and J. W. Han, “DHSNet: Deep hierarchical saliency network for salient object detection,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.678–686, 2016.
    [25]
    J. Lee, E. Kim, S. Lee, et al., “FickleNet: Weakly and semi-supervised semantic image segmentation using stochastic inference,” in Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.5267–5276, 2019.
    [26]
    P. T. Jiang, Q. B. Hou, Y. Cao, et al., “Integral object mining via online attention accumulation,” in Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp.2070–2079, 2019.
    [27]
    Y. D. Wang, J. Zhang, M. N. Kan, et al., Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation,” in Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.12272–12281, 2020.
    [28]
    Z. L. Huang, X. G. Wang, J. S. Wang, et al., “Weakly-supervised semantic segmentation network with deep seeded region growing,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.7014–7023, 2018.
    [29]
    W. Shimoda and K. Yanai, “Self-supervised difference detection for weakly-supervised semantic segmentation,” in Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp.5207–5216, 2019.
    [30]
    J. S. Fan, Z. X. Zhang, T. N. Tan, et al., “CIAN: Cross-image affinity net for weakly supervised semantic segmentation,” in Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, pp.10762–10769, 2020.
    [31]
    X. L. Zhang, Y. C. Wei, J. S. Feng, et al., “Adversarial complementary learning for weakly supervised object localization,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.1325–1334, 2018.
    [32]
    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, pp.1–14, 2015.
    [33]
    Z. F. Wu, C. H. Shen, and A. van der Hengel, “Wider or deeper: Revisiting the ReSnet model for visual recognition,” Pattern Recognition, vol.90, pp.119–133, 2019. doi: 10.1016/j.patcog.2019.01.006
    [34]
    X. L. Wang, R. Girshick, A. Gupta, et al., “Non-local neural networks,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.7794–7803, 2018.
    [35]
    P. Krähenbühl and V. Koltun, “Efficient inference in fully connected CRFS with Gaussian edge potentials,” in Proceedings of the 25th Annual Conference on Neural Information Processing Systems, Granada, Spain, pp.109–117, 2011.
    [36]
    B. Hariharan, P. Arbeláez, L. Bourdev, et al., “Semantic contours from inverse detectors,” in Proceedings of 2011 International Conference on Computer Vision, Barcelona, Spain, pp.991–998, 2011.
    [37]
    N. Ketkar, “Introduction to pytorch,” in Deep Learning with Python, N. Ketkar, Ed. Apress, Berkeley, CA, USA, pp.195–208, 2017.
    [38]
    J. Deng, W. Dong, R. Socher, et al., “ImageNet: A large-scale hierarchical image database,” in Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, USA, pp.248–255, 2009.
    [39]
    B. F. Zhang, J. M. Xiao, Y. C. Wei, et al., “Reliability does matter: An end-to-end weakly supervised semantic segmentation approach,” in Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, pp.12765–12772, 2020.
    [40]
    K. M. He, X. Y. Zhang, S. Q. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.770–778, 2016.
    [41]
    X. J. Qi, Z. Z. Liu, J. P. Shi, et al., “Augmented feedback in semantic segmentation under image level supervision,” in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, pp.90–105, 2016.
    [42]
    S. Hong, D. Yeo, S. Kwak, et al., “Weakly supervised semantic segmentation using web-crawled videos,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.7322–7330, 2017.
    [43]
    Y. C. Wei, X. D. Liang, Y. P. Chen, et al., “STC: A simple to complex framework for weakly-supervised semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, no.11, pp.2314–2320, 2017. doi: 10.1109/TPAMI.2016.2636150
    [44]
    R. C. Fan, Q. B. Hou, M. M. Cheng, et al., “Associating inter-image salient instances for weakly supervised semantic segmentation,” in Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, pp.371–388, 2018.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(5)

    Article Metrics

    Article views (634) PDF downloads(43) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return