The Novel Instance Segmentation Method Based on Multi-Level Features and Joint Attention

XU Bowen; LU Yinan; WU Tieru; GUO Xiaoxin

doi:10.23919/cje.2021.00.226

Volume 32 Issue 5

Sep. 2023

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2023 > 32(5): 1160-1168

XU Bowen, LU Yinan, WU Tieru, et al., “The Novel Instance Segmentation Method Based on Multi-Level Features and Joint Attention,” Chinese Journal of Electronics, vol. 32, no. 5, pp. 1160-1168, 2023, doi: 10.23919/cje.2021.00.226

Citation:

XU Bowen, LU Yinan, WU Tieru, et al., “The Novel Instance Segmentation Method Based on Multi-Level Features and Joint Attention,” Chinese Journal of Electronics, vol. 32, no. 5, pp. 1160-1168, 2023, doi: 10.23919/cje.2021.00.226

Citation:

PDF( 4255 KB)

The Novel Instance Segmentation Method Based on Multi-Level Features and Joint Attention

doi: 10.23919/cje.2021.00.226

XU Bowen^1
,,
LU Yinan^1
,,
WU Tieru^2
,,
GUO Xiaoxin^1
,

1.
College of Computer Science and Technology, Jilin University, Changchun 130012, China
2.
College of Mathematics, Jilin University, Changchun 130012, China

Funds: This work was supported by the National Natural Science Foundation of China (61872162, 82071995), the Key Research and Development Program of Jilin Province (20210301001GX, 20220201141GX), and the Natural Science Foundation of Jilin Province (20200201292JC)

More Information

Author Bio:
Bowen XU was born in Shanxi Province, China, in 1994. He received the M.S. degree in computer science from Jilin University. His current research interests include computer vision and image processing. (Email: xubw19@mails.jlu.edu.cn)

Yinan LU (corresponding author) was born in Jilin Province, China, in 1969. She is currently a Professor of Jilin University. Her research interests include image processing, data mining, and bioinformatics. (Email: luyn@jlu.edu.cn)

Tieru WU was born in Jilin Province, China, in 1971. He is currently a Professor of Jilin University. His research interests include computer graphics and machine learning. (Email: wutr@jlu.edu.cn)

Xiaoxin GUO was born in Jilin Province, China, in 1974. He is currently a Professor of Jilin University. His current research interests include computer vision, medical image processing, and deep learning. (Email: guoxx@jlu.edu.cn)
Received Date: 2021-07-08
Accepted Date: 2022-08-16

Available Online: 2022-11-15

Publish Date: 2023-09-05

Abstract

Abstract

Instance segmentation is an important task in computer vision. In order to enhance the multi-level features expression ability of the segmentation networks, a novel module is proposed in this paper. Firstly, we design a weighted bi-directional feature fusion way by computing the weight distribution function of bi-directional feature pyramid network. Secondly, we propose a joint attention mechanism to effectively filter different levels of feature information by adopting serial and parallel ways to combine the channel attention and spatial attention modules. At the same time, the module uses dynamic convolution to stabilize the calculation speed while improve the 6.7% mean average precision of segmentation. The experiments on the COCO dataset demonstrate that the module can effectively improve the performance of the existing instance segmentation networks.
- Instance segmentation,
- Feature fusion,
- Attention mechanism,
- Dynamic convolution,
- Deep neural network

FullText(HTML)

References(33)

References

[1]	K. M. He, G. Gkioxari, P. Dollár, et al., “Mask R-CNN,” in Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, pp.2980–2988, 2017.
[2]	Y. Li, H. Z. Qi, J. F. Dai, et al., “Fully convolutional instance-aware semantic segmentation,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.4438–4446, 2017.
[3]	Z. J. Huang, L. C. Huang, Y. C. Gong, et al., “Mask scoring R-CNN,” in Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.6402–6411, 2019.
[4]	S. Liu, L. Qi, H. F. Qin, et al., “Path aggregation network for instance segmentation,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.8759–8768, 2018.
[5]	D. Bolya, C. Zhou, F. Y. Xiao, et al., “YOLACT: Real-time instance segmentation,” in Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp.9156–9165, 2019.
[6]	X. L. Chen, R. Girshick, K. M. He, et al., “TensorMask: A foundation for dense object segmentation,” in Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp.2061–2069, 2019.
[7]	H. Chen, K. Y. Sun, Z. Tian, et al., “BlendMask: Top-down meets bottom-up for instance segmentation,” in Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.8570–8578, 2020.
[8]	R. F. Zhang, Z. Tain, C. H. Shen, et al., “Mask encoding for single shot instance segmentation,” in Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.10223–10232, 2020.
[9]	A. Newell, Z. A. Huang, and J. Deng, “Associative embedding: End-to-end learning for joint detection and grouping,” in Proceedings of the 31th International Conference on Neural Information Processing Systems, Long Beach, CA, USA, pp.2274–2284, 2017.
[10]	N. Y. Gao, Y. H. Shan, Y. P. Wang, et al., “SSAP: Single-shot instance segmentation with affinity pyramid,” in Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp.642–651, 2019.
[11]	X. L. Wang, T. Kong, C. H. Shen, et al., “SOLO: Segmenting objects by locations,” in Proceedings of 16th European Conference on Computer Vision, Glasgow, UK, pp.649–665, 2020.
[12]	E. Z. Xie, P. Z. Sun, X. G. Song, et al., “PolarMask: Single shot instance segmentation with polar representation,” in Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.12190–12199, 2020.
[13]	K. Sofiiuk, O. Barinova, and A. Konushin, “AdaptIS: Adaptive instance selection network,” in Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp.7354–7362, 2019.
[14]	X. L. Wang, R. F. Zhang, T. Kong, et al., “SOLOv2: Dynamic and fast instance segmentation,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp.17721–17732, 2020.
[15]	T. Y. Lin, P. Dollár, R. Girshick, et al., “Feature pyramid networks for object detection,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.936–944, 2017.
[16]	T. Y. Lin, M. Maire, S. Belongie, et al., “Microsoft COCO: Common objects in context,” in Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, pp.740–755, 2014.
[17]	S. Bell, C. L. Zitnick, K. Bala, et al., “Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.2874–2883, 2016.
[18]	T. Kong, A. B. Yao, Y. R. Chen, et al., “HyperNet: Towards accurate region proposal generation and joint object detection,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.845–853, 2016.
[19]	W. Liu, D. Anguelov, D. Erhan, et al., “SSD: Single shot MultiBox detector,” in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Netherlands, pp.21–37, 2016.
[20]	Z. W. Cai, Q. F. Fan, R. S. Feris, et al., “A unified multi-scale deep convolutional neural network for fast object detection,” in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Netherlands, pp.354–370, 2016.
[21]	G. Huang, Z. Liu, L. Van Der Maaten, et al., “Densely connected convolutional networks,” in Proceedings of 2017 IEEE conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp.226–2269, 2017.
[22]	M. X. Tan, R. M. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.10778–10787, 2020.
[23]	J. Hu, L. Shen, G. Sun, “Squeeze-and-excitation networks,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp.7132–7141, 2018.
[24]	S. Woo, J. Park, J. Y. Lee, et al., “CBAM: Convolutional block attention module,” in Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, pp.3–19, 2018.
[25]	J. Fu, J. Liu, H. J. Tian, et al., “Dual attention network for scene segmentation,” in Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.3141–3149, 2019.
[26]	J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, pp.3431–3440, 2015.
[27]	B. De Brabandere, X. Jia, T. Tuytelaars, et al., “Dynamic filter networks,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp.667–675, 2016.
[28]	B. Yang, G. Bender, Q. V. Le, et al., “CondConv: Conditionally parameterized convolutions for efficient inference,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, article no. 117, 2019.
[29]	Y. P. Chen, X. Y. Dai, M. C. Liu, et al., “Dynamic convolution: Attention over convolution kernels,” in Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.11027–11036, 2020.
[30]	Y. K. Zhang, J. Zhang, Q. Wang, et al., “DyNet: Dynamic convolution for accelerating convolutional neural networks,” arXiv preprint, arXiv: 2004.10694, 2020.
[31]	Y. Lee and J. Park, “CenterMask: Real-time anchor-free instance segmentation,” in Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp.13903–13912, 2020.
[32]	K. M. He, X. Y. Zhang, S. Q. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.770–778, 2016.
[33]	T. Y. Lin, P. Goyal, R. Girshick, et al., “Focal loss for dense object detection,” in Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, pp.2999–3007, 2017.