LYU Fan, LI Linyan, Victor S. Sheng, et al., “Multi-label Image Classification via Coarse-to-Fine Attention,” Chinese Journal of Electronics, vol. 28, no. 6, pp. 1118-1126, 2019, doi: 10.1049/cje.2019.07.015
Citation: LYU Fan, LI Linyan, Victor S. Sheng, et al., “Multi-label Image Classification via Coarse-to-Fine Attention,” Chinese Journal of Electronics, vol. 28, no. 6, pp. 1118-1126, 2019, doi: 10.1049/cje.2019.07.015

Multi-label Image Classification via Coarse-to-Fine Attention

doi: 10.1049/cje.2019.07.015
Funds:  This work is supported by the National Natural Science Foundation of China (No.61876121, No.61472267, No.61728205, No.61502329, No.61672371), Primary Research & Developement Plan of Jiangsu Province (No.BE2017663), Natural Science Foundation of the Higher Education Institutions of Jiangsu Province (No.19KJB520054), and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou(No.SZS201609, No.SZS201813).
More Information
  • Corresponding author: HU Fuyuan (corresponding author) was a postdoctoral researcher at Vrije Universiteit Brussel,Belgium,a Ph.D.student at Northwestern Polytechnical University,and a visiting Ph.D.student at the City University of Hong Kong.He is a professor at Suzhou University of Science and Technology.His research interests include graphical models,structured learning,and tracking.(Email:fuyuanhu@mail.usts.edu.cn)
  • Received Date: 2018-09-05
  • Rev Recd Date: 2019-07-23
  • Publish Date: 2019-11-10
  • Great efforts have been made by using deep neural networks to recognize multi-label images. Since multi-label image classification is very complicated, many studies seek to use the attention mechanism as a kind of guidance. Conventional attention-based methods always analyzed images directly and aggressively, which is difficult to well understand complicated scenes. We propose a global/local attention method that can recognize a multi-label image from coarse to fine by mimicking how human-beings observe images. Our global/local attention method first concentrates on the whole image, and then focuses on its local specific objects. We also propose a joint max-margin objective function, which enforces that the minimum score of positive labels should be larger than the maximum score of negative labels horizontally and vertically. This function further improve our multi-label image classification method. We evaluate the effectiveness of our method on two popular multi-label image datasets (i.e., Pascal VOC and MS-COCO). Our experimental results show that our method outperforms state-of-the-art methods.
  • loading
  • A. Cocchia, Smart City, Springer, Cham, USA, pp.13-43, 2014.
    K. Su, J. Li and H. Fu, "Smart city and the applications", Proc. of International Conference on Electronics, Communications and Control, Ningbo, China, pp.1028-1031, 2011.
    A. Alhamoud, V. Muradi, D. Bohnstedt, et al., "Activity recognition in multi-user environments using techniques of multi-label classification", Proc. of International Conference on the Internet of Things, Stuttgart, Germany, pp.15-23, 2016.
    G. Sanghi, N. Kanungo, S. Deshmukh, et al., "Automatic multi-label image annotation for smart cities. Proc. of IEEE Region 10 Symposium, Cochin, Kerala, India, pp.1-4, 2017.
    C. Deng, Q. Wu, Q. Wu, et al., "Visual grounding via accumulated attention", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA, pp.7746-7755, 2018.
    J. Deng, W. Dong, R. Socher, et al., "Imagenet:A large-scale hierarchical image database", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Miami, Florida, USA, pp.248-255, 2009.
    K. He, X. Zhang, S. Ren, et al., "Deep residual learning for image recognition", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, pp.770-778, 2016.
    K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", Proc. of International Conference on Learning Representations, San Diego, California, USA, pp.1-14, 2015.
    X.D. Wu, Y.F. Zhao and L. Li, "Multi-label classification in network environments via seed node selection", Acta Electronica Sinica, Vol.44, No.9, pp.2074-2080, 2016.(in Chinese)
    S. Feng, D. Xu, C. Lang, et al., "Automatic image annotation using semi-supervised multi-instance multi-label learning algorithm", Chinese Journal of Electronics, Vol.17, No.4, pp.602-606, 2018.
    Y.S. Cheng, D.W. Zhao, Y.B. Wang, et al., "Multilabel learning of kernel extreme learning machine with non-equilibrium label completion", Acta Electronica Sinica, Vol.47, No.3, pp.719-725, 2019.(in Chinese)
    F. Lyu, F. Hu, V.S. Sheng, et al., "Coarse to fine:Multilabel image classification with global/local attention", Proc. of IEEE International Smart Cities Conference, Kansas, Missouri, USA, pp.1-7, 2018.
    Y. Gong, Y. Jia, T. Leung, A. Toshev, et al., "Deep convolutional ranking for multilabel image annotation", Proc. of International Conference on Learning Representations, Banff, AB,Canada, pp.1-14, 2014.
    A. Krizhevsky, I. Sutskever and G.E. Hinton, "ImageNet classification with deep convolutional neural networks", Proc. of Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pp.1097-1105, 2012.
    A.S. Razavian, H. Azizpour, J. Sullivan, et al., "CNN features off-the-shelf:an astounding baseline for recognition", Proc. of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, Ohiom, USA, pp.806-813, 2014.
    C. Yeh, W. Wu, W. Ko, et al., "Learning deep latent space for multi-label classification", Proc. of AAAI Conference on Artificial Intelligence, San Francisco, USA, pp.2838-2844, 2017.
    J. Jin and H. Nakayama, "Annotation order matters:Recurrent image annotator for arbitrary length image tagging", Proc. of International Conference on Pattern Recognition, Cancn, Mexico, pp.2452-2457, 2016.
    F. Liu, T. Xiang, T.M. Hospedales, et al., "Semantic regularisation for recurrent image annotation", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, pp.2872-2880, 2017.
    J. Wang, Y. Yang, J. Mao, et al., "Cnnrnn:A unified framework for multi-label image classification", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, pp.2285-2294, 2016.
    Z. Ye, F. Lyu, L. Li, et al., "SR-GAN:Semantic rectifying generative adversarial network for zero-shot learning", Proc. of IEEE International Conference on Multimedia and Expo, Shanghai,China, pp.85-90, 2019.
    G. Cheng, D. Gao, Y. Liu, et al., "Multi-scale and discriminative part detectors based features for multi-label image classification", Proc. of International Joint Conferences on Artificial Intelligence, Stockholm, Sweden, pp.649-655, 2018.
    T. Zhou, Z. Li, C. Zhang, et al., "An improved convolutional neural network model with adversarial net for multi-label image classification", Proc. of Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, pp.38-46, 2018.
    D. Bahdanau, K. Cho and Y. Bengio, "Neural machine translation by jointly learning to align and translate", Proc. of International Conference on Learning Representations, San Diego, California, USA, pp.1-14, 2015.
    K. Xu, J. Ba, R. Kiros, et al., "Show, attend and tell:Neural image caption generation with visual attention", Proc. of IEEE International Conference on Machine Learning, Lille, France, pp.2048-2057, 2015.
    Q. You, H. Jin, Z. Wang, C. Fang, et al., "Image captioning with semantic attention", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, pp.4651-4659, 2016.
    Z. Ye, F. Lyu, J. Ren, et al., "DAU-GAN:Unsupervised object transfiguration via deep attention unit", Proc. of International Conference on Brain Inspired Cognitive Systems. Xi'an, China, pp.120-129, 2018.
    V. Ashish, S. Noam, P. Niki, et al., "Attention is all you need", Proc. of Advances in Neural Information Processing Systems, Long Beach, California, USA, pp.5998-6008, 2017.
    J. Cheng, L. Dong and M. Lapata, "Long short-term memorynetworks for machine reading", Proc. of Conference on Empirical Methods in Natural Language Processing, Austin, Texas, USA, pp.551-561, 2016.
    A.P. Parikh, O. Tackstrom, D. Das, et al., "A decomposable attention model for natural language inference", Proc. of Conference on Empirical Methods in Natural Language Processing, Austin, Texas, USA, pp.2249-2255, 2016.
    F. Lyu, Q. Wu, F. Hu, et al., "Attend and imagine:Multilabel image classification with visual attention and recurrent neural networks", IEEE Transactions on Multimedia, Vol.21, No.8, pp.1971-1981, 2019.
    Y. Zhu, J.T. Kwok and Z.H. Zhou, "Multi-label learning with global and local label correlation", IEEE Transactions on Knowledge and Data Engineering, Vol.30, No.6, pp.1081-1094, 2018.
    M.-L. Zhang and Z.-H. Zhou, "A review on multi-label learning algorithms", IEEE Transactions on Knowledge and Data Engineering, Vol.26, No.8, pp.1819-1837, 2014.
    M.R. Boutell, J. Luo, X. Shen, et al., "Learning multilabel scene classification", Pattern recognition, Vol.37, No.9, pp.1757-1771, 2004.
    J. Read, B. Pfahringer, G. Holmes, et al., "Classifier chains for multi-label classification", Proc. of Machine Learning and Knowledge Discovery in Databases, Bled, Slovenia, pp.254-269, 2009.
    J. Read, B. Pfrahringer, G. Holmes, et al., "Classifier chains for multi-label classification", Machine learning, Vol.85, No.3, pp.333-359, 2011.
    A. Clare and R.D. King, "Knowledge discovery in multilabel phenotype data", Proc. of European Conference on Principles of Data Mining and Knowledge Discovery, Freiburg, Germany, pp.42-53, 2001.
    S.M. Darwish, "Combining firefly algorithm and Bayesian classifier:New direction for automatic multilabel image annotation", IET Image Processing, Vol.10, No.10, pp.763-772, 2016.
    Q. You, H. Jin, Z. Wang, C. Fang, et al., "Image captioning with semantic attention", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, pp.4651-4659, 2016.
    Y. Wei, W. Xia, M. Lin, et al., "Hcp:A flexible cnn framework for multi-label image classification", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.38, No.9, pp.1901-1907, 2016.
    Z. Wang, T. Chen, G. Li, et al., "Multi-label image recognition by recurrently discovering attentional regions", Proc. of IEEE International Conference on Computer Vision, Venice, Italy, pp.464-472, 2017.
    F. Zhu, H. Li, W. Ouyang, et al., "Learning spatial regularization with image-level supervisions for multi-label image classification", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, pp.5513-5522, 2017.
    S.F. Chen, Y.C. Chen, C.K. Yeh, et al., "Order-free rnn with visual attention for multi-label classification", Proc. of AAAI Conference on Artificial Intelligence, New Orleans, Los Angeles, USA, pp.6714-6721, 2018.
    J. Lu, J. Yang, D. Batra, et al., "Hierarchical question-image co-attention for visual question answering", Proc. of Advances in Neural Information Processing Systems, Barcelona, Spain, pp.289-297, 2016.
    F. Zhu, H. Li, W. Ouyang, et al., "Learning spatial regularization with image-level supervisions for multilabel image classification", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii,USA,pp.2027-2036, 2017
    Z. Pan, P. Jin, J. Lei, et al., "Fast reference frame selection based on content similarity for low complexity hevc encoder", Journal of Visual Communication and Image Representation, Vol.40, pp.516-524, 2016.
    S. Hochreiter and J. Schmidhuber, "Long short-term memory", Neural Computation, Vol.9, No.8, pp.1735-1780, 1997.
    Z. Yang, Y. Yuan, Y. Wu, et al., "Review networks for caption generation", Proc. of Advances in Neural Information Processing Systems, Barcelona, Spain, pp.2361-2369, 2016.
    M. Everingham, L.V. Gool, C.K. Williams, et al., "The pascal visual object classes (voc) challenge", International Journal of Computer Vision, Vol.88, No.2, pp.303-338, 2010.
    T.-Y. Lin, M. Maire, S. Belongie, et al., "Microsoft coco:common objects in context", Proc. of European Conference on Computer Vision, Zurich, Switzerland, pp.740-755, 2014.
    H. Harzallah, F. Jurie and C. Schmid, "Combining efficient object localization and image classification", Proc. of IEEE International Conference on Computer Vision, Kyoto, Japan, pp.237-244, 2009.
    F. Perronnin, J. Sanchez and T. Mensink, "Improving the fisher kernel for large-scale image classification", Proc. of European Conference on Computer Vision, Crete, Greece, pp.143-156, 2010.
    T. Jaakkola and D. Haussler, "Exploiting generative models in discriminative classifiers", Proc. of Advances in Neural Information Processing Systems, Denver, Colorado, USA, pp.487-493, 1999.
    B. Gu and V.S. Sheng, "A solution path algorithm for a general parametric quadratic programming problem", IEEE Transactions on Neural Networks and Learning Systems, Vol.28, No.5, pp.1241-1248, 2017.
    B. Gu, V.S. Sheng, K.Y. Tay, et al., "Incremental support vector learning for ordinal regression", IEEE Transactions on Neural Networks and Learning Systems, Vol.26, No.7, pp.1403-1416, 2015.
    B. Gu, X. Sun and V.S. Sheng, "Structural minimax probability machine", IEEE Transactions on Neural Networks and Learning Systems, Vol.28, No.7, pp.646-1656, 2017.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (573) PDF downloads(184) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return