Volume 29 Issue 6
Dec.  2020
Turn off MathJax
Article Contents
CAI Guoyong, LYU Guangrui, LIN Yuming, et al., “Multi-level Deep Correlative Networks for Multi-modal Sentiment Analysis,” Chinese Journal of Electronics, vol. 29, no. 6, pp. 1025-1038, 2020, doi: 10.1049/cje.2020.09.003
Citation: CAI Guoyong, LYU Guangrui, LIN Yuming, et al., “Multi-level Deep Correlative Networks for Multi-modal Sentiment Analysis,” Chinese Journal of Electronics, vol. 29, no. 6, pp. 1025-1038, 2020, doi: 10.1049/cje.2020.09.003

Multi-level Deep Correlative Networks for Multi-modal Sentiment Analysis

doi: 10.1049/cje.2020.09.003
Funds:  This work is supported by the National Natural Science Foundation of China (No.61763007), the Science and Technology Major Project of Guangxi (No.AA19046004, No.AA18118039), and the Natural Science Foundation of Guangxi Province (No.2017JJD160017).
  • Received Date: 2019-07-24
  • Publish Date: 2020-12-25
  • Multi-modal sentiment analysis (MSA) is increasingly becoming a hotspot because it extends the conventional Sentiment analysis (SA) based on texts to multi-modal content which can provide richer affective information. However, compared with textbased sentiment analysis, multi-modal sentiment analysis has much more challenges, because the joint learning process on multi-modal data requires both fine-grained semantic matching and effective heterogeneous feature fusion. Existing approaches generally infer sentiment type from splicing features extracted from different modalities but neglect the strong semantic correlation among cooccurrence data of different modalities. To solve the challenges, a multi-level deep correlative network for multimodal sentiment analysis is proposed, which can reduce the semantic gap by analyzing simultaneously the middlelevel semantic features of images and the hierarchical deep correlations. First, the most relevant cross-modal feature representation is generated with Multi-modal Deep and discriminative correlation analysis (Multi-DDCA) while keeping those respective modal feature representations to be discriminative. Second, the high-level semantic outputs from multi-modal deep and discriminative correlation analysis are encoded into attention-correlation cross-modal feature representation through a co-attention-based multimodal correlation submodel, and then they are further merged by multi-layer neural network to train a sentiment classifier for predicting sentimental categories. Extensive experimental results on five datasets demonstrate the effectiveness of the designed approach, which outperforms several state-of-the-art fusion strategies for sentiment analysis.
  • loading
  • M. Wang, D. Cao, L. Li, et al., "Microblog sentiment analysis based on cross-media bag-of-words model", Proceedings of International Conference on Internet Multimedia Computing and Service, ACM, New York, America, pp.76-80, 2014.
    D. Cao, R. Ji, D. Lin, et al., "A cross-media public sentiment analysis system for microblog", Multimedia Systems, Vol.22, No.4, pp.479-486, 2016.
    M. Katsurai and S. Satoh, "Image sentiment analysis using latent correlations among visual, textual, and sentiment views", Proceedings of Acoustics, Speech and Signal Processing, ICCASP, Piscataway, NJ,America, pp.2837-2841,2016.
    G. Cai and B. Xia, "Convolutional neural networks for multimedia sentiment analysis", CCF Int. Conf. Natural Language Processing and Chinese Computing, Springer, pp.159-167, 2015.
    Y. Yu, H. Lin, J. Meng, et al., "Visual and textual sentiment analysis of a microblog using deep convolutional neural networks", Algorithms, Vol.6, No.2, Page 41, 2016.
    C. Baecchi, T. Uricchio, M. Bertini, et al., "A multimodal feature learning approach for sentiment analysis of social network multimedia", Multimedia Tools and Applications, Vol.75, No.5, pp.2507-2525, 2016.
    Q. You, J. Luo, H. Jin, et al., "Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia", Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, ACM, New York, America, pp.13-22,2016.
    N. Xu and W. Mao, "A residual merged neutral network for multimodal sentiment analysis", Proceeding of International Conference on Big Data Analysis, IEEE, Piscataway, NJ, America, pp.6-10, 2017.
    J. Bollen, H. Mao, and A. Pepe, " Modeling public mood and emotion:Twitter sentiment and socio-economic phenomena", Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, pp.1-10, 2011.
    X. Hu, J. Tang, H. Gao, et al, "Unsupervised sentiment analysis with emotional signals", Proceedings of the 22nd International Conference on World Wide Web, ACM, New York, America, pp.607-618, 2013.
    T. Mikolov, I. Sutskever, K. Chen, et al, "Distributed representations of words and phrases and their compositionality", Proceeding of the International Conference on Neural Information Processing Systems, pp.3111-3119, 2013.
    Q. Le and T. Mikolov, "Distributed representations of sentences and documents", Proceeding of International Conference on Machine Learning, ICML, Atlant, GA, America, pp.1188-1196, 2014.
    C. Dos Santos and M. Gatti, "Deep convolutional neural networks for sentiment analysis of short texts", Proceedings of the 25th International Conference on Computational Linguistics, ACL, Stroudsburg, PA, America, pp.69-78,2014.
    Y. Zhang, Y. Jiang and Y. Tong, "Study of sentiment classification for chinese microblog based on recurrent neural network", Chinese Journal of Electronics, Vol.25, No.4, pp.601-607, 2016.
    Y. Zhang, J. Zheng, Y. Jiang, et al., "A text sentiment classification modeling method based on coordinated CNNLSTM-attention model", Chinese Journal of Electronics, Vol.28, No.1, pp.120-126, 2019.
    S. Poria, E. Cambria and A. Gelbukh, "Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis", Proceedings of Conference on Empirical Methods in Natural Language Processing, ACL, Stroudsburg, PA, America, pp.2539-2544, 2015.
    S. Siersdorfer, E. Minack, F. Deng, et al., "Analyzing and predicting sentiment of images on the social web", Proceedings of the 18th ACM International Conference on Multimedia, ACM, New York, America, pp.715-718, 2010.
    D. Borth, R. Ji, T. Chen, et al., "Large-scale visual sentiment ontology and detectors using adjective noun pairs", Proceedings of ACM International Conference on Multimedia, ACM, New York, America, pp.223-232, 2013.
    J. Yuan, S. Mcdonough, Q. You, et al., "Sentribute:Image sentiment analysis from a mid-level perspective", Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, ACM, New York, America, pp.100-109, 2013.
    Q. You, J. Yang, J. Yang, et al., "Robust image sentiment analysis using progressively trained and domain transferred deep networks", Proceedings of Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI, MenloPark, America, pp.381-388, 2015.
    V. Campos, A. Salvador, X. Giro-I-Nieto, et al., "Diving deep into sentiment:Understanding fine-tuned CNNs for visual sentiment prediction", In Proceeding of the 1st International Workshop on Affect & Sentiment in Multimedia, ACM, New York, America, pp.57-62, 2015.
    V. Campos, B. Jou, and X. Giro-i-Nieto, "From pixels to sentiment:Fine-tuning CNNS for visual sentiment prediction", Image and Vision Computing, Vol.65, pp.15-22, 2017.
    J. Yang, D. She, M. Sun, et al., "Visual sentiment prediction based on automatic discovery of affective regions", IEEE Transactions on Multimedia, Vol.20, No.9, pp.2513-2525, 2018.
    D. She, J. Yang, M. M. Cheng, et al., "WSCNet:Weakly supervised coupled networks for visual sentiment classification and detection", IEEE Transactions on Multimedia, DOI:10.1109/TMM.2019.2939744, 2019.
    D. Bahdanau, K. Cho and Y. Bengio, "Neural machine translation by jointly learning to align and translate", Computer Science, arXiv:1409.0473, 2014.
    Z. Yang, X. He, J. Gao, et al., "Stacked attention networks for image question answering", Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, Piscataway, NJ, America, pp.21-29, 2016.
    O. Vinyals, A. Toshev, S. Bengio, et al., "Show and tell:A neural image caption generator", Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Piscataway, NJ, America, pp.3156-3164, 2015.
    Q. T. Truong and H. W. Lauw, "VistaNet:Visual aspect attention network for multimodal sentiment analysis", Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, MenloPark, America, pp.305-312, 2019.
    K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, pp.1-15, 2015.
    B. Jou, T. Chen, N. Pappas, et al., "Visual affect around the world:A large-scale multilingual visual sentiment ontology", Proceedings of ACM International Conference on Multimedia, ACM, New York, America, pp.159-168, 2015.
    Z. Li, Y. Fan, W. Liu, et al., "Image sentiment prediction based on textual descriptions with adjective noun pairs", Multimedia Tools and Applications, Vol.77, No.1, pp.1115-1132, 2018.
    G. Andrew, R. Arora, J. Bilmes, et al., "Deep canonical correlation analysis", Proceedings of International Conference on Machine Learning, ICML, Atlant, GA, America, pp.1247-1255, 2013.
    M. Dorfer, R. Kelz and G. Widmer, "Deep linear discriminant analysis", arXiv preprint arXiv:1511.04707, 2015.
    S. Ruder, "An overview of gradient descent optimization algorithms", arXiv preprint arXiv:1609.04747, 2016.
    J. Islam and Y. Zhang, "Visual sentiment analysis for social images using transfer learning approach", Proceedings of IEEE International Conferences on Big Data and Cloud Computing, Piscataway, NJ, America, pp.124-130, 2016.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (845) PDF downloads(251) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return