Volume 32 Issue 3
May  2023
Turn off MathJax
Article Contents
TANG Huanling, ZHU Hui, WEI Hongmin, et al., “Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model,” Chinese Journal of Electronics, vol. 32, no. 3, pp. 647-654, 2023, doi: 10.23919/cje.2021.00.113
Citation: TANG Huanling, ZHU Hui, WEI Hongmin, et al., “Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model,” Chinese Journal of Electronics, vol. 32, no. 3, pp. 647-654, 2023, doi: 10.23919/cje.2021.00.113

Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model

doi: 10.23919/cje.2021.00.113
Funds:  This work was supported by the National Natural Science Foundation of China (61976124, 61976125, 61773244, 61772319, 61772250)
More Information
  • Author Bio:

    Huanling TANG received the B.S. degree from Yantai University in 1993, the M.S. degree in computer science and technology from Tsinghua University in 2004, and the Ph.D. degree from Dalian Maritime University in 2009. Her research interests include machine learning, artificial intelligence, and data mining. (Email: thl01@163.com)

    Hui ZHU received the B.S. degree from Tongda College of Nanjing University of Posts and Telecommunications. He is a postgraduate student of Shandong Technology and Business University. His research interests include machine learning, artificial intelligence, and data mining. (Email: 1501573182@qq.com)

    Mingyu LU received the M.S. and Ph.D. degrees in computer science and technology from Tsinghua University in 1988 and 2002, respectively. Now he is a Professor at Dalian Maritime University. His research interests include machine learning, artificial intelligence, and data mining. (Email: lumingyu@dlmu.edu.cn)

    Jin GUO (corresponding author) received the M.S. degree from Dongbei University of Finance & Economics in 2003. Now she is an Associate Professor in School of Computer and Information Technology at Liaoning Normal University. Her major fields is intelligent information processing. (Email: guojinsky@163.com)

  • Received Date: 2021-03-31
  • Accepted Date: 2021-06-07
  • Available Online: 2021-09-23
  • Publish Date: 2023-05-05
  • To solve the problem of semantic loss in text representation, this paper proposes a new embedding method of word representation in semantic space called wt2svec based on supervised latent Dirichlet allocation (SLDA) and Word2vec. It generates the global topic embedding word vector utilizing SLDA which can discover the global semantic information through the latent topics on the whole document set. It gets the local semantic embedding word vector based on the Word2vec. The new semantic word vector is obtained by combining the global semantic information with the local semantic information. Additionally, the document semantic vector named doc2svec is generated. The experimental results on different datasets show that wt2svec model can obviously promote the accuracy of the semantic similarity of words, and improve the performance of text categorization compared with Word2vec.
  • loading
  • [1]
    W. Y. Dai, G. R. Xue, Q. Yang, et al., “Transferring naive Bayes classifiers for text classification,” in Proceedings of the 22nd Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, Vancouver, BC, Canada, pp.540–545, 2007.
    [2]
    C. Xing, D. Wang, X. W. Zhang, et al., “Document classification with distributions of word vectors,” 2014 Asia-pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Siem Reap, Cambodia, pp.1–5, 2014.
    [3]
    C. L. Li, H. R. Wang, Z. Q. Zhang, et al., “Topic modeling for short texts with auxiliary word embeddings,” in Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, pp.165–174, 2016.
    [4]
    X. F. He, L. Chen, G. C. Chen, et al., “A LDA topic model based collection selection method for distributed information retrieval,” Journal of Chinese Information Processing, vol.31, no.3, pp.125–133, 2017. (in Chinese)
    [5]
    G. B. Yang, “A novel contextual topic model for query-focused multi-document summarization,” in Proceedings of the 26th International Conference on Tools with Artificial Intelligence, Limassol, Cyprus, pp.576–583, 2014.
    [6]
    M. Tang, L. Zhu, and X. C. Zou, “Document vector representation based on Word2vec,” Computer Science, vol.43, no.6, pp.214–217, 269, 2016. (in Chinese) doi: 10.11896/j.issn.1002-137X.2016.06.043
    [7]
    Y. F. He and M. H. Jiang, “Information bottleneck based feature selection in web text categorization,” Journal of Tsinghua University (Science and Technology), vol.50, no.1, pp.45–48, 53, 2010. (in Chinese) doi: 10.16511/j.cnki.qhdxxb.2010.01.027
    [8]
    D. Q. Nguyen, R. Billingsley, L. Du, et al., “Improving topic models with latent feature word representations,” Transactions of the Association for Computational Linguistics, vol.3, pp.299–313, 2015. doi: 10.1162/tacl_a_00140
    [9]
    H. L. Tang, H. Zheng, Y. H. Liu, et al., “Tr-SLDA: a transfer topic model for cross-domains,” Acta Electronica Sinica, vol.49, no.3, pp.605–613, 2021. (in Chinese) doi: 10.12263/DZXB.20200210
    [10]
    Z. S. Harris, “Distributional structure,” WORD, vol.10, no.2-3, pp.146–162, 1954. doi: 10.1080/00437956.1954.11659520
    [11]
    G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Communications of the ACM, vol.18, no.11, pp.613–620, 1975. doi: 10.1145/361219.361220
    [12]
    D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” The Journal of Machine Learning Research, vol.3, pp.993–1022, 2003.
    [13]
    H. L. Tang, Q. S. Dou, L. P. Yu, et al., “SLDA-TC: A novel text categorization approach based on supervised topic model,” Acta Electronica Sinica, vol.47, no.6, pp.1300–1308, 2019. (in Chinese) doi: 10.3969/j.issn.0372-2112.2019.06.017
    [14]
    P. D. Turney and P. Pantel, “From frequency to meaning: vector space models of semantics,” Journal of Artificial Intelligence Research, vol.37, pp.141–188, 2010. doi: 10.1613/jair.2934
    [15]
    Y. Bengio, R. Ducharme, P. Vincent, et al., “A neural probabilistic language model,” The Journal of Machine Learning Research, vol.3, pp.1137–1155, 2003.
    [16]
    T. Mikolov, I. Sutskever, K. Chen, et al., “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Lake Tahoe, NV, USA, pp.3111–3119, 2013.
    [17]
    Q. V. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Conference on Machine Learning, Beijing, China, pp.1188–1196, 2014
    [18]
    Y. Liu, Z. Y. Liu, T. S. Chua, et al., “Topical word embeddings,” in Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, pp.2418–2424, 2015.
    [19]
    L. Q. Niu, X. Y. Dai, J. B. Zhang, et al., “Topic2Vec: Learning distributed representations of topics,” 2015 International Conference on Asian Language Processing (IALP), Suzhou, China, pp.193–196, 2015.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(4)

    Article Metrics

    Article views (781) PDF downloads(45) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return