Turn off MathJax
Article Contents
TANG Huanling, ZHU Hui, WEI Hongmin, ZHENG Han, MAO Xueli, LU Mingyu, GUO Jin. Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model[J]. Chinese Journal of Electronics. doi: 10.1049/cje.2021.00.113
Citation: TANG Huanling, ZHU Hui, WEI Hongmin, ZHENG Han, MAO Xueli, LU Mingyu, GUO Jin. Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model[J]. Chinese Journal of Electronics. doi: 10.1049/cje.2021.00.113

Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model

doi: 10.1049/cje.2021.00.113
Funds:  This work is supported by the National Natural Foundation of China (No.61976124, No.61976125, No.61773244, No.61772319, No.61772250)
More Information
  • Author Bio:

    received the B.S. degree from Yantai University in 1993, the M.S. degree in computer science and technology from Tsinghua University in 2004, and Ph.D. degree from Dalian Maritime University in 2009. Her research interests include machine learning, artificial intelligence and data mining. (E-mail: thl01@163.com)

    was born in JiangSu. He received the the B.S. degree from Tongda College of Nanjing University of Posts & Telecommunications. He is a postgraduate student of Shandong Technology and Business University. His research interests include machine learning, artificial intelligence and data mining. (E-mail: 1501573182@qq.com)

    received the M.S. and Ph.D. degrees in computer science and technology from Tsinghua University in 1988 and 2002, respectively. Now he is a professor at Dalian Maritime University. His research interests include machine learning, artificial intelligence and data mining. (E-mail: lumingyu@dlmu.edu.cn)

    (corresponding author) received her M.S. degree from Dongbei University of Finance & Economics in 2003. Now she is an associate professor in school of Computer and Information Technology at Liaoning Normal University. Her major fields is intelligent information processing. (E-mail: guojinsky@163.com)

  • Received Date: 2021-03-31
    Available Online: 2021-09-23
  • To solve the problem of semantic loss in text representation, this paper proposes a new embedding method of word representation in semantic space called wt2svec based on SLDA(Supervised LDA) and Word2vec. It generates the global topic embedding word vector utilizing SLDA which can discover the global semantic information through the latent topics on the whole document set. Meanwhile, it gets the local semantic embedding word vector based on the Word2vec. Therefore, the new semantic word vector is obtained by combining the global semantic information with the local semantic information. Additionally, the document semantic vector named doc2svec is generated. The experimental results on different datasets show that wt2svec model can obviously promote the accuracy of the semantic similarity of words, and improve the performance of text categorization compared with Word2vec.
  • loading
  • [1]
    W. Dai, G. Xue, Q. Yang, Y. Yu, “Transferring Naive Bayes Classifiers for Text Classification”, Proc. of the 22nd Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence. USA, Menlo Park, CA, USA, pp.540–545, 2007.
    [2]
    C. Xing, D. Wang and X. Zhang, et al. “Document classification with distributions of word vectors”, Asia-pacific Signal and Information Processing AssociationAnnual Summit and Conference (APSIPA), Asia-Pacific, pp.1–5, 2014.
    [3]
    C. Li, H. Wang, Z. Zhang, et al. “Topic modeling for short texts with auxiliary word embeddings”, Proc. of the International ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy, pp.165–174, 2016.
    [4]
    J. He, L. Chen, G. Chen, et al., “A LDA topic model based collection selection method for distributed information retrieval,” Journal of Chinese Information Processing, vol.31, no.3, pp.125–133, 2017. (in Chinese)
    [5]
    G. Yang. “A Novel contextual topic model for query-focused multi-document summarization”, Proc. of the International Conference on Tools with Artificial Intelligence. Limassol, Cyprus, pp.576–583, 2014.
    [6]
    M. Tang, L. Zhu, X. Zou, “Document Vector Representation Based on Word2vec,” Computer Science, vol.43, no.6, pp.214–217+269, 2016. (in Chinese)
    [7]
    Y. He, M. Jiang, “Information Bottleneck Based Feature Selection in Web Text Categorization,” Tsinghua Science and Technology(journal of natural sciences), vol.50, no.1, pp.45–48, 53, 2010. (in Chinese)
    [8]
    D. Q. Nguyen, R. Billingsley, L. Du, et al., “Improving Topic Models with Latent Feature Word Representations,” Transactions of the Association for Computational Linguistics, vol.3, pp.299–313, 2015. doi: 10.1162/tacl_a_00140
    [9]
    H. Tang, H. Zheng, Y. Liu, “Tr-SLDA:A cross-domain oriented trasfer topic model,” Chinese Journal of Electronics, vol.49, no.3, pp.605–613, 2021.
    [10]
    S. Zellig, “Distributional structure,” word, vol.10, no.2&3, pp.146–162, 1954.
    [11]
    G. Salton, “A vector space model for automatic indexing,” Communications of the ACM, vol.18, no.11, pp.613–620, 1975. doi: 10.1145/361219.361220
    [12]
    D. M. Blei, A. Ng and M.Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol.3, pp.993–1022, 2003.
    [13]
    H. Tang, Q. Dou, L. Yu, et al., “SLDA-TC:A Novel Text Categorization Approach Based on Supervised Topic Model,” Chinese Journal Of Electronics, vol.47, no.6, pp.1300–1308, 2019. (in Chinese)
    [14]
    P. D. Turney, P. Pantel, “From frequency to meaning: vector space models of semantics,” Journal of Artificial Intelligence Research, vol.37, pp.141–188, 2010. doi: 10.1613/jair.2934
    [15]
    Y. Bengio, H. Schwenk, J. S. Senecal, et al., “Neural probabilistic language models,” Journal of Machine Learning Research, vol.3, no.6, pp.1137–1155, 2003.
    [16]
    T. Mikolov, I. Sutskever, K. Chen, et al. “Distributed Representations of Words and Phrases and their Compositionality”. Proc. of the 26th International Conference on Neural Information Processing Systems - Volume 2, Lake Tahoe, Nevada, pp. 3111–3119, 2013.
    [17]
    Q. V. Le, T. Mikolov. “Distributed Representations of Sentences and Documents”. Proc. of the 31st International Conference on International Conference on Machine Learning - Volume 32, Beijing, China, pp. 1188–1196, 2014
    [18]
    Y. Liu, Z. Liu, T. Chua, et al. “Topical word embeddings”, Proc. of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, pp. 2418–2424, 2015.
    [19]
    L. Niu, X. Dai, J. Zhang, et al. “Topic2Vec: Learning distributed representations of topics”. 2015 International Conference on Asian Language Processing (IALP), pp. 193–196, 2015.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(4)

    Article Metrics

    Article views (143) PDF downloads(13) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return