Volume 30 Issue 1
Jan.  2021
Turn off MathJax
Article Contents
ZHOU Yajian, DENG Dingpeng, CHI Junhui. A Short Text Classification Algorithm Based on Semantic Extension[J]. Chinese Journal of Electronics, 2021, 30(1): 153-159. doi: 10.1049/cje.2020.11.014
Citation: ZHOU Yajian, DENG Dingpeng, CHI Junhui. A Short Text Classification Algorithm Based on Semantic Extension[J]. Chinese Journal of Electronics, 2021, 30(1): 153-159. doi: 10.1049/cje.2020.11.014

A Short Text Classification Algorithm Based on Semantic Extension

doi: 10.1049/cje.2020.11.014

the National Key Research and Development Project 2017YFB0802803

More Information
  • Author Bio:

    ZHOU Yajian  obtained the Ph.D. degree in communications engineering from Xidian University at Xi'an in 2003. He received the M.S. degree in 1996 and B.S. degree in 1993, both from Beihang University. He is currently an associate professor in the School of Computer Science of Beijing University of Posts and Telecommunications (BUPT). His main research interests include mobile communications (e.g., LTE, D2D, and WLAN), security of wireless networks, security of databases, and cryptography theory and its application. He is now in charge of two state-level projects: one from the National Natural Science Foundation of China and another one from the National Science and Technology Major Project of China. He was once the project leaders of the National 863 High-Tech Research and Development Plan of China and a project from Beijing Municipal Natural Science Foundation. He has published approximately 50 technical papers and 6 books. (Email: yajian@bupt.edu.cn)

    CHI Junhui  received an M.S. degree in big data and intelligent information processing from Beijing University of Posts and Telecommunications, China in 2018. From 2018 to 2019, he was a teaching assistant of discrete mathematics. His research interests include natural language processing, statistical learning theory and brain-like computing. Mr. Chi is a student member of the China Computer Federation (CCF). (Email: cjhwilliam714@edu.cn)

  • Corresponding author: DENG Dingpeng  (corresponding author) received the B.S. degree in network engineering from Beijing Information Science and Technology University, China, in 2013, and the M.S. degree in big data from Beijing University of Posts and Telecommunications, China, in 2017. His current research interests include information processing, natural language processing, cloud computing, and network security. (Email: dingpeng@bupt.edu.cn)
  • Received Date: 2020-04-27
  • Accepted Date: 2020-06-03
  • Publish Date: 2021-01-01
  • A semantic-extension-based algorithm for short texts is proposed, by involving the Word2vec and the LDA model, to improve the performance of classification, which is frequently deteriorated by semantic dependencies and scarcity of features. For every keyword within a short text, weighted synonyms and related words can be generated by the Word2Vec and LDA model, respectively, and subsequently be inserted to extend the short text to a reasonable length. We not only have established a criterion by means of similarity estimation to determine whether a sentence should be extended, we designed a scheme to choose the number of extended words. The extended text will be classified. Experimental results show that, the classification performance of the proposed algorithm, in terms of the precision rate, is approximately 5% higher than that of the TF-IDF model and approximately 10% higher than that of the VSM method.
  • loading
  • [1]
    T. Naiem, W. Stephen, et al. , "Big data: The next frontier for innovation in therapeutics and healthcare", Expert Rev Clin Pharmacol, Vol. 7, No. 3, pp.293-298, 2015.
    M. J. Sousa, A. M. Pesqueira, et al. , "Decision-making based on big data analytics for people management in healthcare organizations", Journal of Medical Systems, Vol. 43, No. 9, pp.1-10, 2019. doi: 10.1007/s10916-019-1419-x
    W. Tianshu, C. Shuyu, et al. , "A feature optimized deep learning model for clinical data mining", Chinese Journal of Electronics, Vol. 29, No. 3, pp.476-481, 2020. doi: 10.1049/cje.2020.03.004
    Norris R J and Mullinix K J, "Framing innocence: An experimental test of the effects of wrongful convictions on public opinion", Journal of Experimental Criminology, DOI:10.1007/s11292-019-09360-7, 2019.
    L. Bingyu, W. Cuirong, et al. , "Microblog topic mining based on FR-DATM", Chinese Journal of Electronics, Vol. 27, No. 2, pp.334-341, 2018. doi: 10.1049/cje.2017.12.006
    W. Zhou and W. Han, "Personalized recommendation via user preference matching", Information Processing & Management, Vol. 56, No. 3, pp.955-968, 2019. http://www.sciencedirect.com/science/article/pii/S0306457318305521
    H. B. Peng, H. Y. Shao, et al. , "General improved SSD model for picking object recognition of multiple fruits in natural environment", Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, Vol. 34, No. 16, pp.155-162, 2018.
    K. Dorota and S. Vishvajit, "Improvement of manual assembly line based on Value stream mapping (VSM) and efectiveness coefficient", Quality Production Improvement-QPI, Vol. 1, No. 1, pp.537-544, 2019. doi: 10.2478/cqpi-2019-0072
    Z. Wang, "An improved method of short text feature extraction based on words cooccurrence", Proc. of the 2nd International Conference on Electric Technology and Civil Engineering, Wuhan, Hubei, China, pp.18-20, 2012.
    ST. Dumais, "Latent semantic analysis", Annual Review of Information Science and Technology, Vol. 38, No. 1, pp.188-230, 2005. doi: 10.1002/aris.1440380105
    T. Hofmann, "Probabilistic latent semantic indexing", Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, pp.50-57, 1999.
    D. M. Blei, A. Y. Ng and M. I. Jordan, "Latent Dirichlet allocation", Machine Learning Research Archive, Vol. 3, No. 6, pp.993-1022, 2003.
    Y. J. Hu, J. X. Jiang, et al. , "A new method of keywords extraction for Chinese short-text classification", New Technology of Library and Information Service, Vol. 6, No. 1, pp.42-48, 2013. http://search.cnki.net/down/default.aspx?filename=XDTQ201306009&dbcode=CJFD&year=2013&dflag=pdfdown
    M. G. Chen, X. M. Jin and D. Shen, "Short text classification improved by learning multigranularity topics", Proc. of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain, pp.16-22, 2011.
    W. Haitao, H. Jie, et al. , "A short text classification method based on N-gram and CNN", Chinese Journal of Electronics, Vol. 29, No. 2, pp.248-254, 2020. doi: 10.1049/cje.2020.01.001
    Q. Le and T. Mikolov, "Distributed representations of sentences and documents", Proc. of the 31st International Conference on Machine Learning, Beijing, China, pp.1188-1196, 2014.
    G. Salton, "A vector space model for auto-matic indexing", Communications of the ACM, Vol. 18, No. 11, pp.613-620, 1975. doi: 10.1145/361219.361220
    M. Sahami and T. D. Heilman, "A Web - based kernel function for measuring the similarity of short text snippets", Proc. of the 15th Conference on World Wide Web, New York, USA, pp.377-386, 2006.
    T. Mikolov, T. Sutskever, K. Chen, et al., "Distributed representations of words and phrases and their compositionality", Proc. of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, pp.3111-3119, 2013.
    A. Omaki, "Grammatical constraints and reductionism in sentence processing", Linguistic Approaches to Bilingualism, Vol. 3, No. 3, pp.330-334, 2013. doi: 10.1075/lab.3.3.10oma
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(5)

    Article Metrics

    Article views (146) PDF downloads(12) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint