WANG Haitao, HE Jie, ZHANG Xiaohong, LIU Shufen. A Short Text Classification Method Based on N-Gram and CNN[J]. Chinese Journal of Electronics, 2020, 29(2): 248-254. doi: 10.1049/cje.2020.01.001
Citation: WANG Haitao, HE Jie, ZHANG Xiaohong, LIU Shufen. A Short Text Classification Method Based on N-Gram and CNN[J]. Chinese Journal of Electronics, 2020, 29(2): 248-254. doi: 10.1049/cje.2020.01.001

A Short Text Classification Method Based on N-Gram and CNN

doi: 10.1049/cje.2020.01.001
Funds:  This work is supported by the National Natural Science Foundation of China (No.61503124, No.61572379).
More Information
  • Corresponding author: ZHANG Xiaohong (corresponding author) worked in Henan Polytechnic University now, received the Ph.D. degree in computer architecture from University of Chinese Academy of Sciences. She did one year of post-doc research on cloud computing in Wayne State University. Her main research interests include cloud computing and big data analysis. (Email:1760778431@qq.com)
  • Received Date: 2019-05-21
  • Rev Recd Date: 2019-07-22
  • Publish Date: 2020-03-10
  • Text classification is a fundamental task in Nature language process (NLP) application. Most existing research work relied on either explicate or implicit text representation to settle this kind of problems, while these techniques work well for sentence and can not simply apply to short text because of its shortness and sparseness feature. Given these facts that obtaining the simple word vector feature and ignoring the important feature by utilizing the traditional multi-size filter Convolution neural network (CNN) during the course of text classification task, we offer a kind of short text classification model by CNN, which can obtain the abundant text feature by adopting none linear sliding method and N-gram language model, and picks out the key features by using the concentration mechanism, in addition employing the pooling operation can preserve the text features at the most certain as far as possible. The experiment shows that this method we offered, comparing the traditional machine learning algorithm and convolutional neural network, can markedly improve the classification result during the short text classification.
  • loading
  • Marcin Micha Mirończuk and Jarosaw Protasiewicz, “A recent overview of the state-of-the-art elements of text classification”, Expert Systems with Applications, Vol.106, No.15, pp.36-50, 2018.
    S. N. Bharath Bhushan and Ajit Danti, “Classification of text documents based on score level fusion approach”, Pattern Recognition Letters, Vol.94, No.15, pp.118-120, 2017.
    Hinton G E and Salakhutdinov R R, “Reducing the dimensionality of data with neural networks, Science, Vol.313, No.5786, pp.504-507, 2006.
    Lei T, Barzilay R and Jaakkola T, “Molding CNNs for text: Non-linear, non-consecutive convolutions”, Indiana University Mathematics Journal, Vol.58, No.3, pp.1151-1186, 2015.
    LIANG Bin, LIU Quan, XU Jin, et al., “A special sentiment analysis based on multi-concentration CNN”, Computer Research and Development, Vol.54, No.8, pp.1724-1735, 2017.
    Bahdanau D, Cho K and Bengio Y, “Neural machine translation by jointly learning to align and translate”, Computer Science, Vol.18, No.2, pp.124-135, 2014.
    Guo J, Yue B, Xu G, et al., “An enhanced convolutional neural network model for answer selection”, International Conference on World Wide Web Companion, International World Wide Web Conferences Steering Committee, pp.789-796, 2017.
    YANG Zhen, FAN Kefeng, LAI Yingxu, et al., “Short text classification through reference document expansion”, Chinese Journal of Electronics, Vol.23, No.2, pp.315-323, 2014.
    Zhang Y and Wallace B, “A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification”, Computer Science, Vol.16, No.4, pp.874-880, 2015.
    LUO Fan and WANG HouFeng, “Chinese text sentimental classification combined the RNN with CNN hierarchical network”, Beijing University (Natural Science Version). https://doi.org/10.13209/j.0479-8023, 2017.
    Mandelbaum A and Shalev A, “Word embeddings and their use in sentence classification tasks”, Machine Learning, Vol.26, No.10, pp.1-15, 2016.
    FENG Xingjie, ZHANG Zhiwen and SHI Jinchuan, “Text sentimental analyses based CNN and concentration model”, Computer Application Research, Vol.35, No.05, pp.1434-1436, 2018.
    ZHANG Jiang, SUN Qigan, LI Xue, et al., “A novel feature selection method based on probability latent semantic analysis for Chinese text classification”, Chinese Journal of Electronics, Vol.20, No.2, pp.228-232, 2011.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (78) PDF downloads(311) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return