WANG Haitao, HE Jie, ZHANG Xiaohong, LIU Shufen. A Short Text Classification Method Based on N-Gram and CNN[J]. Chinese Journal of Electronics, 2020, 29(2): 248-254. doi: 10.1049/cje.2020.01.001
A Short Text Classification Method Based on N-Gram and CNN

doi: 10.1049/cje.2020.01.001
Funds:  This work is supported by the National Natural Science Foundation of China (No.61503124, No.61572379).
  • Corresponding author: ZHANG Xiaohong (corresponding author) worked in Henan Polytechnic University now, received the Ph.D. degree in computer architecture from University of Chinese Academy of Sciences. She did one year of post-doc research on cloud computing in Wayne State University. Her main research interests include cloud computing and big data analysis. (
  • Received Date: 2019-05-21
  • Rev Recd Date: 2019-07-22
  • Publish Date: 2020-03-10
  • Text classification is a fundamental task in Nature language process (NLP) application. Most existing research work relied on either explicate or implicit text representation to settle this kind of problems, while these techniques work well for sentence and can not simply apply to short text because of its shortness and sparseness feature. Given these facts that obtaining the simple word vector feature and ignoring the important feature by utilizing the traditional multi-size filter Convolution neural network (CNN) during the course of text classification task, we offer a kind of short text classification model by CNN, which can obtain the abundant text feature by adopting none linear sliding method and N-gram language model, and picks out the key features by using the concentration mechanism, in addition employing the pooling operation can preserve the text features at the most certain as far as possible. The experiment shows that this method we offered, comparing the traditional machine learning algorithm and convolutional neural network, can markedly improve the classification result during the short text classification.
