YUAN Lichi, “A Part-of-speech Tagging Model Employing Word Clustering and Syntactic Parsing,” Chinese Journal of Electronics, vol. 23, no. 1, pp. 109-114, 2014,
Citation: YUAN Lichi, “A Part-of-speech Tagging Model Employing Word Clustering and Syntactic Parsing,” Chinese Journal of Electronics, vol. 23, no. 1, pp. 109-114, 2014,

A Part-of-speech Tagging Model Employing Word Clustering and Syntactic Parsing

Funds:  This work is supported by the National Natural Science Foundation of China (No.61262035); the Science and Technology Foundation of Education Department of Jiangxi Province, China (No.GJJ12271, No.GJJ12742); the Natural Science Foundation of Jiangxi Province, China (No.20122BAB201033).
  • Received Date: 2012-04-01
  • Rev Recd Date: 2013-05-01
  • Publish Date: 2014-01-05
  • Part-Of-Speech tagging is a basic task in the field of natural language processing. This paper builds a POS tagger based on improved Hidden Markov model, by employing word clustering and syntactic parsing model. Firstly, In order to overcome the defects of the classical HMM, Markov family model (MFM), a new statistical model was introduced. Secondly, to solve the problem of data sparseness, we propose a bottom-to-up hierarchical word clustering algorithm. Then we combine syntactic parsing with part-of-speech tagging. The Part-of-;Speech tagging experiments show that the improved Part-Of-Speech tagging model has higher performance than Hidden Markov models (HMMs) under the same testing conditions, the precision is enhanced from 94.642% to 97.235%.
  • loading
  • Christopher D. Manning and Hinrich Schutze, Foundations of Statistical Natural Language Processing, London, the MIT Press, pp.136-157, 1999.
    K. Toutanova, D. Klein, C.D. Manning, Y. Singer,"FeatureRich Part-of-Speech Tagging with a Cyclic Dependency Network", Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, pp.252259, 2003.
    W. Jiang, Y. Guan, X.L. Wang,"Conditional Random Fields Based POS Tagging", Computer Engineering and Applications, Vol.42, No.21, pp.13-16, 2006.
    Jiang Tao, Yao Tianshun, Zhang Li,"Application Study of Example Based Chinese Word Segmentation and Part-of-speech Tagging Method", Journal of Chinese Computer Systems, Vol.28, No.11, pp.2090-2093, 2007. (in Chinese)
    Eugene Charniak, Curtis Hendricson,"Neil Jacobson, and Mike Perkowitz. Equations for Part-of-Speech tagging", Proceedings of the Eleventh National Conference on Artificial Intelligence, Menlo Park, AAAI Press/MIT Press, pp.784-789, 1993.
    T. Brants,"A statistical Part-of-Speech tagger", Proceedings of the Sixth Conference on Applied Natural Language Processing (ANLP-2000), Seattle, pp.224-231, 2000.
    Wei Ou, Wu Jian, Sun Yufang,"Analysis and Improvement of Statistics-Based Chinese Part-of-Speech Tagging", Journal of Software, Vol.11, No.4, pp.473-480, 2000. (in Chinese)
    Liang Yimin, Huang De-gen,"Chinese Part-of-speech Tagging Based on Full Second-order Hidden Markov Model", Computer Engineering, Vol.31, No.10, pp.177-179, 2005. (in Chinese)
    Qu Gang, Lu Ru-zhan,"An Improved Part-of-Speech (POS) Tagging System", Journal of Shanghai Jiaotong University, Vol.37, No.6, pp.897-900, 2003. (in Chinese)
    J. Gimenez, L. Marquez,"Fast and accurate part-of-speech tagging: The SVM approach revisited", Proceedings of the International Conference on Recent Advances in Natural Language Processing (4th RANL P), Bulgaria, pp.158-165, 2003.
    Zhao Yan, Wang Xiao-long, Liu Bing-quan, Guan Yi,"Fusion of Clustering Trigger-Pair Features for POS Tagging Based on Maximum Entropy Model", Journal of Computer Research and Development, Vol.43, No.2, pp.268-274, 2006. (in Chinese)
    Xing Fu-kun, Song Rou, Luo Zhi-yong,"Symbol-and-Statistics Decoding Model and Its Application in Chinese POS Tagging", Journal of Chinese Information Processing, Vol.24, No.1, pp.20-24, 2010. (in Chinese)
    Zhu Cong-hui, Zhao Tie-jun, Zheng De-quan,"Joint Chinese Word Segmentation and POS Tagging System with Undirected Graphical Models", Journal of Electronics & Information Technology, Vol.32, No.3, pp.700-704, 2010. (in Chinese)
    Yuan Li-chi,"A speech recognition method based on improved hidden Markov model", Journal of Central South University: Natural Science, Vol.39, No.6, pp.1303-1308, 2008. (in Chinese)
    Takuya Matsuzaki, Yusuke Miyao, Jun'ichi Tsujii,"An Efficient Clustering Algorithm for ClassBased Language Models", Proceedings of the 7th Conference on Computational Natural Language Learning (CoNLL-2003), Edmonton, Canada, pp.119126, 2003.
    Ido Dagan,"Context word similarity and estimation from sparse data", Computer Speech and Language, Vol.9, No.2, pp.123152, 1995.
    Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John R. Tukey,"Scatter/garther: A Cluster-based Approach to Browsing Large Document Collections", Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92), Copenhagen, Denmark, pp.318-329, 1992.
    Lillian Lee, Similarity-Based approaches to Natural Language Processing, Harvard University, Cambridge, MA. pp.56-72, 1997.
    Yael Karov, Shimon Edelman,"Learning Similarity-Based Word Sense Disambiguation from Sparse Data", Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, pp.42-55, 1996.
    Yuan Li-chi,"Word Clustering Based on Similarity and Vari-Gram Language Model", Journal of Chinese Computer 'ystems, Vol.30, No.5, pp.912-915, 2009. (in Chinese)
    Collins M,"Head-Driven Statistical Models for Natural Language Parsing", Computational Linguistics, Vol.29, No.4, pp.589-637, 2003.
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (494) PDF downloads(1268) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint