LI Dongchen, ZHANG Xiantao, WU Xihong. Integrated Chinese Segmentation, Parsing and Named Entity Recognition[J]. Chinese Journal of Electronics, 2018, 27(4): 756-760. doi: 10.1049/cje.2018.05.014
Citation: LI Dongchen, ZHANG Xiantao, WU Xihong. Integrated Chinese Segmentation, Parsing and Named Entity Recognition[J]. Chinese Journal of Electronics, 2018, 27(4): 756-760. doi: 10.1049/cje.2018.05.014

Integrated Chinese Segmentation, Parsing and Named Entity Recognition

doi: 10.1049/cje.2018.05.014
Funds:  This work is supported by the National Basic Research Program of China (973 Program) (No.2013CB329304), the Research Special Fund for Public Welfare Industry of Health (No.201202001), the Key National Social Science Foundation of China (No.12&ZD119), and the National Natural Science Foundation of China (No.91120001).
More Information
  • Corresponding author: WU Xihong (corresponding author) received the Ph.D. degree from the Department of Radio Electronics, Peking University, China, in 1995. He is currently a full professor at Peking University. His areas of research include computational auditory models and auditory scene analysis, auditory psychophysics, speech signal processing, and natural language processing. (Email:wxh@cis.pku.edu.cn)
  • Received Date: 2014-06-08
  • Rev Recd Date: 2014-08-02
  • Publish Date: 2018-07-10
  • Segmentation, named entity recognition and parsing are standalone techniques in natural language processing community, and their annotations are inconsistent. However, the joint output is needed in some practical use, and they rely on the result of each other to make more concise output. A unified model is learned to resolve these three tasks simultaneously. At the training stage, the joint annotation of the three tasks are employed to learn a unified model. At the decoding stage, the three tasks are carried out on a given text to provide a consistent output. Experiment results demonstrate the higher performance for each task and verify the benefits of the unified framework.
  • loading
  • H. Tseng, P. Chang, G. Andrew, et al., "A conditional random field word segmenter for sighan bakeoff 2005", Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, pp.12-18, 2005.
    H. Zhao, C.-N. Huang, M. Li, et al., "An improved Chinese word segmentation system with conditional random field", Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, pp.162-165, 2006.
    W. Jiang, L. Huang and Q. Liu, "Automatic adaptation of annotation standards:Chinese word segmentation and pos tagging:A case study", Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Association for Computational Linguistics, Singapore, pp.522-530, 2009.
    YUAN Li-chi, "Smooth technologies in head-driven parsing", Acta Electronica Sinica, Vol.41, No.7, pp.1337-1342, 2013. (in Chinese)
    D. Li and X. Wu, "Parsing TCT with split conjunction categories", Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing, Tianjin, China, pp.174-178, 2012.
    H.-P. Zhang, Q. Liu, H. Yu, et al., "Chinese named entity recognition using role model", Computational Linguistics and Chinese Language Processing, Vol.8, No.2, pp.29-60, 2003.
    X. Lin, "Chinese semantic role labeling with linguistic knowledge", Ph.D. Dissertation, Peking University, 2009.
    X. Qian and Y. Liu, "Joint chinese word segmentation, pos tagging and parsing", Proc. of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, Jeju Island, Korea, pp.501-511, 2012.
    M. Zhang, Y. Zhang, W. Che, et al., "Chinese parsing exploiting characters", 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp.121-127, 2013.
    X. Wu, M. Zhang and X. Lin, "Parsing-based chinese word segmentation integrating morphological and syntactic information", 7th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), Tokushima, Japan, pp.114-121, 2011.
    D. Li, X. Zhang and X. Wu, "Improved Chinese pars ing using named entity cue", International Conference on Parsing Technologies, Tokushima, Japan, pp.45-53, 2013.
    Z. Li, "Parsing the internal structure of words:A new paradigm for Chinese word segmentation", In 49th Annual Meeting of the Association for Computational Linguistics, Portland, OR, USA, pp.1405-1414, 2011.
    D. Li and X. Wu, "Parsing tct with split conjunction categories", Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing, Association for Computational Linguistics and Chinese Information Processing Society of China, pp.174-178, 2012.
    X. Luo, "A maximum entropy chinese character-based parser", Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Sapporo, Japan, pp.192-199, 2003.
    P. Fung, G. Ngai, Y. Yang, et al., "A maximum-entropy Chinese parser augmented by transformation-based learning", ACM Transactions on Asian Language Information Processing (TALIP), Vol.3, No.2, pp.159-168, 2004.
    Y.ZhangandS.Clark,"Transition-based parsing of the Chinese treebank using a global discriminative model", Proc. of the 11th Intern. Conf. on Parsing Technologies, Association for Computational Linguistics, Paris, France, pp.162-171, 2009.
    Y. Zhang and S. Clark, "Syntactic processing using the generalized perceptron and beam search", Computational Linguistics, Vol.37, No.1, pp.105-151, 2011.
    D.M. Bikel and D. Chiang, "Two statistical parsing models applied to the Chinese treebank", Proceedings of the Second Workshop on Chinese Language Processing, held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vol.12, pp.1-6, 2000.
    S. Sekine and C. Nobata, "Definition, dictionaries and tagger for extended named entity hierarchy", International Conference on Language Resources and Evaluation, pp.34-46, 2004.
    A. Klementiev and D. Roth, "Weakly supervised named entity transliteration and discovery from multilingual comparable corpora", Proc. of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Jeju Island, Korea, pp.817-824, 2006.
    S. Singh, D. Hillard and C. Leggetter, "Minimally-supervised extraction of entities from text advertisements", Human Language Technologies:The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Uppsala, Sweden, pp.73-81, 2010.
    A. Chen, F. Peng, R. Shan, et al., "Chinese named entity recognition with conditional probabilistic models", 5th SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, pp.89-98, 2006.
    X. Yu, W. Lam, S.-K. Chan, et al., "Chinese ner using crfs and logic for the fourth SIGHAN bakeoff", International Joint Conference on Natural Language Processing, Hyderabad, India, pp.102-105,2008.
    J.R. Finkel and C.D. Manning, "Joint parsing and named entity recognition", Proceedings of Human Language Technologies:The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Boulder, Colorado, USA, pp.326-334, 2009.
    J.R. Finkel and C.D. Manning, "Nested named entity recognition", Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Singapore, pp.141-150, 2009.
    N. Xue, F.-D. Chiou and M. Palmer, "Building a large-scale annotated chinese corpus", Proceedings of the 19th Intern. Conf. on Computational Linguistics, Association for Computational Linguistics, Grenoble, France, Vol.1, pp.1-8, 2002.
    E. Hovy, M. Marcus, M. Palmer, et al., "Ontonotes:the 90% solution", Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume:Short Papers, Association for Computational Linguistics, Seattle, Washington, USA, pp.57-60, 2006.
    T. Matsuzaki, Y. Miyao and J. Tsujii, "Probabilistic cfg with latent annotations", Proceedings of the 43rd Annual Meetingon Association for Computational Linguistics, Association for Computational Linguistics, New York, USA, pp.75-82, 2005.
    S. Petrov and D. Klein, "Improved inference for unlexicalized parsing", Human Language Technologies 2007:the Conf. of the North American Chapter of the Association for Computational Linguistics, Prague, Czech Republic, pp.404-411, 2007.
    S. Petrov, L. Barrett, R. Thibaux, et al., "Learning accurate, compact, and interpretable tree annotation", Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics,Seattle, Washington, USA, pp.433-440, 2006.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (146) PDF downloads(369) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return