XI Ning, DAI Xinyu, HUANG Shujian, CHEN Jiajun. Discriminative Word Alignment over Multiple Word Segmentations[J]. Chinese Journal of Electronics, 2014, 23(2): 263-270.
Citation: XI Ning, DAI Xinyu, HUANG Shujian, CHEN Jiajun. Discriminative Word Alignment over Multiple Word Segmentations[J]. Chinese Journal of Electronics, 2014, 23(2): 263-270.

Discriminative Word Alignment over Multiple Word Segmentations

Funds:  This work is supported by the National Natural Science Foundation of China (No.61003112), National Social Science Foundation (No.11AZD121), and Research Fund for the Doctoral Program of Higher Education of China (No.20110091110003).
More Information
  • Corresponding author: DAI Xinyu
  • Received Date: 2013-01-01
  • Rev Recd Date: 2013-03-01
  • Publish Date: 2014-04-05
  • Conventional bilingual word alignment is conducted on sentence pairs with single word segmentation for languages such as Chinese, viz. Single-segmentationbased word alignment (SSWA). However, SSWA may run the risk of losing optimal word segmentation granularities or causing data sparseness in word alignment. This paper proposes Multiple-segmentation-based word alignment (MSWA). In MSWA, diverse and complementary knowledge in multiple word segmentations can be employed to lower the above risks in word alignment. Given k word segmentations of a Chinese sentence, a skeleton segmentation is firstly constructed. The alignment between the skeleton segmentation and the parallel English sentence is loglinearly modeled, where various features defined over multiple word segmentations are incorporated. The Viterbi alignment, the alignment with the highest score, is mapped back to k word alignments based on k segmentations respectively. Experimentally, MSWA outperformed SSWA on all k segmentations in both alignment quality and translation performance.
  • loading
  • Hongmei Zhao and Qun Liu, "The CIPS-SIGHAN CLP 2010 Chinese word segmentation bakeoff", Proceedings of the First CIPS-SIGHAN Joint Conference on Chinese Language Processing, Beijing, China, pp.199-209, 2010.
    Thomas Emerson, "The second international Chinese word segmentation bakeoff", Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, pp.123-133, 2005.
    Richard Sproat and Thomas Emerson, "The first international Chinese word segmentation bakeoff", Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp.133-143, 2003
    Aria Haghighi, John Blitzer, John DeNero and Dan Klein, "Better word alignments with supervised ITG models", Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, pp.923-931, 2009.
    Yang Liu, Qun Liu and Shouxun Lin, "Discriminative word alignment by linear modeling", Computational Linguistics, Vol.36, No.3, pp.303-339, 2010.
    Pichuan Chang, Michel Galley and Christopher D. Manning, "Optimizing Chinese word segmentation for machine translation performance", Proceedings of the Third Workshop on Statistical Machine Translation, Columbus, Ohio, USA, pp.224-232, 2008.
    Xinyan Xiao, Yang Liu, Young-Sook Hwang, Qun Liu, Shouxun Lin, "Joint tokenization and translation", Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, pp.1200-1208, 2010.
    Ruiqiang Zhang, Keiji Yasuda and Eiichiro Sumita, "Improved statistical machine translation by multiple Chinese word segmentation", Proceedings of the Third Workshop on SMT, Columbus, Ohio, USA, pp.216-223, 2008.
    Yanjun Ma, Nicolas Stroppa and Andy Way, "Bootstrapping word alignment via word packing", Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech, pp.304-311, 2007.
    Yanjun Ma and Andy Way, "Bilingually motivated domainadapted word segmentation for statistical machine translation", Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, pp.549-557, 2009.
    Michael Paul, Andrew Finch and Eiichiro Sumita, "Integration of multiple bilingually learned segmentation schemes into statistical machine translation", Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, Uppsala, Sweden, pp.400-408. 2010.
    Jia Xu, Jianfeng Gao, Kristina Toutanove and Hermann Ney, "Bayesian semi-supervised Chinese word segmentation for statistical machine translation", Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, pp.1017-1024, 2008.
    Tagyoung Chung and Gildea, Daniel, "Unsupervised tokenization for machine translation", Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp.718-726, 2009.
    Xiangyu Duan, Min Zhang and Haizhou Li, "Pseudo-word for phrase-based machine translation", Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp.148-156, 2010.
    Chongde Shi and Huilin Wang, "Research on Chinese word segmentation optimization in statistical machine translation", Journal of New Technology of Library and Information Service, Vol.21, No.4, pp.29-34, 2012. (in Chinese)
    Ning Xi, Guangchao Tang, Xinyu Dai, Shujian Huang and Jiajun Chen, "Enhancing statistical machine translation with character alignment", Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, pp.285-290, 2012.
    Ruiqiang Zhang and Eiichiro Sumita, "Chinese unknown word translation by subword re-segmentation", Proceedings of the Third International Joint Conference on Natural Language Processing, Hyderabad, India, pp.225-232, 2008.
    Christopher Dyer, Smaranda Muresan and Philip Resnik, "Generalizing word lattice translation", Proceedings of ACL-08: HLT, Columbus, Ohio, USA, pp.1012-1020, 2008.
    Christopher Dyer, "Using a maximum entropy model to build segmentation lattices for MT", Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, USA, pp.406-414, 2009.
    Jia Xu, Evgeny Matusov, Richard Zens and Hermann Ney, "Integrated Chinese word segmentation in statistical machine translation", Proceedings of IWSLT, Kyoto, Japan, 2004.
    Ning Xi, Guangchao Tang, Boyuan Li, Yinggong Zhao, "Word alignment combination over multiple word segmentation", Proceedings of the ACL 2011 Student Session, Portland, Oregon, USA, pp.1-5, 2011.
    Xuansong Li, Guidelines for Chinese-English Word Alignment, Linguistic Data Consortium, pp.1-32, 2009.
    Franz Josef Och, "Minimum error rate training in statistical machine translation", Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp.440-447, 2003.
    Fei Huang, "Confidence measure for word alignment", Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, pp.932-940, 2009.
    Bing Xiang, Yonggang Deng and Bowen Zhou, "Diversify and combine: Improving word alignment for machine translation on low-resource languages", Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden, pp.22-26, 2010.
    Franz Josef Och and Hermann Ney, "A systematic comparison of various statistical alignment models", Computational Linguistics, Vol.29, No.1, pp.19-51, 2003.
    Chooiling Goh and Eiichiro Sumita, "A feature-rich supervised word alignment model for phrase-based statistical machine translation", International Journal on Asian Language Processing, Vol.19, No.3, pp.109-125, 2009.
    Rober C. Moore, "A discriminative framework for bilingual word alignment", Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, pp.81-88, 2005.
    Philipp Koehn, Franz Josef Och and Daniel Marcu, "Statistical phrase-based translation", Proceedings of HLT-NAACL, Edmonton, Canada, pp.48-54, 2003.
    Alexander Fraser and Daniel Marcu, "Measuring word alignment quality for statistical machine translation", Computational Linguistics, Vol.33, No.3, pp.406-414, 2007.
    David Chiang, Steve DeNeefe, Yee Seng Chan and Hwee Tou Ng, "Decomposability of translation metrics for improved evaluation and efficient algorithms", Proceedings of Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp.610-619, 2008.
    Haodi Feng, Kang Chen, Xiaotie Deng and Weimin Zheng, "Accessor variety criteria for Chinese word extraction", Computational Linguistics, Vol.30, No.1, pp.75-93, 2004.
    Graham Neubig, Taro Watanabe, Shinsuke Mori and Tatsuya Kawahara, "Machine translation without words through substring alignment", Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, pp.165-174, 2012.
    Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer, "Online passive aggressive algorithms", Journal of Machine Learning Research, Vol.7, pp.551-585, 2006.
    Mark Hopkins and Jonathan May, "Tuning as ranking", Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, pp.1352-1362, 2011.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (334) PDF downloads(1241) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return