HUANG Shujian, DAI Xinyu, CHEN Jiajun. Hypothesis Pruning in Learning Word Alignment[J]. Chinese Journal of Electronics, 2013, 22(1): 93-98.
Citation: HUANG Shujian, DAI Xinyu, CHEN Jiajun. Hypothesis Pruning in Learning Word Alignment[J]. Chinese Journal of Electronics, 2013, 22(1): 93-98.

Hypothesis Pruning in Learning Word Alignment

Funds:  This work is supported by the National Natural Science Foundation of China (No.61003112, No.61170181), the National Fundamental Research Program of China (No.2010CB327903).
  • Received Date: 2011-11-01
  • Rev Recd Date: 2012-02-01
  • Publish Date: 2013-01-05
  • Recent study shows that discriminative learning methods could provide a significant improvement of word alignment quality. One of the difficulties of these methods is how to perform efficient search of word alignment. Although Inversion transduction grammar (ITG) provides a polynomial time algorithm using synchronous parsing techniques, a very harsh pruning is still needed to make the algorithm computationally feasible. We notice that previous pruning techniques mostly focus on pruning the bi-lingual spans??after what low quality alignments still exist. To address this problem, we propose an approach that prunes low quality hypotheses on-the-fly during parsing. Compared with previous pruning methods which only use high precision alignment links as constraints, our method could make use of “high recall” alignment links as well. To demonstrate our approach, we also propose a constrained learning framework, which generates high precision and high recall constraints from some existing alignment results. Experiment shows significant improvements of both alignment and translation quality over standard IBM Model 4 alignments on the Chinese-English test data.
  • loading
  • Aitao Chen, Fredric C. Gey, “Multilingual information retrievalusing machine translation, relevance feedback and decompounding”,Information Retrieval, Vol.7, No.1-2, pp.149-182, 2004.
    Eduard Hovy, Nancy Ide, Robert Frederking et al., “Multilingualinformation management: current levels and future abilities”,Istituti editorialie poligrafici internazionali, Pisa, Italy,2001
    Peng Meng, Liusheng Huang, Zhili Chen et al., “Analysis anddetection of translation-based steganography”, Acta ElectronicaSinica, Vol.38, No.8, pp.1748-1752, 2010. (in Chinese)
    Yang Liu, Qun Liu and Shouxun Lin, “Log-linear models forword alignment”, in Proceedings of the 43rd Annual Meetingon Association for Computational Linguistics, Association forComputational Linguistics, pp.459-466, 2005.
    Robert C. Moore, “A discriminative framework for bilingualword alignment”, in Proceedings of the Conference on HumanLanguage Technology and Empirical Methods in Natural LanguageProcessing, Association for Computational Linguistics,Ann Arbor, USA, pp.81-88, 2005.
    Ben Taskar, Simon Lacoste-Julien and Dan Klein, “A discriminativematching approach to word alignment”, in Proceedingsof the conference on Human Language Technology and EmpiricalMethods in Natural Language Processing, Association forComputational Linguistics, Ann Arbor, USA, pp.73-80, 2005.
    Dekai Wu, “Stochastic inversion transduction grammars andbilingual parsing of parallel corpora”, Computational Linguistics,Vol.23, pp.377-403, 1997.
    Colin Cherry and Dekang Lin, “Inversion transduction grammarfor joint phrasal translation modeling”, in Proceedings of theNAACL-HLT 2007/AMTA Workshop on Syntax and Structurein Statistical Translation (SSST ’07), Association for ComputationalLinguistics, Prague, Czech Republic, pp.17-24, 2007.
    Hao Zhang and Daniel Gildea, “Stochastic lexicalized inversiontransduction grammar for alignment”, in Proceedings of the43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Ann Arbor,USA, pp.475-482, 2005.
    Aria Haghighi, John Blitzer and Dan Klein, “Better word alignmentswith supervised ITG models”, in Proceedings of the 47thAnnual Meeting of the Association of Computational Linguistics,Singapore, pp.923-931, 2009.
    Shujie Liu, Chiho Li and Ming Zhou, “Discriminative pruningfor discriminative ITG alignment”, in Proceedings of the 48thAnnual Meeting of the Association for Computational Linguistics,Association for Computational Linguistics, Uppsala, Sweden,pp.316-324, 2010.
    Yoshikazu Sawaragi, Hirotaka Nakayama and Tetsuzo Tanino,“Theory of multiobjective optimization”, Mathematics in Scienceand Engineering, Vol.176, Academic Press, Orlando, 1985.
    Liang Huang and David Chiang, “Forest rescoring: Faster decodingwith integrated language models”, in Proceedings ofthe 45th Annual Meeting of the Association of ComputationalLinguistics, Association for Computational Linguistics, Prague,Czech Republic, pp.144-151, 2007.
    Shujian Huang, Stephan Vogel and Jiajun Chen, “Dealing withspurious ambiguity in learning ITG-based word alignment”, inProceedings of the 49th Annual Meeting of the Association ofComputational Linguistics, Portland, USA, pp.379-383, 2011.
    Shujian Huang, Kangxi Li, Xinyu Dai and Jiajun Chen, “Improvingword alignment by semi-supervised ensemble”, in Proceedingsof the Fourteenth Conference on Computational NaturalLanguage Learning, Association for Computational Linguistics,Uppsala, Sweden, pp.135-143, 2010.
    Koby Crammer, Ofer Dekel, Joseph Keshet et al., “Onlinepassive-aggressive algorithms”, Journal of Machine LearningResearch, Vol.7, pp.551-585, 2006.
    Franz Josef Och and Hermann Ney, “A systematic comparisonof various statistical alignment models”, Computational Linguistics,Vol.29, No.1, pp.19-51, 2003.
    Qin Gao and Stephan Vogel, “A multi-layer chinese word segmentationsystem optimized for out-of-domain tasks”, in Proceedingsof the CIPS-SIGHAN Joint Conference on ChineseLanguage Processing, Beijing, China, 2010.
    Franz Josef Och, “Minimum error rate training in statistical machinetranslation”, in Proceedings of the 41st Annual Meetingon Association for Computational Linguistics, Association forComputational Linguistics, Sapporo, Japan, pp.160-167, 2003.
    Percy Liang, Benjamin Taskar and Dan Klein, “Alignment byagreement”, in Proceedings of HLT-NAACL, The Associationfor Computational Linguistics, New York, USA, pp.104-111,2006.
    Philipp Koehn, Franz Josef Och and Daniel Marcu, “Statisticalphrase-based translation”, in Proceedings of HLT-NAACL,Edmonton, Canada, pp.48-54, 2003.
    David Chiang, “A hierarchical phrase-based model for statisticalmachine translation”, in Proceedings of the 43st AnnualMeeting on Association for Computational Linguistics, Associationfor Computational Linguistics, Ann Arbor, USA, pp.263-270, 2005.
    Andreas Stolcke, “Srilm-an extensible language modelingtoolkit”, in Proceedings of International Conference on SpokenLanguage Processing, Denver, USA, pp.901-904, 2002.
    Kishore Papineni, Salim Roukos, Todd Ward and Weijing Zhu,“BLEU: a method for automatic evaluation of machine translation”,in Proceedings of the 40th Annual Meeting on Associationfor Computational Linguistics, Association for ComputationalLinguistics, Philadelphia, USA, pp.311-318, 2002.
  • 加载中


    通讯作者: 陈斌,
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (443) PDF downloads(881) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint