SU Jinsong, WANG Zhihao, WU Qingqiang, YAO Junfeng, LONG Fei, ZHANG Haiying. A Topic-Triggered Translation Model for Statistical Machine Translation[J]. Chinese Journal of Electronics, 2017, 26(1): 65-72. doi: 10.1049/cje.2016.10.007
Citation: SU Jinsong, WANG Zhihao, WU Qingqiang, YAO Junfeng, LONG Fei, ZHANG Haiying. A Topic-Triggered Translation Model for Statistical Machine Translation[J]. Chinese Journal of Electronics, 2017, 26(1): 65-72. doi: 10.1049/cje.2016.10.007

A Topic-Triggered Translation Model for Statistical Machine Translation

doi: 10.1049/cje.2016.10.007
Funds:  This work is supported by the National Natural Science Foundation of China (No.61303080, No.61303082, No.71173211), the Open Funding Project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No.BUAA-VR-14KF-01), and the Research Fund of the Provincial Key Laboratory for Computer Information Processing Technology in Soochow University (No.KJS1520).
More Information
  • Corresponding author: ZHANG Haiying (corresponding author) was born in 1974, she received the Ph.D. degree in Harbin Institute of Technology. She is now an associate professor of Software School in Xiamen University. Her research interests include machine learning and data mining. (Email:zhang2002@xmu.edu.cn)
  • Received Date: 2014-01-06
  • Rev Recd Date: 2015-07-11
  • Publish Date: 2017-01-10
  • Translation model containing translation rules with probabilities plays a crucial role in statistical machine translation. Conventional method estimates translation probabilities with only the consideration of co-occurrence frequencies of bilingual translation units, while ignoring document-level context information. In this paper, we extend the conventional translation model to a topic-triggered one. Specifically, we estimate topic-specific translation probabilities of translation rules by leveraging topical context information, and online score selected translation rules according to topic posterior distributions of translated sentences. As compared with the conventional model, our model allows for more fine-grained distinction among different translations. Experiment results on large data set demonstrate the effectiveness of our model.
  • loading
  • F. Joseph Och and H. Ney, "The alignment template approach to statistical machine translation", Computational Linguistics, Vol.30, No.4, pp.417-449, 2004.
    P. Koehn, F. Josef Och and D. Marcu, "Statistical phrase-based translation", Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACL-HLT 2003), Edmonton, Canada, pp.127-133, 2003.
    S. Hasan, et al., "Triplet lexicon models for statistical machine translation", Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Honolulu, Hawaii, USA, pp.372-381, 2008.
    S. Hasan, et al., "Comparison of extended lexicon models in search and rescoring for SMT", Proceedings of the 2009 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACL-HLT 2009), Boulder, Colorado, USA, pp.17-20, 2009.
    A. Mauser, S. Hasan and H. Ney, "Extending statistical machine translation with discriminative and trigger-based lexicon models", Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Suntec, Singapore, pp.210-218, 2009.
    Y.S. Chan, et al., "Word sense disambiguation improves statistical machine translation", Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp.33-40, 2006.
    M. Carpuat and D. Wu, "How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation", Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation, pp.53-52, 2007.
    M. Carpuat and D. Wu, "Improving statistical machine translation using word sense disambiguation", Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing, Prague, Czech Republic, pp.61-72, 2007.
    Z. He, Q. Liu and S. Lin, "Improving statistical machine translation using lexicalized rule selection", Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, pp.321-328, 2008.
    Q. Liu, Z. He, Y. Liu, et al., "Maximum entropy based rule selection model for syntax-based statistical machine translation", Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, USA, pp.89-97, 2008.
    T. Hofmann, "Probabilistic latent semantic indexing", Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, USA, pp.50-57, 1999.
    D.M. Blei, "Latent dirichlet allocation", Journal of Machine Learning, pp.993-1022, 2003.
    A. Gruber, et al., "Hidden topic Markov models", Journal of Machine Learning Research, Vol.3, pp.163-170, 2007.
    B. Zhao and E.P. Xing, "HM-BiTAM:Bilingual topic exploration, word alignment, and translation", Proceedings of the 21th Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp.1-8, 2007.
    Y.C. Tam, I.R. Lane and T. Schultz, "Bilingual LSAbased adaptation for statistical machine translation", Machine Translation, Vol.21, pp.187-207, 2007.
    X. Xiao, D. Xiong, M. Zhang, et al., "A topic similarity model for hierarchical phrase-based translation", Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Korea, pp.750-758, 2012.
    V. Eidelman, J.B. Graber and P. Resnik, "Topic models for dynamic translation model adaptation", Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Korea, pp.115-119, 2012.
    J. Su, et al., "Translation model adaptation for statistical machine translation with monolingual topic information", Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Korea, pp.459-468, 2012.
    H. Yu, J. Su, Y. Lv, et al., "A topic-triggered language model for statistical machine translation", Proceedings of the 6th International Joint Conference on Natural Language Processing, Nagoya, Japan, pp.447-454, 2013.
    S. Lu, et al., "Joint and coupled bilingual topic model based sentence representations for language model adaptation", Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Beijing, China, pp.2141-2147, 2013.
    Y. Hu, et al., "Polylingual tree-based topic models for translation domain adaptation", Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA, pp.1166-1176, 2014.
    F.J. Och and H. Ney, "Discriminative training and maximum entropy models for statistical machine translation", Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, USA, pp.295-302, 2002.
    D. Xiong, et al., "Maximum entropy based phrase reordering model for statistical machine translation", Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp.521-528, 2006.
    D. Wu, "Stochastic inversion transduction grammars and bilingual parsing of parallel corpora", Journal of Computational Linguistics, Vol.23, No.3, pp.377-404, 1997.
    D. Chiang, "Hierarchical phrase-based translation", Computational Linguistics, Vol.33, No.2, pp.201-228, 2007.
    F.J. Och, "Minimum error rate training in statistical machine translation", Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp.160-167, 2003.
    K. Papineni, S. Roukos, T. Ward, et al., "BLEU:A method for automatic evaluation of machine translation", Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, USA, pp.311-318, 2002.
    P. Koehn, "Statistical significance tests for machine translation evaluation", Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp.388-395, 2004.
    G. Foster and R. Kuhn, "Mixture model adaptation for SMT", Proceedings of the 2nd Workshop on Statistical Machine Translation, Prague, Czech Republic, pp.128-135, 2007.
    Y. Lv, J. Huang and Q. Liu, "Improving statistical machine translation performance by training data selection and optimization", Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing, Prague, Czech Republic, pp.343-350, 2007.
    S. Matsoukas, et al., "Discriminative corpus weight estimation for machine translation", Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Suntec, Singapore, pp.708-717, 2009.
    G. Foster, C. Goutte and R. Kuhn, "Discriminative instance weighting for domain adaptation in statistical machine translation", Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Massachusetts, USA, pp.451-459, 2010.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (169) PDF downloads(807) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return