Discriminative Word Alignment over Multiple Word Segmentations
-
Abstract
Conventional bilingual word alignment is conducted on sentence pairs with single word segmentation for languages such as Chinese, viz. Single-segmentationbased word alignment (SSWA). However, SSWA may run the risk of losing optimal word segmentation granularities or causing data sparseness in word alignment. This paper proposes Multiple-segmentation-based word alignment (MSWA). In MSWA, diverse and complementary knowledge in multiple word segmentations can be employed to lower the above risks in word alignment. Given k word segmentations of a Chinese sentence, a skeleton segmentation is firstly constructed. The alignment between the skeleton segmentation and the parallel English sentence is loglinearly modeled, where various features defined over multiple word segmentations are incorporated. The Viterbi alignment, the alignment with the highest score, is mapped back to k word alignments based on k segmentations respectively. Experimentally, MSWA outperformed SSWA on all k segmentations in both alignment quality and translation performance.
-
-