Hypothesis Pruning in Learning Word Alignment
-
Graphical Abstract
-
Abstract
Recent study shows that discriminative learning methods could provide a significant improvement of word alignment quality. One of the difficulties of these methods is how to perform efficient search of word alignment. Although Inversion transduction grammar (ITG) provides a polynomial time algorithm using synchronous parsing techniques, a very harsh pruning is still needed to make the algorithm computationally feasible. We notice that previous pruning techniques mostly focus on pruning the bi-lingual spans??after what low quality alignments still exist. To address this problem, we propose an approach that prunes low quality hypotheses on-the-fly during parsing. Compared with previous pruning methods which only use high precision alignment links as constraints, our method could make use of “high recall” alignment links as well. To demonstrate our approach, we also propose a constrained learning framework, which generates high precision and high recall constraints from some existing alignment results. Experiment shows significant improvements of both alignment and translation quality over standard IBM Model 4 alignments on the Chinese-English test data.
-
-