Added a Perl script to pre-process GIZA++ alignment files. It converts something like:
# Sentence pair (1) source length 23 target length 27 alignment score : 0.449695
it 's just down the hall . i 'll bring you some now . if there is anything else you need , just let me know .
NULL ({ }) 在 ({ 1 }) 门厅 ({ 2 3 4 6 13 }) 下面 ({ 5 }) 。 ({ 7 }) 我 ({ 8 }) 这 ({ }) 就 ({ 9 }) 给 ({ 10 }) ...
Into something more compact:
1 0.449695 1-1 2-2 3-2 4-2 6-2 13-2 5-3 7-4 8-5 9-7 10-8 11-9 12-11 14-12 15-13 20-14 19-15 16-16 17-16 ...
The script also generates an index file, which points to the beginning of the alignments for each sentence. For an n-best GIZA++ alignment file, the index just points to the first alignment (the rest can be found on the lines immediately subsequent).