Alignment Scores

There's a big gap between this and my last post. I'll try and fill it in later. In the meantime, today I'm going to look at the alignment scores from GIZA++ and how they're affecting my translations.

Firstly, the scores seem very low (i.e. all raised to big negative powers of 10).
Q: What do these scores represent?
A: Not sure yet, but the only sentences that get high (>0.001) scores are short ones.

They're not probabilities in the sense that I thought they were. These numbers come from the Viterbi alignment process and are essentially some multiplication of the probabilities of each of the HMM nodes. I think the reason they work is that they're fine to compare to the other alignment hypotheses but don't make sense when being compared to each other. Possibilites: multiply score by number of hypotheses, take nth root of score where n is number of words in sentence.