An important issue pertaining to the structure of the output has arisen while implementing the phrase translation output modules.
I've been implementing the sections required to produce tagged output in phrases, rather than just words. Previously, if we'd matched a phrase, each word matched would have its translation specified individually, rather than giving a single translation of the whole phrase. This required fairly significant rewrites, as the old code worked very specifically on the word level, and had no support for larger structures.
I have finished writing the parts required to produce the tagged output, but new issues are now arising. I'm currently trying to finish implementing the remainder of the code required to deal with multiple alignments. I had originally thought that this would be trivial, as I thought it would just be a case of giving multiple possibilities for each source phrase. This would mean no more tags, just multiple options within each tag. In actual fact, the different alignments could produce contiguous phrases at different points in the output, meaning that the number of tags required could vary for each alignment.
This brings us back to an issue we discussed briefly in the past, but didn't ever come up with a solution with which I'm completely comfortable. If the phrase we are translating produces two phrases in the output, how do we represent it? The solution we came up with before was to provide one of the target phrases as the translation of the source phrase, and provide the other target phrase as the translation of the empty string. For example:
I <ebmt english="ne">don't</ebmt><ebmt english="pas"></ebmt> eat
In this simple example, this might seem an acceptable solution. If, however, we have several phrase translations, each with one or more empt string translations hanging off it, Moses has no way of knowing that it should only use certaing combinations of tags. There is no way of linking the tags together, to let Moses know that they all come from the same origin and should all be used together or not at all.
Perhaps the only solution to this is to modify Moses.