Extracting parallel sub-sentential fragments from non-parallel corpora

Publication TypeConference Paper
Year of Publication2006
AuthorsMunteanu, D.S.; Marcu, D.
Conference NameACL '06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL
Pagination81--88
PublisherAssociation for Computational Linguistics
Conference LocationMorristown, NJ, USA
AbstractWe present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processinginspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.
URLClick Here
DOI10.3115/1220175.1220186