| Publication Type | Conference Paper |
| Year of Publication | 2006 |
| Authors | Munteanu, D.S.; Marcu, D. |
| Conference Name | ACL '06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL |
| Pagination | 81--88 |
| Publisher | Association for Computational Linguistics |
| Conference Location | Morristown, NJ, USA |
| Abstract | We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processinginspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. |
| URL | Click Here |
| DOI | 10.3115/1220175.1220186 |
Extracting parallel sub-sentential fragments from non-parallel corpora
- Login to post comments