Comparable Corpora and Computer-assisted Translation
von Estelle Maryline DelpechComputer-assisted translation (CAT) has always used translationmemories, which require the translator to have a corpus of previoustranslations that the CAT software can use to generate bilinguallexicons. This can be problematic when the translator does not havesuch a corpus, for instance, when the text belongs to an emergingfield. To solve this issue, CAT research has looked into theleveraging of comparable corpora, i. e. a set of texts, in two ormore languages, which deal with the same topic but are nottranslations of one another. This work had two primary objectives. The first is to assess theinput of lexicons extracted from comparable corpora in the contextof a specialized human translation task. The second objective is toidentify bilingual-lexicon-extraction methods which best match thetranslators' needs, determining the current limits of thesetechniques and suggesting improvements. The author focuses, inparticular, on the identification of fertile translations, themanagement of multiple morphological structures, and the ranking ofcandidate translations. The experiments are carried out on two language pairs(English-French and English-German) and on specializedtexts dealing with breast cancer. This research puts significantemphasis on applicability - methodological choices are guidedby the needs of the final users. This book is organized in twoparts: the first part presents the applicative and scientificcontext of the research, and the second part is given over toefforts to improve compositional translation.
The research work presented in this book received the PhD Thesisaward 2014 from the French association for natural languageprocessing (ATALA).
The research work presented in this book received the PhD Thesisaward 2014 from the French association for natural languageprocessing (ATALA).







