ExPECT. Extraction of parallel elements from comparable texts

Start - End 
2013 - 2016 (stopped)
Department of Translation, Interpreting and Communication



The ExPECT project aims to extract bilingual terminology from comparable corpora. Both translators and interpreters are assumed to possess knowledge of a very broad range of topics, in different domains and in different languages. Moreover, each of those topics has its very own specific terminology, which translators and interpreters are also assumed to master. Efficient methods for terminology acquisition are therefore increasingly important.

 Even with the massive availability of text in numerous languages on the Internet, it is difficult to find documents suitable for automatic term extraction (ATE). Parallel texts are scarce, especially for very specific domains and even more so for Dutch. This lack of parallel corpora on the one hand, and the increasing availability of text on the web on the other hand have inspired researchers to explore the usability of comparable texts for ATE (Daille & Morin, 2005; Déjean et al., 2005; Delpech et al., 2012; Fung & Yee, 1998).

In the ExPECT project, we will study the extent to which the – raw and corrected – output from automatic term extraction (ATE) from comparable texts can be useful for translators and interpreters. Previously developed methods will be used to develop a system for corpus compilation and term extraction for Dutch, in combination with English, French and German.

Once the corpus has been developed for the four focus languages, monolingual terms will be extracted using the TExSIS tool. Subsequently, these terms will be linked to candidate translations by comparing their contexts (‘distributional hypothesis’).

Next to the traditional evaluation using precision and recall, the output of the tool will be assessed in terms of the gain in time and in quality of translations and interpreting assignments with/without terminological support.