The general objective of the project is to enlarge the interpreting corpus, which is being compiled at Ghent University and to prepare the data in such a way that they can be made available and searchable through a web interface. The project will serve as a kind of pilot to lay down procedures that can be applied on a far larger scale in the framework of more substantial research grants, in particular through the Hercules Programme.
The project includes several detailed objectives: (1) the conversion of existing audio files and corpus data to a new format called EXMARaLDA, developed at the University of Hamburg; (2) the collection of new data, both audio files to be downloaded from the European Parliament's website and transcriptions to be made by trained transcribers; (3) the alignment of source and target texts, of the speech signals of source and target text and, finally of source and target texts with their respective speech signals; (4) the annotation of the corpus for time and part of speech and its partial syntactic parsing; (5) the storage of the corpus data and the corpus management programme on a dedicated platform for the dissemination of corpus data.