DPC. Dutch parallel corpus

Begin - Einde 
2006 - 2009 (afgewerkt)
Vakgroep(en) 
Vakgroep Vertalen, Tolken en Communicatie

Tabgroup

Abstract

The Dutch Parallel Corpus (DPC) is a 10-million-word, high-quality, sentence-aligned parallel corpus for the language pairs Dutch-English and Dutch-French, with Dutch as the central language. The corpus contains five different text types and is balanced with respect to text type and translation direction. All texts included in the corpus have been cleared from copyright. The entire corpus has been aligned at sentence level and further enriched with linguistic information (lemmas and PoS-tags). A small subset of the Dutch-English part has also been manually aligned at the sub-sentential level. 

The corpus is released as full texts in XML format and is distributed by the Dutch Human Language Technology Agency (TST-centrale).

Onderzoekers

Promotor(en)

Onderzoeker(s)

Externe medewerkers

Maribel Montero Perez

Hans Paulussen

Piet Desmet

Publicaties