SoNaR. Stevin Nederlandstalig referentiecorpus

Begin - Einde

2008 - 2011 (afgewerkt)

Type

Research project

URL

http://tst-centrale.org/nl/producten/corpora/sonar-corpus/6-85?cf_product_name=S…

Vakgroep(en)

Vakgroep Vertalen, Tolken en Communicatie

Onderzoeksgroep(en)

LT3 - Language and Translation Technology Team

Onderzoeksgebied

Language technology

Tabgroup

Abstract

The STEVIN project SoNaR aims to build a 500-million word balanced reference corpus for contemporary (1954-present) written Dutch. Besides comprising no less than 38 text types, the corpus will also be balanced according to the number of speakers in Dutch-speaking regions, one-third of the texts coming from Flanders, and two-thirds from the Netherlands. Not only texts from the more conventional text types will be gathered such as newspapers, reports, etcetera, but also data coming from new media such as chat, SMS, internet fora and email. A very important aspect of the SoNaR project is that for all text material included, Intellectual Property Rights (IPR) are settled, so as to guarantee a widespread availability.

Onderzoekers

Promotor(en)

Veronique Hoste

Vakgroep Vertalen, tolken en communicatie

Onderzoeker(s)

Orphée De Clercq

Vakgroep Vertalen, tolken en communicatie