The TExSIS project aims at the automatic extraction of mono‐ and multilingual company specific terminology on the basis of a company’s document streams. These term lists are crucial in every ...read more
During the last 20 years writing research has focused explicitly on the analysis of writing processes. More recently, logging programs (like Inputlog) enabled researchers to record process data (e.g. keystrokes ...read more
The SubTLe project investigates two different methodologies for sentiment mining of online text types (blogs, chats, tweets, etc.). We contrast the traditional lexicon-based approach with a corpus-based approach based on ...read more
The goal of the Stylene project is to implement a robust, modular system for stylometry and readability research using existing techniques for automatic text analysis and machine learning. In order ...read more
In a society that constantly communicates through writing and conversation, clarity of documents is of crucial importance. Technical, governmental, medical and other documents must be understandable. But how clear are ...read more
The STEVIN project SoNaR aims to build a 500-million word balanced reference corpus for contemporary (1954-present) written Dutch. Besides comprising no less than 38 text types, the corpus will also ...read more
Ambiguity remains one of the major problems for current Machine translation systems. The example sentence "Apple has doubled its profits in 2005" will get translated by Babelfish (Babelfish.altavista.com) as "De ...read more
The Dutch Parallel Corpus (DPC) is a 10-million-word, high-quality, sentence-aligned parallel corpus for the language pairs Dutch-English and Dutch-French, with Dutch as the central language. The corpus contains five different ...read more