The ExPECT project aims to extract bilingual terminology from comparable corpora. Both translators and interpreters are assumed to possess knowledge of a very broad range of topics, in different domains ...read more
Historiography begins with narrations of the ancient Greek historians Herodotos and Thucydides on the Persian and Peloponnesian Wars, and subsequent historians tend to share their preoccupation with great actors of ...read more
The main objective of this project is a comparative study of the traditional way of translating and translation on the basis of machine translations to translate general (non-technical) texts for ...read more
User generated content, e.g. available through social networking sites on the Web, offers a wealth of information. The aim of PARIS is to study adequate natural language, image and video ...read more
The TExSIS project aims at the automatic extraction of mono‐ and multilingual company specific terminology on the basis of a company’s document streams. These term lists are crucial in every ...read more
During the last 20 years writing research has focused explicitly on the analysis of writing processes. More recently, logging programs (like Inputlog) enabled researchers to record process data (e.g. keystrokes ...read more
The SubTLe project investigates two different methodologies for sentiment mining of online text types (blogs, chats, tweets, etc.). We contrast the traditional lexicon-based approach with a corpus-based approach based on ...read more
The goal of the Stylene project is to implement a robust, modular system for stylometry and readability research using existing techniques for automatic text analysis and machine learning. In order ...read more
In a society that constantly communicates through writing and conversation, clarity of documents is of crucial importance. Technical, governmental, medical and other documents must be understandable. But how clear are ...read more
The STEVIN project SoNaR aims to build a 500-million word balanced reference corpus for contemporary (1954-present) written Dutch. Besides comprising no less than 38 text types, the corpus will also ...read more