Whereas computers can reliably process text documents in terms of their factual content (e.g. for classifying them into topics, searching or extracting information from them), they have difficulty interpreting opinions, emotions etc. However, people, companies and governments are interested in knowing what other people think. Sentiment analysis is a subdiscipline of natural language processing that is concerned with the automatic analysis of such subjective text. It typically relies on sentiment lexicons, e.g. a list of adjectives with their polarity: positive ('excellent'), negative ('ugly') or neutral ('yellow'). Such lexicons are expensive to build manually, and they don't do justice to the complexity of how sentiment can be expressed: the polarity of a word can change depending on the context or the domain, and factual statements can also entail a sentiment (e.g. 'prices are expected to rise').
In this project, we investigate a method to derive sentiment lexicons automatically, from large amounts of web data. Such lexicons would have the advantage that they can be built cheaply, can be tailored to a specific domain (e.g. restaurant reviews), and can stay up to date. Furthermore, instead of storing the polarity of single words, we want to allow complex structures to be stored as well (e.g. 'price' + 'rise'), so that context can also be taken into account.