A respeaking and collaborative game-based approach to building a parsed corpus of European Spanish dialects

Een 'respeaking' en collaboratief spelgebaseerde aanpak tot het bouwen van een geparsed corpus van de Europees Spaanse dialecten
Start - End 
2018 - 2022 (ongoing)
Department(s) 
Department of Translation, Interpreting and Communication
Department of Linguistics

Tabgroup

Abstract

The study of dialectal microvariation of Spanish spoken in Spain has until recently mainly focused on lexical and phonetic features. The morphosyntax of these dialects, on the contrary, remains largely unexplored, despite the recent surge in interest in dialect grammars. This is due to the lack of large annotated dialectal corpora. This project aims to fill this lacuna and will create the firstmorphosyntactically annotated and parsed corpus of the European Spanish dialects. This dialect corpus will be designed in a geographically balanced way and its material will proceed from the COSER corpus (Corpus Oral y Sonoro del EspaƱol Rural `Audible Corpus of Spoken Rural Spanish'), which is the largest collection of oral data in the Spanish-speaking world but which remains largely untranscribed. As transcribing and annotating are expensive and laborintensive, this project takes a respeaking and collaborativegame-based approach to building the parsed corpus of European Spanish dialects. In other words, we intend to obtain automatic transcriptions using a speech recognizer. These will then be processing using Natural Language Processing tools and can then be used to create a crowdsourced game through which members of the public contribute to the co-creation of the parsed corpus by providing annotations in the context of a game.

People

Supervisor(s)

Co-supervisor(s)