IVESS: Intelligent Vocabulary and Example Selection for Spanish vocabulary learning

Start - End 
2019 - 2024 (ongoing)
Department of Translation, Interpreting and Communication



With vocabulary learning having become a key topic in the research domain of foreign language learning (see a.o. Schmitt, 2010), it has also received increasing attention in Intelligent Computer-Assisted Language Learning (ICALL) research (see a.o. Pilán, 2018). By means of Natural Language Processing (NLP)-driven methodologies (e.g. using Part-Of-Speech tagging, lemmatisation, dependency parsing and word sense disambiguation), ICALL studies seek to facilitate and/or automate the creation of language learning materials to be used in a CALL environment. End users of an ICALL system can be teachers preparing a class, students in a teacher-guided class, or even autonomous learners who are looking to take their data-driven learning (Johns, 1991) activities to a more advanced level. Specifically for vocabulary learning (ICAVL), the pending research issues can be subdivided into four challenges at two different levels: vocabulary retrieval and selection (at vocabulary item level), and example selection and simplification (at sentence or paragraph level). None of these issues can be considered as definitively solved (see a.o. Pilán, 2018; Saggion, 2017).

As ICAVL is situated at the interface of linguistic description, foreign language learning and computational linguistics, the ideal scenario would be the field of NLP providing methodologies (see examples above), frameworks (e.g. Universal Dependencies [UD]) and annotated data (e.g. UD treebanks) to facilitate the creation of didactically usable vocabulary learning materials that reflect the state-of-the-art knowledge in the domain of linguistic description. However, the current situation does not represent this ideal synergy between the three domains. The aim of this PhD project is to make a substantial contribution to turning the interplay between linguistic description, foreign language learning and computational linguistics into a virtuous circle (for a case in point, see Degraeuwe and Goethals [2020], in which a reannotation proposal was elaborated for Spanish reflexive pronouns, including se).

In concrete terms, fundamental research will be conducted on NLP-based methodologies for vocabulary retrieval, vocabulary selection, example selection and example simplification, applied to the particular case of Spanish as a foreign language. The choice for Spanish is not only motivated by its status as one of the most frequently chosen languages by foreign language learners (Instituto Cervantes, 2018), but also by the presence of some morphosyntactic features that raise specific challenges for NLP, such as the absence of lexical compounding (resulting in the existence of many multiword units instead). Moreover, to strengthen the link between NLP and language learning, the integration of the output of this research (i.e. automatically generated learning materials) into a real learning environment will be studied as well. More specifically, an assessment will be made of the attitudes of teachers and students towards engaging in ICAVL as end users (Afshari et al., 2013; Pérez-Paredes et al., 2018).


Afshari, M., Ghavifekr, S., Siraj, S., & Jing, D. (2013). Students’ attitutes towards computer-assisted language learning. Procedia Social and Behavorial Sciences, 103, 852-859.

Degraeuwe, J., & Goethals, P. (2020). Reflexive pronouns in Spanish Universal Dependencies. Procesamiento del Lenguaje Natural64(1), 77-84.

Instituto Cervantes. (2018). El español: Una lengua viva; Informe 2018. Centro Virtual Cervantes. https://cvc.cervantes.es/lengua/anuario/anuario_18/informes_ic/p02.htm

Johns, T. (1991). Should you be persuaded: two examples of data-driven learning. In T. Johns & P. King (Eds.), Classroom Concordancing (pp. 1-13). ELR.

Pérez-Paredes, P., Ordoñana Guillamón, C., & Aguado Jiménez, P. (2018). Language teachers' perceptions on the use of OER language processing technologies in MALL. Computer Assisted Language Learning, 31(5), 522-545.

Pilán, I. (2018). Automatic proficiency level prediction for Intelligent Computer-Assisted Language Learning (PhD thesis), University of Gothenburg.

Saggion, H. (2017). Automatic text simplification. In G. Hirst (Ed.), Synthesis Lectures on Human Language Technologies: Morgan & Claypool.

Schmitt, N. (2010). Key issues in teaching and learning vocabulary. En R. Chacón-Beltrán et al. (Eds.), Insights into non-native vocabulary teaching and learning (pp. 28-40). Bristol, Tonawanda & North York: Multilingual Matters.




Phd Student(s)