Fine-tuning GPT models for lexicography

Start - End 
2024 - 2027 (ongoing)
Department(s) 
Department of Languages and Cultures
Department of Translation, Interpreting and Communication
Research Focus 
Sustainable development goal(s) 

Tabgroup

Abstract

Soon after the release of ChatGPT, the state of the art of generative AI in lexicography was surveyed (cf. de Schryver 2023). If one is to believe that survey, as well as many subsequent studies (esp. Lew and colleagues 2024), generative AI has now made lexicographers, as well as dictionaries themselves, redundant. However, these studies conveniently assume that because it works for English, it will work for any other language. It is time to reveal the truth. Pairing any other language with English only produces look-alikes: the lexicographic material appears to be sound, until one scratches the surface and realises that what was generated is ‘translated English’. When it comes to dictionaries for languages of limited diffusion, the use of existing models mostly produce gibberish. In this research project, various comparisons will be made between out-of-the-box, customisation and fine-tuned GPT models for lexicography, with a focus on monolingual dictionaries for undocumented Bantu languages.

People

Researcher(s)

Publications
SDG
To educate all of humanity, quality language technology takes over from humans.