From orthography to semantics : large-scale unsupervised textual similarity detection in historical Greek