efnilex-vect

The EFNILEX project has explored the use of machine translation tools for lexicon generation in "medium" European languages (less-resourced official languages of th EU). In 2012, dictionaries have been generated from parallel corpora. Since 2014, we have followed the method of Mikolov et al (2013) where dictionaries are generated from monolingual corpora plus a seed dictionary. Since 2015, we have targeted disambiguation in two ways: scoring translations inferred in pivot-based fashion from Wiktionary with wikt2dict, and translating from multi-sense embeddings.

Any kind of comment is welcome.

Download

Dictionary

Dictionary
- English to Hungarian
- Hungarian to English
Triangulated translations from Wiktionary scored
- German to Hungarian (LREC 2016)
- Hungarian to English
Disambiguated dictionary (experimental)
- Hungarian to English

Hungarian analogical questions

questions-words-hu.txt following Mikolov et al. (2013 a and b).

Word embedding, hunembed0.0

Hungarian Webcorpus

Hungarian National Corpus

Publications

from parallel corpora (Héja and Takács 2012)
Hungarian analogical questions (Makrai 2015a pdf, bib)
scoring pivot-based translations (triangles) and
experiments with mutli-prototype VSMs
- LREC 2016 abstract, presentation
- CogInfoCom15 abstract, presentation
- Hungarian Science Festival 2015 presentation in Hungarian

Acknowledgement

Tiedemann (2012)

OpenSubtitles