project has explored the use of machine translation tools for lexicon
generation in "medium" European languages (less-resourced official
languages of th EU).
In 2012, dictionaries have been generated from
Since 2014, we have followed the method of Mikolov et al (2013)
where dictionaries are generated from monolingual corpora plus a
seed dictionary. Since 2015, we have targeted disambiguation in two ways:
scoring translations inferred in pivot-based fashion from
Wiktionary with wikt2dict,
and translating from multi-sense embeddings.
Any kind of comment is welcome.
- Triangulated translations from Wiktionary scored
- Disambiguated dictionary (experimental)
Hungarian analogical questions
questions-words-hu.txt following Mikolov et al. (2013
A word2vec word embedding trained on the concatenation of the
Hungarian National Corpus
in 600 dimensions with a cut-off of 10 words.
- from parallel corpora
(Héja and Takács 2012)
- Hungarian analogical questions (Makrai 2015a
- scoring pivot-based translations (triangles) and
experiments with mutli-prototype VSMs
We made great use of dictionaries computed by Tiedemann
(2012) from the OpenSubtitles corpus.
Links on embeddings