Supervised Project
Generation of simplified texts |
Loria - Synalp
| Claire Gardent
Categorie
URL
Don't put volumes
Summary
NLG Task:
This article describes a Natural Machine Translator for languages, using one encoder for the entry and language specific decoder
Training Data:
They use Monolingual news corpora from WMT 2014 for English, German and French and WMT 2013 for Spanish. This represents more than 1.3 billion tokens for the 4 languages present. They don’t use parallel data.
Model Description:
They use FastText to train the model with a Skip-gram model with vector dimension of 300. With that they get monolingual embedding. Then use the cross lingual embedding mapping code Muse to map every language to English. The embedding dimension of 300 and hidden dimension of 600, vocabulary size 50K
Key Contribution:
This outperforms the separately trained bilingual models for all translation directions.
Result:
Update