Supervised Project
Generation of simplified texts |
Loria - Synalp
| Claire Gardent
Categorie
URL
Don't put volumes
Summary
NLG Task:
They base their approach on the mBART model, which is pre-trained for multilingual denoising. This allows them to use a simple, identical, end-to-end setup for both English and Russian.
Training Data:
WebNLG2020 dataset The data contains sets of RDF triples extracted from DBpedia accompanied with verbalizations which were crowdsourced from human annotators. git link : https://gitlab.com/shimorina/webnlg-dataset/-/tree/master/release_v3.0
Model Description:
They finetune the pre-trained mBART model (Liuet al., 2020) on the provided training data individually for each language (English and Russian). They then feed tokenized and trivially linearized input RDF triples into the model and train it to output ground-truth references
Key Contribution:
In automatic metrics, our solution placed in the top third of the field (out of 35 submissions) for English and first or second (out of 12 submissions) for Russian. In human evaluation, it scored in the best or second-best system cluster. We believe that our approach—with its excessive simplicity—can serve as a benchmark for a trade-off between the output quality and the setup complexity. Results:
Update