Supervised Project
Generation of simplified texts |
Loria - Synalp
| Claire Gardent
Categorie
URL
Don't put volumes
Summary
NLG Task:
Previous work only studies English reply suggestion. So, this paper presents MRS, a Multilingual Reply suggestion dataset with ten languages.
The task is to build a generation and a retrieval model as baselines for MRS.
MRS is publicly available at https://github.com/zhangmozhi/mrs
Training Data:
The Data used to investigate reply suggestion is from a dataset dubbed MRS built from publicly available Reddit threads. Message reply pairs, response sets, and machine translated examples from ten languages are extracted For each language, we use 80% examples for training, 10% for validation, and 10% for testing. We then create response sets for retrieval models.
Model Description
Retrieval Model -> Selects the reply from a predetermined response set. Easier to train, run faster and a curated response set guarantees the coherence and the safety of the model output
Generation Model -> Produce replys from scratch and are more powerful because they are not constrained by the response set
Key Contributions:
This paper demonstrates that the generation model beats the retrieval model in nearly every language set, resulting in higher relevance scores and more diverse replies. However, However, unlike the retrieval model, the generation model fails to generalize across languages in the zero-shot setting, despite using Unicoder-XDAE for initialization. The generation model "forgets" its multilingual knowledge acquired during pre-training. This suggests that reply suggestion poses unique challenges for cross-lingual transfer learning,
Results:
Update