Supervised Project
Generation of simplified texts |
Loria - Synalp
| Claire Gardent
Categorie
URL
Don't put volumes
Summary
NLP Task:
The task is to Summarise an article in a target language. This change the traditional approach from summarising -> translate or translate -> summarise to a cross-lingual summarisation
Training Data:
The experiment is based on the NCLS dataset. This dataset contains paired data between English and Chinese. There is English articles with summaries in Chinese and Chinese articles with English summaries.
Cross-lingual data has been generated using a Machine Translation Model.
For Pre-training Data, they used Wikipedia dumps, with 83 Million sentences for English and 20 Million sentences for Chinesse
Model Description:
The model has 6 layers and 8 heads. The input and output dimensions for all transformer blocks are 512 and the inner dimension is 2048. The vocabulary has a size of 33, 000 from a balanced mix of the monolingual Wikipedia corpus. The model has approximately 61M parameters.
Key Contributions:
They provide a pre-train model for English and Chinese that outperform traditional method.
Results:
Update