TL;DR:

In this paper, we mainly introduce a large-scale headline-generation dataset for 14 Indic languages and English. With the new dataset, we try to answer important questions about indic and multilingual models/research. We first perform headline generation task with Varta, and our results indicate that the Indic headline generation task is challenging even for the state-of-the-art text generation models. Owing to its size, we show that Varta can be also used as a pretraining corpus for strong performing Indic NLU and NLG models. In short, we introduce a high-quality resource for Indic languages.

Poster:

ACL 2023 Spotlight:

Recorded Presentation: