Itihasa

Itihasa is a corpus of Sanskrit-English translation pairs extracted from Manmatha Nath Dutt's translations of The Ramayana and The Mahabharata. The original digitized volumes are available here. Occasionally, you might find syntactic errors in the shlokas or their translations. This is expected since OCR was used to extract text from the documents. If you want to help correct these errors, contact me. You can find more details about the dataset and its curation process in this paper.

Important Links: Start reading: If you find this work useful, please consider citing it as:
@inproceedings{aralikatte-etal-2021-itihasa,
    title = "Itihasa: A large-scale corpus for {S}anskrit to {E}nglish translation",
    author = "Aralikatte, Rahul  and
        de Lhoneux, Miryam  and
        Kunchukuttan, Anoop  and
        S{\o}gaard, Anders",
    booktitle = "Proceedings of the 8th Workshop on Asian Translation (WAT2021)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.wat-1.22",
    pages = "191--197"
}