Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering
Minatak QA dataset contains 20K English question-answer pairs linked to Wikidata, and additional translated 160K questions in eight different languages. Mintaka contains only question and answer pairs, so that the authors used models that can be trained end-to- end. The table given below shows the evaluation results on 3 language models (rows # 1-3), 3 knowledge graph-based models (rows # 4-6), and 2 retriever-reader models(rows # 7 & 8). Train, dev, and test set can be found here.
Results of baseline models on Mintaka
Model | Year | hits@1 | Language | Reported By |
---|---|---|---|---|
XL-T5 (fine-tuned) | 2022 | 0.38 | English | Sen et al., 2022 |
DPR (trained) | 2022 | 0.31 | English | Sen et al., 2022 |
T5 | 2022 | 0.28 | English | Sen et al., 2022 |
T5 for CBQA (zero-shot) | 2022 | 0.20 | English | Sen et al., 2022 |
Rigel | 2022 | 0.20 | English | Sen et al., 2022 |
EmbedKGQA | 2022 | 0.18 | English | Sen et al., 2022 |
DPR (zero-shot) | 2022 | 0.15 | English | Sen et al., 2022 |
KVMemNet | 2022 | 0.12 | English | Sen et al., 2022 |
T5 for CBQA (translated) | 2022 | 0.31 | Multilingually | Sen et al., 2022 |
Rigel | 2022 | 0.19 | Multilingually | Sen et al., 2022 |
MT5 | 2022 | 0.16 | Multilingually | Sen et al., 2022 |
Reference
Sen, P., Aji, A. F., & Saffari, A. (2022). Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering. Proceedings of the 29th International Conference on Computational Linguistics, 1604–1619. https://aclanthology.org/2022.coling-1.138