Proudly built by:

Textbook Question Answering


The TQA dataset encourages work on the task of Multi-Modal Machine Comprehension (M3C) task. The M3C task builds on the popular Visual Question Answering (VQA) and Machine Comprehension (MC) paradigms by framing question answering as a machine comprehension task, where the context needed to answer questions is provided and composed of both text and images. The dataset constructed to showcase this task has been built from a middle school science curriculum that pairs a given question to a limited span of knowledge needed to answer it.


The training, validation and test sets can be downloaded via the download button on the navigation bar.


This dataset was constructed in large part from material freely available as part of's open-source science curriculum. It is distributed under a Creative Commons Attribution – Non-Commerical 3.0 license.


If you find TQA helpful in your work, please cite:

title={Are You Smarter Than A Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension},
author={Aniruddha Kembhavi and Minjoon Seo and Dustin Schwenk and Jonghyun Choi and Ali Farhadi and Hannaneh Hajishirzi},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},