Proudly built by:

AI2 Science Questions v2.1 (October 2017)


The AI2 Science Questions dataset consists of questions used in student assessments in the United States across elementary and middle school grade levels. Each question is 4-way multiple choice format and may or may not include a diagram element.

  • Elementary School Without Diagrams v2: 1,288 questions
  • Elementary School With Diagrams v2: 1,183 questions
  • Middle School Without Diagrams v2: 1,409 questions
  • Middle School With Diagrams v2: 1,179 questions

The question train/dev/test sets within these categories are named using the following acronyms for brevity:

  • NDMC: Non-Diagram Multiple Choice
  • DMC: Diagram Multiple Choice


Download the data here.


To evaluate your models, we have also built Aristo MINI, a light-weight question answering system that can quickly evaluate science questions with an evaluation web server and provided baseline solvers. You can extend the provided solvers with your own implementation to try out new approaches and compare results.

Release Notes

v2.1 was released in October of 2017. It corrects typos in two questions: one elementary NDMC Train question and one middle school NDMC Test question.

v2 was released in May of 2017. It expands the size of our openly available multiple choice science question set to 5,060 questions via the inclusion of many newly sourced, genuine exam questions from a variety of state assessments. This set also removes some erroneous questions and duplicate questions detected in v1 of the set. It can be downloaded here: AI2 Science Questions v2.1 (May 2017).

v1 of the dataset was released in February of 2016. It can be downloaded here: AI2 Science Questions v1 (February 2016).


If you have any other questions or feedback for us about this data, please contact us at