AI2 Science Questions Mercury

Data format

This download contains elementary level and middle school level questions in multiple choice format, both with and without associated diagrams. The questions come pre-split into Train, Development, and Test sets. They come in two formats, CSV and JSONL. The CSV files contain the full text of the question and its answer options in one cell. The JSONL files contain a split version of the question, where the question text has been separated from the answer options programatically.

JSONL Structure

Here is the structure of the dataset JSONL files for the Train, Development, and Test sets (please note that as the dataset is provided under license, the example shown is not present in the actual dataset.):

{"id":"89629","question":{"stem":"Which of the following groups of materials would most likely be used to build an electromagnet?","choices":[{"label":"A","text":"bare wire, plastic rod, battery"},{"label":"B","text":"bare wire, iron rod, light bulb"},{"label":"C","text":"insulated wire, iron rod, battery"},{"label":"D","text":"insulated wire, plastic rod, light bulb"}]},"answerKey":"C"}
  • id - a unique identifier for the question (our own numbering)
  • question
    • stem - the question text
    • choices - the answer choices
      • label - the answer label ("A", "B", "C", "D")
        • text - the text associated with the answer label
  • answerKey - the the correct answer option

CSV Structure

Comma-delimited (CSV) columns:

  • questionID - a unique identifier for the question (our own numbering)
  • originalQuestionID - the question number on the test
  • totalPossiblePoint - how many points the question is worth
  • AnswerKey - the correct answer option
  • isMultipleChoiceQuestion - 1 = multiple choice, 0 = other
  • includesDiagram - 1 = includes diagram, 0 = other
  • examName - the source of the exam
  • schoolGrade - grade level
  • year
  • question - the question itself
  • subject - Science
  • category - Test, Train, or Dev