QuaRel Dataset

Aristo • 2018
QuaRel is a crowdsourced dataset of 2771 multiple-choice story questions, including their logical forms.
License: CC BY

Supplementary material can be found here.

A related slide deck can be found here.

Code for the QuaSP, QuaSP+ and QuaSP+Zero models can be found in the AllenNLP library.

The dataset is split into train (1941), dev (278) and test (552).

Each line in a dataset file is a question specified as a json object, e.g., (with extra whitespace for readability):

{
    "id": "QuaRel_V1_Fr_0872",
    "question": "The lawnmower went faster over the moss, versus the tall grass because the moss has (A) more resistance (B) less resistance",
    "answer_index": 1,
    "logical_form_pretty": "qrel(speed, higher, world1) -> qrel(friction, higher, world1) ; qrel(friction, lower, world1)",
    "logical_forms": [
        "(infer (speed higher world1) (friction higher world1) (friction lower world1))",
        "(infer (speed higher world2) (friction higher world2) (friction lower world2))"
    ],
    "world_literals": {
        "world1": "moss",
        "world2": "tall grass"
    }
}

Explanation for each json field:

  • id: Unique id of the question. The subset of questions about friction (QuaRelF) can be identified by having _Fr_ in the id.
  • question: The raw text of the question, always in the form “… (A) … (B) …”.
  • answer_index: The index of the correct answer (0 for “A” and 1 for “B”)
  • logical_form_pretty: Logical form for the question in the format used in the paper
  • logical_forms: Logical forms in format used in QuaSP code, including version with world1/world2 interchanged
  • world_literals: Annotated substrings associated with the two “worlds” in the question (aligned to the first LF in logical_forms)