Proudly built by:

QuaRTz Dataset (V1, August 2019)

The QuaRTz dataset V1 contains 3864 questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).

Download the dataset here.

See associated paper "QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions" (EMNLP 2019) for details.

The dataset is split into train (2696), dev (384) and test (784). A background sentence will only appear in a single split.

Each line in a dataset file is a question specified as a json object, e.g., (with extra whitespace for readability):

  "para":"A sunscreen with a higher SPF protects the skin longer.",
    "stem":"John was looking at sunscreen at the retail store. He noticed that sunscreens that had lower SPF would offer protection that is", 

Explanation for each json field:

  • id: Unique question id, ends with "-flip" if it's a "flipped" version of an original question
  • para_id: Unique background sentence id
  • para: The text of the associated background sentence (paragraph)
  • question: Contains the question stem and answer choices
  • answerKey: The label corresponding to the correct answer
  • para_anno: Annotations related to the background sentence:
    • cause_dir_sign: MORE or LESS indicating the direction of change for the "cause"
    • cause_prop: Surface form associated with the cause
    • cause_dir: Surface form associated with the change direction
    • effect_*: Same for cause -> effect for the effect property
  • question_anno: Annotations related to the question
    • more_cause_dir: Surface form (if any) associated with the direction of chance for the cause property, in the direction of "MORE"
    • less_cause_dir: Same, but for direction "LESS"
    • more_cause_prop: Same, but for associated property rather than direction
    • *_effect_*: Same for cause -> effect for the effect property