SciTail Dataset

The SciTail dataset is an entailment dataset created from multiple-choice science exams and web sentences. Each question and the correct answer choice are converted into an assertive statement to form the hypothesis. We use information retrieval to obtain relevant text from a large text corpus of web sentences, and use these sentences as a premise P. We crowdsource the annotation of such premise-hypothesis pair as supports (entails) or not (neutral), in order to create the SciTail dataset.The dataset contains 27,026 examples with 10,101 examples with entails label and 16,925 examples with neutral label.

A limitation of mainstream entailment datasets is that they have been constructed in isolation from any end task. Moreover, in several cases, either the hypothesis or the premise has been synthesized specifically for creating the entailment dataset. Both the premise and the hypothesis in SciTail were authored independently of each other and independent of the entailment task. As a result, linguistic variations in the dataset are not limited by the coverage of rules or the creativity of crowd-workers. Further, unfiltered web sentences, which are used to create the premises, tend to be highly diverse in various aspects (length, complexity, being well-formed for a parser, etc.), adding to the linguistic challenge. Refer to our paper on SciTail, A Textual Entailment Dataset from Science Question Answering for additional information.


Which of the following best explains how stems transport water to other parts of the plant?

  • (A) through a chemical called chlorophyll.
  • (B) by using photosynthesis.
  • (C) through a system of tubes.
  • (D) by converting water to food.
Hypothesis from question + answer candidate (C):

Stems transport water to other parts of the plant through a system of tubes.

Supporting Premise (entails):

Water and other materials necessary for biological activity in trees are transported throughout the stem and branches in thin, hollow tubes in the xylem, or wood tissue.

Non-supporting Premise (neutral):

Cut plant stems and insert stem into tubing while stem is submerged in a pan of water.

Examples, Leaderboard, and Download

For examples and leaderboard, click on the menu on the left-hand side. To be added to the leaderboard, please email scitail@allenai.org with a link to your published/arXiv paper on this dataset. To download the dataset, click the Download button in the upper-right.


If you find this data helpful in your work, please cite this paper:

Author = {Tushar Khot and Ashish Sabharwal and Peter Clark},
     Booktitle = {AAAI}
     Title = {{SciTail}: A Textual Entailment Dataset from Science Question Answering},
     Year = {2018}