ProPara Dataset

ProPara aims to promote the research in natural language understanding in the context of procedural text. This requires identifying the actions described in the paragraph and tracking state changes happening to the entities involved. We treat the comprehension task as that of predicting, tracking, and answering questions about how entities change during the procedure. The dataset contains 488 paragraphs and 3,300 sentences. Each paragraph is richly annotated with the existence and locations of all the main entities (the "participants") at every time step (sentence) throughout the procedure (~81,000 annotations).

ProPara paragraphs are natural (authored by crowdsourcing) rather than synthetic (e.g,. in bAbI). Workers were given a prompt (e.g., "What happens during photosynthesis?") and then asked to author a series of sentences describing the sequence of events in the procedure. From these sentences, participant entities and their existence and locations were identified. The goal of the challenge is to predict the existence and location of each participant, based on sentences in the paragraph.


The main task is: given a paragraph and list of participants, predict the contents of the grid (i.e., the locations of all participants after all steps of the process). However, given that many participants are irrelevant to each sentence, we use a more targeted end task that is a deterministic computation over the grid, as described below. For each paragraph, answer the following 4 questions (we also provide sample answers for the example paragraph above):

  1. What are the Inputs? That is, which participants existed before the procedure began, and don't exist after the procedure ended? Or, what participants were consumed?
    Answer: The inputs are water, light, CO2.
  2. What are the Outputs? That is, which participants existed after the procedure ended, but didn't exist before the procedure began? Or, what participants were produced?
    Answer: The output is sugar.
  3. What are the Conversions? That is, which participants were converted to which other participants?
    Answer: Light, water and CO2 are converted into mixture at leaf in sentence 4. Mixture is converted into sugar at leaf in sentence 5.
  4. What are the Moves? That is, which participants moved from one location to another?
    Answer: Water moves from soil to roots in sentence 1. Water moves from roots to leaf in sentence 2, and so on.
More information can be found on the leaderboard and evaluator codebase webpages.

Dataset Download

The dataset is most conveniently stored and browsed as a Google Spreadsheet. To view or download the dataset, please click here.

ProPara Repository

The ProPara repository in Github can be accessed here.


Further details and experimental results are described in the following papers:

B. Dalvi Mishra, L. Huang, N. Tandon, W. Yih, P. Clark. Tracking State Changes in Procedural Text: A Challenge Dataset and Models for Process Paragraph Comprehension. In Proc. NAACL, 2018.

N. Tandon, B. Dalvi Mishra, J. Grus, W. Yih, A. Bosselut, P. Clark. Reasoning about Actions and State Changes by Injecting Commonsense Knowledge. In Proc. EMNLP, 2018.


The ProPara leaderboard is located here.


If you have questions, please do not hesitate to contact the authors at: {bhavanad,nikett,scottyih,peterc}