edit
Proudly built by:

AI2 Reasoning Challenge (ARC)

Think you have solved question answering? Try the AI2 Reasoning Challenge (ARC)! The ARC dataset contains 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. The questions are partitioned as follows:

  • Challenge Train: 1,119
  • Challenge Dev: 299
  • Challenge Test: 1,172
  • Easy Train: 2,251
  • Easy Dev: 570
  • Easy Test: 2,376

For more information about this dataset please read: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge.

To join in on the discussion about this dataset, please visit the ARC Discussion Group.

Leaderboard

Here are the scores (% correct) on the test partition of the ARC Question Sets. Can you do better? (Scoring note: If your model predicts a k-way tie that includes the correct answer for a given question, score 1/k points for that question).

Rank Model Challenge Set Easy Set
1 DGEM [Khot et al. 2018]
Allen Institute for AI
Entailment model, modified for QA [Clark et al. 2018]
27.11 58.97
2 TableILP [Khashabi et al. 2016]
UIUC & Allen Institute for AI
ARC score published in [Clark et al. 2018]
26.97 36.15
3 BiDAF [Seo et al. 2017]
Univ Washington & Allen Institute for AI
Span prediction QA model, modified for multiple-choice QA [Clark et al. 2018]
26.54 50.11
4 DGEM-OpenIE [Khot et al. 2018]
Allen Institute for AI
Entailment model, modified for QA [Clark et al. 2018]
26.41 57.45
5 Guess All ("random") 25.02 25.02
6 Decomposable Attention [Parikh et al. 2016]
Google
Entailment model, modified for QA [Clark et al. 2018]
24.34 58.27
7 TupleInference [Khot et al. 2017]
Allen Institute for AI
ARC score published in [Clark et al. 2018]
23.83 60.81
8 Information Retrieval Solver [Clark et al. 2016]
Allen Institute for AI
ARC score published in [Clark et al. 2018]
20.26 62.55

To be added to the ARC leaderboard, please email arc@allenai.org with a link to your published/ArXiv paper on this dataset.