edit
Proudly built by:

# AI2 Reasoning Challenge (ARC)

Think you have solved question answering? Try the AI2 Reasoning Challenge (ARC)! The ARC dataset contains 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. The questions are partitioned as follows:

• Challenge Train: 1,119
• Challenge Dev: 299
• Challenge Test: 1,172
• Easy Train: 2,251
• Easy Dev: 570
• Easy Test: 2,376

Here are the scores (% correct) on the test partition of the ARC Question Sets. Can you do better? (Scoring note: If your model predicts a k-way tie that includes the correct answer for a given question, score 1/k points for that question).

Rank Model Challenge Set Easy Set
1 DGEM [Khot et al., 2018]
Allen Institute for AI
ARC score published in [Clark et al. 2018]
27.11 58.97
2 TableILP [Khashabi et al., 2016]
Allen Institute for AI
ARC score published in [Clark et al. 2018]
26.97 36.15
3 BiDAF [Seo et al., 2017], reimplemented and modified for multiple choice QA
Univ. of Washington & Allen Institute for AI
ARC score published in [Clark et al. 2018]
26.54 50.11
4 DGEM-OpenIE [Parikh et al., 2016], reimplemented and modified for multiple choice QA
ARC score published in [Clark et al. 2018]
26.41 57.45
5 Guess All ("random") 25.02 25.02
6 DecompAttn [Clark et al. 2018]
Allen Institute for AI
24.34 58.27
7 TupleInference [Khot et al., 2017]
Allen Institute for AI
ARC score published in [Clark et al. 2018]
23.83 60.81
8 Information Retrieval Solver [Clark et al. 2018]
Allen Institute for AI
20.26 62.55

``````@article{clark2018think,