The TQA dataset encourages work on the task of Multi-Modal Machine Comprehension (M3C) task. The M3C task builds on the popular Visual Question Answering (VQA) and Machine Comprehension (MC) paradigms by framing question answering as a machine comprehension task, where the context needed to answer questions is provided and composed of both text and images. The dataset constructed to showcase this task has been built from a middle school science curriculum that pairs a given question to a limited span of knowledge needed to answer it.
This dataset was constructed in large part from material freely available as part of ck12.org’s open-source science curriculum. It is distributed under a Creative Commons Attribution – Non-Commerical 3.0 license.
Lessons in the dataset each have limited amount of multimodal material that contains the information needed to answer that lesson’s text and diagram questions. Textual material is broken down by subtopic within the lesson. Figures and instructional diagrams are linked from the topic in which they appear.
The dataset is split into a training, validation and test set at the lesson level. The training set consists of 666 lessons and 15,154 questions, the validation set consists of 200 lessons and 5,309 questions and the test set consists of 210 lessons and 5,797 questions. On occasions, multiple lessons have an overlap in the concepts they teach. Care has been taken to group these lessons before splitting the data, so as to minimize the concept overlap between splits.
Questions are separated by type in the dataset (nested under diagramQuestions and nonDiagramQuestions), and are also distinguished by global id prefixes indicating an associated diagram (DQ_ and NDQ_).
Images in the dataset are divided into four types:
question_images (diagrams referred to in diagram questions)
abc_question_images (question diagrams generated with letter labels)
teaching_images (diagrams which have detailed descriptions)
textbook_images (images from the instructional material)
Images are referenced in the dataset as relative paths to one of these directories.
Here is the structure of the dataset json files for the train, test, and validation sets:
[ ### at the top level is a list of "lessons" that are the atomic unit of the dataset
{ ### within a lesson are several fields, including questions and their related instructional material
"globalID": "global id string",
"lessonName": "name of lesson",
"questions": {
"nonDiagramQuestions": {
"question global ID": {
"answerChoices": {
"a": {
"idStructural": "a.",
"processedText": "cleaned answer option string",
"rawText": "unprocessed answer option given in question"
},
.
.
... additional answer options
},
"beingAsked": {
"processedText": "cleaned question string",
"rawText": "unprocessed question being asked"
},
"correctAnswer": {
"processedText": "cleaned correct answer string",
"rawText": "correct answer to question"
},
"globalID": "global id number",
"idStructural": "number identifier from source material",
"questionType": "question type"
}
.
.
.... additional questions
}
"diagramQuestions": {
The same fields as non-diagram questions, plus:
"imagePath": "relative local path to image file"
"imageName": "image name"
},
"instructionalDiagrams": {
"global_id": {
"global id": "global id number",
"imagePath": relative local path to image file",
"imageName": "image name",
"rawText": "description of diagram",
"processedText": "processed description of diagram"
},
...
},
"diagramAnnotations": {
"diagram_name": [
{
"text": "the ground truth content of the text box",
"rectangle": "the location of the text "
},
...
},
"topics": {
"Global Topic ID": {
"global ID": global id string,
"content": {
"figures": [
{
"caption": "figure caption text",
"imagePath}": "relative local path to image file"
}
],
"text": "Paragraph sized text explaining topic"
},
"orderID": "order of appearance within lesson",
"mediaLinks": {[
list of youtube/ other video links extracted from the text]
}
}
}
"adjunctTopics": {
"Vocabulary": {
dictionary of lesson vocabulary words and their definitions.
},
"section name": {
"content": {
"figures": [
{
"caption": "figure caption text",
"imagePath": "relative local path to image file"
}
],
"text": "Paragraph sized text explaining topic"
},
"orderID": "order of appearance within lesson",
"mediaLinks": {[
list of youtube/ other video links extracted from the text]
}
}
}
}
}
}
.
.
... repeated for additional lessons
]
Top-Level Lesson: This is the atomic unit of the dataset, and the level at which questions are paired to instructional material, e.g “History of Life on Earth”. The important components of a lesson are described below.
questions: questions that should be answerable given the contents of the lesson.
topics:
adjunctTopics:
diagramAnnotations
instructionalDiagrams
metaLessonID: an identifier that links lessons that are similar in their concepts and subject matter. This is to prevent similar material being placed in both test and train.