# GeoSQA

> A Benchmark for Scenario-based Question Answeringin the Geography Domain at High School Level

In order to facilitate the use of our dataset for different tasks, we provide two json files, `dataset_release.json` and `dataset_release_no_image.json`, one with binary image data and the other without.
The structure of the json file as follow:
```json
[
    {
        "free-form_annotation": "Annotation diagram without template",
        "templated_annotation": "Annotation diagram with template",
        "category": "The category of the diagram, using symbols | to split multiple categories",
        "optionA": "The content of option A",
        "optionB": "The content of option B",
        "optionC": "The content of option C",
        "optionD": "The content of option D",
        "scenario_diagram": "Binary data of the original diagram encoded by base64",
        "question": "The content of question",
        "scenario_id": "The id of scenario, ranging from 0 to 1980",
        "question_id": "The id of question, ranging from 0 to 4109",
    },
    {
        ...
    }
]
```
You can load the data with the following python code:
``` python
import pandas as pd
pd.read_json(open("dataset_processed.json", encoding="UTF8").read(), orient='records')
```

We also provide templates.json, which contains all the categories used for labeling and the corresponding templates for each category.
The structure of templates.json as follow:
``` json
[
    {
        "id": "The ID of first layer category",
        "label": "The label of first layer category",
        "children": [
            {
                "id": "The ID of second layer category",
                "label": "The ID of second layer category",
                "children": [
                    ...
                ],
                "templates": [
                    "The templates of second layer category",
                    "..."
                ]
            }
        ],
        "templates": [
            "The templates of first layer category",
            "..."
        ]
    },
    {
        ...
    }
]
```
As for the structure of template, we use the symbol $ to indicate the slot, and the content of bracket indicates the explanation of a slot.