# Datasets

{% hint style="info" %}
We prepared a [boilerplate primer](https://www.kaggle.com/code/danilz/datathon-1-boilerplate) to make it easier for you to jump into action! Feel free to use it as a starting point and tinker on it to get better results!
{% endhint %}

### Train Dataset

The train dataset contains financial data points for 13,104 publicly traded companies based on their quarterly and annual financial reports. This dataset is compiled using the 5 latest quarterly reports and 4 latest annual reports, and reflects financial components extracted from their corresponding balance sheet and income statements.

### Test Dataset

The test dataset contains 3227 companies with their corresponding features in the same format as the train dataset.

### The ordering is as follows:

Columns starting with Q\_($$n$$) (where $$n$$ is the number of the quarter) contain the companies' quarterly reported financial components.

Columns starting with Y\_($$n$$) (where $$n$$ is the number of the annual report) contain the companies' annually reported financial components.

Other columns represent metadata per each company.

Columns starting with:&#x20;

* `Q_0` are only present in **targets\_train.csv**: contain financial components for the latest (closest to today) quarter
* `Q_1` are a part of **X\_train.csv** and **X\_test.csv**: contain financial components of the quarter which went before `Q_0`
* `Q_4` are a part of **X\_train.csv** and **X\_test.csv**: contain financial components of the furthest reported quarter, 4 quarters before `Q_0`
* `Y_0` are a part of **X\_train.csv** and **X\_test.csv**: contain financial components from the latest annual report
* `Y_3` are a part of **X\_train.csv** and **X\_test.csv**: contain financial components from the  furthest annual report, 3 years before `Y_0`

Each quarter and each year (except for `Q_0`) contains 146 financial components, please refer to the **data\_dictionary.txt** for details.

There are 16 targets (**train\_targets.csv**) which represent the latest financial data points for each company. Participants need to train model(s) which will map the historical financial performance of the companies (**X\_train.csv**) to their latest financial indicators.

### Files

* [**X\_train.csv**](https://www.dropbox.com/scl/fi/8rajpkxgb3fwo9eketss9/X_train.csv.zip?rlkey=rimqpqqsizn3qh2sziit7izji\&st=pgro20v5\&dl=0) - training features
* [**targets\_train.csv**](https://www.dropbox.com/scl/fi/i4lr3tq4mvybnw6lmcz6w/targets_train.csv.zip?rlkey=hd0jns628vv60c7rup2v26567\&st=y2yogzt8\&dl=0) - training targets
* [**X\_test.csv**](https://www.dropbox.com/scl/fi/vz4mtb37nsvre3gm02z3g/X_test.csv.zip?rlkey=fxbsjb34vakjugsnv1g2knj9l\&st=deup9f7l\&dl=0) - testing features
* [**sample\_submission.csv**](https://www.dropbox.com/scl/fi/hzdze2fk8v8k361jsobb4/sample_submission.csv.zip?rlkey=0466hwqkzjoq4q8tsdn9778j7\&st=m1x1ylnz\&dl=0) - a sample submission file in the correct format
* [**data\_dictionary.txt**](https://www.dropbox.com/scl/fi/82ukcy3q3qb4olnlayqv7/data_dictionary.txt?rlkey=g3aes0u1qm8yis30hrbrla8tv\&st=l2zop6vw\&dl=0) - detailed data points description

Please refer to **data\_dictionary.txt** for detailed columns description.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.synnax.ai/synnax/synnax-lab/data-science-competitions/synnax-datathon-1/datasets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
