Datasets

We prepared a boilerplate primer to make it easier for you to jump into action! Feel free to use it as a starting point and tinker on it to get better results!

Train Dataset

The train dataset contains financial data points for 13,104 publicly traded companies based on their quarterly and annual financial reports. This dataset is compiled using the 5 latest quarterly reports and 4 latest annual reports, and reflects financial components extracted from their corresponding balance sheet and income statements.

Test Dataset

The test dataset contains 3227 companies with their corresponding features in the same format as the train dataset.

The ordering is as follows:

Columns starting with Q_( $n$ ) (where $n$ is the number of the quarter) contain the companies' quarterly reported financial components.

Columns starting with Y_( $n$ ) (where $n$ is the number of the annual report) contain the companies' annually reported financial components.

Other columns represent metadata per each company.

Columns starting with:

Q_0 are only present in targets_train.csv: contain financial components for the latest (closest to today) quarter
Q_1 are a part of X_train.csv and X_test.csv: contain financial components of the quarter which went before Q_0
Q_4 are a part of X_train.csv and X_test.csv: contain financial components of the furthest reported quarter, 4 quarters before Q_0
Y_0 are a part of X_train.csv and X_test.csv: contain financial components from the latest annual report
Y_3 are a part of X_train.csv and X_test.csv: contain financial components from the furthest annual report, 3 years before Y_0

Each quarter and each year (except for Q_0) contains 146 financial components, please refer to the data_dictionary.txt for details.

There are 16 targets (train_targets.csv) which represent the latest financial data points for each company. Participants need to train model(s) which will map the historical financial performance of the companies (X_train.csv) to their latest financial indicators.

Files

X_train.csv - training features
targets_train.csv - training targets
X_test.csv - testing features
sample_submission.csv - a sample submission file in the correct format
data_dictionary.txt - detailed data points description

Please refer to data_dictionary.txt for detailed columns description.

PreviousPrizes NextRules

Last updated 11 months ago