CSV Download

Download here

This .csv file is automatically created by parsing all currently existing dataset entries. It can be used to sort and filter data in a spreadsheet program or generate statistics and plots. The following fields are present for each dataset (semicolon-delimited):

Field Name	Description
Name	Name of the dataset
Network Data	Does this dataset feature network-based data (Yes/No)
Host Data	Does this dataset feature host-based data (Yes/No)
Start Year	Year in which data collection started
End Year	Year in which data collection ended (usually the same as `Start Year`, but not always)
Setting	Setting of the underlying scenario (Single OS/Enterprise IT/Military IT/Subsystem/Miscellaneous/Undisclosed)
OS Type	OS families that were part of the underlying scenario (Windows/Linux/Unix/MacOS/Undisclosed)
Network Data Source	Source of network data (e.g., pcaps or NetFlows)
Network Data Labeled	If and how labels for network data are available
Host Data Source	Source of host data (e.g., Windows events or ssh auth logs)
Host Data Labeled	If and how labels for host data are available
Attack Categories	Types of attacks in the underlying scenario
Benign activity	How benign activity (aka “normal behavior”) was generated in the underlying scenario
Packed Size in MB	Size of the entire dataset when packed, in MB
Unpacked Size in MB	Size of the entire dataset when unpacked, in MB
Times Recently Cited	Number of times the underlying publication of the dataset was cited in the last five years, sourced from the S2 API

Note: Missing values are indicated by a single hyphen (-).