Download here
This .csv
file is automatically created by parsing all currently existing dataset entries.
It can be used to sort and filter data in a spreadsheet program or generate statistics and plots.
The following fields are present for each dataset (semicolon-delimited):
Field Name | Description |
---|---|
Name | Name of the dataset |
Network Data | Does this dataset feature network-based data (Yes/No) |
Host Data | Does this dataset feature host-based data (Yes/No) |
Start Year | Year in which data collection started |
End Year | Year in which data collection ended (usually the same as Start Year , but not always) |
Setting | Setting of the underlying scenario (Single OS/Enterprise IT/Military IT/Subsystem/Miscellaneous/Undisclosed) |
OS Type | OS families that were part of the underlying scenario (Windows/Linux/Unix/MacOS/Undisclosed) |
Network Data Source | Source of network data (e.g., pcaps or NetFlows) |
Network Data Labeled | If and how labels for network data are available |
Host Data Source | Source of host data (e.g., Windows events or ssh auth logs) |
Host Data Labeled | If and how labels for host data are available |
Attack Categories | Types of attacks in the underlying scenario |
Benign activity | How benign activity (aka “normal behavior”) was generated in the underlying scenario |
Packed Size in MB | Size of the entire dataset when packed, in MB |
Unpacked Size in MB | Size of the entire dataset when unpacked, in MB |
Times Recently Cited | Number of times the underlying publication of the dataset was cited in the last five years, sourced from the S2 API |
Note: Missing values are indicated by a single hyphen (-
).