Name | Network/Host Data | Year | Times Recently Cited¹ | TL;DR | Setting | OS Type | Labeled?² | Data Type/Source | Packed Size | Unpacked Size |
---|---|---|---|---|---|---|---|---|---|---|
AIT Alert Dataset | Both | 2023 | 2 | Alerts generated from the AIT log dataset, including labels. Only caveat is the lack of Windows machines | Enterprise IT | Linux | 🟩 | Wazuh, Suricata and AMiner alerts | 96 MB | 2,9 GB |
OTFR Security Datasets - LSASS Campaign | Both | 2023 | - | Very small simulation focusing on exploiting Windows’ LSASS.exe. Lacking documentation, no labels and no user behavior | Single OS | Windows | 🟥 | pcaps, Windows events, Zeek logs | 423 MB | 1 GB |
AIT Log Dataset | Both | 2022 | 16 | Huge variety of labeled logs collected from multiple simulation runs of an enterprise network under attack. With user emulation. but only Linux machines | Enterprise IT | Linux | 🟩 | pcaps, Suricata alerts, misc. logs (Apache, auth, dns, vpn, audit, suricata, syslog) | 130 GB | 206 GB |
CLUE-LDS | Host | 2022 | 3 | Database of real user behavior without known attacks, for evaluation of methods detecting shifts in user behavior | Subsystem | Undisclosed | 🟥 | Custom event logs | 640 MB | 14,9 GB |
EVTX to MITRE ATT&CK | Host | 2022 | - | Small dataset providing various events corresponding to certain MITRE ATT&CK tactics/techniques | Single OS | Windows | 🟩 | Windows events | <1 GB | <1 GB |
OD-IDS2022 | Network | 2022 | 8 | 30 days of traffic from two servers under attack. Large variety of attacks, but extremely lacking documentation and access has to be requested manually | Enterprise IT | Windows, Linux | 🟩 | NetFlows | - | - |
OTFR Security Datasets - Atomic | Both | 2019-2022 | - | Various small datasets, each corresponding to a specific MITRE ATT&CK tactic/technique. Lacks user simulation / underlying scenario and does not provide explicit labels | Single OS | Windows, Linux, Cloud | 🟨 | pcaps, Windows events, auditd logs, AWS CloudTrail logs | 125 MB | - |
PWNJUTSU | Both | 2022 | 8 | Rich collection of complex attacks executed by various red team participants each acting in a small network, but not labeled | Miscellaneous | Windows, Linux | 🟥 | pcaps, Windows events, Sysmon, auditd, various logs (Apache, auth, dns, ssh, etc.) | 82 GB | - |
UWF-ZeekData22 | Network | 2022 | 18 | Traffic collected from a university’s wargaming course. Covers all MITRE ATT&CK tactics, though the overwhelming majority is simple recon and attacks are poorly documented | Enterprise IT | Windows, Linux | 🟩 | pcaps, Zeek logs | - | 209 GB |
I-Sec-IDS | Network | 2021 | 0 | Small collection of NetFlows containing trivial DoS and scan attacks targeting a single host, does not feature user behavior | Single OS | Windows | 🟩 | NetFlows | 66 MB | - |
NF-UQ-NIDS | Network | 2021 | 179 | Combination of four distinct network datasets using a newly proposed set of standardized features | Miscellaneous | Windows, Linux, MacOS | 🟩 | Custom NetFlows | 2 GB | 14,8 GB |
OTFR Security Datasets - Log4Shell | Both | 2021 | - | Very small simulation focusing on the Log4j vulnerability. Lacking documentation, no explicit labels and no user behavior | Single OS | Linux | 🟨 | pcaps, Ubuntu events | <1 MB | 1 MB |
OTFR Security Datasets - SimuLand Golden SAML | Host | 2021 | - | Barely a dataset, only contains very few traces for some specific events. At most usable to test specific Windows detection rules. | Enterprise IT | Windows | 🟩 | Windows Events | - | <1 MB |
SOCBED Example Dataset | Both | 2021 | 17 | Generated using the SOCBED framework, demonstrating reproducible dataset creation, though current attacks are on the basic side | Enterprise IT | Windows, Linux | 🟥 | Windows events, Linux events, packetbeat | 78 MB | 1,3 GB |
Unraveled | Both | 2021 | 22 | Large dataset with intricate labeling, though the focus seems to be on network flows. Mapping will be annoying. | Enterprise IT | Windows, Linux | 🟩 | pcaps, misc. logs (syslog, audit, auth, Snort) | - | 22 GB |
DAPT 2020 | Both | 2020 | 45 | Focuses on attacks mimicking those of an APT group, executed in a rather small environment | Enterprise IT | Undisclosed | 🟩 | NetFlows, misc. logs (DNS, syslog, auditd, apache, auth, various services) | 460 MB | - |
OpTC | Both | 2020 | - | Huge amount of data and interesting attacks, but possibly hard to use due to uncommon event format and requiring semi-manual labeling | Enterprise IT | Windows | 🟨 | Custom event logs, Zeek events | - | 1 TB |
OTFR Security Datasets - APT 29 | Both | 2020 | - | Replication of APT29 evaluation developed by MITRE. Well made and documented, but without labels or user behavior | Enterprise IT | Windows, Linux | 🟥 | pcaps, Windows events, Zeek events | 126 MB | 2 GB |
SR-BH 2020 | Network | 2020 | 18 | Multi-label dataset assigning a variety of MITRE CAPEC classifications to requests collected from a small honeypot | Single OS | Undisclosed | 🟩 | Custom Network Features | - | 436 MB |
CICDDoS2019 | Network | 2019 | 624 | Dataset focusing on various DDoS attacks, covering a broad range of categories. Includes benign behavior, but only for Pcaps, not NetFlows | Enterprise IT | Windows, Linux | 🟩 | Pcaps, NetFlows, Windows events, Ubuntu events | 24,4 GB | - |
DARPA TC5 | Host | 2019 | - | Custom event logs from network under attack from APT groups, designed to facilitate provenance tracking | Undisclosed | Undisclosed | 🟨 | Custom event logs | - | - |
IDEA Dataset | Network | 2019 | - | One week of anonymized IDS alerts collected from three large organizations, in a normalized format (an extension of IDMEF) | Enterprise IT | Undisclosed | 🟥 | NEMEA, Suricata, TippingPoint, and other alerts (normalized & anonymized) | 1 GB | 7 GB |
LID-DS 2019 | Host | 2019 | 16 | Contains system calls + associated data/metadata for a variety of Linux exploits, includes normal behavior | Single OS | Linux | 🟨 | Sequences of syscalls with extended information | 13 GB | - |
OTFR Security Datasets - APT 3 | Host | 2019 | - | Replication of APT3 evaluation developed by MITRE. Lacking documentation, no labels and no user behavior | Enterprise IT | Windows, Linux | 🟥 | Windows events | 30 MB | 855 MB |
ASNM Datasets | Network | 2009-2018 | 6 | Specialized features extracted from instances of remote buffer overflow attacks for the purpose of anomaly-based detection | Miscellaneous | Windows, Linux | 🟩 | Custom NetFlows | 21 MB | 95 GB |
AWSCTD | Host | 2018 | 19 | Syscalls collected from ~10k malware samples running on Windows 7, no user emulation | Single OS | Windows | 🟩 | Sequences of syscall numbers | 10 MB | 558 MB |
CSE-CIC-IDS2018 | Both | 2018 | 2601 | Simulation of large enterprise IT (450 machines) with user emulation and various attacks, includes host and network logs, but only the latter are labeled | Enterprise IT | Windows, Linux, MacOS | 🟩 | pcaps, NetFlows, Windows events, Ubuntu events | 220 GB | - |
DARPA TC3 | Host | 2018 | - | Custom event logs from network under attack, designed to facilitate provenance tracking | Undisclosed | Undisclosed | 🟨 | Custom event logs | 115 GB | - |
NGIDS-DS | Both | 2018 | 2 | Enterprise network undergoing variety of attacks using IXIA PerfectStorm hardware. Seems to lack host user behavior, does not provide raw host logs | Enterprise IT | Linux | 🟩 | pcaps, custom host features | 941 MB | 13,4 GB |
Biblio-US17 | Network | 2017 | 0 | Large number of web requests collected over 6.5 months from a production server, but heavily anonymized and only select features available | Enterprise IT | Undisclosed | 🟩 | HTTP requests (select features) | 1,1 GB | 6 GB |
CIC DoS | Network | 2017 | 145 | Dataset focusing on different DoS attacks targeting the application layer (instead of network layer), but no longer available | Enterprise IT | Linux | 🟩 | Network traffic (unknown format) | - | 4,6 GB |
CIC-IDS2017 | Network | 2017 | 2601 | Simulation of medium-sized company network under attack, focuses solely on network traffic | Enterprise IT | Windows, Linux | 🟩 | pcaps, NetFlows, custom network features | 48,4 GB | 50 GB |
Unified Host and Network Data Set | Both | 2017 | 76 | Selection of network and host events collected from operational environment, but without any attacks | Enterprise IT | Windows, Linux | 🟥 | NetFlows, Windows events | - | - |
UGR’16 | Network | 2016 | 148 | Network flows collected from real network over a long period of time, with some attack traffic injected | Enterprise IT | Undisclosed | 🟩 | NetFlows | 236 GB | - |
AWID | Network | 2015 | 293 | Traffic features collected from a home Wi-Fi network using WEP, targeted by an attacker exploiting various weaknesses of this security mechanism | Home IT | Windows, Linux, iOS | 🟩 | Custom network features | 11,7 GB | - |
Comprehensive, Multi-Source Cyber-Security Events | Both | 2015 | 84 | Various events from production network with red team activity, but extremely limited information per event | Enterprise IT | Windows, Linux | 🟩 | Custom event logs (auth, proc, network flows, dns, redteam) | 12 GB | - |
Kyoto Honeypot | Network | 2006-2015 | 153 | Collection of features derived from attack traffic targeting honeypots over the span of 9 years | Miscellaneous | Windows, Unix, MacOS | 🟩 | Custom network features | 20 GB | - |
UNSW-NB15 | Network | 2015 | 1934 | Custom network undergoing a variety of attacks using IXIA PerfectStorm hardware. Mostly geared towards anomaly-based NIDS | Undisclosed | Undisclosed | 🟩 | pcaps, custom network features | >100 GB | - |
ADFA-WD | Host | 2014 | 43 | Mostly intended for anomaly-based stuff leveraging library calls, explores interesting concept of stealthy shellcode | Single OS | Windows | 🟨 | Sequences of dll calls, Windows events (dll calls only) | 403 MB | 13,6 GB |
ISCX Botnet 2014 | Network | 2004-2014 | 131 | A combination of several network traffic datasets with the goal of creating a diverse and realistic botnet dataset | Enterprise IT | Undisclosed | 🟩 | pcaps | 13,8 GB | - |
Skopik 2014 | Host | 2014 | 27 | Focus on realistically emulating user behavior, does not include attacks | Enterprise IT | Linux | 🟥 | misc. logs (Apache, database, mail server, bug tracker app) | - | - |
Twente 2014 | Both | 2014 | 25 | Anonymized network flows and host logs from real network, but only those related to ssh authentication, focusing on detecting related brute force attacks | Enterprise IT | Undisclosed | 🟩 | NetFlows | 2,42 GB | 5,8 GB |
User-Computer Associations in Time | Host | 2014 | 5 | Large number of authentication events over a period of 9 months, but with very little detail and without any attacks | Enterprise IT | Undisclosed | 🟥 | Custom auth event logs | 2,3 GB | - |
ADFA-LD | Host | 2013 | 159 | Purely intended for anomaly-based approaches, provides only syscall numbers | Single OS | Linux | 🟩 | Sequences of syscall numbers | 2 MB | 17 MB |
CIDD | Network | 2012 | 22 | Spin on the DARPA’98 dataset, correlating user behavior over different systems/environments for behavior-based IDSs | Military IT | Unix | 🟩 | Sequences of user “audits” | - | 22 GB |
ISCX IDS 2012 | Network | 2012 | 632 | Focus on realistic traffic generation in a company network, combined with some basic attacks | Enterprise IT | Windows, Linux | 🟩 | pcaps | 84 GB | 87 GB |
TUIDS | Network | 2012 | 60 | Dataset focusing on DoS attacks, but very poorly documented | Enterprise IT | Undisclosed | 🟩 | pcaps, NetFlows | - | - |
VAST Challenge 2012 | Network | 2012 | 10 | Originated from a challenge about data analytics, focus an a large network being the victim of a botnet | Enterprise IT | Undisclosed | 🟨 | Snort alerts, firewall logs | 186 MB | 2,9 GB |
CTU 13 | Network | 2011 | 462 | Collection of various botnet behavior combined with loads of background traffic, but very limited feature space | Enterprise IT | Windows, Undisclosed | 🟩 | pcaps, NetFlows, Bro logs | - | 697 GB |
VAST Challenge 2011 | Both | 2011 | - | Originated from a challenge about data analytics, focus on network but also contains host logs. Labeling is a bit lacking | Enterprise IT | Windows | 🟨 | pcaps, Windows events, misc. logs (firewall, Snort, Nessus) | 940 MB | 9,3 GB |
ISOT Botnet | Network | 2004-2010 | 98 | An amalgamation of several individual datasets, two containing malicious botnet traffic, and five datasets consisting of benign traffic | Enterprise IT | Undisclosed | 🟩 | pcaps | 3 GB | 10,6 GB |
CDX CTF 2009 | Both | 2009 | 37 | Dataset captured from a CTF event, generally intended to provide methods for reliable generating labeled datasets from such events | Enterprise IT | Windows, Linux | 🟨 | pcaps, Snort IDS alerts, Apache logs, Splunk logs | 12 GB | 15,3 GB |
NSL-KDD | Network | 2009 | 2125 | An improvement of the original KDD’99 dataset, but still outdated at its core | Military IT | Unix | 🟩 | Connection records | 6 MB | 19 MB |
Twente 2009 | Network | 2009 | 53 | Intricately labeled network flows + alerts collected from a single honeypot over the span of 6 days | Single OS | Linux | 🟩 | NetFlows | 303 MB | 1,9 GB |
gureKDDCup | Network | 2008 | 14 | An extension of the KDDCup 1999 dataset, adding additional information about payloads to each connection record | Military IT | Unix | 🟩 | Connection records with payload information | 10 GB | - |
KDD Cup 1999 | Network | 1999 | - | Network connection events derived from simulated U.S. Air Force network under attack. No longer appropriate to use for multiple reasons | Military IT | Unix | 🟩 | Connection records | 18 MB | 743 MB |
DARPA’98 Intrusion Detection Program | Both | 1998 | 149 | Simulation of a small U.S. Air Force network under attack. No longer appropriate to use for a multiple reasons | Military IT | Unix | 🟨 | tcpdumps, host audit logs, file system dumps | 5 GB | - |
Legend
¹ “Times Recently Cited” counts any time the underlying publication of a given dataset has been referenced by other publications in the last five years. This data is sourced from the Semantic Scholar API and automatically updated whenever the website is re-deployed. Some datasets are not backed by a publication and thus do not show a number here. Last updated: 2025-02-07 09:49:50 UTC
² Labeling:
- 🟩: Direct; provides explicit labels on at least a portion of the contained data
- 🟨: Indirect; provides some form of ground truth that allows for manual or automatic labeling (e.g., periods of attack)
- 🟥: No labels; does not provide any form of explicit labels or information that would allow for their creation