All Datasets


Overview of all datasets examined so far
Name Network/Host Data Year Times Recently Cited¹ TL;DR Setting OS Type Labeled?² Data Type/Source Packed Size Unpacked Size
AIT Alert Dataset Both 2023 2 Alerts generated from the AIT log dataset, including labels. Only caveat is the lack of Windows machines Enterprise IT Linux 🟩 Wazuh, Suricata and AMiner alerts 96 MB 2,9 GB
OTFR Security Datasets - LSASS Campaign Both 2023 - Very small simulation focusing on exploiting Windows’ LSASS.exe. Lacking documentation, no labels and no user behavior Single OS Windows 🟥 pcaps, Windows events, Zeek logs 423 MB 1 GB
AIT Log Dataset Both 2022 16 Huge variety of labeled logs collected from multiple simulation runs of an enterprise network under attack. With user emulation. but only Linux machines Enterprise IT Linux 🟩 pcaps, Suricata alerts, misc. logs (Apache, auth, dns, vpn, audit, suricata, syslog) 130 GB 206 GB
CLUE-LDS Host 2022 3 Database of real user behavior without known attacks, for evaluation of methods detecting shifts in user behavior Subsystem Undisclosed 🟥 Custom event logs 640 MB 14,9 GB
EVTX to MITRE ATT&CK Host 2022 - Small dataset providing various events corresponding to certain MITRE ATT&CK tactics/techniques Single OS Windows 🟩 Windows events <1 GB <1 GB
OD-IDS2022 Network 2022 8 30 days of traffic from two servers under attack. Large variety of attacks, but extremely lacking documentation and access has to be requested manually Enterprise IT Windows, Linux 🟩 NetFlows - -
OTFR Security Datasets - Atomic Both 2019-2022 - Various small datasets, each corresponding to a specific MITRE ATT&CK tactic/technique. Lacks user simulation / underlying scenario and does not provide explicit labels Single OS Windows, Linux, Cloud 🟨 pcaps, Windows events, auditd logs, AWS CloudTrail logs 125 MB -
PWNJUTSU Both 2022 8 Rich collection of complex attacks executed by various red team participants each acting in a small network, but not labeled Miscellaneous Windows, Linux 🟥 pcaps, Windows events, Sysmon, auditd, various logs (Apache, auth, dns, ssh, etc.) 82 GB -
UWF-ZeekData22 Network 2022 18 Traffic collected from a university’s wargaming course. Covers all MITRE ATT&CK tactics, though the overwhelming majority is simple recon and attacks are poorly documented Enterprise IT Windows, Linux 🟩 pcaps, Zeek logs - 209 GB
I-Sec-IDS Network 2021 0 Small collection of NetFlows containing trivial DoS and scan attacks targeting a single host, does not feature user behavior Single OS Windows 🟩 NetFlows 66 MB -
NF-UQ-NIDS Network 2021 179 Combination of four distinct network datasets using a newly proposed set of standardized features Miscellaneous Windows, Linux, MacOS 🟩 Custom NetFlows 2 GB 14,8 GB
OTFR Security Datasets - Log4Shell Both 2021 - Very small simulation focusing on the Log4j vulnerability. Lacking documentation, no explicit labels and no user behavior Single OS Linux 🟨 pcaps, Ubuntu events <1 MB 1 MB
OTFR Security Datasets - SimuLand Golden SAML Host 2021 - Barely a dataset, only contains very few traces for some specific events. At most usable to test specific Windows detection rules. Enterprise IT Windows 🟩 Windows Events - <1 MB
SOCBED Example Dataset Both 2021 17 Generated using the SOCBED framework, demonstrating reproducible dataset creation, though current attacks are on the basic side Enterprise IT Windows, Linux 🟥 Windows events, Linux events, packetbeat 78 MB 1,3 GB
Unraveled Both 2021 22 Large dataset with intricate labeling, though the focus seems to be on network flows. Mapping will be annoying. Enterprise IT Windows, Linux 🟩 pcaps, misc. logs (syslog, audit, auth, Snort) - 22 GB
DAPT 2020 Both 2020 45 Focuses on attacks mimicking those of an APT group, executed in a rather small environment Enterprise IT Undisclosed 🟩 NetFlows, misc. logs (DNS, syslog, auditd, apache, auth, various services) 460 MB -
OpTC Both 2020 - Huge amount of data and interesting attacks, but possibly hard to use due to uncommon event format and requiring semi-manual labeling Enterprise IT Windows 🟨 Custom event logs, Zeek events - 1 TB
OTFR Security Datasets - APT 29 Both 2020 - Replication of APT29 evaluation developed by MITRE. Well made and documented, but without labels or user behavior Enterprise IT Windows, Linux 🟥 pcaps, Windows events, Zeek events 126 MB 2 GB
SR-BH 2020 Network 2020 18 Multi-label dataset assigning a variety of MITRE CAPEC classifications to requests collected from a small honeypot Single OS Undisclosed 🟩 Custom Network Features - 436 MB
CICDDoS2019 Network 2019 624 Dataset focusing on various DDoS attacks, covering a broad range of categories. Includes benign behavior, but only for Pcaps, not NetFlows Enterprise IT Windows, Linux 🟩 Pcaps, NetFlows, Windows events, Ubuntu events 24,4 GB -
DARPA TC5 Host 2019 - Custom event logs from network under attack from APT groups, designed to facilitate provenance tracking Undisclosed Undisclosed 🟨 Custom event logs - -
IDEA Dataset Network 2019 - One week of anonymized IDS alerts collected from three large organizations, in a normalized format (an extension of IDMEF) Enterprise IT Undisclosed 🟥 NEMEA, Suricata, TippingPoint, and other alerts (normalized & anonymized) 1 GB 7 GB
LID-DS 2019 Host 2019 16 Contains system calls + associated data/metadata for a variety of Linux exploits, includes normal behavior Single OS Linux 🟨 Sequences of syscalls with extended information 13 GB -
OTFR Security Datasets - APT 3 Host 2019 - Replication of APT3 evaluation developed by MITRE. Lacking documentation, no labels and no user behavior Enterprise IT Windows, Linux 🟥 Windows events 30 MB 855 MB
ASNM Datasets Network 2009-2018 6 Specialized features extracted from instances of remote buffer overflow attacks for the purpose of anomaly-based detection Miscellaneous Windows, Linux 🟩 Custom NetFlows 21 MB 95 GB
AWSCTD Host 2018 19 Syscalls collected from ~10k malware samples running on Windows 7, no user emulation Single OS Windows 🟩 Sequences of syscall numbers 10 MB 558 MB
CSE-CIC-IDS2018 Both 2018 2601 Simulation of large enterprise IT (450 machines) with user emulation and various attacks, includes host and network logs, but only the latter are labeled Enterprise IT Windows, Linux, MacOS 🟩 pcaps, NetFlows, Windows events, Ubuntu events 220 GB -
DARPA TC3 Host 2018 - Custom event logs from network under attack, designed to facilitate provenance tracking Undisclosed Undisclosed 🟨 Custom event logs 115 GB -
NGIDS-DS Both 2018 2 Enterprise network undergoing variety of attacks using IXIA PerfectStorm hardware. Seems to lack host user behavior, does not provide raw host logs Enterprise IT Linux 🟩 pcaps, custom host features 941 MB 13,4 GB
Biblio-US17 Network 2017 0 Large number of web requests collected over 6.5 months from a production server, but heavily anonymized and only select features available Enterprise IT Undisclosed 🟩 HTTP requests (select features) 1,1 GB 6 GB
CIC DoS Network 2017 145 Dataset focusing on different DoS attacks targeting the application layer (instead of network layer), but no longer available Enterprise IT Linux 🟩 Network traffic (unknown format) - 4,6 GB
CIC-IDS2017 Network 2017 2601 Simulation of medium-sized company network under attack, focuses solely on network traffic Enterprise IT Windows, Linux 🟩 pcaps, NetFlows, custom network features 48,4 GB 50 GB
Unified Host and Network Data Set Both 2017 76 Selection of network and host events collected from operational environment, but without any attacks Enterprise IT Windows, Linux 🟥 NetFlows, Windows events - -
UGR’16 Network 2016 148 Network flows collected from real network over a long period of time, with some attack traffic injected Enterprise IT Undisclosed 🟩 NetFlows 236 GB -
AWID Network 2015 293 Traffic features collected from a home Wi-Fi network using WEP, targeted by an attacker exploiting various weaknesses of this security mechanism Home IT Windows, Linux, iOS 🟩 Custom network features 11,7 GB -
Comprehensive, Multi-Source Cyber-Security Events Both 2015 84 Various events from production network with red team activity, but extremely limited information per event Enterprise IT Windows, Linux 🟩 Custom event logs (auth, proc, network flows, dns, redteam) 12 GB -
Kyoto Honeypot Network 2006-2015 153 Collection of features derived from attack traffic targeting honeypots over the span of 9 years Miscellaneous Windows, Unix, MacOS 🟩 Custom network features 20 GB -
UNSW-NB15 Network 2015 1934 Custom network undergoing a variety of attacks using IXIA PerfectStorm hardware. Mostly geared towards anomaly-based NIDS Undisclosed Undisclosed 🟩 pcaps, custom network features >100 GB -
ADFA-WD Host 2014 43 Mostly intended for anomaly-based stuff leveraging library calls, explores interesting concept of stealthy shellcode Single OS Windows 🟨 Sequences of dll calls, Windows events (dll calls only) 403 MB 13,6 GB
ISCX Botnet 2014 Network 2004-2014 131 A combination of several network traffic datasets with the goal of creating a diverse and realistic botnet dataset Enterprise IT Undisclosed 🟩 pcaps 13,8 GB -
Skopik 2014 Host 2014 27 Focus on realistically emulating user behavior, does not include attacks Enterprise IT Linux 🟥 misc. logs (Apache, database, mail server, bug tracker app) - -
Twente 2014 Both 2014 25 Anonymized network flows and host logs from real network, but only those related to ssh authentication, focusing on detecting related brute force attacks Enterprise IT Undisclosed 🟩 NetFlows 2,42 GB 5,8 GB
User-Computer Associations in Time Host 2014 5 Large number of authentication events over a period of 9 months, but with very little detail and without any attacks Enterprise IT Undisclosed 🟥 Custom auth event logs 2,3 GB -
ADFA-LD Host 2013 159 Purely intended for anomaly-based approaches, provides only syscall numbers Single OS Linux 🟩 Sequences of syscall numbers 2 MB 17 MB
CIDD Network 2012 22 Spin on the DARPA’98 dataset, correlating user behavior over different systems/environments for behavior-based IDSs Military IT Unix 🟩 Sequences of user “audits” - 22 GB
ISCX IDS 2012 Network 2012 632 Focus on realistic traffic generation in a company network, combined with some basic attacks Enterprise IT Windows, Linux 🟩 pcaps 84 GB 87 GB
TUIDS Network 2012 60 Dataset focusing on DoS attacks, but very poorly documented Enterprise IT Undisclosed 🟩 pcaps, NetFlows - -
VAST Challenge 2012 Network 2012 10 Originated from a challenge about data analytics, focus an a large network being the victim of a botnet Enterprise IT Undisclosed 🟨 Snort alerts, firewall logs 186 MB 2,9 GB
CTU 13 Network 2011 462 Collection of various botnet behavior combined with loads of background traffic, but very limited feature space Enterprise IT Windows, Undisclosed 🟩 pcaps, NetFlows, Bro logs - 697 GB
VAST Challenge 2011 Both 2011 - Originated from a challenge about data analytics, focus on network but also contains host logs. Labeling is a bit lacking Enterprise IT Windows 🟨 pcaps, Windows events, misc. logs (firewall, Snort, Nessus) 940 MB 9,3 GB
ISOT Botnet Network 2004-2010 98 An amalgamation of several individual datasets, two containing malicious botnet traffic, and five datasets consisting of benign traffic Enterprise IT Undisclosed 🟩 pcaps 3 GB 10,6 GB
CDX CTF 2009 Both 2009 37 Dataset captured from a CTF event, generally intended to provide methods for reliable generating labeled datasets from such events Enterprise IT Windows, Linux 🟨 pcaps, Snort IDS alerts, Apache logs, Splunk logs 12 GB 15,3 GB
NSL-KDD Network 2009 2125 An improvement of the original KDD’99 dataset, but still outdated at its core Military IT Unix 🟩 Connection records 6 MB 19 MB
Twente 2009 Network 2009 53 Intricately labeled network flows + alerts collected from a single honeypot over the span of 6 days Single OS Linux 🟩 NetFlows 303 MB 1,9 GB
gureKDDCup Network 2008 14 An extension of the KDDCup 1999 dataset, adding additional information about payloads to each connection record Military IT Unix 🟩 Connection records with payload information 10 GB -
KDD Cup 1999 Network 1999 - Network connection events derived from simulated U.S. Air Force network under attack. No longer appropriate to use for multiple reasons Military IT Unix 🟩 Connection records 18 MB 743 MB
DARPA’98 Intrusion Detection Program Both 1998 149 Simulation of a small U.S. Air Force network under attack. No longer appropriate to use for a multiple reasons Military IT Unix 🟨 tcpdumps, host audit logs, file system dumps 5 GB -

Legend

¹ “Times Recently Cited” counts any time the underlying publication of a given dataset has been referenced by other publications in the last five years. This data is sourced from the Semantic Scholar API and automatically updated whenever the website is re-deployed. Some datasets are not backed by a publication and thus do not show a number here. Last updated: 2025-02-07 09:49:50 UTC

² Labeling:

  • 🟩: Direct; provides explicit labels on at least a portion of the contained data
  • 🟨: Indirect; provides some form of ground truth that allows for manual or automatic labeling (e.g., periods of attack)
  • 🟥: No labels; does not provide any form of explicit labels or information that would allow for their creation