Network Data Source | Suricata, Wazuh and AMiner alerts |
Network Data Labeled | Yes |
Host Data Source | Wazuh and AMiner alerts |
Host Data Labeled | Yes |
Overall Setting | Enterprise Network |
OS Types | Ubuntu 20.04 |
Number of Machines | 9-27 |
Total Runtime | 4-6 days per sim, 8 simulations total |
Year of Collection | 2023 |
Attack Categories | Reconnaissance Privilege Escalation Data Exfiltration Web-based Attacks Remote Command Execution |
Benign Activity | Synthetic, models complex behavior |
Packed Size | 96 MB |
Unpacked Size | 2,9 GB |
Download Link | goto |
Overview
The AIT Alert Data Set (AIT-ADS) is a collection of synthetic alerts designed for evaluating various alert-related tasks like aggregation, correlation, filtering, and generating attack graphs. These alerts come from the AIT Log Data Set V2 (AIT-LDSv2) and are sourced from three intrusion detection systems: Suricata, Wazuh, and AMiner. The dataset includes eight scenarios, each targeted by a multistep attack that involves actions such as web exploits, password cracking, and privilege escalation. Each attack scenario has variations, allowing the dataset to be used to assess the similarity and merging of attack chains.
Environment
For an overview of the original environment, as well as differences between scenarios, refer to the AIT Log Dataset entry (see Related Entries).
Activity
Attacks and benign behavior correspond to that from the AIT Log Dataset (see Related Entries):
- Scans (nmap, WPScan, dirb)
- Webshell upload (CVE-2020-24186)
- Password Cracking (John the Ripper)
- Privilege Escalation
- Remote Command Execution
- Data Exfiltration
Contained Data
The dataset contains alerts generated by applying two different frameworks, Wazuh and AMiner, onto the original AIT-LDSv2. Wazuh is rule-based, while AMiner is anomaly-based and requires training before it can be applied. Notably, this dataset indirectly also contains Suricata alerts, which are a part of AIT-LDSv2, but are picked up by Wazuh, which generates a separate new alert.
For each of the eight scenarios (fox, harrison, russellmitchell, santos, shaw, wardbeck, wheeler, wilson), two separate
files exists containing Wazuh and AMiner alerts, e.g. fox_aminer.json
and fox_wazuh.json
.
Labels are provided via a separate .csv file, which looks like this:
time,name,ip,host,short,time_label,event_label
[...]
1642996679,AMiner: New request method in Apache Access log.,10.143.2.4,intranet_server,A-Acc-Val1,wpscan,wpscan
[...]
The epoch timestamps (here, 1642996679
) have to be matched against those found in the alerts, and then labeled with time_label
and/or event_label
.
The latter label is more precise and should be preferred, as it is based on the labeling of the original logs, while the former simply uses the time intervals of attack executions to label alerts.
Note that Wazuh alerts contain both an epoch and an ISO timestamp which differ from one another - you MUST use the ISO timestamp (and convert it to epoch).
The epoch timestamps resemble the time a given alert was generated, which happened roughly a year after the creation of the original log dataset and won’t match against the timestamps found in the ground truth.
Papers
- Introducing a New Alert Data Set for Multi-Step Attack Analysis (2023)
- Maintainable Log Datasets for Evaluation of Intrusion Detection Systems (2023)
Links
Related entries
Data Examples
Example of a Wazuh alert, stored in .jsonl
files.
{
"predecoder": {
"hostname": "hayes-mail",
"program_name": "dovecot",
"timestamp": "Jan 15 19:23:33"
},
"agent": {
"ip": "10.229.2.25",
"name": "wazuh-client",
"id": "21"
},
"manager": {
"name": "wazuh.manager"
},
"rule": {
"mail": false,
"level": 3,
"pci_dss": ["10.2.5"],
"hipaa": ["164.312.b"],
"tsc": ["CC6.8", "CC7.2", "CC7.3"],
"description": "Dovecot Authentication Success.",
"groups": ["dovecot", "authentication_success"],
"nist_800_53": ["AU.14", "AC.7"],
"gdpr": ["IV_32.2"],
"firedtimes": 104,
"mitre": {
"technique": ["Valid Accounts"],
"id": ["T1078"],
"tactic": ["Defense Evasion", "Persistence", "Privilege Escalation", "Initial Access"]
},
"id": "9701",
"gpg13": ["7.1", "7.2"]
},
"decoder": {
"parent": "dovecot",
"name": "dovecot"
},
"full_log": "Jan 15 19:23:33 hayes-mail dovecot: imap-login: Login: user=<katy.martin>, method=PLAIN, rip=10.229.2.25, lip=10.229.2.25, mpid=21902, TLS, session=<AjDz2qPV6s4K5QIZ>",
"input": {
"type": "log"
},
"@timestamp": "2022-01-15T19:23:33.000000Z",
"location": "/var/log/syslog",
"id": "1687331600.16465482"
}
Example of an AMiner alert, stored in .jsonl
files.
{
"AnalysisComponent": {
"AnalysisComponentIdentifier": 3,
"AnalysisComponentType": "NewMatchPathDetector",
"AnalysisComponentName": "AMiner: New event type.",
"Message": "New path(es) detected",
"PersistenceFileName": "nmpd",
"TrainingMode": false,
"AffectedLogAtomPaths": [
"/model/service/horde/horde/imp/imp/auth_failed",
"/model/service/horde/horde/imp/imp/auth_failed/bracket",
"/model/service/horde/horde/imp/imp/auth_failed/type",
"/model/service/horde/horde/imp/imp/auth_failed/auth_failed_str"
]
},
"LogData": {
"RawLogData": [
"Jan 17 13:56:16 hayes-mail HORDE: [imp] [listMailboxes] Authentication failed. [pid 6735 on line 730 of \"/usr/share/horde/imp/lib/Imap.php\"]"
],
"Timestamps": [
1642427776
],
"DetectionTimestamp": [
1642427776
],
"LogLinesCount": 1,
"LogResources": [
"/var/log/syslog"
]
},
"AMiner": {
"ID": "10.229.2.25"
}
}
Time intervals provided as a .csv
file.
scenario,attack,start,end
russellmitchell,network_scans,1642993260.0,1642996606.0
russellmitchell,service_scans,1642996606.0,1642996645.0
russellmitchell,dirb,1642996645.0,1642996668.0
russellmitchell,wpscan,1642996668.0,1642996699.0
russellmitchell,webshell,1642996699.0,1642996762.0
russellmitchell,cracking,1642996762.0,1642999016.0
russellmitchell,reverse_shell,1642999016.0,1642999059.0
russellmitchell,privilege_escalation,1642999059.0,1642999093.0
russellmitchell,service_stop,1643032238.0,1643032240.0
russellmitchell,dnsteal,1643032240.0,1643035840.0
fox,network_scans,1642507140.0,1642508220.0
fox,service_scans,1642508220.0,1642508267.0
fox,wpscan,1642508267.0,1642508310.0
[...]