AIT Alert Data Set

Network Data Source Suricata, Wazuh and AMiner alerts
Network Data Labeled Yes
Host Data Source Wazuh and AMiner alerts
Host Data Labeled Yes
   
Overall Setting Enterprise Network
OS Types Ubuntu 20.04
Number of Machines 9-27
Total Runtime 4-6 days per sim, 8 simulations total
Year of Collection 2023
Attack Categories Reconnaissance
Privilege Escalation
Data Exfiltration
Web-based Attacks
Remote Command Execution
Benign Activity Synthetic, models complex behavior
   
Packed Size 96 MB
Unpacked Size 2,9 GB
Download Link goto

Overview

The AIT Alert Data Set (AIT-ADS) is a collection of synthetic alerts designed for evaluating various alert-related tasks like aggregation, correlation, filtering, and generating attack graphs. These alerts come from the AIT Log Data Set V2 (AIT-LDSv2) and are sourced from three intrusion detection systems: Suricata, Wazuh, and AMiner. The dataset includes eight scenarios, each targeted by a multistep attack that involves actions such as web exploits, password cracking, and privilege escalation. Each attack scenario has variations, allowing the dataset to be used to assess the similarity and merging of attack chains.

Environment

For an overview of the original environment, as well as differences between scenarios, refer to the AIT Log Dataset entry (see Related Entries).

Activity

Attacks and benign behavior correspond to that from the AIT Log Dataset (see Related Entries):

  • Scans (nmap, WPScan, dirb)
  • Webshell upload (CVE-2020-24186)
  • Password Cracking (John the Ripper)
  • Privilege Escalation
  • Remote Command Execution
  • Data Exfiltration

Contained Data

The dataset contains alerts generated by applying two different frameworks, Wazuh and AMiner, onto the original AIT-LDSv2. Wazuh is rule-based, while AMiner is anomaly-based and requires training before it can be applied. Notably, this dataset indirectly also contains Suricata alerts, which are a part of AIT-LDSv2, but are picked up by Wazuh, which generates a separate new alert.

For each of the eight scenarios (fox, harrison, russellmitchell, santos, shaw, wardbeck, wheeler, wilson), two separate files exists containing Wazuh and AMiner alerts, e.g. fox_aminer.json and fox_wazuh.json. Labels are provided via a separate .csv file, which looks like this:

time,name,ip,host,short,time_label,event_label
[...]
1642996679,AMiner: New request method in Apache Access log.,10.143.2.4,intranet_server,A-Acc-Val1,wpscan,wpscan
[...]

The epoch timestamps (here, 1642996679) have to be matched against those found in the alerts, and then labeled with time_label and/or event_label. The latter label is more precise and should be preferred, as it is based on the labeling of the original logs, while the former simply uses the time intervals of attack executions to label alerts. Note that Wazuh alerts contain both an epoch and an ISO timestamp which differ from one another - you MUST use the ISO timestamp (and convert it to epoch). The epoch timestamps resemble the time a given alert was generated, which happened roughly a year after the creation of the original log dataset and won’t match against the timestamps found in the ground truth.

Papers

Data Examples

Example of a Wazuh alert, stored in .jsonl files.

{
  "predecoder": {
    "hostname": "hayes-mail",
    "program_name": "dovecot",
    "timestamp": "Jan 15 19:23:33"
  },
  "agent": {
    "ip": "10.229.2.25",
    "name": "wazuh-client",
    "id": "21"
  },
  "manager": {
    "name": "wazuh.manager"
  },
  "rule": {
    "mail": false,
    "level": 3,
    "pci_dss": ["10.2.5"],
    "hipaa": ["164.312.b"],
    "tsc": ["CC6.8", "CC7.2", "CC7.3"],
    "description": "Dovecot Authentication Success.",
    "groups": ["dovecot", "authentication_success"],
    "nist_800_53": ["AU.14", "AC.7"],
    "gdpr": ["IV_32.2"],
    "firedtimes": 104,
    "mitre": {
      "technique": ["Valid Accounts"],
      "id": ["T1078"],
      "tactic": ["Defense Evasion", "Persistence", "Privilege Escalation", "Initial Access"]
    },
    "id": "9701",
    "gpg13": ["7.1", "7.2"]
  },
  "decoder": {
    "parent": "dovecot",
    "name": "dovecot"
  },
  "full_log": "Jan 15 19:23:33 hayes-mail dovecot: imap-login: Login: user=<katy.martin>, method=PLAIN, rip=10.229.2.25, lip=10.229.2.25, mpid=21902, TLS, session=<AjDz2qPV6s4K5QIZ>",
  "input": {
    "type": "log"
  },
  "@timestamp": "2022-01-15T19:23:33.000000Z",
  "location": "/var/log/syslog",
  "id": "1687331600.16465482"
}

Example of an AMiner alert, stored in .jsonl files.

{
  "AnalysisComponent": {
    "AnalysisComponentIdentifier": 3,
    "AnalysisComponentType": "NewMatchPathDetector",
    "AnalysisComponentName": "AMiner: New event type.",
    "Message": "New path(es) detected",
    "PersistenceFileName": "nmpd",
    "TrainingMode": false,
    "AffectedLogAtomPaths": [
      "/model/service/horde/horde/imp/imp/auth_failed",
      "/model/service/horde/horde/imp/imp/auth_failed/bracket",
      "/model/service/horde/horde/imp/imp/auth_failed/type",
      "/model/service/horde/horde/imp/imp/auth_failed/auth_failed_str"
    ]
  },
  "LogData": {
    "RawLogData": [
      "Jan 17 13:56:16 hayes-mail HORDE: [imp] [listMailboxes] Authentication failed. [pid 6735 on line 730 of \"/usr/share/horde/imp/lib/Imap.php\"]"
    ],
    "Timestamps": [
      1642427776
    ],
    "DetectionTimestamp": [
      1642427776
    ],
    "LogLinesCount": 1,
    "LogResources": [
      "/var/log/syslog"
    ]
  },
  "AMiner": {
    "ID": "10.229.2.25"
  }
}

Time intervals provided as a .csv file.

scenario,attack,start,end
russellmitchell,network_scans,1642993260.0,1642996606.0
russellmitchell,service_scans,1642996606.0,1642996645.0
russellmitchell,dirb,1642996645.0,1642996668.0
russellmitchell,wpscan,1642996668.0,1642996699.0
russellmitchell,webshell,1642996699.0,1642996762.0
russellmitchell,cracking,1642996762.0,1642999016.0
russellmitchell,reverse_shell,1642999016.0,1642999059.0
russellmitchell,privilege_escalation,1642999059.0,1642999093.0
russellmitchell,service_stop,1643032238.0,1643032240.0
russellmitchell,dnsteal,1643032240.0,1643035840.0
fox,network_scans,1642507140.0,1642508220.0
fox,service_scans,1642508220.0,1642508267.0
fox,wpscan,1642508267.0,1642508310.0
[...]