Network Data Source | - |
Network Data Labeled | - |
Host Data Source | Syslogs |
Host Data Labeled | No |
Overall Setting | Enterprise IT |
OS Types | Ubuntu 13.04 |
Number of Machines | 2 + N users |
Total Runtime | ~15 minutes |
Year of Collection | 2014 |
Attack Categories | n/a |
Benign Activity | Synthetic, following a complex model |
Packed Size | n/a |
Unpacked Size | n/a |
Download Link | n/a |
Overview
Not a dataset per se, but outlines concepts for generation of semisynthetic log data for evaluation of IDSs. Most importantly, it introduces a three-layer model which the data-generating architecture should follow: A virtual user layer (modeling interactions performed by users), an infrastructure layer (whatever systems are required for the scenario) and a logging layer (containing typical corporate logging infrastructure) - simplifying here, of course. Notably, this model does not include an attacker, the main shtick lies in their method of user emulation, which is based on studying real humans interacting with the scenario in question. They argue that this kind of data, when coupled with separate attack emulation, offers a good baseline for evaluation.
They implement a concrete instance of this model, with the chosen scenario/application being the Mantis Bug Tracker; although the actual dataset of this instance does not seem to be available.
Environment
All machines run Ubuntu 13.04, with the following setup
- Infrastructure layer consists of one virtual machine running the Mantis Bug Tracker, the underlying Apache Web Server, a backend database and several support services (mail server, firewall, etc.)
- Logging layer consists of one virtual machine collecting logs from everything running on the infrastructure layer using rsyslog and managed with GrayLog2.
- User layer consists of N virtual machines, each running a browser and Selenium to interact with the infrastructure layer.
Activity
As mentioned, each user’s behavior is modeled based on a real-life study - specifically, by analyzing logs from interactions with a Mantis instance.
Contained Data
The dataset obtained from the simulation is compared with the one generated by actual users, and the authors contend that the results are satisfactory. However, I was unable to find a download source.