Network Data Source | NetFlows |
Network Data Labeled | Yes |
Host Data Source | - |
Host Data Labeled | - |
Overall Setting | Tier 3 ISP network |
OS Types | n/a |
Number of Machines | 27 |
Total Runtime | ~130 days |
Year of Collection | 2016 |
Attack Categories | DoS Port Scanning Botnet |
Benign Activity | Real users |
Packed Size | 236 GB |
Unpacked Size | n/a |
Download Link | goto |
Overview
The UGR’16 dataset (name taken from the acronym of the University of Granada) is a network dataset collected from a real cloud network accessed by a variety of clients, with some additional victim machines added. It consists of anonymized network flows collected over a span of ~130 days, with attack traffic injected during the last 30 days. All traffic has been labeled, though the authors note that this is a complex task for “real” data, as the number and kind of attacks contained is unknown. Notable features of this dataset include the scope, which enables analysis of long-running patters, and the use of representative network traffic.
Environment
The environment is an unspecified ISP cloud production network, with two victim subnets and one attacker subnet added to facilitate red team activity.
Activity
The total collection period lasted 130 days, with attacks being performed during the last 30. Attack batches include:
- Low-rate DoS with some variations
- Port scanning
- Botnet traffic
Contained Data
Flows are collected using Netflow, divided into a “calibration set” containing 100 days of “normal” behavior (though
this does not imply complete absence of attacks), and a “test set” containing one month of normal behavior plus
execution of attack batches.
All IP addresses have been anonymized using nfanon
.
The authors discuss three different labels:
- Normal: Flows synthetically generated with normal patterns
- Attack: Flows the authors positively know to correspond to an attack
- Background: Flows where “no one really knows”
Since there are no synthetically generated normal patterns, collected flows received either the “attack” or “background” label. For injected attacks, signature detection was used to label flows originating from these attacks. For the remaining (real) flows, which are very likely to contain actual attacks, the authors leveraged anomaly detection to label unusual occurrences, such as increased requests to certain ports or email campaigns, though they do note that not necessarily all of these correspond to real attacks.
Netflow data is grouped into one file per week, available as nfcapd
(which contains all features) and
preprocessed csv
(containing only some selected features, see section 3.D in paper), and only the latter is labeled.
Additionally, there is a csv
file denoting the start and stop times of each attack within that week, and
separate csv
files containing only those flows corresponding to a specific attack.
Papers
Links
Data Examples
Snippet of all preprocessed features taken from july_week5_csv
[...]
2016-07-27 13:43:30,0.000,42.219.154.107,143.72.8.137,59212,53,UDP,.A....,0,0,1,72,background
2016-07-27 13:43:30,0.000,42.219.154.107,143.72.8.137,59372,53,UDP,.A....,0,0,1,55,background
2016-07-27 13:43:30,0.000,42.219.154.107,143.72.8.137,59576,53,UDP,.A....,0,0,1,67,background
2016-07-27 13:43:30,0.000,42.219.154.107,66.98.48.193,80,53367,TCP,.A....,0,72,1,52,background
2016-07-27 13:43:30,0.000,42.219.154.108,143.72.8.137,38817,53,UDP,.A....,0,0,1,76,background
2016-07-27 13:43:30,0.000,42.219.154.108,143.72.8.137,48279,53,UDP,.A....,0,0,1,76,background
2016-07-27 13:43:30,0.000,42.219.154.108,143.72.8.137,50098,53,UDP,.A....,0,0,1,74,background
2016-07-27 13:43:30,0.000,42.219.154.109,143.72.8.137,43109,53,UDP,.A....,0,0,1,75,background
2016-07-27 13:43:30,0.000,42.219.154.109,143.72.8.137,51872,53,UDP,.A....,0,0,1,69,background
2016-07-27 13:43:30,0.000,42.219.154.109,204.97.72.135,53,41040,UDP,.A....,0,0,1,176,background
2016-07-27 13:43:30,0.000,42.219.154.109,85.194.84.240,53,38814,UDP,.A....,0,0,1,160,background
[...]
Attack timestamps taken from attack_ts_july_week5.csv
,counter(mins),Dos,scan44,scan11,nerisbotnet,blacklist,anomaly-udpscan,anomaly-sshscan,anomaly-spam
2016-07-27 13:38:00,0,0,0,0,0,0,0,0,0
2016-07-27 13:39:00,1,0,0,0,0,0,0,0,0
2016-07-27 13:40:00,2,0,0,0,0,0,0,0,0
2016-07-27 13:41:00,3,0,0,0,0,0,0,0,0
2016-07-27 13:42:00,4,0,0,0,0,0,0,0,0
2016-07-27 13:43:00,5,0,0,0,0,1,0,0,0
2016-07-27 13:44:00,6,0,0,0,0,1,0,0,0
[...]
Preprocessed features corresponding to a DoS attack taken from dos_july_week5_csv
[...]
2016-07-28 13:14:17,0.000,42.219.150.242,42.219.152.20,6487,80,TCP,...RS.,0,0,2,200,dos
2016-07-28 13:14:17,0.000,42.219.152.20,42.219.150.242,80,6487,TCP,.A..S.,0,0,1,40,dos
2016-07-28 13:14:17,0.000,42.219.158.16,42.219.150.247,80,6649,TCP,.A..S.,0,0,1,40,dos
2016-07-28 13:14:17,0.000,42.219.150.241,42.219.154.69,6446,80,TCP,...RS.,0,0,2,200,dos
2016-07-28 13:14:17,0.000,42.219.154.69,42.219.150.241,80,6446,TCP,.A..S.,0,0,1,40,dos
2016-07-28 13:14:17,0.000,42.219.150.247,42.219.158.16,6650,80,TCP,...RS.,0,0,2,200,dos
2016-07-28 13:14:17,0.000,42.219.158.16,42.219.150.247,80,6650,TCP,.A..S.,0,0,1,40,dos
2016-07-28 13:14:17,0.004,42.219.150.242,42.219.152.20,6488,80,TCP,...RS.,0,0,2,200,dos
2016-07-28 13:14:17,0.000,42.219.152.20,42.219.150.242,80,6488,TCP,.A..S.,0,0,1,40,dos
[...]