NSL-KDD

Network Data Source Connection records
Network Data Labeled Yes
Host Data Source -
Host Data Labeled -
   
Overall Setting Military IT
OS Types Linux 2.0.27
SunOS 4.1.4
Sun Solaris 2.5.1
Windows NT
Number of Machines 1000’s
Total Runtime Nine weeks
Year of Collection 1998, updated 2009
Attack Categories DoS
Remote to Local
User to Root
Surveillance/Probing
Benign Activity Scripts for synthetic traffic generation, real humans for performing complex tasks
   
Packed Size 6 MB
Unpacked Size 19 MB
Download Link goto

Overview

An improvement of the original KDD’99 dataset, aiming to fix some of its statistical problems. KDD’99 is itself based on data captured in DARPAs ‘98 IDS evaluation program, which contains data collected from a total of nine weeks of synthetic network activity, including a small variety of attacks. Like the dataset it is based on, due to its age and other potential flaws, it should be used with reservations, if at all.

Environment

Refer to the underlying DARPA’98 Intrusion Detection Program.

Activity

Refer to the underlying DARPA’98 Intrusion Detection Program.

Contained Data

The original version - the KDD Cup 1999 dataset - contained a large number of redundant/duplicate records, which was problematic for two reasons:

  • In the training set, it caused classifiers to be biased towards those more frequent records
  • In the test set, it caused evaluation to be biased towards learners having better detection rates on artificially frequent records

Thus, this new version attempts to rectify these issues by removing duplicates. In addition to this, the authors define a “difficulty” for each event; by leveraging 21 classifiers (7 different models, each trained 3 times), an events difficulty was determined by the number of classifiers which did (not) label a given event correctly. This difficulty feature was then used to resample the dataset, such that the number of records from each difficulty group is inversely proportional to the number of records in the original dataset, which, the authors argue, facilitates evaluation. The difficulty is available as a new feature of each event (the last one).

Note that the original download source is now longer accessible, however, an unofficial copy is available via an individuals GitHub repository.

Papers

Data Examples

Labeled features, including the new difficulty feature, taken from KDDTrain+.txt

[...]
0,tcp,http,SF,281,1951,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,30,31,0.00,0.00,0.00,0.00,1.00,0.00,0.06,77,255,1.00,0.00,0.01,0.03,0.00,0.01,0.00,0.00,normal,21
0,tcp,bgp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,138,25,1.00,1.00,0.00,0.00,0.18,0.05,0.00,255,25,0.10,0.05,0.00,0.00,1.00,1.00,0.00,0.00,neptune,18
0,udp,domain_u,SF,42,42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,6,0.00,0.00,0.00,0.00,1.00,0.00,0.50,255,245,0.96,0.01,0.01,0.00,0.00,0.00,0.00,0.00,normal,21
0,tcp,ftp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300,7,1.00,1.00,0.00,0.00,0.02,0.06,0.00,255,7,0.03,0.07,0.00,0.00,1.00,1.00,0.00,0.00,neptune,18
0,tcp,time,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,199,9,1.00,1.00,0.00,0.00,0.05,0.07,0.00,255,9,0.04,0.07,0.00,0.00,1.00,1.00,0.00,0.00,neptune,18
0,tcp,http,SF,263,453,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,15,15,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,normal,21
36613,tcp,private,RSTR,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,1.00,1.00,1.00,0.00,0.00,255,2,0.01,0.50,1.00,0.00,0.00,0.00,1.00,1.00,portsweep,15
0,tcp,ftp,RSTO,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,1.00,1.00,1.00,0.00,0.00,212,43,0.20,0.02,0.00,0.00,0.00,0.00,0.06,0.12,normal,13
0,tcp,private,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,144,15,1.00,1.00,0.00,0.00,0.10,0.06,0.00,255,30,0.12,0.06,0.00,0.00,1.00,1.00,0.00,0.00,neptune,18
0,tcp,sql_net,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,104,4,1.00,1.00,0.00,0.00,0.04,0.07,0.00,255,18,0.07,0.07,0.00,0.00,1.00,1.00,0.00,0.00,neptune,19
0,tcp,http,SF,283,1980,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,18,18,0.00,0.00,0.00,0.00,1.00,0.00,0.00,18,255,1.00,0.00,0.06,0.03,0.00,0.00,0.00,0.00,normal,21
0,tcp,http,SF,161,5530,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,2,255,1.00,0.00,0.50,0.05,0.00,0.00,0.00,0.00,normal,21
0,tcp,http,SF,211,321,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,7,11,0.00,0.00,0.00,0.00,1.00,0.00,0.27,211,255,1.00,0.00,0.00,0.01,0.00,0.00,0.00,0.00,normal,21
0,tcp,http,SF,201,1701,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,7,7,0.00,0.00,0.00,0.00,1.00,0.00,0.00,32,255,1.00,0.00,0.03,0.04,0.00,0.00,0.00,0.00,normal,21
[...]