ISCX Botnet Dataset 2014

Network Data Source pcaps
Network Data Labeled Yes
Host Data Source -
Host Data Labeled -
Overall Setting Enterprise IT
OS Types Undisclosed
Number of Machines 2000+
Total Runtime n/a
Year of Collection 2004-2014
Attack Categories Botnets
User Emulation Real Users
Packed Size -
Unpacked Size 13,8 GB
Download Link goto


The ISCX Botnet Dataset originated from the need for a modern botnet dataset suitable for evaluation of botnet detection methods; specifically, how much individual NetFlow features contribute towards detection. To this end, three overarching shortcomings affecting (at the time) current datasets are laid out:

  • Generality (few botnets included)
  • Realism (environments are not realistic / long-lasting enough for a botnet to perform all malicious functionality)
  • Representativeness (benign behavior does not resemble that of actual networks) In order to alleviate these flaws, the authors combine traffic from three sources - ISOT Botnet, ISCX IDS 2012, and the Malware Capture Facility Project - with the goal of creating one novel dataset.


For details regarding the respective underlying datasets, refer to the related entries and links below.


The final dataset, divided into a training and test set, contains a total of 7 and 16 types of botnets, respectively (the latter containing more to to evaluate novelty detection). For the training dataset, these are (total of 43.92% malicious): | Botnet Name | Type | Total Flows | | —————– | —- | ————- | | Neris | IRC | 21159 (12%) | | Rbot | IRC | 39316 (22%) | | Virut | HTTP | 1638 (0.94%) | | NSIS | P2P | 4336 (2.48%) | | SMTP Spam | P2P | 11296 (6.48%) | | Zeus | P2P | 31 (0.01%) | | Zeus Control (C2) | P2P | 20 (0.01%) |

For the test dataset, these are (total of 44.97% malicious): | Botnet Name | Type | Total Flows | | —————– | —- | ————- | | Neris | IRC | 25967 (5.67%) | | Rbot | IRC | 83 (0.018%) | | Menti | IRC | 2878 (0.62%) | | Sogou | HTTP | 89 (0.019%) | | Murlo | IRC | 4881 (1.06%) | | Virut | HTTP | 58576 (12.8%) | | NSIS | P2P | 757 (0.165%) | | Zeus | P2P | 502 (0.109%) | | SMTP Spam | P2P | 21633 (4.72%) | | UDP Storm | P2P | 44062 (9.63%) | | Tbot | IRC | 1296 (0.283) | | Weasel | P2P | 42313 (9.25%) | | Zero Access | P2P | 1011 (0.221%) | | Smoke Bot | P2P | 78 (0.17%) | | Zeus Control (C2) | P2P | 31 (0.006%) | | ISCX IRC Bot | P2P | 1816 (0.387%) |

The remaining flows are all benign, though their exact nature is not detailed.

Contained Data

The three sources have been combined using the overlay methodology [2], whereby malicious traffic was assigned to existing IPs outside the main network using the Bit-Twist packet generator. This combination of malicious and benign traffic was then replayed via TCPReplay and captured by TCPdump, resulting in a single dataset which was then divided into a training and test set with the properties outlined above. Labels are available in the form of IPs associated with each botnet and can be found in the homepage linked below.
