Network Data Source | Various IDS alerts (NEMEA, Suricata, TippingPoint, others) |
Network Data Labeled | No |
Host Data Source | - |
Host Data Labeled | - |
Overall Setting | Enterprise IT |
OS Types | Undisclosed |
Number of Machines | > 100.000 |
Total Runtime | 7 days |
Year of Collection | 2019 |
Attack Categories | Unknown |
User Emulation | Real users |
Packed Size | 1 GB |
Unpacked Size | 7 GB |
Download Link | goto |
Overview
The IDEA dataset is a collection of anonymized and normalized alerts collected from three organizations using the alert sharing platform “SABU”. Collection happened over the period of one week, spanning a large number of intrusion detection systems and systems, including several honeypots. The IDEA format in which the alerts are made available is an extension of the widely used IDMEF (Intrusion Detection Message Exchange Format). Data is accompanied by a multitude of auxiliary data such as (geo)spatial data or statistics about the underlying domain.
The authors claim that the heterogeneity of involved systems can aid in obtaining a deeper understanding of current attack patterns and adversary behavior from multiple perspectives; similarly, it could be used for alert correlation, attack scenario reconstruction, and similar disciplines.
Environment
The three organizations which sent in alerts cover three different networks, though the first two partially overlap:
- The Czech National Research and Education Network (NREN)
- Deployed the most IDSs and honeypots
- Alerts mostly generated by NEMEA, other solutions include Suricata, TippingPoint, and others
- An unnamed large campus network
- Alerts generated by NEMEA and Flowmon ADS
- A Czech commercial ISP
- Alerts generated by NEMEA
Detailed information about the three environments is not made available. In total, alerts were collected from 34 intrusion detection systems, honeypots, and other data sources.
Activity
Alerts were collected for one week, from March 11 to March 17, 2019. No information regarding performed activity is available.
Contained Data
Alerts are available in the IDEA format, which is an extension of the the Intrusion Detection Message Exchange Format (IDMEF).
It adds a taxonomy of alerts (based on eCSIERT.net) and uses JSON instead of XML, though it can be converted back if need be.
Alerts can be associated with their source using the Name
field - for example, all alerts with cz.casablanca.nemea.∗
originate from the commercial ISP, with the *
standing for the responsible intrusion detection module.
Roughly 12 million alerts were collected in total from the aforementioned 34 sources, with identifiers like IP addresses, hostnames or URLs being anonymized using CryptoPAn. These identifiers are replaced with unique identifiers allowing for correlation across alerts and auxiliary data; not all IP addresses are affected by this. Alerts stemming from link-local or private address ranges, and those classified as “testing alerts” are removed, which affects ~1% of alerts. For a detailed description of this process, refer to the linked paper.
Alert data is accompanied by three sets of auxiliary data:
- Spatial data: Approximate geographical information related to IP addresses, associated ISPs, etc.
- Passive DNS data enrichment: Provides various statistics of domains related to IP addresses
- Other enrichment: Miscellaneous information such as if a VPN was involved or if an associated IP is part of some denylist
There is no mention of any labeling process.
Papers
Links
Data Examples
Alerts from dataset.idea
{"DetectTime":"2019-03-11T00:04:43.312146+02:00","Node":[{"AggrWin":"00:05:00","SW":["Dionaea"],"Type":["Connection","Protocol","Honeypot"],"Name":"cz.cesnet.hugo.haas_dionaea"}],"Target":[{"Anonymised":true,"Port":[445],"IP4":["192.0.0.0"],"Proto":["tcp"]}],"ConnCount":2,"Format":"IDEA0","Category":["Recon.Scanning"],"Source":[{"Port":[3508],"IP4":["22.198.228.92"],"Proto":["tcp"]}],"WinStartTime":"2019-03-11T00:00:01.222836+02:00","WinEndTime":"2019-03-11T00:05:01.222721+02:00","ID":"f62537c2-77b8-49c7-a0a2-24c4b81b20f8"}
{"DetectTime":"2019-03-11T00:04:30.399404+02:00","Node":[{"AggrWin":"00:05:00","SW":["Dionaea"],"Type":["Connection","Protocol","Honeypot"],"Name":"cz.cesnet.hugo.haas_dionaea"}],"Target":[{"Anonymised":true,"Port":[445],"IP4":["192.0.0.0"],"Proto":["tcp"]}],"ConnCount":2,"Format":"IDEA0","Category":["Recon.Scanning"],"Source":[{"Port":[3714],"IP4":["63.130.102.28"],"Proto":["tcp"]}],"WinStartTime":"2019-03-11T00:00:01.222836+02:00","WinEndTime":"2019-03-11T00:05:01.222721+02:00","ID":"7f3e0acf-6812-442c-a339-f069a5d83524"}
{"DetectTime":"2019-03-11T00:01:39.556554+02:00","Node":[{"AggrWin":"00:05:00","SW":["Dionaea"],"Type":["Connection","Protocol","Honeypot"],"Name":"cz.cesnet.hugo.haas_dionaea"}],"Target":[{"Anonymised":true,"Port":[445],"IP4":["192.0.0.0"],"Proto":["tcp"]}],"ConnCount":2,"Format":"IDEA0","Category":["Recon.Scanning"],"Source":[{"Port":[54637],"IP4":["14.205.135.200"],"Proto":["tcp"]}],"WinStartTime":"2019-03-11T00:00:01.222836+02:00","WinEndTime":"2019-03-11T00:05:01.222721+02:00","ID":"7b2a01b2-6d01-47d8-9bf1-4a8eedf41c52"}
{"DetectTime":"2019-03-11T00:04:23.744908+02:00","Node":[{"AggrWin":"00:05:00","SW":["Dionaea"],"Type":["Connection","Protocol","Honeypot"],"Name":"cz.cesnet.hugo.haas_dionaea"}],"Target":[{"Anonymised":true,"Port":[5060],"IP4":["192.0.0.0"],"Proto":["udp"]}],"ConnCount":8,"Format":"IDEA0","Category":["Recon.Scanning"],"Source":[{"Port":[49167],"IP4":["227.253.185.165"],"Proto":["udp"]}],"WinStartTime":"2019-03-11T00:00:01.222836+02:00","WinEndTime":"2019-03-11T00:05:01.222721+02:00","ID":"78957bda-6808-490d-b7cb-b5b3009a7231"}
{"DetectTime":"2019-03-11T00:09:58.763908+02:00","Node":[{"AggrWin":"00:05:00","SW":["Dionaea"],"Type":["Connection","Protocol","Honeypot"],"Name":"cz.cesnet.hugo.haas_dionaea"}],"Target":[{"Anonymised":true,"Port":[5060],"IP4":["192.0.0.0"],"Proto":["udp"]}],"ConnCount":8,"Format":"IDEA0","Category":["Recon.Scanning"],"Source":[{"Port":[65026],"IP4":["227.253.185.165"],"Proto":["udp"]}],"WinStartTime":"2019-03-11T00:05:01.409005+02:00","WinEndTime":"2019-03-11T00:10:01.408919+02:00","ID":"77e71118-4572-4d0a-984c-ef33991d547a"}
Auxiliary geolocation data from Aux_1A_Geolocation.csv
;ANIP;Lat;Lon;Country;Region;City;TZ;ASN;ISP
1;107.224.110.203;42.2245;-70.8911;US;Massachusetts;Hingham;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
2;107.224.1.174;42.3007;-71.4255;US;Massachusetts;Framingham;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
3;107.224.1.173;42.3007;-71.4255;US;Massachusetts;Framingham;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
4;107.224.1.172;42.3007;-71.4255;US;Massachusetts;Framingham;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
5;107.224.1.60;42.3007;-71.4255;US;Massachusetts;Framingham;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
6;107.224.1.153;42.3007;-71.4255;US;Massachusetts;Framingham;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
7;107.224.216.233;42.4188;-71.1557;US;Massachusetts;Arlington;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
8;107.224.194.187;41.905;-71.1026;US;Massachusetts;Taunton;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
9;107.224.75.176;42.3007;-71.4255;US;Massachusetts;Framingham;America/New_York;701;MCI Communications Services, Inc. d/b/a Verizon Business
10;107.155.231.253;NA;NA;NA;NA;NA;NA;NA;NA
Auxiliary passive DNS data from Aux_2_PassiveDNS.csv
ip,numrecords,avgdomainsinrecords,stddevdomainsinrecords,maxdomainsinrecords,mediandomainsinrecords,avglenrecords,stddevlenrecords,maxlenrecords,medianlenrecords,avgsimilarityrecords,stddevsimilarityrecords,maxsimilarityrecords,mediansimilarityrecords,avgentropyrecords,stddeventropyrecords,maxentropyrecords,medianentropyrecords,avgmaxconsecutivecharsrecords,stddevmaxconsecutivecharsrecords,maxmaxconsecutivecharsrecords,medianmaxconsecutivecharsrecords,avglenlowleveldomains,stddevlenlowleveldomains,maxlenlowleveldomains,medianlenlowleveldomains,avgsimilaritylowleveldomains,stddevsimilaritylowleveldomains,maxsimilaritylowleveldomains,mediansimilaritylowleveldomains,avgentropylowleveldomains,stddeventropylowleveldomains,maxentropylowleveldomains,medianentropylowleveldomains,avgmaxconsecutiecharslowleveldomains,stddevmaxconsecutiecharslowleveldomains,maxmaxconsecutiecharslowleveldomains,medianmaxconsecutiecharslowleveldomains
107.224.110.203,1,5.0,0.0,5,5.0,44.0,0.0,44,44.0,0,0,0,0,4.0897801980352435,0.0,4.0897801980352435,4.0897801980352435,3.0,0.0,3,3.0,20.0,0.0,20,20.0,0,0,0,0,3.0219280948873624,0.0,3.0219280948873624,3.0219280948873624,3.0,0.0,3,3.0,10.0,0.0,10,10.0,4.0,0.0,4,4.0,
107.224.1.174,1,5.0,0.0,5,5.0,39.0,0.0,39,39.0,0,0,0,0,4.099128821259931,0.0,4.099128821259931,4.099128821259931,2.0,0.0,2,2.0,15.0,0.0,15,15.0,0,0,0,0,2.606238928653389,0.0,2.606238928653389,2.606238928653389,2.0,0.0,2,2.0,7.0,0.0,7,7.0,4.0,0.0,4,4.0,
107.224.1.173,1,5.0,0.0,5,5.0,39.0,0.0,39,39.0,0,0,0,0,4.099128821259931,0.0,4.099128821259931,4.099128821259931,2.0,0.0,2,2.0,15.0,0.0,15,15.0,0,0,0,0,2.606238928653389,0.0,2.606238928653389,2.606238928653389,2.0,0.0,2,2.0,7.0,0.0,7,7.0,4.0,0.0,4,4.0,
107.224.1.172,1,5.0,0.0,5,5.0,39.0,0.0,39,39.0,0,0,0,0,4.099128821259931,0.0,4.099128821259931,4.099128821259931,2.0,0.0,2,2.0,15.0,0.0,15,15.0,0,0,0,0,2.606238928653389,0.0,2.606238928653389,2.606238928653389,2.0,0.0,2,2.0,7.0,0.0,7,7.0,4.0,0.0,4,4.0,
107.224.1.60,1,5.0,0.0,5,5.0,40.0,0.0,40,40.0,0,0,0,0,4.165311532225101,0.0,4.165311532225101,4.165311532225101,2.0,0.0,2,2.0,16.0,0.0,16,16.0,0,0,0,0,2.7806390622295662,0.0,2.7806390622295662,2.7806390622295662,2.0,0.0,2,2.0,8.0,0.0,8,8.0,4.0,0.0,4,4.0,
107.224.1.153,1,5.0,0.0,5,5.0,39.0,0.0,39,39.0,0,0,0,0,4.169766962341045,0.0,4.169766962341045,4.169766962341045,2.0,0.0,2,2.0,15.0,0.0,15,15.0,0,0,0,0,2.789898095464287,0.0,2.789898095464287,2.789898095464287,2.0,0.0,2,2.0,7.0,0.0,7,7.0,4.0,0.0,4,4.0,
107.224.194.187,1,5.0,0.0,5,5.0,40.0,0.0,40,40.0,0,0,0,0,4.184183719779188,0.0,4.184183719779188,4.184183719779188,2.0,0.0,2,2.0,16.0,0.0,16,16.0,0,0,0,0,2.827819531114783,0.0,2.827819531114783,2.827819531114783,2.0,0.0,2,2.0,8.0,0.0,8,8.0,4.0,0.0,4,4.0,
107.155.231.253,0,0,0,0,0,0,0,0,0,0,0,0,0,2.75,0.0,2.75,2.75,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
Other auxiliary data taken from Aux_3_Enrichment/enrichment/censys_20190311.json
{"tags":["http","https","imap","imaps","pop3","pop3s","smtp","ssh"],"protocols":["110/pop3","143/imap","22/ssh","25/smtp","443/https","80/http","993/imaps","995/pop3s"],"ip":"71.99.143.29","ports":["80","993","995","22","25","443","110","143"],"metadata":{"os":"Debian,Debian,Debian"}}
{"tags":["database","http","mysql","rdp","remote_display"],"protocols":["3306/mysql","3389/rdp","80/http"],"ip":"104.63.137.103","ports":["80","3306","3389"],"metadata":{"os":"Win64"}}
{"tags":["http","https"],"protocols":["443/https","80/http"],"ip":"74.248.18.141","ports":["80","443"],"metadata":{"os":"CentOS,CentOS"}}
{"tags":["database","ftp","http","mysql","rdp","remote_display"],"protocols":["21/ftp","3306/mysql","3389/rdp","80/http"],"ip":"61.219.26.92","ports":["80","21","3306","3389"],"metadata":{"os":"Win64"}}
{"tags":["embedded","http","https","rdp","remote_display"],"protocols":["3389/rdp","443/https","80/http"],"ip":"237.104.26.88","ports":["80","443","3389"],"metadata":{"device_type":"DSL/cable modem","os":"Win32,Win32","manufacturer":"Entrolink","description":"Entrolink"}}
{"tags":["http"],"protocols":["80/http"],"ip":"142.252.54.184","ports":["80"],"metadata":{"os":"Raspbian"}}
{"tags":["http","https","rdp","remote_display","ssh"],"protocols":["22/ssh","3389/rdp","443/https","80/http"],"ip":"123.172.134.150","ports":["80","22","443","3389"],"metadata":{"os":"Ubuntu,Windows"}}
{"tags":["http","https","imap","imaps","pop3","pop3s","smtp","ssh"],"protocols":["110/pop3","143/imap","22/ssh","25/smtp","443/https","465/smtp","80/http","993/imaps","995/pop3s"],"ip":"78.255.121.232","ports":["80","465","993","995","22","25","443","110","143"],"metadata":{"os":"Debian,Debian,Debian"}}
{"tags":["dns","embedded","ftp","http","rdp","remote_display","ssh"],"protocols":["21/ftp","22/ssh","3389/rdp","53/dns","80/http"],"ip":"43.129.221.251","ports":["80","21","53","22","3389"],"metadata":{"device_type":"network","os":"MikroTik RouterOS,Win32","manufacturer":"MikroTik","description":"MikroTik"}}
{"tags":["dns","ftp","http","imap","imaps","pop3","pop3s","smtp","ssh"],"protocols":["110/pop3","143/imap","21/ftp","22/ssh","25/smtp","443/https","465/smtp","53/dns","587/smtp","80/http","993/imaps","995/pop3s"],"ip":"95.113.20.15","ports":["80","465","993","995","21","53","22","25","443","587","110","143"],"metadata":{"os":"Unix"}}