Network Data Source | - |
Network Data Labeled | - |
Host Data Source | Sequences of Syscall Numbers |
Host Data Labeled | Yes |
Overall Setting | Single OS |
OS Types | Windows 7 |
Number of Machines | 1 per run |
Total Runtime | ~30 minutes per run |
Year of Collection | 2018 |
Attack Categories | Over 10k different malware samples, idk |
Benign Activity | None |
Packed Size | 9,8 MB |
Unpacked Size | 558 MB |
Download Link | goto |
Overview
The Attack-Caused Windows System Calls Traces Dataset (AWSCTD) contains system call logs generated by executing malware on a small number of Windows hosts. It is intended to be used for training of anomaly-based HIDS solutions, and is significantly larger in scale (compared to previous datasets)regarding the number of different malware samples executed (12110). However, it does not include any kind of user emulation or similar things.
Environment
Six guest machines running Windows 7 were simulated in parallel using QEMU, each running “Dr. Memory”, “OSSEC” and “ WinDump” for data collection. It is not explicitly stated if and how these machines are connected, I assume they are used separately to each test individual samples.
Activity
The sequence for each of the 12110 malware samples is:
- Transfer file(s) to virtual disk of guest machine
- Power guest machine on
- Malware is started via bash script
- Wait for a predefined time (here, 30 minutes)
- Stop machine and collect syscall data
Contained Data
The final dataset then contains ~110 million system calls (both from malware and OS activity), each annotated with a couple of standard and additional features:
- ID
- fkMalwareFile (related to the malware sample this syscall originated from, if any)
- SystemCall
- Arguments
- RetArguments
- Return
- Success
- CallNumber
Notably, I couldn’t find any information regarding the actual labeling process, but I assume that known parameters like
paths, PIDs and whatnot were used for that.
However, the only download link I was able to find (GitHub link below)contains only .csv
files with sequences of what
I assume to be syscall numbers.
There is a webpage referencing numerical values for system calls under
Windows, which is also cited by the authors and matches the values found, but this is sadly not further detailed in the
paper.
These sequences are annotated with a category of malware (like Adware, Trojan, etc.), presumably the one this sequence
originated from.
Papers
Links
Data Examples
Raw system call taken from example in paper
NtQueryValueKey
arg 0: 0x35a (type=HANDLE, size=0x4)
arg 1: 26/28 "DescriptionID" (type=UNICODE_STRING*, size=0x4)
arg 2: 0x2 (type=int, size 0x4)
arg 3: 0x02x5daa0 (type=<struct>*, size=0x4)
arg 4: 0x90 (type=unsigned int, size=0x4)
arg 5: 0x02c5da7c (type=unsigned int*, size=0x4)
failed (error=0xc0000034 =>
arg 3: <NYI> (type=<struct>*, size=0x4)
arg 5: 0x02c5da7c => 0x0 (type=unsigned int*, size=0x4)
retval: 0xc0000034 (type=NTSTATUS, size=0x4)
Sequence of syscall numbers (?) taken from AllMalwarePlusClean/060_6.csv
[...]
15,18,18,17,17,17,13,13,13,17,17,27,10,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,AdWare
15,18,18,17,17,17,13,13,13,17,17,27,10,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,17,17,17,17,17,17,17,17,17,17,60,63,62,63,79,11,68,62,63,21,68,68,68,68,68,68,79,AdWare
15,17,34,17,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,17,13,18,13,18,13,18,17,17,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,AdWare
15,17,34,17,18,18,18,18,18,18,18,18,17,13,13,13,17,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,AdWare
18,21,51,17,17,17,34,68,68,99,162,52,11,164,91,52,11,68,68,34,15,18,18,17,13,13,13,17,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,Downloader
15,18,18,17,13,13,13,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,62,17,63,79,11,68,62,63,79,11,68,68,68,15,15,2,2,2,2,2,17,2,2,62,17,88,88,55,60,63,79,11,AdWare
17,17,17,17,18,91,88,17,55,60,63,21,68,68,68,68,68,68,79,11,11,68,68,15,2,2,17,34,17,18,91,18,91,88,55,60,63,21,68,68,68,68,68,68,68,68,79,11,11,68,62,88,88,55,60,63,21,68,68,68,AdWare
15,18,18,17,13,13,13,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,62,17,63,79,11,68,62,63,79,11,68,68,68,15,15,2,2,2,2,2,17,2,2,62,17,88,88,55,60,63,79,11,AdWare
17,56,81,85,17,17,88,17,56,87,85,87,87,87,87,6,11,55,6,11,34,11,88,55,60,63,79,11,11,68,27,88,88,55,88,55,60,63,21,68,68,68,68,68,68,79,11,11,68,62,63,79,11,68,68,68,62,63,79,11,Trojan
15,17,34,17,18,18,18,18,18,18,18,18,17,13,13,13,17,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,AdWare
98,98,17,17,99,61,11,27,70,70,70,27,10,11,27,10,11,78,61,61,61,11,100,88,88,88,88,27,10,11,18,17,17,17,18,91,62,63,79,11,68,68,15,2,2,2,2,17,2,18,91,101,27,10,11,56,60,63,11,21,WebToolbar
98,98,99,61,11,27,70,70,70,27,10,11,27,10,11,78,61,61,61,11,100,88,88,88,88,27,10,11,18,17,17,17,17,74,74,74,56,60,63,11,11,18,91,62,88,88,55,60,63,79,11,11,18,91,62,88,88,55,60,63,AdWare
17,56,81,85,17,17,88,17,56,87,85,87,87,87,87,6,11,55,6,11,34,11,88,55,60,63,21,68,68,68,68,68,68,68,68,79,11,11,68,27,88,88,55,88,55,60,63,21,68,68,68,68,68,68,79,11,11,68,62,63,Trojan
68,68,15,17,34,17,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,17,13,18,13,18,13,18,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,Clean
68,68,15,62,88,88,88,88,88,88,88,88,88,62,88,88,88,88,88,88,88,88,88,2,62,88,88,88,88,88,88,88,88,88,17,62,88,88,88,88,88,88,88,88,88,17,13,13,13,62,88,88,88,88,88,88,88,88,88,17,Clean
15,17,34,17,18,18,18,18,18,18,18,18,18,17,13,13,13,17,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,Clean
[...]