AWSCTD

Network Data Source -
Network Data Labeled -
Host Data Source Sequences of Syscall Numbers
Host Data Labeled Yes
   
Overall Setting Single OS
OS Types Windows 7
Number of Machines 1 per run
Total Runtime ~30 minutes per run
Year of Collection 2018
Attack Categories Over 10k different malware samples, idk
Benign Activity None
   
Packed Size 9,8 MB
Unpacked Size 558 MB
Download Link goto

Overview

The Attack-Caused Windows System Calls Traces Dataset (AWSCTD) contains system call logs generated by executing malware on a small number of Windows hosts. It is intended to be used for training of anomaly-based HIDS solutions, and is significantly larger in scale (compared to previous datasets)regarding the number of different malware samples executed (12110). However, it does not include any kind of user emulation or similar things.

Environment

Six guest machines running Windows 7 were simulated in parallel using QEMU, each running “Dr. Memory”, “OSSEC” and “ WinDump” for data collection. It is not explicitly stated if and how these machines are connected, I assume they are used separately to each test individual samples.

Activity

The sequence for each of the 12110 malware samples is:

  • Transfer file(s) to virtual disk of guest machine
  • Power guest machine on
  • Malware is started via bash script
  • Wait for a predefined time (here, 30 minutes)
  • Stop machine and collect syscall data

Contained Data

The final dataset then contains ~110 million system calls (both from malware and OS activity), each annotated with a couple of standard and additional features:

  • ID
  • fkMalwareFile (related to the malware sample this syscall originated from, if any)
  • SystemCall
  • Arguments
  • RetArguments
  • Return
  • Success
  • CallNumber

Notably, I couldn’t find any information regarding the actual labeling process, but I assume that known parameters like paths, PIDs and whatnot were used for that. However, the only download link I was able to find (GitHub link below)contains only .csv files with sequences of what I assume to be syscall numbers. There is a webpage referencing numerical values for system calls under Windows, which is also cited by the authors and matches the values found, but this is sadly not further detailed in the paper. These sequences are annotated with a category of malware (like Adware, Trojan, etc.), presumably the one this sequence originated from.

Papers

Data Examples

Raw system call taken from example in paper

NtQueryValueKey
    arg 0: 0x35a (type=HANDLE, size=0x4)
    arg 1: 26/28 "DescriptionID" (type=UNICODE_STRING*, size=0x4)
    arg 2: 0x2 (type=int, size 0x4)
    arg 3: 0x02x5daa0 (type=<struct>*, size=0x4)
    arg 4: 0x90 (type=unsigned int, size=0x4)
    arg 5: 0x02c5da7c (type=unsigned int*, size=0x4)
    failed (error=0xc0000034 =>
    arg 3: <NYI> (type=<struct>*, size=0x4)
    arg 5: 0x02c5da7c => 0x0 (type=unsigned int*, size=0x4)
    retval: 0xc0000034 (type=NTSTATUS, size=0x4)

Sequence of syscall numbers (?) taken from AllMalwarePlusClean/060_6.csv

[...]
15,18,18,17,17,17,13,13,13,17,17,27,10,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,AdWare
15,18,18,17,17,17,13,13,13,17,17,27,10,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,17,17,17,17,17,17,17,17,17,17,60,63,62,63,79,11,68,62,63,21,68,68,68,68,68,68,79,AdWare
15,17,34,17,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,17,13,18,13,18,13,18,17,17,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,AdWare
15,17,34,17,18,18,18,18,18,18,18,18,17,13,13,13,17,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,AdWare
18,21,51,17,17,17,34,68,68,99,162,52,11,164,91,52,11,68,68,34,15,18,18,17,13,13,13,17,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,Downloader
15,18,18,17,13,13,13,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,62,17,63,79,11,68,62,63,79,11,68,68,68,15,15,2,2,2,2,2,17,2,2,62,17,88,88,55,60,63,79,11,AdWare
17,17,17,17,18,91,88,17,55,60,63,21,68,68,68,68,68,68,79,11,11,68,68,15,2,2,17,34,17,18,91,18,91,88,55,60,63,21,68,68,68,68,68,68,68,68,79,11,11,68,62,88,88,55,60,63,21,68,68,68,AdWare
15,18,18,17,13,13,13,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,62,17,63,79,11,68,62,63,79,11,68,68,68,15,15,2,2,2,2,2,17,2,2,62,17,88,88,55,60,63,79,11,AdWare
17,56,81,85,17,17,88,17,56,87,85,87,87,87,87,6,11,55,6,11,34,11,88,55,60,63,79,11,11,68,27,88,88,55,88,55,60,63,21,68,68,68,68,68,68,79,11,11,68,62,63,79,11,68,68,68,62,63,79,11,Trojan
15,17,34,17,18,18,18,18,18,18,18,18,17,13,13,13,17,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,AdWare
98,98,17,17,99,61,11,27,70,70,70,27,10,11,27,10,11,78,61,61,61,11,100,88,88,88,88,27,10,11,18,17,17,17,18,91,62,63,79,11,68,68,15,2,2,2,2,17,2,18,91,101,27,10,11,56,60,63,11,21,WebToolbar
98,98,99,61,11,27,70,70,70,27,10,11,27,10,11,78,61,61,61,11,100,88,88,88,88,27,10,11,18,17,17,17,17,74,74,74,56,60,63,11,11,18,91,62,88,88,55,60,63,79,11,11,18,91,62,88,88,55,60,63,AdWare
17,56,81,85,17,17,88,17,56,87,85,87,87,87,87,6,11,55,6,11,34,11,88,55,60,63,21,68,68,68,68,68,68,68,68,79,11,11,68,27,88,88,55,88,55,60,63,21,68,68,68,68,68,68,79,11,11,68,62,63,Trojan
68,68,15,17,34,17,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,17,13,18,13,18,13,18,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,Clean
68,68,15,62,88,88,88,88,88,88,88,88,88,62,88,88,88,88,88,88,88,88,88,2,62,88,88,88,88,88,88,88,88,88,17,62,88,88,88,88,88,88,88,88,88,17,13,13,13,62,88,88,88,88,88,88,88,88,88,17,Clean
15,17,34,17,18,18,18,18,18,18,18,18,18,17,13,13,13,17,17,18,18,18,18,18,18,18,18,18,18,18,21,21,21,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,Clean
[...]