objects.file module

class objects.file.FileObject(binary=None, file_name=None, file_path=None, scheduled_analysis=None)

Bases: object

FileObject is the primary data structure in FACT. It holds all meta information of a file along with analysis results and some internal values for scheduling.

Parameters:
  • binary (bytes | None) – The file in binary representation. Either this or file_path has to be present.

  • file_name (str | None) – The file’s name.

  • file_path (str | None) – The file’s path. Either this or binary has to be present.

  • scheduled_analysis (Optional[list[str]]) – A list of analysis plugins that should be run on this file.

add_included_file(file_object)

This functions adds a file to this object’s list of included files. The function also takes care of a number of fields for the child object:

  • parents: Adds the uid of this file to the parent’s field of the child.

  • root_uid: Sets the root uid of the child as this files uid.

  • depth: The child inherits the unpacking depth from this file, incremented by one.

  • scheduled_analysis: The child inherits this file’s scheduled analysis.

  • virtual_file_path: Sets a new virtual_file_path for the child, being <this_files_current_vfp|child_path>.

Parameters:

file_object (FileObject) – File that was extracted from the current file

Return type:

None

analysis_exception

If an exception occurred during analysis, this fields stores a tuple (<plugin name>, <error message>) for debugging purposes and as placeholder in UI.

analysis_tags

Analysis tags for this file. An analysis tag has the structure {tag_name: {'value': value, 'color': color, 'propagate': propagate,}, 'root_uid': root uid} while the first layer of this dict is a key for each plugin. So in total you have a dict {plugin: [tags, of, plugin], ..}.

binary

Binary representation of this file in bytes.

comments

List of comments that have been made on this file. Comments are dicts with the keys time (float), author (str) and comment (str).

create_binary_from_path()
Return type:

None

depth

Extraction depth of this object. If outer firmware file, this is 0. Every extraction increments this by one. For a file inside a squashfs, that is contained inside a tar archive this would be 1 (tar) + 1 (fs) = 2.

file_name

Name of this file. Similar to file_path, this probably is generated for carved objects.

file_path

The path of this file. Has to be a local path if binary is not set. For carved objects, this will likely only be a (generated) name.

files_included

The set of files included in this file. This is usually true for archives. Only lists the next layer, not recursively included files on lower extraction layers.

classmethod from_json(json_dict, root_uid=None)
Parameters:
  • json_dict (dict) –

  • root_uid (str | None) –

Return type:

FileObject

get_hid()

Get a human-readable identifier for the given file. This usually is the file name for extracted files. :return: String representing a human-readable identifier for this file.

Return type:

str

get_virtual_paths_for_all_uids()

Get all virtual file paths (VFPs) of the file in all firmware containers.

Returns:

List of virtual paths.

Return type:

list[str]

list_of_all_included_files

The list of all recursively included files in this file. That means files are included that are themselves included in files contained in this file, and so on. This value is not set by default as it’s expensive to aggregate and takes up a lot of memory.

parent_firmware_uids

Set of parent firmware uids. Parent uids are from the root object, this file belongs to, not its direct predecessor. Thus, as a file can be part of multiple firmware images, this field is a set. This field should be closely related to the keys in the virtual file path field.

parents

List of parent uids. A parent in this context is the direct predecessor in a firmware tree. Not necessarily it’s root.

processed_analysis

Analysis results for this file.

Structure of results: The first level of this dict is a pair of 'plugin_name': <result_dict> pairs. The result dict can have any content, but always has at least the fields:

  • analysis_date - float representing the time of analysis in unix time.

  • plugin_version - str defining the version of each plugin at time of analysis.

  • summary - list holding a summary of each file’s result, that can be aggregated.

root_uid

UID of root (i.e. firmware) object for the given file. Useful to associate results of children with firmware. Is only set during unpacking / analysis in the backend and not if you load the object from the DB!

scheduled_analysis

List of plugins that are scheduled to be run on this file.

set_binary(binary)

Store the binary representation of the file as byte string. Additionally, set binary related metadata (size, hash) and compute uid after that.

Parameters:

binary (bytes) – file in binary representation

Return type:

None

sha256

SHA256 hash of this file.

size

Size of this file in bytes

temporary_data

This field can be used for arbitrary temporary storage. It will not be persisted to the database, so it dies after the analysis cycle.

to_json(vfp_parent_filter=None)

Get a FileObject as JSON. vfp_parent_filter can be used to filter the entries with a UID whitelist.

Parameters:

vfp_parent_filter (set[str] | None) –

Return type:

dict

property uid: str

Unique identifier of this file. Consisting of the file’s sha256 hash, and it’s length in the form hash_length.

Returns:

uid of this file.

virtual_file_path

The virtual file path (vfp) is not a path on the analysis machine but the full path inside a firmware object. For a file inside a filesystem, that was itself packed inside an archive this might look like firmware_uid|fs_uid|/etc/hosts with the pipe sign ( | ) separating extraction levels. For files such as symlinks, there can be multiple paths inside a single firmware for one unique file.