objects.file module
- class objects.file.FileObject(binary=None, file_name=None, file_path=None, scheduled_analysis=None)
Bases:
object
FileObject is the primary data structure in FACT. It holds all meta information of a file along with analysis results and some internal values for scheduling.
- Parameters:
binary (bytes | None) – The file in binary representation. Either this or file_path has to be present.
file_name (str | None) – The file’s name.
file_path (str | None) – The file’s path. Either this or binary has to be present.
scheduled_analysis (Optional[list[str]]) – A list of analysis plugins that should be run on this file.
- add_included_file(file_object)
This functions adds a file to this object’s list of included files. The function also takes care of a number of fields for the child object:
parents: Adds the uid of this file to the parent’s field of the child.
root_uid: Sets the root uid of the child as this files uid.
depth: The child inherits the unpacking depth from this file, incremented by one.
scheduled_analysis: The child inherits this file’s scheduled analysis.
virtual_file_path: Sets a new virtual_file_path for the child, being <this_files_current_vfp|child_path>.
- Parameters:
file_object (FileObject) – File that was extracted from the current file
- Return type:
None
- analysis_exception
If an exception occurred during analysis, this fields stores a tuple
(<plugin name>, <error message>)
for debugging purposes and as placeholder in UI.
- analysis_tags
Analysis tags for this file. An analysis tag has the structure
{tag_name: {'value': value, 'color': color, 'propagate': propagate,}, 'root_uid': root uid}
while the first layer of this dict is a key for each plugin. So in total you have a dict{plugin: [tags, of, plugin], ..}
.
- binary
Binary representation of this file in bytes.
- comments
List of comments that have been made on this file. Comments are dicts with the keys time (float), author (str) and comment (str).
- create_binary_from_path()
- Return type:
None
- depth
Extraction depth of this object. If outer firmware file, this is 0. Every extraction increments this by one. For a file inside a squashfs, that is contained inside a tar archive this would be 1 (tar) + 1 (fs) = 2.
- file_name
Name of this file. Similar to
file_path
, this probably is generated for carved objects.
- file_path
The path of this file. Has to be a local path if binary is not set. For carved objects, this will likely only be a (generated) name.
- files_included
The set of files included in this file. This is usually true for archives. Only lists the next layer, not recursively included files on lower extraction layers.
- classmethod from_json(json_dict, root_uid=None)
- Parameters:
json_dict (dict) –
root_uid (str | None) –
- Return type:
- get_hid()
Get a human-readable identifier for the given file. This usually is the file name for extracted files. :return: String representing a human-readable identifier for this file.
- Return type:
str
- get_virtual_paths_for_all_uids()
Get all virtual file paths (VFPs) of the file in all firmware containers.
- Returns:
List of virtual paths.
- Return type:
list[str]
- list_of_all_included_files
The list of all recursively included files in this file. That means files are included that are themselves included in files contained in this file, and so on. This value is not set by default as it’s expensive to aggregate and takes up a lot of memory.
- parent_firmware_uids
Set of parent firmware uids. Parent uids are from the root object, this file belongs to, not its direct predecessor. Thus, as a file can be part of multiple firmware images, this field is a set. This field should be closely related to the keys in the virtual file path field.
- parents
List of parent uids. A parent in this context is the direct predecessor in a firmware tree. Not necessarily it’s root.
- processed_analysis
Analysis results for this file.
Structure of results: The first level of this dict is a pair of
'plugin_name': <result_dict>
pairs. The result dict can have any content, but always has at least the fields:analysis_date - float representing the time of analysis in unix time.
plugin_version - str defining the version of each plugin at time of analysis.
summary - list holding a summary of each file’s result, that can be aggregated.
- root_uid
UID of root (i.e. firmware) object for the given file. Useful to associate results of children with firmware. Is only set during unpacking / analysis in the backend and not if you load the object from the DB!
- scheduled_analysis
List of plugins that are scheduled to be run on this file.
- set_binary(binary)
Store the binary representation of the file as byte string. Additionally, set binary related metadata (size, hash) and compute uid after that.
- Parameters:
binary (bytes) – file in binary representation
- Return type:
None
- sha256
SHA256 hash of this file.
- size
Size of this file in bytes
- temporary_data
This field can be used for arbitrary temporary storage. It will not be persisted to the database, so it dies after the analysis cycle.
- to_json(vfp_parent_filter=None)
Get a FileObject as JSON. vfp_parent_filter can be used to filter the entries with a UID whitelist.
- Parameters:
vfp_parent_filter (set[str] | None) –
- Return type:
dict
- property uid: str
Unique identifier of this file. Consisting of the file’s sha256 hash, and it’s length in the form hash_length.
- Returns:
uid of this file.
- virtual_file_path
The virtual file path (vfp) is not a path on the analysis machine but the full path inside a firmware object. For a file inside a filesystem, that was itself packed inside an archive this might look like firmware_uid|fs_uid|/etc/hosts with the pipe sign ( | ) separating extraction levels. For files such as symlinks, there can be multiple paths inside a single firmware for one unique file.