scheduler.analysis.scheduler

class scheduler.analysis.scheduler.AnalysisScheduler(post_analysis=None, db_interface=None, unpacking_locks=None)

Bases: object

The analysis scheduler is responsible for

initializing analysis plugins
scheduling tasks based on the user’s decision and built-in dependencies
deciding if tasks should run or may be skipped
running the tasks
and storing the new results of analysis tasks in the database

Plugins handle initialization mostly themselves. The scheduler only provides an attachment point and offers a single point of reference for introspection and runtime information.

The scheduler offers three entry points:

Start the analysis of a file object (start_analysis_of_object)
Start the analysis of a file object without context (update_analysis_of_single_object)
Start an update of a firmware file and all its children (update_analysis_of_object_and_children)

Entry point 1. is used by the unpacking scheduler and is where each file object is sent after being unpacked. Entry points 2. and 3. are independent of the unpacking process and can be triggered by the user using the Web-UI or REST-API. 2. is used to update analyses for a single file. 3. is used to update analyses for all files contained inside a given firmware. The difference between 1. and 2. is that the single file update (2.) will not be considered in the “current analyses” introspection.

Scheduling of tasks is made with the following considerations:

New objects need a set of mandatory plugins (e.g. file type and hashes), as these results are used in further processing stages
Plugins can have dependencies, these have to be present before the depending plugin can be run
The order of execution is shuffled (dependency preserving) to balance execution of the plugins

After scheduling, for each task a set of checks is run to decide if a task might be skipped: class:

┌─┬──────────────┐ No                                   ┌────────┐
│0│Plugin exists?├──────────────────────────────────────►        │
└─┴───┬──────────┘                                      │  Skip  │
      │ Yes                                     ┌───────►        ◄───┐
┌─┬───▼─────────────┐ Yes                       │       └────────┘   │
│1│Is forced update?├───────────────────────────┼─────┐              │
└─┴───┬─────────────┘                           │     │              │
      │ No                                      │     │              │
┌─┬───▼────────────────────────────────┐ Yes    │     │              │
│2│Analysis present, version unchanged?├────────┘     │              │
└─┴───┬────────────────────────────────┘              │ ┌─────────┐  │
      │ No                                            └─►         │  │
┌─┬───▼────────────────────────────┐ No                 │  Start  │  │
│3│Analysis is black / whitelisted?├────────────────────►         │  │
└─┴───┬────────────────────────────┘                    └─────────┘  │
      │ Yes                                                          │
      └──────────────────────────────────────────────────────────────┘

Passing objects between processes is done using instances of multiprocessing.Queue. Each plugin has an in-queue, which is filled by the scheduler using the add_job() function, and an out-queue that is processed by the result collector. The actual analysis process is out of scope. The results are stored in the database using post_analysis() after each analysis is completed.

Parameters:

post_analysis (Optional[Callable[[str, str, dict], None]]) – A database function to call after running an analysis task.
db_interface – An instance of BackEndDbInterface.
unpacking_locks (UnpackingLockManager | None) – An instance of UnpackingLockManager.

cancel_analysis(root_uid)

Parameters:: root_uid (str) –

check_exceptions()

Iterate all attached processes and see if an exception occurred in any. Depending on configuration, plugin exceptions are not registered as they are restarted after an exception occurs.

Returns:: Boolean value stating if any attached process ran into an exception
Return type:: bool

get_combined_analysis_workload()

get_plugin_dict()

Get information regarding all loaded plugins in the form of a dictionary with the following form:

{
    NAME: (
        str: DESCRIPTION,
        bool: mandatory,
        dict: plugin_sets,
        str: VERSION,
        list: DEPENDENCIES,
        list: MIME_BLACKLIST,
        list: MIME_WHITELIST,
        str: config.threads
    )
}

Mandatory plugins are not shown in the analysis selection but always executed. Default plugins are pre-selected in the analysis selection.

Returns:: dict with information regarding all loaded plugins
Return type:: dict

get_scheduled_workload()

Get the current workload of this scheduler. The workload is represented by - the general in-queue, - the currently running analyses in each plugin and the plugin in-queues, - the progress for each currently analyzed firmware and - recently finished analyses.

The result has the form:

{
    'analysis_main_scheduler': int(),
    'plugins': dict(),
    'current_analyses': dict(),
    'recently_finished_analyses': dict(),
}

Returns:: Dictionary containing current workload statistics
Return type:: dict

shutdown(): Shutdown the runner process, the result collector and all plugin processes. A multiprocessing.Value is set to notify all attached processes of the impending shutdown. Afterward, queues are closed once it’s safe.

start()

start_analysis_of_object(fo)

This function is used to start analysis of a firmware object. The function registers the firmware with the status module such that the progress of the firmware and its included files is tracked.

Parameters:: fo (FileObject) – The firmware that is to be analyzed

update_analysis_of_object_and_children(fo)

This function is used to analyze an object and all its recursively included objects without repeating the extraction process. Scheduled analyses are propagated to the included objects.

Parameters:: fo (FileObject) – The root file that is to be analyzed

update_analysis_of_single_object(fo)

This function is used to add analysis tasks for a single file. This function has no side effects, so the object is simply iterated until all scheduled analyses are processed or skipped.

Parameters:: fo (FileObject) – The file that is to be analyzed