scheduler.analysis.scheduler

class scheduler.analysis.scheduler.AnalysisScheduler(post_analysis=None, db_interface=None, unpacking_locks=None)

Bases: object

The analysis scheduler is responsible for

  • initializing analysis plugins

  • scheduling tasks based on user decision and built-in dependencies

  • deciding if tasks should run or may be skipped

  • running the tasks

  • and storing the new results of analysis tasks in the database

Plugin initialization is mostly handled by the plugins, the scheduler only provides an attachment point and offers a single point of reference for introspection and runtime information.

The scheduler offers three entry points:

  1. Start the analysis of a file object (start_analysis_of_object)

  2. Start the analysis of a file object without context (update_analysis_of_single_object)

  3. Start an update of a firmware file and all it’s children (update_analysis_of_object_and_children)

Entry point 1. is used by the unpacking scheduler and is trigger for each file object after the unpacking has been processed. Entry points 2. and 3. are independent of the unpacking process and can be triggered by the user using the Web-UI or REST-API. 2. is used to update analyses for a single file. 3. is used to update analyses for all files contained inside a given firmware. The difference between 1. and 2. is that the single file update (2.) will not be considered in the current analysis introspection.

Scheduling of tasks is made with the following considerations:

  • New objects need a set of mandatory plugins (e.g. file type and hashes), as these results are used in further processing stages

  • Plugins can have dependencies, these have to be present before the depending plugin can be run

  • The order of execution is shuffled (dependency preserving) to balance execution of the plugins

After scheduling, for each task a set of checks is run to decide if a task might be skipped: class:

┌─┬──────────────┐ No                                   ┌────────┐
│0│Plugin exists?├──────────────────────────────────────►        │
└─┴───┬──────────┘                                      │  Skip  │
      │ Yes                                     ┌───────►        ◄───┐
┌─┬───▼─────────────┐ Yes                       │       └────────┘   │
│1│Is forced update?├───────────────────────────┼─────┐              │
└─┴───┬─────────────┘                           │     │              │
      │ No                                      │     │              │
┌─┬───▼────────────────────────────────┐ Yes    │     │              │
│2│Analysis present, version unchanged?├────────┘     │              │
└─┴───┬────────────────────────────────┘              │ ┌─────────┐  │
      │ No                                            └─►         │  │
┌─┬───▼────────────────────────────┐ No                 │  Start  │  │
│3│Analysis is black / whitelisted?├────────────────────►         │  │
└─┴───┬────────────────────────────┘                    └─────────┘  │
      │ Yes                                                          │
      └──────────────────────────────────────────────────────────────┘

Running the analysis tasks is achieved through (multiprocessing.Queue)s. Each plugin has an in-queue, triggered by the scheduler using the add_job function, and an out-queue that is processed by the result collector. The actual analysis process is out of scope. Database interaction happens before (pre_analysis) and after (post_analysis) the running of a task, to store intermediate results for live updates, and final results.

Parameters:
  • pre_analysis – A database callback to execute before running an analysis task.

  • post_analysis (Optional[Callable[[str, str, dict], None]]) – A database callback to execute after running an analysis task.

  • db_interface – An object reference to an instance of BackEndDbInterface.

  • unpacking_locks (UnpackingLockManager | None) – An instance of UnpackingLockManager.

check_exceptions()

Iterate all attached processes and see if an exception occurred in any. Depending on configuration, plugin exceptions are not registered as they are restarted after an exception occurs.

Returns:

Boolean value stating if any attached process ran into an exception

Return type:

bool

get_combined_analysis_workload()
get_plugin_dict()

Get information regarding all loaded plugins in form of a dictionary with the following form:

{
    NAME: (
        str: DESCRIPTION,
        bool: mandatory,
        dict: plugin_sets,
        str: VERSION,
        list: DEPENDENCIES,
        list: MIME_BLACKLIST,
        list: MIME_WHITELIST,
        str: config.threads
    )
}

Mandatory plugins are not shown in the analysis selection but always executed. Default plugins are pre-selected in the analysis selection.

Returns:

dict with information regarding all loaded plugins

Return type:

dict

get_scheduled_workload()

Get the current workload of this scheduler. The workload is represented through - the general in-queue, - the currently running analyses in each plugin and the plugin in-queues, - the progress for each currently analyzed firmware and - recently finished analyses.

The result has the form:

{
    'analysis_main_scheduler': int(),
    'plugins': dict(),
    'current_analyses': dict(),
    'recently_finished_analyses': dict(),
}
Returns:

Dictionary containing current workload statistics

Return type:

dict

shutdown()

Shutdown the runner process, the result collector and all plugin processes. A multiprocessing.Value is set to notify all attached processes of the impending shutdown. Afterwards queues are closed once it’s safe.

start()
start_analysis_of_object(fo)

This function is used to start analysis of a firmware object. The function registers the firmware with the status module such that the progress of the firmware and its included files is tracked.

Parameters:

fo (FileObject) – The firmware that is to be analyzed

update_analysis_of_object_and_children(fo)

This function is used to analyze an object and all its recursively included objects without repeating the extraction process. Scheduled analyses are propagated to the included objects.

Parameters:

fo (FileObject) – The root file that is to be analyzed

update_analysis_of_single_object(fo)

This function is used to add analysis tasks for a single file. This function has no side effects so the object is simply iterated until all scheduled analyses are processed or skipped.

Parameters:

fo (FileObject) – The file that is to be analyzed