Pipeline Runner

To perform particular life cycle actions of any given project, ForML delegates the workflow topology compiled into a portable set of instructions to a selected runner provider for its execution.

The runner is the foremost elementary component of the runtime platform carrying out the compute function on top of the entire IO layer (represented by the feed, sink, and registry providers).

The pluggable provider model of the runner concept conveniently allows to mix and match different processing technologies for different workloads as these typically come with varying performance criteria regarding the particular use case (e.g. low latency for online serving vs large throughput for offline training).

There are three different execution mechanisms each engaging the pipeline runners under the hood.

Runner API

class forml.runtime.Runner(instance: asset.Instance | None = None, feed: io.Feed | None = None, sink: io.Sink | None = None, **kwargs)[source]

Base class for implementing ForML runner providers.

The public API allows performing all the standard actions of the ForML lifecycles.

All that needs to be supplied by the provider is the abstract run() method.

Parameters:
instance: asset.Instance | None = None

A particular instance of the persistent artifacts to be executed.

feed: io.Feed | None = None

Optional input feed instance to retrieve the data from (falls back to the default configured feed).

sink: io.Sink | None = None

Output sink instance (no output is produced if omitted).

**kwargs

Additional keyword arguments for the run() method.

abstract classmethod run(symbols: Collection[flow.Symbol], **kwargs) None[source]

Actual run action implementation using the specific provider execution technology.

Parameters:
symbols: Collection[flow.Symbol]

Collection of portable symbols representing the workflow task graph to be executed as produced by the flow.compile() function.

**kwargs

Custom keyword arguments provided via the constructor.

Runner Providers

Runner providers can be configured within the runtime platform setup using the [RUNNER.*] sections.

The available implementations are:

Dask

ForML runner implementation using the Dask computing library as the execution platform.

Graphviz

(Pseudo)runner using the Graphviz drawing software for rendering graphical visualization of the workflow task graph.

Pyfunc

Non-distributed low-latency runner turning the task graph into a single synchronous python function.

Spark

ForML runner utilizing Apache Spark as a distributed executor.