Serving Engine¶

In addition to the basic CLI-driven project-level batch-mode execution mechanism, ForML allows operating the encompassing applications within an interactive loop performing the apply action of the production life cycle - essentially providing online predictions a.k.a. ML inference based on the underlying models.

Process Control¶

The core component driving the serving loop is the Engine. To facilitate the end-to-end prediction serving, it interacts with all the different platform sub-systems as shown in the following sequence diagram:

sequenceDiagram
    actor Client
    participant Engine as Engine/Gateway
    Client ->> Engine: query(Application, Request)
    opt if not in cache
        Engine ->> Inventory: get_descriptor(Application)
        Inventory --) Engine: Descriptor
    end
    Engine ->> Engine: Entry, Scope = Descriptor.receive(Request)
    Engine ->> Engine: ModelHandle = Descriptor.select(Scope)
    opt if needed for model selection
        Engine ->> Registry: inspect()
        Registry --) Engine: Metadata
    end
    Engine ->> Engine: Runner = get_or_spawn()
    Engine ->> Runner: apply(ModelHandle, Entry)
    opt if not loaded
        Runner ->> Registry: load(ModelHandle)
        Registry --) Runner: Model
    end
    opt if needs augmenting
        Runner ->> Feed: get_features(Entry)
        Feed --) Runner: Features
    end
    Runner ->> Runner: Outcome = Model.predict(Features)
    Runner --) Engine: Outcome
    Engine ->> Engine: Response = Descriptor.respond(Outcome)
    Engine --) Client: Response

This diagram illustrates the following steps:

Receiving a request containing the query payload and the target application reference.
Upon the very first request for any given application, the engine fetches the particular application descriptor from the configured inventory. The descriptor remains cached for every follow-up request of that application.
The engine uses the descriptor of the selected application to dispatch the request by:
1. Interpreting the query payload.
2. Selecting a particular model generation to serve the given request (depending on the model-selection strategy used by that application, this step might involve interaction with the model registry).
Unless already running, the engine spawns a dedicated runner which loads the selected model artifacts providing an isolated environment not colliding with (dependencies of) other models also served by the same engine.
The runner might involve the configured feed system to augment the provided data points using a feature store.
With the complete feature set matching the project-defined schema, the runner executes the pipeline in the apply-mode obtaining the prediction outcomes.
Finally, the engine again uses the application descriptor to produce the response which is then returned to the original caller.

Note

An engine can serve any application available in its linked inventory in a multiplexed fashion. Since the released project packages contain all the declared dependencies, the engine itself remains generic. To avoid collisions between dependencies of different models, the engine separates each one in an isolated context.

Frontend Gateway¶

While the engine is full-featured in terms of the end-to-end application serving, it can only be engaged using its raw Python API. That’s suitable for products natively embedding the engine as an integrated component, but for a truly decoupled client-server architecture, this needs an extra layer providing some sort of a transport protocol.

For this purpose, ForML comes with the concept of serving frontend gateways. They also follow the provider pattern allowing to deliver a number of different interchangeable implementations pluggable at launch time.

Frontend gateways represent the outermost layer in the logical hierarchy of the ForML architecture:

Layer	Objective/Task	Problem question	Product/Instance
Project	ML solution	How to solve?	Prediction outcomes (e.g. probabilities)
Application	Domain interpretation, model selection	How to utilize?	Domain response (e.g. recommended products)
Engine	Serving control	How to operate?	Interactive processing loop
Gateway	Client-server transport	How to integrate?	ML service API

API¶

class forml.runtime.Gateway(inventory: asset.Inventory | None = None, registry: asset.Registry | None = None, feeds: io.Importer | None = None, processes: int | None = None, loop: asyncio.AbstractEventLoop | None = None, **kwargs)[source]¶

Top-level serving gateway abstraction.

Parameters:

inventory: asset.Inventory | None = None¶: Inventory of applications to be served (default as per the platform configuration).
registry: asset.Registry | None = None¶: Model registry of project artifacts to be served (default as per the platform configuration).
feeds: io.Importer | None = None¶: Feeds to be used for potential feature augmentation (default as per the platform configuration).
processes: int | None = None¶: Process pool size for each model sandbox.
loop: asyncio.AbstractEventLoop | None = None¶: Explicit event loop instance.
**kwargs¶: Additional serving loop keyword arguments passed to the run() method.

abstract classmethod run(apply: Callable[[str, layout.Request], Awaitable[layout.Response]], stats: Callable[[], Awaitable[runtime.Stats]], **kwargs) → None[source]¶

Serving loop implementation.

Parameters:

apply: Callable[[str, layout.Request], Awaitable[layout.Response]]¶: Prediction request handler provided by the engine. The handler expects two parameters - the target application name and the prediction request.
stats: Callable[[], Awaitable[runtime.Stats]]¶: Stats producer callback provided by the engine.
**kwargs¶: Additional keyword arguments provided via the constructor.

Service Management¶

The gateway service can be managed using the CLI as follows (see the integrated help for full synopsis):

Use case	Command
Launch the gateway service	`$ forml application serve`

Gateway Providers¶

Gateway providers can be configured within the runtime platform setup using the [GATEWAY.*] sections.

The available implementations are:

`Rest`	Serving gateway implemented as a RESTful API.