Serving Engine¶
In addition to the basic CLI-driven project-level batch-mode execution mechanism, ForML allows operating the encompassing applications within an interactive loop performing the apply action of the production life cycle - essentially providing online predictions a.k.a. ML inference based on the underlying models.
Process Control¶
The core component driving the serving loop is the Engine. To facilitate the end-to-end prediction serving, it interacts with all the different platform sub-systems as shown in the following sequence diagram:
sequenceDiagram
actor Client
participant Engine as Engine/Gateway
Client ->> Engine: query(Application, Request)
opt if not in cache
Engine ->> Inventory: get_descriptor(Application)
Inventory --) Engine: Descriptor
end
Engine ->> Engine: Entry, Scope = Descriptor.receive(Request)
Engine ->> Engine: ModelHandle = Descriptor.select(Scope)
opt if needed for model selection
Engine ->> Registry: inspect()
Registry --) Engine: Metadata
end
Engine ->> Engine: Runner = get_or_spawn()
Engine ->> Runner: apply(ModelHandle, Entry)
opt if not loaded
Runner ->> Registry: load(ModelHandle)
Registry --) Runner: Model
end
opt if needs augmenting
Runner ->> Feed: get_features(Entry)
Feed --) Runner: Features
end
Runner ->> Runner: Outcome = Model.predict(Features)
Runner --) Engine: Outcome
Engine ->> Engine: Response = Descriptor.respond(Outcome)
Engine --) Client: Response
This diagram illustrates the following steps:
Receiving a request containing the query payload and the target application reference.
Upon the very first request for any given application, the engine fetches the particular application descriptor from the configured inventory. The descriptor remains cached for every follow-up request of that application.
The engine uses the descriptor of the selected application to dispatch the request by:
Interpreting the query payload.
Selecting a particular model generation to serve the given request (depending on the model-selection strategy used by that application, this step might involve interaction with the model registry).
Unless already running, the engine spawns a dedicated runner which loads the selected model artifacts providing an isolated environment not colliding with (dependencies of) other models also served by the same engine.
The runner might involve the configured feed system to augment the provided data points using a feature store.
With the complete feature set matching the project-defined schema, the runner executes the pipeline in the apply-mode obtaining the prediction outcomes.
Finally, the engine again uses the application descriptor to produce the response which is then returned to the original caller.
Note
An engine can serve any application available in its linked inventory in a multiplexed fashion. Since the released project packages contain all the declared dependencies, the engine itself remains generic. To avoid collisions between dependencies of different models, the engine separates each one in an isolated context.
Frontend Gateway¶
While the engine is full-featured in terms of the end-to-end application serving, it can only be engaged using its raw Python API. That’s suitable for products natively embedding the engine as an integrated component, but for a truly decoupled client-server architecture, this needs an extra layer providing some sort of a transport protocol.
For this purpose, ForML comes with the concept of serving frontend gateways. They also follow the provider pattern allowing to deliver a number of different interchangeable implementations pluggable at launch time.
Frontend gateways represent the outermost layer in the logical hierarchy of the ForML architecture:
Layer |
Objective/Task |
Problem question |
Product/Instance |
---|---|---|---|
ML solution |
How to solve? |
Prediction outcomes (e.g. probabilities) |
|
Domain interpretation, model selection |
How to utilize? |
Domain response (e.g. recommended products) |
|
Serving control |
How to operate? |
Interactive processing loop |
|
Client-server transport |
How to integrate? |
ML service API |
API¶
-
class forml.runtime.Gateway(inventory: asset.Inventory | None =
None
, registry: asset.Registry | None =None
, feeds: io.Importer | None =None
, processes: int | None =None
, loop: asyncio.AbstractEventLoop | None =None
, **kwargs)[source]¶ Top-level serving gateway abstraction.
- Parameters:
- inventory: asset.Inventory | None =
None
¶ Inventory of applications to be served (default as per the platform configuration).
- registry: asset.Registry | None =
None
¶ Model registry of project artifacts to be served (default as per the platform configuration).
- feeds: io.Importer | None =
None
¶ Feeds to be used for potential feature augmentation (default as per the platform configuration).
- processes: int | None =
None
¶ Process pool size for each model sandbox.
- loop: asyncio.AbstractEventLoop | None =
None
¶ Explicit event loop instance.
- **kwargs¶
Additional serving loop keyword arguments passed to the
run()
method.
- inventory: asset.Inventory | None =
- abstract classmethod run(apply: Callable[[str, layout.Request], Awaitable[layout.Response]], stats: Callable[[], Awaitable[runtime.Stats]], **kwargs) None [source]¶
Serving loop implementation.
- Parameters:
- apply: Callable[[str, layout.Request], Awaitable[layout.Response]]¶
Prediction request handler provided by the engine. The handler expects two parameters - the target application name and the prediction request.
- stats: Callable[[], Awaitable[runtime.Stats]]¶
Stats producer callback provided by the engine.
- **kwargs¶
Additional keyword arguments provided via the constructor.
Service Management¶
The gateway service can be managed using the CLI as follows (see the integrated help for full synopsis):
Use case |
Command |
---|---|
Launch the gateway service |
|
Gateway Providers¶
Gateway providers can be configured within the runtime platform setup using the [GATEWAY.*]
sections.
The available implementations are:
Serving gateway implemented as a RESTful API. |