In addition to the basic CLI driven isolated batch mode, ML projects implemented on ForML can be embedded into a dynamic serving layer and operated in an autonomous full-cycle fashion. This layer continuously serves the following functions:
ongoing performance reporting
dynamic rollout strategies
In order to autonomously provide the full-cycle serving capabilities for a supervised ML project, there needs to be a programmatically reachable event-outcome feedback loop defined as an external reconciliation path providing a knowledge of the true outcome for every event the system is predicting for.
Implementation of this feedback loop (the reconciliation logic) is in scope of the particular business application and its data architecture to which ForML simply plugs into using its feed system.
The key attribute of this feedback loop is its latency which determines the turnaround time for all the serving functionality like performance monitoring, incremental training etc.
The serving capabilities are provided through a number of additional platform components as explained in the following sections.
Online agent is the most apparent serving component responsible for answering the event queries with actual predictions. In scope of this process it needs to go through set of essential steps (some of them are part of agent bootstrapping or periodical cache refreshing while others are synchronous with each query):
Fetching the serving manifest from the project roster.
Selecting a particular model generation using the dynamic rollout strategy as defined in the serving manifest.
Loading the selected model generation from the model registry.
Fetching all missing input features for augmenting the particular request according to the project input DSL.
Running the prediction pipeline and responding with the result.
Submitting query metadata to the query logbus.
The rollout workflow employed by the agent is a powerful concept allowing to select particular model/generation dynamically based on the project-defined function of any available parameters (mainly the performance metrics). This allows to implement strategies like canary deployment, multi-armed bandits, A/B testing, cold-start or fallback models etc.
The serving agent is expected to be embedded into a particular application layer (ie web/rest service) to provide the actual frontend facade.
This is a tiny storage service used by the serving layer to pickup list of active projects and their serving manifests. It gets updated as part of project deployment promotion and continuously watched by the online/offline agents to determine things like the model generation selection.
Standard publisher-subscriber software bus for distributing the serving queries metadata to allow for further (offline) processing like the performance reporting or general debugging. The typical attributes sent to the query logbus per each event are:
project + version
obtained augmentation features
Another storage service for aggregating the performance metric as time series derived from both the metadata pushed via query logbus as well as the main feedback loop and produced by the offline agent processing.
The PerfDB is a crucial source of information not only for any sorts of operational monitoring/reporting but especially for the dynamic model generation selection performed by the online agent according to the rollout strategy when serving the actual event queries.
The typical available metrics are:
per model generation:
serving latency (gauge)
number of requests (counter)
loss function value
auxiliary project-defined metrics
loss function value
Offline agent is the backend service responsible for doing all the heavy processing of: