Providers Library¶
ForML providers are plugins implementing particular functionality defined within the framework using an abstract interface to decouple itself from specific technologies allowing for greater operational flexibility.
Providers become available for runtime operations after being properly configured within the given platform.
See also
This page is merely a summary list of all the official providers shipped with ForML. API documentation as well as a comprehensive description of their logical concepts is covered in individual chapters dedicated to each of the provider types respectively (linked in subsections below).
Custom Provider Setup¶
In addition to the existing providers, users might find themselves requiring to implement their bespoke instances. To avoid having to always release and deploy these providers as true code artifacts, ForML allows to alternatively treat them in a rather more configuration-like manner.
For this purpose, the standard config directories are valid locations for hosting python modules with bespoke provider implementations available to the particular runtime platform.
Following is an example of a custom Feed setup (even though this one could well be
solved using the existing generic Alchemy Feed
or -
given the particular dataset - even more easily using the openlake.Lite
feed):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
This custom foobar:Baz
feed provider can now be added to the platform config:
[FEED.foobar]
provider = "foobar:Baz"
connection = "sqlite:////tmp/foobar.db"
Model Registries¶
ForML delegates responsibility for model persistence to model registry providers
implementing the abstract forml.io.asset.Registry
base class.
(Pseudo)registry implementation provided as temporal non-distributed storage persistent only during its lifetime. |
|
File-based registry backed by a locally-accessible posix file system. |
|
ForML model registry implementation using the MLflow Tracking Server as the artifact storage. |
Runners¶
The actual execution of the ForML workflows is performed by the
pipeline runner providers implementing the forml.runtime.Runner
base class.
ForML runner implementation using the Dask computing library as the execution platform. |
|
(Pseudo)runner using the Graphviz drawing software for rendering graphical visualization of the workflow task graph. |
|
Non-distributed low-latency runner turning the task graph into a single synchronous python function. |
|
ForML runner utilizing Apache Spark as a distributed executor. |
Feeds¶
To decouple projects from any physical data sources, ForML is using a generic query DSL
working with logical schemas that only at runtime get resolved to actual data provided by the
platform-configured set of feeds implementing the forml.io.Feed
base class.
Generic SQL feed based on SQLAlchemy. |
|
Lightweight feed for pulling data from multiple simple origins. |
External Providers
ForML feed providing access to a number of public datasets. |
Sinks¶
Reciprocally to the Feeds system, ForML is using sink providers for submitting the
workflow results according to the particular implementation of the
forml.io.Sink
base class.
Null sink with no real write action. |
|
Sink implementation committing the pipeline result to the standard output of the execution process. |
Application Inventories¶
For managing the high-level application descriptors driving the serving layer, ForML defers to the inventory providers implementing the
forml.io.asset.Inventory
base class.
Posix inventory implementation. |
Gateways¶
The serving layer representing one of the possible execution mechanisms is using the gateway providers implementing the forml.runtime.Gateway
base class.
Serving gateway implemented as a RESTful API. |