During their life cycles, ForML projects produce specific artifacts as their runtime deliverables. To store these artifacts, ForML uses model registry providers as the persistence layer managing the models at rest.
We use the term model more loosely including not just the involved estimators but essentially any stateful actor in the entire pipeline.
The following diagram illustrates the logical hierarchy of the persistence layer based on a
particular instance of the
posix registry provider holding a single project
forml-titanic-example with two releases
1.3.dev12 - the first one having two generations with two
model assets each and the later just one generation with three model assets:
flowchart LR subgraph reg1 ["Registry [posix]"] subgraph prj1 ["Project [forml-titanic-example]"] subgraph rel1 ["Release [0.1.dev0]"] direction LR pkg111[(package.4ml)] subgraph gen1111 ["Generation "] direction TB act11111[(asset1)] act11112[(asset2)] tag1111>tag.toml] end subgraph gen1112 ["Generation "] direction TB act11121[(asset1)] act11122[(asset2)] tag1112>tag.toml] end end subgraph rel2 ["Release [1.3.dev12]"] direction LR pkg112[(package.4ml)] subgraph gen1121 ["Generation "] direction TB act11211[(asset1)] act11212[(asset2)] act11213[(asset3)] tag1121>tag.toml] end end end end
The deployable project code arrangement produced upon release from
within the development life cycle is the binary
package. It is a zipfile object
(typically a file with the
.4ml suffix) containing all the project principal components bundled together with all of its runtime code dependencies (as declared in
the project setup) plus some additional metadata (
Each ForML package is published with an explicit version as specified in the project setup at the time of release. All registry providers require packages of the same project to have unique monotonically increasing version numbers.
Registry providers might internally persist packages in an arbitrary format. In order to launch
their code using a runner, however, they need to be
mounted and exposed using a posix file system path known as
the staging path that is reachable from all runner nodes (for distributed deployment this implies
shared network posix file system).
Model Generation Assets¶
All stateful actors involved in a project life cycle require their internal state acquired during training to be persisted using the model registry. States produced from the same training process represent the model generation assets and every single follow-up training is leading to a new generation advancement.
Generations are implicitly versioned using an integer sequence number starting from
(relatively to the same release) incremented upon every generation advancement.
Since each actor can implement an arbitrary way of
representing its own state, the model assets are persisted as monolithic binary blobs with
a transparent structure.
The metadata associated with each generation is provided in form of an
Content of the registry can be managed using the CLI as follows (see the integrated help for full synopsis):
New release publishing
New generation publishing
Registry content listing
The low-level persistence interface is used mainly by the registry providers.
- class forml.project.Package(path: str | Path)¶
ForML artifact representing a complete project code together with all of its dependencies packaged for distribution.
- class forml.project.Manifest(name: str | Key, version: str | Key, package: str, **modules: str)¶
ForML distribution package metadata manifest.
class forml.io.asset.Tag(training: Training | None =
None, tuning: Tuning | None =
None, states: Sequence[UUID] | None =
class forml.io.asset.Registry(staging: str | Path | None =
Abstract base class of the ForML model registry concept.
- mount(project: asset.Project.Key, release: asset.Release.Key) project.Artifact ¶
Pull and install the given project/release package using the staging file system location available to all runner nodes.
The following is the high-level persistence interface as used by the runners.
class forml.io.asset.State(generation: asset.Generation, nodes: Sequence[UUID], tag: asset.Tag | None =
A high-level actor state persistence accessor.
It allows the runner to load and dump the states of individual stateful actors within the given generation.
class forml.io.asset.Instance(project: str | asset.Project.Key =
'__main__', release: str | asset.Release.Key | None =
None, generation: str | int | asset.Generation.Key | None =
None, registry: asset.Directory | None =
The top-level instance of a particular project/release/generation used by a Runner to access the runtime artifacts (both the release package and the model generation assets).
This is just a lazy reference not physically containing the actual assets - only fetching them upon the eventual access.
- class forml.io.asset.Directory(registry: asset.Registry)¶
Logical representation of a hierarchy of projects, their releases and generations.
- class forml.io.asset.Project.Key¶
Project level key - i.e. the project name.
This can be any identifier valid as a Python package distribution name (i.e. lowercase with hyphens).
class forml.io.asset.Release.Key(key: str | Key =
Project release level key - i.e. the release version.
This needs to be a valid PEP 440 version.
class forml.io.asset.Generation.Key(key: str | int | Key | None =
Project model generation key - i.e. generation sequence number.
This must be a natural integer starting from 1.
ForML comes with a number of providers implementing the
io.asset.Registry interface. To make them available
for the ForML runtime, selected providers need to be configured within the common platform
setup using the
The official registry providers are:
(Pseudo)registry implementation provided as temporal non-distributed storage persistent only during its lifetime.
File-based registry backed by a locally-accessible posix file system.
ForML model registry implementation using the MLflow Tracking Server as the artifact storage.