Model Persistence¶
During their life cycles, ForML projects produce specific artifacts as their runtime deliverables. To store these artifacts, ForML uses model registry providers as the persistence layer managing the models at rest.
Note
We use the term model more loosely including not just the involved estimators but essentially any stateful actor in the entire pipeline.
Project Artifacts¶
The two types of artifacts requiring persistence are code in form of the release package and states stored as model generation assets.
The following diagram illustrates the logical hierarchy of the persistence layer based on a
particular instance of the posix
registry provider holding a single project
forml-titanic-example
with two releases 0.1.dev0
and
1.3.dev12
- the first one having two generations with two
model assets each and the later just one generation with three model assets:
flowchart LR
subgraph reg1 ["Registry [posix]"]
subgraph prj1 ["Project [forml-titanic-example]"]
subgraph rel1 ["Release [0.1.dev0]"]
direction LR
pkg111[(package.4ml)]
subgraph gen1111 ["Generation [1]"]
direction TB
act11111[(asset1)]
act11112[(asset2)]
tag1111>tag.toml]
end
subgraph gen1112 ["Generation [2]"]
direction TB
act11121[(asset1)]
act11122[(asset2)]
tag1112>tag.toml]
end
end
subgraph rel2 ["Release [1.3.dev12]"]
direction LR
pkg112[(package.4ml)]
subgraph gen1121 ["Generation [1]"]
direction TB
act11211[(asset1)]
act11212[(asset2)]
act11213[(asset3)]
tag1121>tag.toml]
end
end
end
end
Release Package¶
The deployable project code arrangement produced upon release from
within the development life cycle is the binary ForML
package
. It is a zipfile object
(typically a file with the .4ml
suffix) containing all the project principal components bundled together with all of its runtime code dependencies (as declared in
the project setup) plus some additional metadata (ForML package
manifest
).
Each ForML package is published with an explicit version as specified in the project setup at the time of release. All registry providers require packages of the same project to have unique monotonically increasing version numbers.
Package Staging¶
Registry providers might internally persist packages in an arbitrary format. In order to launch
their code using a runner, however, they need to be
mounted
and exposed using a posix file system path known as
the staging path that is reachable from all runner nodes (for distributed deployment this implies
shared network posix file system).
Model Generation Assets¶
All stateful actors involved in a project life cycle require their internal state acquired during training to be persisted using the model registry. States produced from the same training process represent the model generation assets and every single follow-up training is leading to a new generation advancement.
Generations are implicitly versioned using an integer sequence number starting from 1
(relatively to the same release) incremented upon every generation advancement.
Since each actor can implement an arbitrary way of representing its own state
, the model assets are persisted as monolithic binary blobs with
a transparent structure.
The metadata associated with each generation is provided in form of an asset.Tag
.
Content Management¶
Content of the registry can be managed using the CLI as follows (see the integrated help for full synopsis):
Use case |
Command |
---|---|
New release publishing |
|
New generation publishing |
|
Registry content listing |
|
Persistence API¶
Low-level¶
The low-level persistence interface is used mainly by the registry providers.
- class forml.project.Package(path: str | Path)[source]¶
ForML artifact representing a complete project code together with all of its dependencies packaged for distribution.
- class forml.project.Manifest(name: str | Key, version: str | Key, package: str, **modules: str)[source]¶
ForML distribution package metadata manifest.
-
class forml.io.asset.Tag(training: Training | None =
None
, tuning: Tuning | None =None
, states: Sequence[UUID] | None =None
)[source]¶ Generation metadata.
-
class forml.io.asset.Registry(staging: str | Path | None =
None
)[source]¶ Abstract base class of the ForML model registry concept.
- Parameters:
- staging: str | Path | None =
None
¶ File system location reachable from all runner nodes to be used for package staging (defaults to a local temporal directory (invalid for distributed runners)).
- staging: str | Path | None =
- mount(project: asset.Project.Key, release: asset.Release.Key) project.Artifact [source]¶
Pull and install the given project/release package using the staging file system location available to all runner nodes.
- Parameters:
- project: asset.Project.Key¶
Name of the project to work with.
- release: asset.Release.Key¶
Version of the release to be loaded.
- Returns:
Product artifact.
- Raises:
forml.MissingError – The given artifact could not be found.
High-level¶
The following is the high-level persistence interface as used by the runners.
-
class forml.io.asset.State(generation: asset.Generation, nodes: Sequence[UUID], tag: asset.Tag | None =
None
)[source]¶ A high-level actor state persistence accessor.
It allows the runner to load and dump the states of individual stateful actors within the given generation.
-
class forml.io.asset.Instance(project: str | asset.Project.Key =
'__main__'
, release: str | asset.Release.Key | None =None
, generation: str | int | asset.Generation.Key | None =None
, registry: asset.Directory | None =None
)[source]¶ The top-level instance of a particular project/release/generation used by a Runner to access the runtime artifacts (both the release package and the model generation assets).
This is just a lazy reference not physically containing the actual assets - only fetching them upon the eventual access.
- class forml.io.asset.Directory(registry: asset.Registry)[source]¶
Logical representation of a hierarchy of projects, their releases and generations.
- class forml.io.asset.Project.Key¶
Project level key - i.e. the project name.
This can be any identifier valid as a Python package distribution name (i.e. lowercase with hyphens).
-
class forml.io.asset.Release.Key(key: str | Key =
'0'
)¶ Project release level key - i.e. the release version.
This needs to be a valid PEP 440 version.
-
class forml.io.asset.Generation.Key(key: str | int | Key | None =
1
)¶ Project model generation key - i.e. generation sequence number.
This must be a natural integer starting from 1.
Registry Providers¶
ForML comes with a number of providers implementing the
io.asset.Registry
interface. To make them available
for the ForML runtime, selected providers need to be configured within the common platform
setup using the [REGISTRY.*]
sections.
The official registry providers are:
(Pseudo)registry implementation provided as temporal non-distributed storage persistent only during its lifetime. |
|
File-based registry backed by a locally-accessible posix file system. |
|
ForML model registry implementation using the MLflow Tracking Server as the artifact storage. |