Model Persistence

During their life cycles, ForML projects produce specific artifacts as their runtime deliverables. To store these artifacts, ForML uses model registry providers as the persistence layer managing the models at rest.

Note

We use the term model more loosely including not just the involved estimators but essentially any stateful actor in the entire pipeline.

Project Artifacts

The two types of artifacts requiring persistence are code in form of the release package and states stored as model generation assets.

The following diagram illustrates the logical hierarchy of the persistence layer based on a particular instance of the posix registry provider holding a single project forml-titanic-example with two releases 0.1.dev0 and 1.3.dev12 - the first one having two generations with two model assets each and the later just one generation with three model assets:

flowchart LR
    subgraph reg1 ["Registry [posix]"]
        subgraph prj1 ["Project [forml-titanic-example]"]
            subgraph rel1 ["Release [0.1.dev0]"]
                direction LR
                pkg111[(package.4ml)]
                subgraph gen1111 ["Generation [1]"]
                    direction TB
                    act11111[(asset1)]
                    act11112[(asset2)]
                    tag1111>tag.toml]
                end
                subgraph gen1112 ["Generation [2]"]
                    direction TB
                    act11121[(asset1)]
                    act11122[(asset2)]
                    tag1112>tag.toml]
                end
            end
            subgraph rel2 ["Release [1.3.dev12]"]
                direction LR
                pkg112[(package.4ml)]
                subgraph gen1121 ["Generation [1]"]
                    direction TB
                    act11211[(asset1)]
                    act11212[(asset2)]
                    act11213[(asset3)]
                    tag1121>tag.toml]
                end
            end
        end
    end

Release Package

The deployable project code arrangement produced upon release from within the development life cycle is the binary ForML package. It is a zipfile object (typically a file with the .4ml suffix) containing all the project principal components bundled together with all of its runtime code dependencies (as declared in the project descriptor) plus some additional metadata (ForML package manifest).

Each ForML package is published with an explicit version as specified in the project descriptor at the time of release. All registry providers require packages of the same project to have unique monotonically increasing version numbers.

Package Staging

Registry providers might internally persist packages in an arbitrary format. In order to launch their code using a runner, however, they need to be mounted and exposed using a posix file system path known as the staging path that is reachable from all runner nodes (for distributed deployment this implies shared network posix file system).

Model Generation Assets

All stateful actors involved in a project life cycle require their internal state acquired during training to be persisted using the model registry. States produced from the same training process represent the model generation assets and every single follow-up training is leading to a new generation advancement.

Generations are implicitly versioned using an integer sequence number starting from 1 (relatively to the same release) incremented upon every generation advancement.

Since each actor can implement an arbitrary way of representing its own state, the model assets are persisted as monolithic binary blobs with a transparent structure.

The metadata associated with each generation is provided in form of an asset.Tag.

Content Management

Content of the registry can be managed using the CLI as follows (see the integrated help for full synopsis):

Use case

Command

New release publishing

$ forml project release

New generation publishing

$ forml model train

Registry content listing

$ forml model list

Persistence API

Low-level

The low-level persistence interface is used mainly by the registry providers.

class forml.project.Package(path: str | Path)[source]

ForML artifact representing a complete project code together with all of its dependencies packaged for distribution.

Parameters:
path: str | Path

File system path pointing to the package file.

class forml.project.Manifest(name: str | Key, version: str | Key, package: str, **modules: str)[source]

ForML distribution package metadata manifest.

Parameters:
name: str | Key

Project name.

version: str | Key

Project release version.

package: str

Full python package name containing the project principal components.

**modules: str

Individual project components mapping (if non-conventional).

class forml.io.asset.Tag(training: Training | None = None, tuning: Tuning | None = None, states: Sequence[UUID] | None = None)[source]

Generation metadata.

Parameters:
training: Training | None = None

Generation training information.

tuning: Tuning | None = None

Generation tuning information.

states: Sequence[UUID] | None = None

Sequence of state asset IDs.

class forml.io.asset.Registry(staging: str | Path | None = None)[source]

Abstract base class of the ForML model registry concept.

Parameters:
staging: str | Path | None = None

File system location reachable from all runner nodes to be used for package staging (defaults to a local temporal directory (invalid for distributed runners)).

mount(project: asset.Project.Key, release: asset.Release.Key) project.Artifact[source]

Pull and install the given project/release package using the staging file system location available to all runner nodes.

Parameters:
project: asset.Project.Key

Name of the project to work with.

release: asset.Release.Key

Version of the release to be loaded.

Returns:

Product artifact.

Raises:

forml.MissingError – The given artifact could not be found.

High-level

The following is the high-level persistence interface as used by the runners.

class forml.io.asset.State(generation: asset.Generation, nodes: Sequence[UUID], tag: asset.Tag | None = None)[source]

A high-level actor state persistence accessor.

It allows the runner to load and dump the states of individual stateful actors within the given generation.

class forml.io.asset.Instance(project: str | asset.Project.Key = '__main__', release: str | asset.Release.Key | None = None, generation: str | int | asset.Generation.Key | None = None, registry: asset.Directory | None = None)[source]

The top-level instance of a particular project/release/generation used by a Runner to access the runtime artifacts (both the release package and the model generation assets).

This is just a lazy reference not physically containing the actual assets - only fetching them upon the eventual access.

class forml.io.asset.Directory(registry: asset.Registry)[source]

Logical representation of a hierarchy of projects, their releases and generations.

class forml.io.asset.Project.Key

Project level key - i.e. the project name.

This can be any identifier valid as a Python package distribution name (i.e. lowercase with hyphens).

class forml.io.asset.Release.Key(key: str | Key = '0')

Project release level key - i.e. the release version.

This needs to be a valid PEP 440 version.

class forml.io.asset.Generation.Key(key: str | int | Key | None = 1)

Project model generation key - i.e. generation sequence number.

This must be a natural integer starting from 1.

Registry Providers

ForML comes with a number of providers implementing the io.asset.Registry interface. To make them available for the ForML runtime, selected providers need to be configured within the common platform setup using the [REGISTRY.*] sections.

The official registry providers are:

Volatile

(Pseudo)registry implementation provided as temporal non-distributed storage persistent only during its lifetime.

Posix

File-based registry backed by a locally-accessible posix file system.

Mlflow

ForML model registry implementation using the MLflow Tracking Server as the artifact storage.