Projects built on ForML are in principle software source-code collections consisting of a set of defined components organized as a python package. Their ultimate purpose is to enable effective development leading to delivering (i.e. releasing a version of) a solution in form of a deployable artifact.
While developing, ForML allows execution of the project source-code working copy by triggering its development life cycle actions or when visited in the interactive mode.
Although not in the scope of this documentation, all the general source-code management best practices (version control, continuous integration/delivery, etc.) are applicable to ForML projects and should be integrated into the development process.
To discover the structure of some real ForML projects, it is worth exploring the available tutorials.
Starting a New Project¶
ForML project can be initialized either manually by implementing the component structure from
scratch or simply via the
init subcommand of the
forml command-line interface:
$ forml project init myproject
ForML projects are organized as usual python projects accompanied with a PEP 621 compliant
pyproject.toml. They are structured in a way to allow ForML identifying its principal
components and to operate its life cycle.
The framework adopts the Convention over Configuration approach for organizing the internal project structure to automatically discover the relevant components (it is still possible to ignore the convention and organize the project in an arbitrary way, but the author is then responsible for explicitly configuring all the otherwise automatic steps himself).
The typical project structure matching the ForML convention might look as the following tree:
<project_name> ├── pyproject.toml ├── <optional_project_namespace_package> │ └── <project_root_package> │ ├── __init__.py │ ├── pipeline # principal component as a package │ │ ├── __init__.py │ │ └── <moduleX>.py # arbitrary module not part of the convention │ ├── source.py # principal component as a module │ ├── evaluation.py │ ├── <moduleY>.py # another module not part of the convention │ └── tuning.py ├── tests │ ├── __init__.py │ ├── test_<pipeline>.py # actual name not part of the convention │ └── ... ├── README.md # not part of the convention ├── notebooks # not part of the convention │ └── ... └── ...
Clearly, the overall structure does not look any special - pretty usual python project layout
(plus some additional content). What makes it a ForML project is the particular modules and/or
packages within that structure and specific metadata provided in the
focus on each of these components in the following sections.
This is a standard pyproject.toml metadata descriptor with a specific ForML
section helping to integrate the ForML principal component structure. It’s placed directly in the
project root directory.
The minimal content looks as follows:
[project] name = "forml-tutorial-titanic" version = "0.1.dev1" dependencies = [ "openschema", "scikit-learn", "pandas", "numpy", ] [tool.forml] package = "titanic"
[project] section can contain any additional metadata supported by the PEP 621
Upon publishing (in the scope of the development life cycle), the
[project.version] value will become the release identifier and thus needs to
be a valid PEP 440 version.
The project should carefully specify all of its dependencies using the
list as these will be included in the released .4ml package artifact.
[tool.forml] section supports the following options:
packagestring referring to the python package containing the principal components
componentsmap allowing to override the conventional modules representing the individual principal components as submodules relatively to the
[tool.forml.components] evaluation = "relative.path.to.my.custom.evaluation.module" pipeline = "relative.path.to.my.custom.pipeline.module" source = "relative.path.to.my.custom.source.module"
These are the actual high-level blocks of the particular ForML solution provided as python modules (or packages) within the project package root.
ForML does not care whether the principal component is defined as a module (a file with
suffix) or a package (a subdirectory with
__init__.py file in it) since both have the
same import syntax.
To load each of the principal components, ForML relies on the
project.setup() function as the
expected component registration interface:
- forml.project.setup(source: project.Source) None [source]¶
forml.project.setup(pipeline: flow.Composable, schema: dsl.Source.Schema | None =
- forml.project.setup(evaluation: project.Evaluation) None
Interface for registering principal component instances.
This function is expected to be called exactly once from within every component module passing the component instance.
The true implementation of this function is only provided when imported within the component loader context (outside the context this is effectively no-op).
- source: project.Source¶
- pipeline: flow.Composable
- schema: dsl.Source.Schema | None =
Optional schema of the pipeline output.
- evaluation: project.Evaluation
Pipeline definition is the heart of the entire solution. It is provided in form of the workflow expression.
ForML expects this component to be provided as a
pipeline.py module or
package under the project package root.
1 2 3 4 5
source component provides the project with a definite while still portable dataset
description. It is specified using the
as a DSL expression against some particular schema catalog.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Definition of the model evaluation strategy for both the development and
production life cycles provided as the
1 2 3 4 5 6 7 8
ForML has a rich operator unit testing facility that can be integrated into the usual
tests/ project structure. This topic is extensively covered in the separate Unit Testing