Project Organization¶
Projects built on ForML are in principle software source-code collections consisting of a set of defined components organized as a python package. Their ultimate purpose is to enable effective development leading to delivering (i.e. releasing a version of) a solution in form of a deployable artifact.
While developing, ForML allows execution of the project source-code working copy by triggering its development life cycle actions or when visited in the interactive mode.
Attention
Although not in the scope of this documentation, all the general source-code management best practices (version control, continuous integration/delivery, etc.) are applicable to ForML projects and should be integrated into the development process.
To discover the structure of some real ForML projects, it is worth exploring the available tutorials.
Starting a New Project¶
ForML project can be initialized either manually by implementing the component structure from
scratch or simply via the init
subcommand of the forml
command-line interface:
$ forml project init myproject
Component Structure¶
ForML projects are organized as usual python projects accompanied with a PEP 621 compliant
pyproject.toml
. They are structured in a way to allow ForML identifying its principal
components and to operate its life cycle.
The framework adopts the Convention over Configuration approach for organizing the internal project structure to automatically discover the relevant components (it is still possible to ignore the convention and organize the project in an arbitrary way, but the author is then responsible for explicitly configuring all the otherwise automatic steps himself).
The typical project structure matching the ForML convention might look as the following tree:
<project_name>
├── pyproject.toml
├── <optional_project_namespace_package>
│ └── <project_root_package>
│ ├── __init__.py
│ ├── pipeline # principal component as a package
│ │ ├── __init__.py
│ │ └── <moduleX>.py # arbitrary module not part of the convention
│ ├── source.py # principal component as a module
│ ├── evaluation.py
│ ├── <moduleY>.py # another module not part of the convention
│ └── tuning.py
├── tests
│ ├── __init__.py
│ ├── test_<pipeline>.py # actual name not part of the convention
│ └── ...
├── README.md # not part of the convention
├── notebooks # not part of the convention
│ └── ...
└── ...
Clearly, the overall structure does not look any special - pretty usual python project layout
(plus some additional content). What makes it a ForML project is the particular modules and/or
packages within that structure and specific metadata provided in the pyproject.toml
. Let’s
focus on each of these components in the following sections.
Project Descriptor¶
This is a standard pyproject.toml metadata descriptor with a specific ForML tool
section helping to integrate the ForML principal component structure. It’s placed directly in the
project root directory.
The minimal content looks as follows:
[project]
name = "forml-tutorial-titanic"
version = "0.1.dev1"
dependencies = [
"openschema",
"scikit-learn",
"pandas",
"numpy",
]
[tool.forml]
package = "titanic"
The [project]
section can contain any additional metadata supported by the PEP 621
specification.
Note
Upon publishing (in the scope of the development life cycle), the
specified [project.version]
value will become the release identifier and thus needs to
be a valid PEP 440 version.
The project should carefully specify all of its dependencies using the [project.dependencies]
list as these will be included in the released .4ml package artifact.
The custom [tool.forml]
section supports the following options:
the
package
string referring to the python package containing the principal componentsthe optional
components
map allowing to override the conventional modules representing the individual principal components as submodules relatively to thepackage
:[tool.forml.components] evaluation = "relative.path.to.my.custom.evaluation.module" pipeline = "relative.path.to.my.custom.pipeline.module" source = "relative.path.to.my.custom.source.module"
Principal Components¶
These are the actual high-level blocks of the particular ForML solution provided as python modules (or packages) within the project package root.
Hint
ForML does not care whether the principal component is defined as a module (a file with .py
suffix) or a package (a subdirectory with __init__.py
file in it) since both have the
same import syntax.
To load each of the principal components, ForML relies on the project.setup()
function as the
expected component registration interface:
- forml.project.setup(source: project.Source) None [source]¶
-
forml.project.setup(pipeline: flow.Composable, schema: dsl.Source.Schema | None =
None
) None - forml.project.setup(evaluation: project.Evaluation) None
Interface for registering principal component instances.
This function is expected to be called exactly once from within every component module passing the component instance.
The true implementation of this function is only provided when imported within the component loader context (outside the context this is effectively no-op).
- Parameters:
- source: project.Source¶
Source descriptor.
- pipeline: flow.Composable
Workflow expression.
- schema: dsl.Source.Schema | None =
None
Optional schema of the pipeline output.
- evaluation: project.Evaluation
Evaluation descriptor.
Pipeline Expression¶
Pipeline definition is the heart of the entire solution. It is provided in form of the workflow expression.
ForML expects this component to be provided as a pipeline.py
module or pipeline
package under the project package root.
1 2 3 4 5 |
|
Dataset Definition¶
The source
component provides the project with a definite while still portable dataset
description. It is specified using the project.Source.query
as a DSL expression against some particular schema catalog.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Evaluation Strategy¶
Definition of the model evaluation strategy for both the development and
production life cycles provided as the
project.Evaluation
descriptor.
1 2 3 4 5 6 7 8 |
|
Tests¶
ForML has a rich operator unit testing facility that can be integrated into the usual
tests/
project structure. This topic is extensively covered in the separate Unit Testing
chapter.