Machine learning projects are operated in typical modes that are followed in a particular order. This pattern is what we call a lifecycle. ForML supports two specific lifecycles depending on the project stage.
This lifecycle is typically followed during the project development. All work is done in the scope of the project source
code working copy and no persistent modules are produced upon execution. It is typically managed using the
python setup.py <mode> interface or in special case using the Interactive Mode. This lifecycle is supposed to aid
the development process allowing to quickly see the effect of the project changes.
The expected behaviour of the particular mode depends on the correct project setup as per the Project sections.
The modes of a research lifecycle are:
Simply run through the unit tests defined as per the Operator Unit Testing framework.
$ python3 setup.py test
Perform an evaluation based on the specs defined in
evaluation.pyand return the metrics. This can be defined either as cross-validation or hold-out training. One of the potential use-cases might be a CI integration to continuously monitor (evaluate) the changes in the project development.
$ python3 setup.py eval
Run hyper-parameter tuning reporting the results (not implemented yet).
$ python3 setup.py tune
Run the pipeline in the standard train mode. This will produce all the defined models but since it won’t persist them, this mode is useful merely for testing the training (or displaying the task graph on the Visualize task graphs).
$ python3 setup.py train
Create the distributable project artifact containing all of its dependencies (produced into the
distdirectory under the project root directory).
$ python3 setup.py bdist_4ml
Build and wrap the project into a runnable Artifact producing a new Lineage (that can then be used within the Production Lifecycle) and upload it to a persistent registry.
Each particular registry allows uploading only distinct monotonically increasing lineages per any given project, hence executing this stage twice against the same registry without incrementing the project version will fail.
$ python3 setup.py bdist_4ml upload
After publishing a project lineage in to a registry using the
upload mode of the research lifecycle, the project
becomes available for the production lifecycle. Contrary to the research, this production lifecycle no longer needs
the project source code working copy as it operates solely on the published artifact plus potentially previously
persisted model generations.
The production lifecycle is operated using the CLI (see runtime for full synopsis) and offers the following modes:
Fit (incrementally) the stateful parts of the pipeline using new labelled data producing a new Generation of the given lineage (unless explicit, the default lineage is the one with the highest version).
forml train titanic
Run hyper-parameter tuning of the selected pipeline and produce new generation (not implemented yet).
forml tune titanic
Run unlabelled data through a project generation (unless explicit, the default generation is the one with the highest version) producing transformed output (ie predictions).
forml apply titanic
Measure the actual performance of the model based on the definitions in
evaluation.py(not implemented yet).
forml eval titanic