Task Actor

An Actor is the lowest level entity - a node in the task graph - representing an atomic black-box transformation of the passing data.

Important

ForML cares neither about the particular internal processing functionality of any actors nor the actual types and formats of the data passed between them. All that ForML deals with are the actor interconnections within the overall flow topology - responsibility for their logical and functional compatibility is solely in hands of the implementer.

Actor Types

The two main actor types are:

  1. Plain stateless actors which define output as a function applied just to their input.

  2. More complex stateful actors produce output based on not just the input but also their inner state which it acquires during a separate phase called the train. We then distinguish between the train and apply modes in which the stateful actors operate. The framework transparently manages the actor state using its .get_state()/.set_state() methods and the model registry.

Ports

ForML actors have a number of input and output ports for the mutual interconnection within the task graph. The following diagram shows how the different ports might get engaged when in each of the particular actor modes:

flowchart LR
    subgraph Actor
        A([Apply Mode])
        T[Train Mode]
    end
    subgraph Input Ports
        AI1[/Apply 1\] --> A
        AI2[/Apply ...\] --> A
        AIM[/Apply M\] --> A
        TI[/Train/] --> T
        LI[/Labels/] --> T
        SI[(State)] -. set .-> A & T
        PI>Parameters] -. set .-> A & T
    end
    subgraph Output Ports
        A --> AO1[\Apply 1/]
        A --> AO2[\Apply .../]
        A --> AON[\Apply N/]
        T -. get .-> SO[(State)]
        A & T -. get .-> PO>Parameters]
    end

There is a couple of different ways the ports can be logically grouped together:

Level - how are the ports configured:
  • user-level ports (full lines in the diagram) are explicitly connected by the implementer

  • system-level ports (dotted lines in the diagram) are internally managed exclusively by ForML

Mode - when do the ports get engaged:
  • train-mode ports are engaged only during the train-mode

  • apply-mode ports are engaged only during the apply-mode

  • both-mode ports can be engaged in any mode

Direction - which way the data flows through the ports:
  • input ports are passing data into the Actor

  • output ports are emitting data out of the Actor

With this perspective, we can now describe each of the different ports as follows:

Name

Level

Mode

# Inputs

# Outputs

Description

Apply

user

apply

M

N

The features ports(s) to/from the apply-transformation.

Train

user

train

1

0

Features port to be trained on.

Label

user

train

1

0

Labels port to be trained on.

State

system

both

1

1

State getter/setter ports.

Parameters

system

both

1

1

Hyper-parameter getter/setter ports.

See also

The actual port management is discussed in great detail in the Flow Topology chapter, here we stay focused rather on the Actor itself.

Interface

The actor API is defined using an abstract class of flow.Actor. The generic way of implementing user-defined actors is to simply extend this class providing the relevant methods with the desired functionality. The main parts of the API look as follows:

class forml.flow.Actor[source]

Abstract actor base class.

This is a generic class with parametric input types flow.Features, flow.Labels and output type flow.Result.

abstract apply(*features: flow.Features) flow.Result[source]

The apply mode entry-point.

Mandatory method engaging the M:N input-output Apply ports.

Parameters:
*features: flow.Features

Input feature-set(s).

Returns:

Transformation result (i.e. predictions).

train(features: flow.Features, labels: flow.Labels, /) None[source]

The train mode entry point.

Optional method engaging the Train (features) and Label (labels) ports of stateful actors.

Unlike with the multiple apply-mode feature ports, there can only be a single train-mode feature port.

Parameters:
features: flow.Features

Train feature-set.

labels: flow.Labels

Train labels.

get_state() bytes[source]

Return the internal state of the actor.

The State output port representation.

The particular bytes-encoding of the returned value can be arbitrary as long as it is acceptable by the companion set_state() method.

The default implementation is using Python Pickle for serializing the entire actor object.

Returns:

State as bytes.

set_state(state: bytes) None[source]

Set the new internal state of the actor.

The State input port representation.

The default implementation is interpreting the state as the entire actor object serialized by Python Pickle.

Parameters:
state: bytes

Bytes to be used as internal state.

get_params() Mapping[str, Any][source]

Get the current hyper-parameters of the actor.

The Params output port representation.

All the values returned by this method must be acceptable by the companion set_params().

The default implementation returns empty mapping.

Returns:

Dictionary of the name-value of the hyper-parameters.

set_params(**params: Any) None[source]

Set new hyper-parameters of the actor (typically by a hyper-parameter tuner).

The Params input port representation.

The implementation of this method can choose to accept only a subset of the constructor arguments if some of them are not expected to be changed during the lifetime.

Parameters:
**params: Any

New hyper-parameters as keyword arguments.

classmethod builder(*args, **kwargs: Any) flow.Builder[_Actor][source]

Creating a builder instance for this actor.

Parameters:
*args

Positional arguments.

**kwargs: Any

Keyword arguments.

Returns:

Actor builder instance.

final class forml.flow.Builder(actor: type[_Actor], *args: Any, **kwargs: Any)[source]

Actor builder holding all the required initialization configuration for instantiating the particular actor.

Parameters:
actor: type[_Actor]

Target actor class.

*args: Any

Actor positional arguments.

**kwargs: Any

Actor keyword arguments.

Implementation

The following sections explain the different ways an Actor can be implemented.

Native Actors

The basic mechanism for declaring custom actors is simply extending the flow.Actor base class.

Example of a user-defined native actor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import typing
import pandas as pd
from forml import flow

class StaticImpute(flow.Actor[pandas.DataFrame, None, pandas.DataFrame]):
    """Simple stateless imputation actor using the provided value to fill the NaNs."""

    def __init__(self, column: str, value: float):
        self._column: str = column
        self._value: float = value

    def apply(self, df: pandas.DataFrame) -> pandas.DataFrame:
        return df[self._column].fillna(self._value)

    def get_params(self) -> typing.Mapping[str, typing.Any]:
        return {'column': self._column, 'value': self._value}

    def set_params(
        self,
        column: typing.Optional[str] = None,
        value: typing.Optional[float] = None,
    ) -> None:
        if column is not None:
            self._column = column
        if value is not None:
            self._value = value
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import typing
import pandas as pd
from forml import flow

class MeanImpute(flow.Actor[pandas.DataFrame, pandas.Series, pandas.DataFrame]):
    """Simple stateful imputation actor using the trained mean value to fill the NaNs.

    Using the default implementations of ``.get_state()`` and ``.set_state()`` methods.
    """

    def __init__(self, column: str):
        self._column: str = column
        self._value: typing.Optional[float] = None

    def train(self, df: pandas.DataFrame, labels: pandas.Series) -> None:
        self._value = df[self._column].mean()

    def apply(self, df: pandas.DataFrame) -> pandas.DataFrame:
        if self._value is None:
            raise RuntimeError('Not trained')
        df[self._column] = df[self._column].fillna(self._value)
        return df

    def get_params(self) -> typing.Mapping[str, typing.Any]:
        return {'column': self._column}

    def set_params(self, column: str) -> None:
        self._column = column

Decorated Function Actors

The less verbose option for defining actors is based on wrapping user-defined functions using the @wrap.Actor.train and/or @wrap.Actor.apply decorators from the Pipeline Library (the following examples match exactly the functionality as in the native implementations above):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import typing
import pandas
from forml.pipeline import wrap

@wrap.Actor.apply
def StaticImpute(
    df: pandas.DataFrame,
    *,
    column: str,
    value: float,
) -> pandas.DataFrame:
    """Simple stateless imputation actor using the provided value to fill the NaNs."""
    df[column] = df[column].fillna(value)
    return df
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import typing
import pandas
from forml.pipeline import wrap

@wrap.Actor.train
def MeanImpute(
    state: typing.Optional[float],
    df: pandas.DataFrame,
    labels: pandas.Series,
    *,
    column: str,
) -> float:
    """Train part of a stateful imputation actor using the trained mean value to fill
    the NaNs.
    """
    return df[column].mean()

@MeanImpute.apply
def MeanImpute(state: float, df: pandas.DataFrame, *, column: str) -> pandas.DataFrame:
    """Apply part of a stateful imputation actor using the trained mean value to fill
    the NaNs.
    """
    df[column] = df[column].fillna(state)
    return df

Important

To have a consistent naming convention for all actors regardless of their implementation (whether native classes or decorated functions) - it should stick with the class naming convention, i.e. the CapitalizedWords.

Mapped Actors

Third-party implementations that are logically compatible with the ForML actor concept can be easily mapped into valid ForML actors using the @wrap.Actor.type wrapper from the Pipeline Library:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from sklearn import ensemble
from forml.pipeline import wrap

RfcActor = wrap.Actor.type(
    ensemble.RandomForestClassifier,
    # mapping using target method reference
    train='fit',
    # mapping using a callable wrapper
    apply=lambda c, *a, **kw: c.predict_proba(*a, **kw).transpose()[-1],
)

Attention

Rather than just actors, third-party implementations are usually required to be converted all the way to ForML operators to be eventually composable within the pipeline expressions. For this purpose, there is an even easier method of turning those implementations into operators with no effort using the @wrap.importer context manager - see the operator auto-wrapping section for more details.