Spark Runner

class forml.provider.runner.spark.Runner(instance: asset.Instance | None = None, feed: io.Feed | None = None, sink: io.Sink | None = None, **kwargs)[source]

Bases: Runner

ForML runner utilizing Apache Spark as a distributed executor.

Parameters:
**kwargs

Any Spark Configuration options.

The provider can be enabled using the following platform configuration:

config.toml
 [RUNNER.compute]
 provider = "spark"
 "spark.driver.cores" = 1
 "spark.driver.memory" = "1g"
 "spark.executor.cores" = 2
 "spark.executor.memory" = "1g"
 "spark.executor.pyspark.memory" = "1g"

Important

Select the spark extras to install ForML together with the Spark support.

Note

ForML uses Spark purely as an executor without any deeper integration with its robust data management API.