Monolite Feed¶
-
class forml.provider.feed.monolite.Feed(inline: Mapping[dsl.Source | str, layout.RowMajor] | None =
None
, csv: Mapping[dsl.Source | str, Path | str | Mapping[str, Any]] | None =None
, parquet: Mapping[dsl.Source | str, Path | str | Mapping[str, Any]] | None =None
)[source]¶ Bases:
Feed
Lightweight feed for pulling data from multiple simple origins.
The feed can resolve queries across all of its combined data sources.
All the origins need to be declared using a proper content resolver mapping with keys representing the fully qualified schema name formatted as
<full.module.path>:<qualified.Class.Name>
and the values should be origin-specific configuration options.Attention
All the referenced schema catalogs must be installed.
Supported origins:
Inline data provided as a row-oriented array.
CSV files parsed using the
pandas.read_csv()
.Parquet files parsed using the
pandas.read_parquet()
.
- Parameters:
- inline: Mapping[dsl.Source | str, layout.RowMajor] | None =
None
¶ Schema mapping of datasets provided inline as native row-oriented arrays.
- csv: Mapping[dsl.Source | str, Path | str | Mapping[str, Any]] | None =
None
¶ Schema mapping of datasets accessible using a CSV reader. Values can either be direct file system paths or mapping with two keys:
path
pointing to the CSV filekwargs
containing additional options to be passed to the underlyingpandas.read_csv()
- parquet: Mapping[dsl.Source | str, Path | str | Mapping[str, Any]] | None =
None
¶ Schema mapping of datasets accessible using a Parquet reader. Values can either be direct file system paths or mapping with two keys:
path
pointing to the Parquet filekwargs
containing additional options to be passed to the underlyingpandas.read_parquet()
- inline: Mapping[dsl.Source | str, layout.RowMajor] | None =
The provider can be enabled using the following platform configuration:
config.toml¶[FEED.mono] provider = "monolite" [FEED.mono.inline] "foobar.schemas:Foo.Baz" = [ ["alpha", 27, 0.314, 2021-05-11T17:12:24], ["beta", 11, -1.12, 2020-11-03T01:24:56], ] [FEED.mono.csv] "openschema.kaggle:Titanic" = "/tmp/titanic.csv" [FEED.mono.csv."openschema.sklearn:Iris"] path = "/tmp/iris.csv" kwargs = {sep = ";", engine = "pyarrow"} [FEED.mono.parquet] "openschema.kaggle:Avazu" = "/tmp/avazu.parquet"
Important
Select the
sql
extras to install ForML together with the SQLAlchemy support.Todo
More file types (json)
Multi-file data sources (partitions)