Internal Design

The DSL uses a comprehensive class hierarchy to implement the desired API. Although the internal design is irrelevant from the practical usability standpoint, it is essential for implementing additional parsers.

Model

The following class diagram outlines the API model:

classDiagram
    Source <|-- Queryable
    Source <|-- Statement
    Statement <|-- Set
    Statement <|-- Query
    Queryable <|-- Query
    Queryable <|-- Origin
    Origin <|-- Join
    Origin <|-- Reference
    Origin <|-- Table

    class Source {
        <<abstract>>
        +Schema schema
        +list[Feature] features
        +reference() Reference
        +union() Set
        +intersection() Set
        +difference() Set
    }

    class Queryable {
        <<abstract>>
        +select() Query
        +where() Query
        +having() Query
        +groupby() Query
        +orderby() Query
        +limit() Query
    }

    class Statement {
        <<abstract>>
    }

    class Origin {
        <<abstract>>
        +inner_join() Join
        +left_join() Join
        +right_join() Join
        +full_join() Join
        +cross_join() Join
    }

    class Join {
        +Origin left
        +Origin right
        +Expression condition
        +Kind kind
    }

    class Set {
        +Statement left
        +Statement right
        +Kind kind
    }

    class Reference {
        +Source instance
        +str name
    }

    class Query {
        +Source source
        +list[Feature] selection
        +Expression prefilter
        +list[Operable] grouping
        +Expression postfilter
        +list[Operable] ordering
        +Rows rows
    }

    Feature <|-- Operable
    Feature <|-- Aliased
    Operable <|-- Literal
    Operable <|-- Element
    Element <|-- Column
    Operable <|-- Expression
    Expression <|-- Predicate
    Expression <|-- Window

    class Feature {
        <<abstract>>
        +Any kind
        +alias() Aliased
    }

    class Operable {
        <<abstract>>
        +eq() Expression
        +ne() Expression
        +lt() Expression
        +le() Expression
        +gt() Expression
        +ge() Expression
        +and() Expression
        +or() Expression
        +not() Expression
        +add() Expression
        +sub() Expression
        +mul() Expression
        +div() Expression
        +mod() Expression
    }

    class Expression {
        <<abstract>>
    }

    class Predicate {
        <<abstract>>
    }

    class Aliased {
        +str name
        +Operable operable
    }

    class Element {
        +str name
        +Origin origin
    }

    class Column {
        +Table origin
    }

    class Literal {
        +Any value
    }

    class Window {
        +Operable partition
    }

Base Abstractions

The hierarchy starts with the following two abstractions:

class forml.io.dsl.Source(*args)[source]

Base class of the tabular data frame sources.

A Source is anything that can be used to obtain tabular data FROM. It is a logical collection of dsl.Feature instances represented by its schema.

class Schema(name: str, bases: tuple[type], namespace: dict[str, Any])[source]

Meta-class for schema types construction.

It guarantees consistent hashing and comparability for equality of the produced schema classes.

Attention

This meta-class is used internally, for schema frontend API see the dsl.Schema.

property schema : dsl.Source.Schema

Schema type representing this source.

Returns:

Schema type.

property features : Sequence[dsl.Feature]

List of features logically contained in or potentially produced by this Source.

Returns:

Sequence of contained features.

reference(name: str | None = None) dsl.Reference[source]

Get an independent reference to this Source (e.g. for self-join conditions).

Parameters:
name: str | None = None

Optional alias to be used for this reference (random by default).

Returns:

New reference to this Source.

Examples

>>> manager = staff.Employee.reference('manager')
>>> subs = (
...     manager.join(staff.Employee, staff.Employee.manager == manager.id)
...     .select(manager.name, function.Count(staff.Employee.id).alias('subs'))
...     .groupby(manager.id)
... )
union(other: dsl.Source) dsl.Set[source]

Create a new Source as a set union of this and the other Source.

Parameters:
other: dsl.Source

Source to union with.

Returns:

Set instance.

Examples

>>> barbaz = (
...     foo.Bar.select(foo.Bar.X, foo.Bar.Y)
...     .union(foo.Baz.select(foo.Baz.X, foo.Baz.Y))
... )
intersection(other: dsl.Source) dsl.Set[source]

Create a new Source as a set intersection of this and the other Source.

Parameters:
other: dsl.Source

Source to intersect with.

Returns:

Set instance.

Examples

>>> barbaz = (
...     foo.Bar.select(foo.Bar.X, foo.Bar.Y)
...     .intersection(foo.Baz.select(foo.Baz.X, foo.Baz.Y))
... )
difference(other: dsl.Source) dsl.Set[source]

Create a new Source as a set difference of this and the other Source.

Parameters:
other: dsl.Source

Source to difference with.

Returns:

Set instance.

Examples

>>> barbaz = (
...     foo.Bar.select(foo.Bar.X, foo.Bar.Y)
...     .difference(foo.Baz.select(foo.Baz.X, foo.Baz.Y))
... )
class forml.io.dsl.Feature(*args)[source]

Base class of the individual columnar data series features.

A Feature is anything that can be used as a handle for independent columnar data. It is a homogenous series of data of the same kind.

abstract property kind : dsl.Any

Feature type.

Returns:

Type.

alias(alias: str) dsl.Aliased[source]

Use an alias for this feature.

Parameters:
alias: str

Aliased feature name.

Returns:

New feature instance with the given alias.

Notable Interfaces

class forml.io.dsl.Queryable(*args)[source]

Bases: Source

Base class for any Source that can be queried directly.

select(*features: dsl.Feature) dsl.Query[source]

Specify the output features to be provided (projection).

Repeated calls to .select replace the earlier selection.

Parameters:
*features: dsl.Feature

Sequence of features.

Returns:

Query instance.

Examples

>>> barxy = foo.Bar.select(foo.Bar.X, foo.Bar.Y)
where(condition: dsl.Predicate) dsl.Query[source]

Add a row-filtering condition that’s evaluated before any aggregations.

Repeated calls to .where combine all the conditions (logical AND).

Parameters:
condition: dsl.Predicate

Boolean feature expression.

Returns:

Query instance.

Examples

>>> barx10 = foo.Bar.where(foo.Bar.X == 10)
having(condition: dsl.Predicate) dsl.Query[source]

Add a row-filtering condition that’s applied to the evaluated aggregations.

Repeated calls to .having combine all the conditions (logical AND).

Parameters:
condition: dsl.Predicate

Boolean feature expression.

Returns:

Query instance.

Examples

>>> bargy10 = foo.Bar.groupby(foo.Bar.X).having(function.Count(foo.Bar.Y) == 10)
groupby(*features: dsl.Operable) dsl.Query[source]

Aggregation grouping specifiers.

Repeated calls to .groupby replace the earlier grouping.

Parameters:
*features: dsl.Operable

Sequence of aggregation features.

Returns:

Query instance.

Examples

>>> bargbx = foo.Bar.groupby(foo.Bar.X).select(foo.Bar.X, function.Count(foo.Bar.Y))
orderby(*terms: dsl.Ordering.Term) dsl.Query[source]

Ordering specifiers.

Default direction is ascending.

Repeated calls to .orderby replace the earlier ordering.

Parameters:
*terms: dsl.Ordering.Term

Sequence of feature and direction tuples.

Returns:

Query instance.

Examples

>>> barbyx = foo.Bar.orderby(foo.Bar.X)
>>> barbyxd = foo.Bar.orderby(foo.Bar.X, 'desc')
>>> barbxy = foo.Bar.orderby(foo.Bar.X, foo.Bar.Y)
>>> barbxdy = foo.Bar.orderby(
...     foo.Bar.X, dsl.Ordering.Direction.DESCENDING, foo.Bar.Y, 'asc'
... )
>>> barbydxd = foo.Bar.orderby(
...     (foo.Bar.X, 'desc'),
...     (foo.Bar.Y, dsl.Ordering.Direction.DESCENDING),
... )
limit(count: int, offset: int = 0) dsl.Query[source]

Restrict the result rows by its max count with an optional offset.

Repeated calls to .limit replace the earlier restriction.

Parameters:
count: int

Number of rows to return.

offset: int = 0

Skip the given number of rows.

Returns:

Query instance.

Examples

>>> bar10 = foo.Bar.limit(10)
class forml.io.dsl.Origin(*args)[source]

Bases: Queryable

Origin is a queryable Source with some handle.

Its features are represented using dsl.Element.

inner_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join[source]

Construct an inner join with the other origin using the provided condition.

Parameters:
other: dsl.Origin

Source to join with.

condition: dsl.Predicate

Feature expression as the join condition.

Returns:

Join instance.

Examples

>>> barbaz = foo.Bar.inner_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
left_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join[source]

Construct a left join with the other origin using the provided condition.

Parameters:
other: dsl.Origin

Source to join with.

condition: dsl.Predicate

Feature expression as the join condition.

Returns:

Join instance.

Examples

>>> barbaz = foo.Bar.left_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
right_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join[source]

Construct a right join with the other origin using the provided condition.

Parameters:
other: dsl.Origin

Source to join with.

condition: dsl.Predicate

Feature expression as the join condition.

Returns:

Join instance.

Examples

>>> barbaz = foo.Bar.right_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
full_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join[source]

Construct a full join with the other origin using the provided condition.

Parameters:
other: dsl.Origin

Source to join with.

condition: dsl.Predicate

Feature expression as the join condition.

Returns:

Join instance.

Examples

>>> barbaz = foo.Bar.full_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
cross_join(other: dsl.Origin) dsl.Join[source]

Construct a cross join with the other origin.

Parameters:
other: dsl.Origin

Source to join with.

Returns:

Join instance.

Examples

>>> barbaz = foo.Bar.cross_join(foo.Baz)
class forml.io.dsl.Statement(*args)[source]

Bases: Source

Base class for complete statements.

Complete statements are:

class forml.io.dsl.Operable(*args)[source]

Bases: Feature

Base class for features that can be used in expressions, conditions, grouping, and/or ordering definitions.

In principle, any non-aliased future is Operable.

Operable feature instances are overloading the following native Python operator API to provide convenient DSL semantic:

Type

Syntax

Comparison

==, !=, <, <=, >, >=

Logical

&, |, ~

Arithmetical

+, -, *, /, %

See also

Additional details are available under the DSL functions and operators.

class forml.io.dsl.Element(source: dsl.Origin, name: str)[source]

Bases: Operable

Name-referenced feature of a particular origin (dsl.Table or dsl.Reference).

class forml.io.dsl.Predicate[source]

Mixin for features representing logical or comparison operations.

Specific predicate instances are produced from the native dsl.Operable operators.

Notable Final Types

class forml.io.dsl.Table(name: str, bases: tuple[type], namespace: dict[str, Any])[source]
class forml.io.dsl.Table(schema: dsl.Source.Schema)

Bases: Origin

Table based Source with an explicit schema.

Attention

The primary way of creating Table instances is by inheriting the dsl.Schema which is using this type as a meta-class.

class forml.io.dsl.Query(source: dsl.Source, selection: Iterable[dsl.Feature] | None = None, prefilter: dsl.Predicate | None = None, grouping: Iterable[dsl.Operable] | None = None, postfilter: dsl.Predicate | None = None, ordering: Sequence[dsl.Ordering.Term] | None = None, rows: dsl.Rows | None = None)[source]

Bases: Queryable, Statement

Query based Source.

Container for holding all the parameters supplied via the dsl.Queryable interface.

Attention

Instances are expected to be created internally via the dsl.Queryable interface methods.

class forml.io.dsl.Set(left: dsl.Source, right: dsl.Source, kind: dsl.Set.Kind)[source]

Bases: Statement

Source made of two set-combined sub-statements with the same schema.

Attention

Instances are expected to be created internally via:

class Kind(value)[source]

Bases: Enum

Set type enum.

UNION = 'union'

Union set operation type.

INTERSECTION = 'intersection'

Intersection set operation type.

DIFFERENCE = 'difference'

Difference set operation type.

class forml.io.dsl.Join(left: dsl.Origin, right: dsl.Origin, kind: dsl.Join.Kind | str, condition: dsl.Predicate | None = None)[source]

Bases: Origin

Source made of two join-combined sub-sources.

class Kind(value)[source]

Bases: Enum

Join type enum.

INNER = 'inner'

Inner join type (default if condition is provided).

LEFT = 'left'

Left outer join type.

RIGHT = 'right'

Right outer join type.

FULL = 'full'

Full join type.

CROSS = 'cross'

Cross join type (default if condition is not provided).

class forml.io.dsl.Rows(count: int, offset: int = 0)[source]

Row limit spec container.

Attention

Instances are expected to be created internally via dsl.Queryable.limit.

class forml.io.dsl.Reference(instance: dsl.Source, name: str | None = None)[source]

Bases: Origin

Wrapper around any Source associating it with a (possibly random) name.

Attention

Instances are expected to be created internally via dsl.Source.reference.

class forml.io.dsl.Column(table: dsl.Table, name: str)[source]

Bases: Element

Special type of element is the table column type.

class forml.io.dsl.Aliased(feature: dsl.Feature, alias: str)[source]

Bases: Feature

Representation of a feature with an explicit name alias.

Attention

Instances are expected to be created internally via dsl.Feature.alias.

class forml.io.dsl.Ordering(feature: dsl.Operable, direction: dsl.Ordering.Direction | str | None = None)[source]

Container for holding the ordering specification.

Attention

Instances are expected to be created internally by dsl.Queryable.orderby.

Term

Type alias for accepted ordering specifiers.

alias of Union[dsl.Operable, dsl.Ordering.Direction, str, tuple[dsl.Operable, typing.Union[dsl.Ordering.Direction, str]]]

class Direction(value)[source]

Ordering direction enum.

ASCENDING = 'ascending'

Ascending direction.

DESCENDING = 'descending'

Descending direction.

classmethod make(*terms: dsl.Ordering.Term) Iterable[dsl.Ordering][source]

Helper to generate orderings from the given terms.

Parameters:
*terms: dsl.Ordering.Term

One or many features or actual ordering instances.

Returns:

Iterator of ordering instances.

class forml.io.dsl.Window(function: dsl.Window.Function, partition: Sequence[dsl.Feature], ordering: Sequence[dsl.Ordering.Term] | None = None, frame: Optional = None)[source]

Window function wrapper feature representation.

Attention

Instances are expected to be created internally via dsl.Window.Function.over().

See also

Supported window functions are available in the window module.

Todo

Support for window expressions is experimental and unlikely to be supported by the existing parsers.

class Function[source]

Window function representation mixin.

abstract property kind : dsl.Any

Function return type.

Returns:

Type.

over(partition: Sequence[dsl.Operable], ordering: Sequence[dsl.Ordering.Term] | None = None, frame: Optional = None) dsl.Window[source]

Create a window using this function.

Parameters:
partition: Sequence[dsl.Operable]

Window partitioning specifying the rows of query results.

ordering: Sequence[dsl.Ordering.Term] | None = None

Order in which input rows should be processed.

frame: Optional = None

Sliding window specification.

Returns:

Windowed feature instance.

Exceptions

exception forml.io.dsl.GrammarError[source]

Bases: InvalidError

Indicating syntactical error in the given DSL query statement.

Parser

Since the constructed DSL query statement is a generic descriptor with no means of direct execution, the ETL process depends on a particular io.Feed.Reader implementation to parse that query into a set of instructions corresponding to the selected feed and its target storage layer.

Generic Interface

forml.io.dsl.parser.Source = ~Source

Generic storage-native representation of dsl.Source.

forml.io.dsl.parser.Feature = ~Feature

Generic storage-native representation of dsl.Feature.

class forml.io.dsl.parser.Visitor(sources: Mapping[dsl.Source, parser.Source], features: Mapping[dsl.Feature, parser.Feature])[source]

Abstract base class for DSL query statement parser implementations.

In this context, parsing essentially means conversion between the generic DSL-based instance of the particular query and its native representation matching a selected target storage layer.

Conceptually, the parser is implemented as a combination of a visitor traversing the query statement structure and a push-down automaton assembling the generated instructions in their storage-native representation.

The parser assumes resolving the native representation of all the leaves of the query statement tree (the dsl.Table and dsl.Column instances) or possibly entire branches can be accomplished via the provided initial mappings from which the parser builds the complete query up.

Upon failing to resolve any particular source/feature using the initial mappings, the parser raises the dsl.UnprovisionedError indicating unavailability of the given data source.

Parameters:
sources: Mapping[dsl.Source, parser.Source]

Explicit mapping of generic DSL sources (typically dsl.Table) to their native representations.

features: Mapping[dsl.Feature, parser.Feature]

Explicit mapping of generic DSL features (typically dsl.Column) to their native representations.

Exceptions

exception forml.io.dsl.UnprovisionedError[source]

Bases: MissingError

Source or Feature resolving exception.

Raised by DSL parsers when the given source or feature (typically dsl.Table or dsl.Column) can’t be resolved using the available data sources.

exception forml.io.dsl.UnsupportedError[source]

Bases: MissingError

Indicating DSL operation unsupported by the given parser.

References

For reference, several existing Parser implementations can be found under the forml.provider.feed.reader package:

Alchemy

Frame DSL parser producing SQLAlchemy select expression.