Internal Design¶
The DSL uses a comprehensive class hierarchy to implement the desired API. Although the internal design is irrelevant from the practical usability standpoint, it is essential for implementing additional parsers.
Model¶
The following class diagram outlines the API model:
classDiagram
Source <|-- Queryable
Source <|-- Statement
Statement <|-- Set
Statement <|-- Query
Queryable <|-- Query
Queryable <|-- Origin
Origin <|-- Join
Origin <|-- Reference
Origin <|-- Table
class Source {
<<abstract>>
+Schema schema
+list[Feature] features
+reference() Reference
+union() Set
+intersection() Set
+difference() Set
}
class Queryable {
<<abstract>>
+select() Query
+where() Query
+having() Query
+groupby() Query
+orderby() Query
+limit() Query
}
class Statement {
<<abstract>>
}
class Origin {
<<abstract>>
+inner_join() Join
+left_join() Join
+right_join() Join
+full_join() Join
+cross_join() Join
}
class Join {
+Origin left
+Origin right
+Expression condition
+Kind kind
}
class Set {
+Statement left
+Statement right
+Kind kind
}
class Reference {
+Source instance
+str name
}
class Query {
+Source source
+list[Feature] selection
+Expression prefilter
+list[Operable] grouping
+Expression postfilter
+list[Operable] ordering
+Rows rows
}
Feature <|-- Operable
Feature <|-- Aliased
Operable <|-- Literal
Operable <|-- Element
Element <|-- Column
Operable <|-- Expression
Expression <|-- Predicate
Expression <|-- Window
class Feature {
<<abstract>>
+Any kind
+alias() Aliased
}
class Operable {
<<abstract>>
+eq() Expression
+ne() Expression
+lt() Expression
+le() Expression
+gt() Expression
+ge() Expression
+and() Expression
+or() Expression
+not() Expression
+add() Expression
+sub() Expression
+mul() Expression
+div() Expression
+mod() Expression
}
class Expression {
<<abstract>>
}
class Predicate {
<<abstract>>
}
class Aliased {
+str name
+Operable operable
}
class Element {
+str name
+Origin origin
}
class Column {
+Table origin
}
class Literal {
+Any value
}
class Window {
+Operable partition
}Base Abstractions¶
The hierarchy starts with the following two abstractions:
- class forml.io.dsl.Source(*args)[source]¶
Base class of the tabular data frame sources.
A Source is anything that can be used to obtain tabular data FROM. It is a logical collection of
dsl.Featureinstances represented by itsschema.- class Schema(name: str, bases: tuple[type], namespace: dict[str, Any])[source]¶
Meta-class for schema types construction.
It guarantees consistent hashing and comparability for equality of the produced schema classes.
Attention
This meta-class is used internally, for schema frontend API see the
dsl.Schema.
- property schema : dsl.Source.Schema¶
Schema type representing this source.
- Returns:
Schema type.
- property features : Sequence[dsl.Feature]¶
List of features logically contained in or potentially produced by this Source.
- Returns:
Sequence of contained features.
-
reference(name: str | None =
None) dsl.Reference[source]¶ Get an independent reference to this Source (e.g. for self-join conditions).
- Parameters:
- Returns:
New reference to this Source.
Examples
>>> manager = staff.Employee.reference('manager') >>> subs = ( ... manager.join(staff.Employee, staff.Employee.manager == manager.id) ... .select(manager.name, function.Count(staff.Employee.id).alias('subs')) ... .groupby(manager.id) ... )
- union(other: dsl.Source) dsl.Set[source]¶
Create a new Source as a set union of this and the other Source.
- Parameters:
- other: dsl.Source¶
Source to union with.
- Returns:
Set instance.
Examples
>>> barbaz = ( ... foo.Bar.select(foo.Bar.X, foo.Bar.Y) ... .union(foo.Baz.select(foo.Baz.X, foo.Baz.Y)) ... )
- intersection(other: dsl.Source) dsl.Set[source]¶
Create a new Source as a set intersection of this and the other Source.
- Parameters:
- other: dsl.Source¶
Source to intersect with.
- Returns:
Set instance.
Examples
>>> barbaz = ( ... foo.Bar.select(foo.Bar.X, foo.Bar.Y) ... .intersection(foo.Baz.select(foo.Baz.X, foo.Baz.Y)) ... )
- difference(other: dsl.Source) dsl.Set[source]¶
Create a new Source as a set difference of this and the other Source.
- Parameters:
- other: dsl.Source¶
Source to difference with.
- Returns:
Set instance.
Examples
>>> barbaz = ( ... foo.Bar.select(foo.Bar.X, foo.Bar.Y) ... .difference(foo.Baz.select(foo.Baz.X, foo.Baz.Y)) ... )
- class forml.io.dsl.Feature(*args)[source]¶
Base class of the individual columnar data series features.
A Feature is anything that can be used as a handle for independent columnar data. It is a homogenous series of data of the same
kind.
Notable Interfaces¶
- class forml.io.dsl.Queryable(*args)[source]¶
Bases:
SourceBase class for any Source that can be queried directly.
- select(*features: dsl.Feature) dsl.Query[source]¶
Specify the output features to be provided (projection).
Repeated calls to
.selectreplace the earlier selection.- Parameters:
- *features: dsl.Feature¶
Sequence of features.
- Returns:
Query instance.
Examples
>>> barxy = foo.Bar.select(foo.Bar.X, foo.Bar.Y)
- where(condition: dsl.Predicate) dsl.Query[source]¶
Add a row-filtering condition that’s evaluated before any aggregations.
Repeated calls to
.wherecombine all the conditions (logical AND).- Parameters:
- condition: dsl.Predicate¶
Boolean feature expression.
- Returns:
Query instance.
Examples
>>> barx10 = foo.Bar.where(foo.Bar.X == 10)
- having(condition: dsl.Predicate) dsl.Query[source]¶
Add a row-filtering condition that’s applied to the evaluated aggregations.
Repeated calls to
.havingcombine all the conditions (logical AND).- Parameters:
- condition: dsl.Predicate¶
Boolean feature expression.
- Returns:
Query instance.
Examples
>>> bargy10 = foo.Bar.groupby(foo.Bar.X).having(function.Count(foo.Bar.Y) == 10)
- groupby(*features: dsl.Operable) dsl.Query[source]¶
Aggregation grouping specifiers.
Repeated calls to
.groupbyreplace the earlier grouping.- Parameters:
- *features: dsl.Operable¶
Sequence of aggregation features.
- Returns:
Query instance.
Examples
>>> bargbx = foo.Bar.groupby(foo.Bar.X).select(foo.Bar.X, function.Count(foo.Bar.Y))
- orderby(*terms: dsl.Ordering.Term) dsl.Query[source]¶
Ordering specifiers.
Default direction is ascending.
Repeated calls to
.orderbyreplace the earlier ordering.- Parameters:
- *terms: dsl.Ordering.Term¶
Sequence of feature and direction tuples.
- Returns:
Query instance.
Examples
>>> barbyx = foo.Bar.orderby(foo.Bar.X) >>> barbyxd = foo.Bar.orderby(foo.Bar.X, 'desc') >>> barbxy = foo.Bar.orderby(foo.Bar.X, foo.Bar.Y) >>> barbxdy = foo.Bar.orderby( ... foo.Bar.X, dsl.Ordering.Direction.DESCENDING, foo.Bar.Y, 'asc' ... ) >>> barbydxd = foo.Bar.orderby( ... (foo.Bar.X, 'desc'), ... (foo.Bar.Y, dsl.Ordering.Direction.DESCENDING), ... )
- class forml.io.dsl.Origin(*args)[source]¶
Bases:
QueryableOrigin is a queryable Source with some handle.
Its features are represented using
dsl.Element.- inner_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join[source]¶
Construct an inner join with the other origin using the provided condition.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- condition: dsl.Predicate¶
Feature expression as the join condition.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.inner_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
- left_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join[source]¶
Construct a left join with the other origin using the provided condition.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- condition: dsl.Predicate¶
Feature expression as the join condition.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.left_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
- right_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join[source]¶
Construct a right join with the other origin using the provided condition.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- condition: dsl.Predicate¶
Feature expression as the join condition.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.right_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
- full_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join[source]¶
Construct a full join with the other origin using the provided condition.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- condition: dsl.Predicate¶
Feature expression as the join condition.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.full_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
- cross_join(other: dsl.Origin) dsl.Join[source]¶
Construct a cross join with the other origin.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.cross_join(foo.Baz)
- class forml.io.dsl.Statement(*args)[source]¶
Bases:
SourceBase class for complete statements.
Complete statements are:
- class forml.io.dsl.Operable(*args)[source]¶
Bases:
FeatureBase class for features that can be used in expressions, conditions, grouping, and/or ordering definitions.
In principle, any non-aliased future is Operable.
Operable feature instances are overloading the following native Python operator API to provide convenient DSL semantic:
Type
Syntax
Comparison
==,!=,<,<=,>,>=Logical
&,|,~Arithmetical
+,-,*,/,%See also
Additional details are available under the DSL functions and operators.
- class forml.io.dsl.Element(source: dsl.Origin, name: str)[source]¶
Bases:
OperableName-referenced feature of a particular origin (
dsl.Tableordsl.Reference).
- class forml.io.dsl.Predicate[source]¶
Mixin for features representing logical or comparison operations.
Specific predicate instances are produced from the native
dsl.Operableoperators.
Notable Final Types¶
- class forml.io.dsl.Table(name: str, bases: tuple[type], namespace: dict[str, Any])[source]¶
- class forml.io.dsl.Table(schema: dsl.Source.Schema)
Bases:
OriginTable based Source with an explicit schema.
Attention
The primary way of creating
Tableinstances is by inheriting thedsl.Schemawhich is using this type as a meta-class.
-
class forml.io.dsl.Query(source: dsl.Source, selection: Iterable[dsl.Feature] | None =
None, prefilter: dsl.Predicate | None =None, grouping: Iterable[dsl.Operable] | None =None, postfilter: dsl.Predicate | None =None, ordering: Sequence[dsl.Ordering.Term] | None =None, rows: dsl.Rows | None =None)[source]¶ -
Query based Source.
Container for holding all the parameters supplied via the
dsl.Queryableinterface.Attention
Instances are expected to be created internally via the
dsl.Queryableinterface methods.
- class forml.io.dsl.Set(left: dsl.Source, right: dsl.Source, kind: dsl.Set.Kind)[source]¶
Bases:
StatementSource made of two set-combined sub-statements with the same schema.
Attention
Instances are expected to be created internally via:
-
class forml.io.dsl.Join(left: dsl.Origin, right: dsl.Origin, kind: dsl.Join.Kind | str, condition: dsl.Predicate | None =
None)[source]¶ Bases:
OriginSource made of two join-combined sub-sources.
Attention
Instances are expected to be created internally via:
- class Kind(value)[source]¶
Bases:
EnumJoin type enum.
-
INNER =
'inner'¶ Inner join type (default if condition is provided).
-
LEFT =
'left'¶ Left outer join type.
-
RIGHT =
'right'¶ Right outer join type.
-
FULL =
'full'¶ Full join type.
-
CROSS =
'cross'¶ Cross join type (default if condition is not provided).
-
INNER =
-
class forml.io.dsl.Rows(count: int, offset: int =
0)[source]¶ Row limit spec container.
Attention
Instances are expected to be created internally via
dsl.Queryable.limit.
-
class forml.io.dsl.Reference(instance: dsl.Source, name: str | None =
None)[source]¶ Bases:
OriginWrapper around any Source associating it with a (possibly random) name.
Attention
Instances are expected to be created internally via
dsl.Source.reference.
- class forml.io.dsl.Column(table: dsl.Table, name: str)[source]¶
Bases:
ElementSpecial type of element is the table column type.
- class forml.io.dsl.Aliased(feature: dsl.Feature, alias: str)[source]¶
Bases:
FeatureRepresentation of a feature with an explicit name alias.
Attention
Instances are expected to be created internally via
dsl.Feature.alias.
-
class forml.io.dsl.Ordering(feature: dsl.Operable, direction: dsl.Ordering.Direction | str | None =
None)[source]¶ Container for holding the ordering specification.
Attention
Instances are expected to be created internally by
dsl.Queryable.orderby.- Term¶
Type alias for accepted ordering specifiers.
alias of
Union[dsl.Operable,dsl.Ordering.Direction,str,tuple[dsl.Operable, typing.Union[dsl.Ordering.Direction, str]]]
-
class forml.io.dsl.Window(function: dsl.Window.Function, partition: Sequence[dsl.Feature], ordering: Sequence[dsl.Ordering.Term] | None =
None, frame: Optional =None)[source]¶ Window function wrapper feature representation.
Attention
Instances are expected to be created internally via
dsl.Window.Function.over().See also
Supported window functions are available in the window module.
Todo
Support for window expressions is experimental and unlikely to be supported by the existing parsers.
Exceptions¶
- exception forml.io.dsl.GrammarError[source]¶
Bases:
InvalidErrorIndicating syntactical error in the given DSL query statement.
Parser¶
Since the constructed DSL query statement is a generic descriptor with no means
of direct execution, the ETL process depends on a particular io.Feed.Reader implementation to parse that query into a set of instructions
corresponding to the selected feed and its target storage layer.
Generic Interface¶
-
forml.io.dsl.parser.Source =
~Source¶ Generic storage-native representation of
dsl.Source.
-
forml.io.dsl.parser.Feature =
~Feature¶ Generic storage-native representation of
dsl.Feature.
- class forml.io.dsl.parser.Visitor(sources: Mapping[dsl.Source, parser.Source], features: Mapping[dsl.Feature, parser.Feature])[source]¶
Abstract base class for DSL query statement parser implementations.
In this context, parsing essentially means conversion between the generic DSL-based instance of the particular query and its native representation matching a selected target storage layer.
Conceptually, the parser is implemented as a combination of a visitor traversing the query statement structure and a push-down automaton assembling the generated instructions in their storage-native representation.
The parser assumes resolving the native representation of all the leaves of the query statement tree (the
dsl.Tableanddsl.Columninstances) or possibly entire branches can be accomplished via the provided initial mappings from which the parser builds the complete query up.Upon failing to resolve any particular source/feature using the initial mappings, the parser raises the
dsl.UnprovisionedErrorindicating unavailability of the given data source.- Parameters:
- sources: Mapping[dsl.Source, parser.Source]¶
Explicit mapping of generic DSL sources (typically
dsl.Table) to their native representations.- features: Mapping[dsl.Feature, parser.Feature]¶
Explicit mapping of generic DSL features (typically
dsl.Column) to their native representations.
Exceptions¶
- exception forml.io.dsl.UnprovisionedError[source]¶
Bases:
MissingErrorSource or Feature resolving exception.
Raised by DSL parsers when the given source or feature (typically
dsl.Tableordsl.Column) can’t be resolved using the available data sources.
- exception forml.io.dsl.UnsupportedError[source]¶
Bases:
MissingErrorIndicating DSL operation unsupported by the given parser.
References¶
For reference, several existing Parser implementations can be found under the
forml.provider.feed.reader package:
|
Frame DSL parser producing SQLAlchemy select expression. |