Internal Design¶
The DSL uses a comprehensive class hierarchy to implement the desired API. Although the internal design is irrelevant from the practical usability standpoint, it is essential for implementing additional parsers.
Model¶
The following class diagram outlines the API model:
classDiagram
Source <|-- Queryable
Source <|-- Statement
Statement <|-- Set
Statement <|-- Query
Queryable <|-- Query
Queryable <|-- Origin
Origin <|-- Join
Origin <|-- Reference
Origin <|-- Table
class Source {
<<abstract>>
+Schema schema
+list[Feature] features
+reference() Reference
+union() Set
+intersection() Set
+difference() Set
}
class Queryable {
<<abstract>>
+select() Query
+where() Query
+having() Query
+groupby() Query
+orderby() Query
+limit() Query
}
class Statement {
<<abstract>>
}
class Origin {
<<abstract>>
+inner_join() Join
+left_join() Join
+right_join() Join
+full_join() Join
+cross_join() Join
}
class Join {
+Origin left
+Origin right
+Expression condition
+Kind kind
}
class Set {
+Statement left
+Statement right
+Kind kind
}
class Reference {
+Source instance
+str name
}
class Query {
+Source source
+list[Feature] selection
+Expression prefilter
+list[Operable] grouping
+Expression postfilter
+list[Operable] ordering
+Rows rows
}
Feature <|-- Operable
Feature <|-- Aliased
Operable <|-- Literal
Operable <|-- Element
Element <|-- Column
Operable <|-- Expression
Expression <|-- Predicate
Expression <|-- Window
class Feature {
<<abstract>>
+Any kind
+alias() Aliased
}
class Operable {
<<abstract>>
+eq() Expression
+ne() Expression
+lt() Expression
+le() Expression
+gt() Expression
+ge() Expression
+and() Expression
+or() Expression
+not() Expression
+add() Expression
+sub() Expression
+mul() Expression
+div() Expression
+mod() Expression
}
class Expression {
<<abstract>>
}
class Predicate {
<<abstract>>
}
class Aliased {
+str name
+Operable operable
}
class Element {
+str name
+Origin origin
}
class Column {
+Table origin
}
class Literal {
+Any value
}
class Window {
+Operable partition
}
Base Abstractions¶
The hierarchy starts with the following two abstractions:
- class forml.io.dsl.Source(*args)[source]¶
Base class of the tabular data frame sources.
A Source is anything that can be used to obtain tabular data FROM. It is a logical collection of
dsl.Feature
instances represented by itsschema
.- class Schema(name: str, bases: tuple[type], namespace: dict[str, Any])[source]¶
Meta-class for schema types construction.
It guarantees consistent hashing and comparability for equality of the produced schema classes.
Attention
This meta-class is used internally, for schema frontend API see the
dsl.Schema
.
- property schema : dsl.Source.Schema¶
Schema type representing this source.
- Returns:
Schema type.
- property features : Sequence[dsl.Feature]¶
List of features logically contained in or potentially produced by this Source.
- Returns:
Sequence of contained features.
-
reference(name: str | None =
None
) dsl.Reference [source]¶ Get an independent reference to this Source (e.g. for self-join conditions).
- Parameters:
- Returns:
New reference to this Source.
Examples
>>> manager = staff.Employee.reference('manager') >>> subs = ( ... manager.join(staff.Employee, staff.Employee.manager == manager.id) ... .select(manager.name, function.Count(staff.Employee.id).alias('subs')) ... .groupby(manager.id) ... )
- union(other: dsl.Source) dsl.Set [source]¶
Create a new Source as a set union of this and the other Source.
- Parameters:
- other: dsl.Source¶
Source to union with.
- Returns:
Set instance.
Examples
>>> barbaz = ( ... foo.Bar.select(foo.Bar.X, foo.Bar.Y) ... .union(foo.Baz.select(foo.Baz.X, foo.Baz.Y)) ... )
- intersection(other: dsl.Source) dsl.Set [source]¶
Create a new Source as a set intersection of this and the other Source.
- Parameters:
- other: dsl.Source¶
Source to intersect with.
- Returns:
Set instance.
Examples
>>> barbaz = ( ... foo.Bar.select(foo.Bar.X, foo.Bar.Y) ... .intersection(foo.Baz.select(foo.Baz.X, foo.Baz.Y)) ... )
- difference(other: dsl.Source) dsl.Set [source]¶
Create a new Source as a set difference of this and the other Source.
- Parameters:
- other: dsl.Source¶
Source to difference with.
- Returns:
Set instance.
Examples
>>> barbaz = ( ... foo.Bar.select(foo.Bar.X, foo.Bar.Y) ... .difference(foo.Baz.select(foo.Baz.X, foo.Baz.Y)) ... )
- class forml.io.dsl.Feature(*args)[source]¶
Base class of the individual columnar data series features.
A Feature is anything that can be used as a handle for independent columnar data. It is a homogenous series of data of the same
kind
.
Notable Interfaces¶
- class forml.io.dsl.Queryable(*args)[source]¶
Bases:
Source
Base class for any Source that can be queried directly.
- select(*features: dsl.Feature) dsl.Query [source]¶
Specify the output features to be provided (projection).
Repeated calls to
.select
replace the earlier selection.- Parameters:
- *features: dsl.Feature¶
Sequence of features.
- Returns:
Query instance.
Examples
>>> barxy = foo.Bar.select(foo.Bar.X, foo.Bar.Y)
- where(condition: dsl.Predicate) dsl.Query [source]¶
Add a row-filtering condition that’s evaluated before any aggregations.
Repeated calls to
.where
combine all the conditions (logical AND).- Parameters:
- condition: dsl.Predicate¶
Boolean feature expression.
- Returns:
Query instance.
Examples
>>> barx10 = foo.Bar.where(foo.Bar.X == 10)
- having(condition: dsl.Predicate) dsl.Query [source]¶
Add a row-filtering condition that’s applied to the evaluated aggregations.
Repeated calls to
.having
combine all the conditions (logical AND).- Parameters:
- condition: dsl.Predicate¶
Boolean feature expression.
- Returns:
Query instance.
Examples
>>> bargy10 = foo.Bar.groupby(foo.Bar.X).having(function.Count(foo.Bar.Y) == 10)
- groupby(*features: dsl.Operable) dsl.Query [source]¶
Aggregation grouping specifiers.
Repeated calls to
.groupby
replace the earlier grouping.- Parameters:
- *features: dsl.Operable¶
Sequence of aggregation features.
- Returns:
Query instance.
Examples
>>> bargbx = foo.Bar.groupby(foo.Bar.X).select(foo.Bar.X, function.Count(foo.Bar.Y))
- orderby(*terms: dsl.Ordering.Term) dsl.Query [source]¶
Ordering specifiers.
Default direction is ascending.
Repeated calls to
.orderby
replace the earlier ordering.- Parameters:
- *terms: dsl.Ordering.Term¶
Sequence of feature and direction tuples.
- Returns:
Query instance.
Examples
>>> barbyx = foo.Bar.orderby(foo.Bar.X) >>> barbyxd = foo.Bar.orderby(foo.Bar.X, 'desc') >>> barbxy = foo.Bar.orderby(foo.Bar.X, foo.Bar.Y) >>> barbxdy = foo.Bar.orderby( ... foo.Bar.X, dsl.Ordering.Direction.DESCENDING, foo.Bar.Y, 'asc' ... ) >>> barbydxd = foo.Bar.orderby( ... (foo.Bar.X, 'desc'), ... (foo.Bar.Y, dsl.Ordering.Direction.DESCENDING), ... )
- class forml.io.dsl.Origin(*args)[source]¶
Bases:
Queryable
Origin is a queryable Source with some handle.
Its features are represented using
dsl.Element
.- inner_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join [source]¶
Construct an inner join with the other origin using the provided condition.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- condition: dsl.Predicate¶
Feature expression as the join condition.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.inner_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
- left_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join [source]¶
Construct a left join with the other origin using the provided condition.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- condition: dsl.Predicate¶
Feature expression as the join condition.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.left_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
- right_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join [source]¶
Construct a right join with the other origin using the provided condition.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- condition: dsl.Predicate¶
Feature expression as the join condition.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.right_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
- full_join(other: dsl.Origin, condition: dsl.Predicate) dsl.Join [source]¶
Construct a full join with the other origin using the provided condition.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- condition: dsl.Predicate¶
Feature expression as the join condition.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.full_join(foo.Baz, foo.Bar.baz == foo.Baz.id)
- cross_join(other: dsl.Origin) dsl.Join [source]¶
Construct a cross join with the other origin.
- Parameters:
- other: dsl.Origin¶
Source to join with.
- Returns:
Join instance.
Examples
>>> barbaz = foo.Bar.cross_join(foo.Baz)
- class forml.io.dsl.Statement(*args)[source]¶
Bases:
Source
Base class for complete statements.
Complete statements are:
- class forml.io.dsl.Operable(*args)[source]¶
Bases:
Feature
Base class for features that can be used in expressions, conditions, grouping, and/or ordering definitions.
In principle, any non-aliased future is Operable.
Operable feature instances are overloading the following native Python operator API to provide convenient DSL semantic:
Type
Syntax
Comparison
==
,!=
,<
,<=
,>
,>=
Logical
&
,|
,~
Arithmetical
+
,-
,*
,/
,%
See also
Additional details are available under the DSL functions and operators.
- class forml.io.dsl.Element(source: dsl.Origin, name: str)[source]¶
Bases:
Operable
Name-referenced feature of a particular origin (
dsl.Table
ordsl.Reference
).
- class forml.io.dsl.Predicate[source]¶
Mixin for features representing logical or comparison operations.
Specific predicate instances are produced from the native
dsl.Operable
operators.
Notable Final Types¶
- class forml.io.dsl.Table(name: str, bases: tuple[type], namespace: dict[str, Any])[source]¶
- class forml.io.dsl.Table(schema: dsl.Source.Schema)
Bases:
Origin
Table based Source with an explicit schema.
Attention
The primary way of creating
Table
instances is by inheriting thedsl.Schema
which is using this type as a meta-class.
-
class forml.io.dsl.Query(source: dsl.Source, selection: Iterable[dsl.Feature] | None =
None
, prefilter: dsl.Predicate | None =None
, grouping: Iterable[dsl.Operable] | None =None
, postfilter: dsl.Predicate | None =None
, ordering: Sequence[dsl.Ordering.Term] | None =None
, rows: dsl.Rows | None =None
)[source]¶ -
Query based Source.
Container for holding all the parameters supplied via the
dsl.Queryable
interface.Attention
Instances are expected to be created internally via the
dsl.Queryable
interface methods.
- class forml.io.dsl.Set(left: dsl.Source, right: dsl.Source, kind: dsl.Set.Kind)[source]¶
Bases:
Statement
Source made of two set-combined sub-statements with the same schema.
Attention
Instances are expected to be created internally via:
-
class forml.io.dsl.Join(left: dsl.Origin, right: dsl.Origin, kind: dsl.Join.Kind | str, condition: dsl.Predicate | None =
None
)[source]¶ Bases:
Origin
Source made of two join-combined sub-sources.
Attention
Instances are expected to be created internally via:
- class Kind(value)[source]¶
Bases:
Enum
Join type enum.
-
INNER =
'inner'
¶ Inner join type (default if condition is provided).
-
LEFT =
'left'
¶ Left outer join type.
-
RIGHT =
'right'
¶ Right outer join type.
-
FULL =
'full'
¶ Full join type.
-
CROSS =
'cross'
¶ Cross join type (default if condition is not provided).
-
INNER =
-
class forml.io.dsl.Rows(count: int, offset: int =
0
)[source]¶ Row limit spec container.
Attention
Instances are expected to be created internally via
dsl.Queryable.limit
.
-
class forml.io.dsl.Reference(instance: dsl.Source, name: str | None =
None
)[source]¶ Bases:
Origin
Wrapper around any Source associating it with a (possibly random) name.
Attention
Instances are expected to be created internally via
dsl.Source.reference
.
- class forml.io.dsl.Column(table: dsl.Table, name: str)[source]¶
Bases:
Element
Special type of element is the table column type.
- class forml.io.dsl.Aliased(feature: dsl.Feature, alias: str)[source]¶
Bases:
Feature
Representation of a feature with an explicit name alias.
Attention
Instances are expected to be created internally via
dsl.Feature.alias
.
-
class forml.io.dsl.Ordering(feature: dsl.Operable, direction: dsl.Ordering.Direction | str | None =
None
)[source]¶ Container for holding the ordering specification.
Attention
Instances are expected to be created internally by
dsl.Queryable.orderby
.- Term¶
Type alias for accepted ordering specifiers.
alias of
Union
[dsl.Operable
,dsl.Ordering.Direction
,str
,tuple[dsl.Operable, typing.Union[dsl.Ordering.Direction, str]]
]
-
class forml.io.dsl.Window(function: dsl.Window.Function, partition: Sequence[dsl.Feature], ordering: Sequence[dsl.Ordering.Term] | None =
None
, frame: Optional =None
)[source]¶ Window function wrapper feature representation.
Attention
Instances are expected to be created internally via
dsl.Window.Function.over()
.See also
Supported window functions are available in the window module.
Todo
Support for window expressions is experimental and unlikely to be supported by the existing parsers.
Exceptions¶
- exception forml.io.dsl.GrammarError[source]¶
Bases:
InvalidError
Indicating syntactical error in the given DSL query statement.
Parser¶
Since the constructed DSL query statement is a generic descriptor with no means
of direct execution, the ETL process depends on a particular io.Feed.Reader
implementation to parse that query into a set of instructions
corresponding to the selected feed and its target storage layer.
Generic Interface¶
-
forml.io.dsl.parser.Source =
~Source
¶ Generic storage-native representation of
dsl.Source
.
-
forml.io.dsl.parser.Feature =
~Feature
¶ Generic storage-native representation of
dsl.Feature
.
- class forml.io.dsl.parser.Visitor(sources: Mapping[dsl.Source, parser.Source], features: Mapping[dsl.Feature, parser.Feature])[source]¶
Abstract base class for DSL query statement parser implementations.
In this context, parsing essentially means conversion between the generic DSL-based instance of the particular query and its native representation matching a selected target storage layer.
Conceptually, the parser is implemented as a combination of a visitor traversing the query statement structure and a push-down automaton assembling the generated instructions in their storage-native representation.
The parser assumes resolving the native representation of all the leaves of the query statement tree (the
dsl.Table
anddsl.Column
instances) or possibly entire branches can be accomplished via the provided initial mappings from which the parser builds the complete query up.Upon failing to resolve any particular source/feature using the initial mappings, the parser raises the
dsl.UnprovisionedError
indicating unavailability of the given data source.- Parameters:
- sources: Mapping[dsl.Source, parser.Source]¶
Explicit mapping of generic DSL sources (typically
dsl.Table
) to their native representations.- features: Mapping[dsl.Feature, parser.Feature]¶
Explicit mapping of generic DSL features (typically
dsl.Column
) to their native representations.
Exceptions¶
- exception forml.io.dsl.UnprovisionedError[source]¶
Bases:
MissingError
Source or Feature resolving exception.
Raised by DSL parsers when the given source or feature (typically
dsl.Table
ordsl.Column
) can’t be resolved using the available data sources.
- exception forml.io.dsl.UnsupportedError[source]¶
Bases:
MissingError
Indicating DSL operation unsupported by the given parser.
References¶
For reference, several existing Parser
implementations can be found under the
forml.provider.feed.reader
package:
|
Frame DSL parser producing SQLAlchemy select expression. |