Syntax Reference¶
Being implemented as a python-internal DSL, its high-level syntax is consequently a subset of the native Python syntax embedding the actual DSL API.
Even though it is not coupled via any inherent dependency, the DSL is designed to loosely resemble
the SQL SELECT
statement syntax for its proven modeling effectiveness and its universal
recognition as the ETL standard. The abstract descriptive role of the DSL with the separate
runtime-specific parsing stage responsible for converting the generic query
into arbitrary native representation still allows the integration of any data access
mechanisms including non-sql-based sources.
Grammar Notation¶
Below is the DSL syntax described using the BNF notation. For readability, it is not strictly
formal - leaving some of the terminal symbols out with just their conceptual descriptions (e.g.
<count>
, <identifier>
, <literal>
, etc.) or references to their API class representations (e.g. <table>
, function.*
).
A substantial part of the DSL is the syntax for expression notation which is based on an extensive collection of supported functions and operators.
The central component of any query is an existing dsl.Table
instance defined using the schema API within its schema catalog.
The full DSL syntax is:
<source> ::= <origin> | <set> | <query>
<origin> ::= <table> | <reference> | <join>
<set> ::=
<source>.union(<source>)
| <source>.intersection(<source>)
| <source>.difference(<source>)
<query> ::=
<queryable>.select(<feature_list>)
| <queryable>.where(<predicate>)
| <queryable>.having(<predicate>)
| <queryable>.groupby(<operable_list>)
| <queryable>.orderby(<ordering_list>)
| <queryable>.limit(<count> [, <count>])
<table> ::= schema instances defined using dsl.Schema
<reference> ::= <source>.reference([<identifier>])
<join> ::=
<origin>.inner_join(<origin>, <predicate>)
| <origin>.left_join(<origin>, <predicate>)
| <origin>.right_join(<origin>, <predicate>)
| <origin>.full_join(<origin>, <predicate>)
| <origin>.cross_join(<origin>)
<queryable> ::= <query> | <origin>
<feature_list> ::= <feature> [, <feature_list> ]
<predicate> ::= <comparison> | <logical>
<operable_list> ::= <operable> [, <operable_list> ]
<ordering_list> ::= <ordering> [, <ordering_list> ]
<count> ::= natural number
<identifier> ::= string of letters, digits and underscores starting with a letter
<feature> ::= <operable> | <aliased>
<operable> ::= <element> | <literal> | <expression>
<ordering> ::= <operable> [, <direction>]
<aliased> ::= <feature>.alias(<identifier>)
<element> ::= <origin>.<identifier>
<literal> ::= any Python literal value
<expression> ::=
<aggregate>
| <comparison>
| <conversion>
| <datetime>
| <logical>
| <math>
| <window_spec>
<direction> ::= "asc" | "ascending" | "desc" | "descending"
<comparison> ::=
<operable> == <operable>
| <operable> != <operable>
| <operable> < <operable>
| <operable> <= <operable>
| <operable> > <operable>
| <operable> >= <operable>
<logical> ::=
<operable> & <operable>
| <operable> | <operable>
| ~ <operable>
<conversion> ::= function.Cast | ...
<datetime> ::= function.Year | ...
<math> :: =
<arithmetic>
| function.Abs
| function.Ceil
| function.Floor
| ...
<arithmetic> ::=
<operable> + <operable>
| <operable> - <operable>
| <operable> * <operable>
| <operable> / <operable>
| <operable> % <operable>
<window_spec> ::= <window>.over(<operable_list> [, <ordering_list>])
<aggregate> ::=
function.Count
| function.Avg
| function.Min
| function.Max
| function.Sum
| ...
<window> ::= <aggregate> | <ranking>
<ranking> ::= function.RowNumber | ...
Examples¶
from foobar.edu import schema # our schema catalog
school_ref = schema.School.reference('bar')
QUERY = (
schema.Student
.inner_join(schema.Person, schema.Student.surname == schema.Person.surname)
.inner_join(school_ref, schema.Student.school == school_ref.sid)
.select(
schema.Student.surname,
school_ref['name'].alias('school'),
function.Cast(schema.Student.score, dsl.Integer()).alias('score'),
)
.where(schema.Student.score > 0)
.orderby(schema.Student.updated, schema.Student['surname'])
.limit(10)
)