autosql
Task [SUNSETTED]¶
About¶
The autosql
task lets you write a SELECT
statement and SAYN then automates the data processing (i.e. table or view creation, incremental load, etc.) for you.
Defining autosql
Tasks¶
An autosql
task group is defined as follows:
project.yaml
...
groups:
core:
type: autosql
file_name: "core/*.sql"
materialisation: table
destination:
table: "{{ task.name }}"
...
An autosql
task is defined by the following attributes:
type
:autosql
.file_name
: the path to a file within the sql folder of the project's root. When definingautosql
groups inproject.yaml
this property needs to be a glob expression, for examplegroup/*.sql
.materialisation
: this should be eithertable
,view
orincremental
.table
will create a table,view
will create a view.incremental
will create a table and will load the data incrementally based on a delete key (see more detail onincremental
below).destination
: this sets the details of the data processing.tmp_schema
: the (optional) schema which will be used to store any necessary temporary object created in the process. The final compiled value is affected byschema_prefix
,schema_suffix
andschema_override
as specified in database objects.schema
: the (optional) destination schema where the object will be created. The final compiled value is affected byschema_prefix
,schema_suffix
andschema_override
as specified in database objects.table
: is the name of the object that will be created. The final compiled value is affected bytable_prefix
,table_suffix
andtable_override
as specified in database objects.db
: the (optional) destination database.
delete_key
: specifies the incremental process delete key. This is forincremental
materialisation
only.
Info
By default the task is executed in the database defined by default_db
in project.yaml
. db
can be specified to change this, in which case the connection specified needs to:
- Be a credential from the
required_credentials
list inproject.yaml
. - Be defined in your
settings.yaml
. - Be one of the supported databases.
Setting Dependencies With autosql
¶
With autosql
tasks, you should use the src
macro in your SELECT
statements to implicitly create task dependencies.
!!! example autosql
query
SELECT field1
, field2
FROM {{ src('my_table') }} l
By using the {{ src('my_table') }}
in your FROM
clause, you are effectively telling SAYN that your task depends on the my_table
table (or view). As a result, SAYN will look for the task that produces my_table
and set it as a parent of this autosql
task automatically.
Tip
When using the src
macro, you can pass a structure formatted as schema.table
such as {{ src('my_schema.my_table') }}
. In this case, SAYN interprets the first element as the schema, the second element as the table or view. If you use schema_prefix
and / or table_prefix
in your project settings, SAYN will then prepend the schema_prefix
to the schema
value and table_prefix
to the table
value. For example, if your schema_prefix
is set to analytics
and table_prefix
to up
then {{ src('my_schema.my_table') }}
will compile analytics_my_schema.up_my_table
.
Advanced Configuration¶
If you need to amend the configuration (e.g. materialisation) of a specific autosql
task within a group
, you can overload the values specified in the YAML group definition. To do this, we simply call config
from a Jinja tag within the sql file of the task:
autosql with config
{{ config(materialisation='view') }}
SELECT ...
The above code will override the value of materialisation
setting defined in YAML to make this model a view. All other parameters
described above in this page are also available to overload with config
except db
, file_name
and name
.
Using autosql
In incremental
Mode¶
autosql
tasks support loads incrementally, which is extremely useful for large data volumes when full
refresh (materialisation: table
) would be infeasible.
We set an autosql
task as incremental by:
1. Setting materialisation
to incremental
2. Defining a delete_key
autosql in incremental mode
...
task_autosql_incremental:
type: autosql
file_name: task_autosql_incremental.sql
materialisation: incremental
destination:
tmp_schema: analytics_staging
schema: analytics_models
table: task_autosql
delete_key: dt
...
When using incremental
, SAYN will do the following in the background:
- Create a temporary table based on the incremental logic from the SAYN query.
- Delete from the final table those records for which the
delete_key
value is in the temporary table. - Insert the contents of the temporary table into the final table.
In order to make the SELECT
statement incremental, SAYN provides the following arguments:
full_load
: a flag defaulting toFalse
and controlled by the-f
flag in the SAYN command. If-f
is passed to the sayn command, the final table will be replaced with the temporary one in step 2 above, rather than performing a merge of the data.start_dt
: a date defaulting to "yesterday" and controlled by the-s
flag in the SAYN command.end_dt
: a date defaulting to "yesterday" and controlled by the-e
flag in the SAYN command.
SQL using incremental arguments
SELECT dt
, field2
, COUNT(1) AS c
FROM table
WHERE dt BETWEEN {{ start_dt }} AND {{ end_dt }}
GROUP BY 1,2
Defining columns¶
Autosql tasks accept a columns
field in the task definition that affects the table creation by enforcing types and column order.
Attention
Each supported database might have specific DDL related to it. Below are the DDLs that SAYN supports across all databases. For DDLs related to specific databases see the database-specific pages.
CREATE TABLE DDLs¶
SAYN also lets you control the CREATE TABLE statement if you need more specification. This is done with:
- columns: the list of columns including their definitions.
- table_properties: database specific properties that affect table creation (indexes, cluster, sorting, etc.).
- post_hook: SQL statments executed right after the table/view creation.
columns
can define the following attributes:
- name: the column name.
- type: the column type.
- tests: list of keywords that constraint a specific column
- unique: enforces a unique constraint on the column.
- not_null: enforces a non null constraint on the column.
- allowed_values: list allowed values for the column.
table_properties
can define the following attributes (database specific):
* indexes:
* sorting: specify the sorting for the table
* distribution_key: specify the type of distribution.
* partitioning: specify the partitioning model for the table.
* clustering: specify the clustering for the table.
Attention
Each supported database might have specific table_properties
related to it; see the database-specific pages for further details and examples.
Attention
If the a primary key is defined in both the columns
and indexes
DDL entries, the primary key will be set as part of the CREATE TABLE
statement only.
autosql with columns
...
task_autosql:
type: autosql
file_name: task_autosql.sql
materialisation: table
destination:
tmp_schema: analytics_staging
schema: analytics_models
table: task_autosql
ddl:
columns:
- name: x
type: int
primary: True
- name: y
type: varchar
unique: True
permissions:
role_name: SELECT
...