autosql Task [SUNSETTED]¶
About¶
The autosql task lets you write a SELECT statement and SAYN then automates the data processing (i.e. table or view creation, incremental load, etc.) for you.
Defining autosql Tasks¶
An autosql task group is defined as follows:
project.yaml
...
groups:
core:
type: autosql
file_name: "core/*.sql"
materialisation: table
destination:
table: "{{ task.name }}"
...
An autosql task is defined by the following attributes:
type:autosql.file_name: the path to a file within the sql folder of the project's root. When definingautosqlgroups inproject.yamlthis property needs to be a glob expression, for examplegroup/*.sql.materialisation: this should be eithertable,vieworincremental.tablewill create a table,viewwill create a view.incrementalwill create a table and will load the data incrementally based on a delete key (see more detail onincrementalbelow).destination: this sets the details of the data processing.tmp_schema: the (optional) schema which will be used to store any necessary temporary object created in the process. The final compiled value is affected byschema_prefix,schema_suffixandschema_overrideas specified in database objects.schema: the (optional) destination schema where the object will be created. The final compiled value is affected byschema_prefix,schema_suffixandschema_overrideas specified in database objects.table: is the name of the object that will be created. The final compiled value is affected bytable_prefix,table_suffixandtable_overrideas specified in database objects.db: the (optional) destination database.
delete_key: specifies the incremental process delete key. This is forincrementalmaterialisationonly.
Info
By default the task is executed in the database defined by default_db in project.yaml. db can be specified to change this, in which case the connection specified needs to:
- Be a credential from the
required_credentialslist inproject.yaml. - Be defined in your
settings.yaml. - Be one of the supported databases.
Setting Dependencies With autosql¶
With autosql tasks, you should use the src macro in your SELECT statements to implicitly create task dependencies.
!!! example autosql query
SELECT field1
, field2
FROM {{ src('my_table') }} l
By using the {{ src('my_table') }} in your FROM clause, you are effectively telling SAYN that your task depends on the my_table table (or view). As a result, SAYN will look for the task that produces my_table and set it as a parent of this autosql task automatically.
Tip
When using the src macro, you can pass a structure formatted as schema.table such as {{ src('my_schema.my_table') }}. In this case, SAYN interprets the first element as the schema, the second element as the table or view. If you use schema_prefix and / or table_prefix in your project settings, SAYN will then prepend the schema_prefix to the schema value and table_prefix to the table value. For example, if your schema_prefix is set to analytics and table_prefix to up then {{ src('my_schema.my_table') }} will compile analytics_my_schema.up_my_table.
Advanced Configuration¶
If you need to amend the configuration (e.g. materialisation) of a specific autosql task within a group, you can overload the values specified in the YAML group definition. To do this, we simply call config from a Jinja tag within the sql file of the task:
autosql with config
{{ config(materialisation='view') }}
SELECT ...
The above code will override the value of materialisation setting defined in YAML to make this model a view. All other parameters
described above in this page are also available to overload with config except db, file_name and name.
Using autosql In incremental Mode¶
autosql tasks support loads incrementally, which is extremely useful for large data volumes when full
refresh (materialisation: table) would be infeasible.
We set an autosql task as incremental by:
1. Setting materialisation to incremental
2. Defining a delete_key
autosql in incremental mode
...
task_autosql_incremental:
type: autosql
file_name: task_autosql_incremental.sql
materialisation: incremental
destination:
tmp_schema: analytics_staging
schema: analytics_models
table: task_autosql
delete_key: dt
...
When using incremental, SAYN will do the following in the background:
- Create a temporary table based on the incremental logic from the SAYN query.
- Delete from the final table those records for which the
delete_keyvalue is in the temporary table. - Insert the contents of the temporary table into the final table.
In order to make the SELECT statement incremental, SAYN provides the following arguments:
full_load: a flag defaulting toFalseand controlled by the-fflag in the SAYN command. If-fis passed to the sayn command, the final table will be replaced with the temporary one in step 2 above, rather than performing a merge of the data.start_dt: a date defaulting to "yesterday" and controlled by the-sflag in the SAYN command.end_dt: a date defaulting to "yesterday" and controlled by the-eflag in the SAYN command.
SQL using incremental arguments
SELECT dt
, field2
, COUNT(1) AS c
FROM table
WHERE dt BETWEEN {{ start_dt }} AND {{ end_dt }}
GROUP BY 1,2
Defining columns¶
Autosql tasks accept a columns field in the task definition that affects the table creation by enforcing types and column order.
Attention
Each supported database might have specific DDL related to it. Below are the DDLs that SAYN supports across all databases. For DDLs related to specific databases see the database-specific pages.
CREATE TABLE DDLs¶
SAYN also lets you control the CREATE TABLE statement if you need more specification. This is done with:
- columns: the list of columns including their definitions.
- table_properties: database specific properties that affect table creation (indexes, cluster, sorting, etc.).
- post_hook: SQL statments executed right after the table/view creation.
columns can define the following attributes:
- name: the column name.
- type: the column type.
- tests: list of keywords that constraint a specific column
- unique: enforces a unique constraint on the column.
- not_null: enforces a non null constraint on the column.
- allowed_values: list allowed values for the column.
table_properties can define the following attributes (database specific):
* indexes:
* sorting: specify the sorting for the table
* distribution_key: specify the type of distribution.
* partitioning: specify the partitioning model for the table.
* clustering: specify the clustering for the table.
Attention
Each supported database might have specific table_properties related to it; see the database-specific pages for further details and examples.
Attention
If the a primary key is defined in both the columns and indexes DDL entries, the primary key will be set as part of the CREATE TABLE statement only.
autosql with columns
...
task_autosql:
type: autosql
file_name: task_autosql.sql
materialisation: table
destination:
tmp_schema: analytics_staging
schema: analytics_models
table: task_autosql
ddl:
columns:
- name: x
type: int
primary: True
- name: y
type: varchar
unique: True
permissions:
role_name: SELECT
...