Tutorial Part 2: Data Modelling With SAYN¶
In the first part of the tutorial, we executed our first SAYN run and went through the core components of a SAYN project. We will now see how to use SAYN for data modelling, a major process of analytics warehousing.
SQL Tasks With SAYN¶
SAYN can execute two main types of SQL tasks:
* autosql
: these tasks take a SELECT
statement and create a table or view using it. All the processes are automated by SAYN in the background.
* sql
: these tasks take your SQL statement as is and execute it. These are not covered in this tutorial.
Defining The Task Group¶
In order to execute your tasks, you will need to define a task group. This is done in the project.yaml
file, under the groups
entry. This is the models
group we have defined in our project:
project.yaml
...
groups:
models:
type: autosql
file_name: "{{ task.group }}/*.sql"
materialisation: table
destination:
table: "{{ task.name }}"
This group effectively does the following:
- It creates a task group called
models
. - This task group is defined to be
autosql
tasks. - Each file with a
.sql
extension in thesql/models
folder will be turned into anautosql
task. Do not worry about the{{ task.group }}
notation for now, this is simple Jinja templating to dynamically pick themodels
name of the group. - All tasks from the group will model the output as tables.
- The tables will be named as per the
task
name. Thistask
name is automatically generated from your file name, excluding the.sql
extension.
Writing Your Models¶
A Simple Model¶
As explained in the previous section, each file with a .sql
extension in the sql/models
folder will be turned into an autosql
task following our models
group definition.
For example, the dim_arenas.sql
file in the sql/models
folder will be turned into a dim_arenas
task. This is the SQL code of this file:
sql/dim_arenas.sql
SELECT l.arena_id
, l.arena_name
FROM logs_arenas l
When executed, this task will create a table called dim_arenas
using this SQL code. This is the dim_arenas
you can find in the dev.db
SQLite database at the root level of the project folder. You can execute this task only by running the command sayn run -t dim_arenas
, where dim_arenas
is our task
name.
A Model With Dependency¶
Now, let's have a look at a model which depends on another model. The file f_battles.sql
is a good example and actually depends on multiple other models. This is how the SQL query is written:
sql/f_battles.sql
SELECT t.tournament_name
, t.tournament_name || '-' || CAST(b.battle_id AS VARCHAR) AS battle_id
, a.arena_name
, f1.fighter_name AS fighter1_name
, f2.fighter_name AS fighter2_name
, w.fighter_name AS winner_name
FROM logs_battles b
LEFT JOIN {{ src('dim_tournaments') }} t
ON b.tournament_id = t.tournament_id
LEFT JOIN {{ src('dim_arenas') }} a
ON b.arena_id = a.arena_id
LEFT JOIN {{ src('dim_fighters') }} f1
ON b.fighter1_id = f1.fighter_id
LEFT JOIN {{ src('dim_fighters') }} f2
ON b.fighter2_id = f2.fighter_id
LEFT JOIN {{ src('dim_fighters') }} w
ON b.winner_id = w.fighter_id
As you can see, this query uses another Jinja templating notation, the src
function which is core to using SAYN efficiently. You pass this function the name of a table, and SAYN will automatically build the dependencies between your tasks! For example, this f_battles
task sources the table dim_tournaments
(amongst others) with the src
function. As a result, SAYN will look for the task that produces this dim_tournaments
table (which is the dim_tournaments
task) and set this task as a parent of the f_battles
task. Therefore, dim_tournaments
will always execute before f_battles
. From the above code, you can see that many tasks will be set as parents of the f_battles
task.
Changing Your Model Materialisation¶
You can easily amend the configuration of a single task when necessary with SAYN. For example, the task f_rankings
, generated by f_rankings.sql
, uses the following query:
sql/f_rankings.sql
{{ config(
materialisation='view'
)
}}
SELECT fr.fighter_name
, CAST(SUM(fr.is_winner) AS FLOAT) / COUNT(DISTINCT fr.battle_id) AS win_rate
FROM {{ src('f_fighter_results') }} fr
GROUP BY 1
ORDER BY 2 DESC
As you can see, this task uses the config
function, which will in this case overwrite the materialisation of the task's output to a view instead of a table. This config
function can be really useful when you want to overwrite some attributes of specific tasks within your group.
What Next?¶
You now know how to build data models with SAYN. The next section of this tutorial will now go through how to use SAYN for Python processes. This will enable you to leverage SAYN for end-to-end ELT processes or data science tasks!