BigQuery¶
The BigQuery driver depends on pybigquery and can be installed with:
pip install "sayn[bigquery]"
Warning
SAYN 0.6 switched from pybigquery to
sqlalchemy-bigquery. When upgrading, pybigquery
should be uninstalled with pip uninstall pybigquery
.
The Bigquery connector looks for the following parameters:
Parameter | Description | Default |
---|---|---|
project | GCP project where the cluster is | Required |
credentials_path | Path relative to the project to the json for the service account to use | Required |
location | Default location for tables created | Dataset default |
dataset | Dataset to use when running queries. Can be specified in sql |
For advanced configurations, SAYN will pass other parameters to create_engine
, so check the
pybigquery
dialect for extra parameters.
Bigquery Specific DDL¶
Partitioning¶
SAYN supports specifying the partitioning model for tables created with autosql and copy tasks. To do
this we specify partition
in the ddl field. The value is a string matching a BigQuery
partition expression.
tasks/base.yaml
tasks:
f_battles:
type: autosql
file_name: f_battles.sql
materialisation: table
destination:
table: f_battles
table_properties:
partition: DATE(_PARTITIONTIME)
Clustering¶
We can also specify the clustering for the table with the cluster
property in autosql and copy tasks.
The value in this case is a list of columns. If the ddl for the task includes the list of columns, the
columns specified in the cluster
should be present in the column list.
tasks/base.yaml
tasks:
f_battles:
type: autosql
file_name: f_battles.sql
materialisation: table
destination:
table: f_battles
table_properties:
cluster:
- arena_name