Skip to content

BigQuery

The BigQuery driver depends on pybigquery and can be installed with:

pip install "sayn[bigquery]"

Warning

SAYN 0.6 switched from pybigquery to sqlalchemy-bigquery. When upgrading, pybigquery should be uninstalled with pip uninstall pybigquery.

The Bigquery connector looks for the following parameters:

Parameter Description Default
project GCP project where the cluster is Required
credentials_path Path relative to the project to the json for the service account to use Required
location Default location for tables created Dataset default
dataset Dataset to use when running queries. Can be specified in sql

For advanced configurations, SAYN will pass other parameters to create_engine, so check the pybigquery dialect for extra parameters.

Bigquery Specific DDL

Partitioning

SAYN supports specifying the partitioning model for tables created with autosql and copy tasks. To do this we specify partition in the ddl field. The value is a string matching a BigQuery partition expression.

tasks/base.yaml

tasks:
  f_battles:
    type: autosql
    file_name: f_battles.sql
    materialisation: table
    destination:
      table: f_battles
    table_properties:
      partition: DATE(_PARTITIONTIME)

Clustering

We can also specify the clustering for the table with the cluster property in autosql and copy tasks. The value in this case is a list of columns. If the ddl for the task includes the list of columns, the columns specified in the cluster should be present in the column list.

tasks/base.yaml

tasks:
  f_battles:
    type: autosql
    file_name: f_battles.sql
    materialisation: table
    destination:
      table: f_battles
    table_properties:
      cluster:
        - arena_name