Skip to content

Databases

About

SAYN uses sqlalchemy in order to manage database connections. It currently supports the following databases:

Usage

Database connections are defined as credentials in settings.yaml by specifying the database type and other connection parameters.

settings.yaml

credentials:
  db_name:
    type: redshift

    host: ...
    # other connection parameters ...

You can check the list of connection parameters in the database specific pages of this section. If a parameter that is not listed on the database page is included in settings.yaml, that parameter will be passed to sqlalchemy.create_engine. Refer to sqlalchemy's documentation if you need to fine tune the connection.

For example, to specify the default timezone for a Snowflake connection, this can be specified in the connect_args parameter:

settings.yaml

credentials:
  db_name:
    type: snowflake

    account: ...
    # other connection parameters

    connect_args:
      timezone: UTC

All databases support a parameter max_batch_rows that controls the default size of a batch when using load_data or in copy tasks. If you get an error when running SAYN indicating the amount of data is too large, adjust this value.

settings.yaml

credentials:
  dev_db:
    type: sqlite
    database: dev.db
    max_batch_rows: 200

Using Databases In python Tasks

Databases and other credentials defined in the SAYN project are available to Python tasks via self.connections. For convenience though, all Python tasks have a default_db property that gives you access to the default database declared in project.yaml.

The database python class provides several methods and properties to make it easier to work with python tasks. For example you can easily read or load data with self.default_db (see example below) or use self.default_db.engine to call DataFrame.read_sql from pandas.

Example PythonTask

from sayn import PythonTask

class TaskPython(PythonTask):
    def run(self):
        data = self.default_db.read_data("SELECT * FROM test_table")

        # do something with that data