Databases¶
About¶
SAYN uses sqlalchemy in order to manage database connections. It currently supports the following databases:
Usage¶
Database connections are defined as credentials in settings.yaml by specifying the database
type and other connection parameters.
settings.yaml
credentials:
db_name:
type: redshift
host: ...
# other connection parameters ...
You can check the list of connection parameters in the database specific pages of this section.
If a parameter that is not listed on the database page is included in settings.yaml, that parameter
will be passed to sqlalchemy.create_engine.
Refer to sqlalchemy's documentation if you need to fine tune the connection.
For example, to specify the default timezone for a Snowflake connection, this can be
specified in the connect_args parameter:
settings.yaml
credentials:
db_name:
type: snowflake
account: ...
# other connection parameters
connect_args:
timezone: UTC
All databases support a parameter max_batch_rows that controls the default size of a batch
when using load_data or in copy tasks. If you get an error when running SAYN indicating the
amount of data is too large, adjust this value.
settings.yaml
credentials:
dev_db:
type: sqlite
database: dev.db
max_batch_rows: 200
Using Databases In python Tasks¶
Databases and other credentials defined in the SAYN project are available to Python tasks via
self.connections. For convenience though, all Python tasks have a default_db property that
gives you access to the default database declared in project.yaml.
The database python class provides several methods and properties to make it
easier to work with python tasks. For example you can easily read or load data with self.default_db (see example below) or use self.default_db.engine to call DataFrame.read_sql from pandas.
Example PythonTask
from sayn import PythonTask
class TaskPython(PythonTask):
def run(self):
data = self.default_db.read_data("SELECT * FROM test_table")
# do something with that data