Cheat sheet for DBT¶
Estimated time to read: 4 minutes
dbt is a powerful and flexible tool for data transformations, and there's much to learn about it in order to utilise its full potention. Here you can find some of the most often commonds for a quick reference.
Command | Description |
---|---|
dbt init | Initializes a new dbt project with a basic directory structure and a sample project configuration. |
dbt debug | Validates your installation, checks your environment variables, and verifies the connection to your database. |
dbt compile | Compiles your dbt models into executable SQL statements without running them against your database. |
dbt run | Executes your dbt models against your database. |
dbt test | Runs data tests against your models to ensure that your data transformations are working as expected. |
dbt deps | Downloads and manages the dependencies of your dbt project. |
dbt docs generate | Generates a web-based documentation site for your dbt project. |
dbt docs serve | Serves the generated documentation on a local server. |
dbt seed | Loads CSV files from your project into your database. |
dbt snapshot | Executes the Snapshots defined in your project. Snapshots is a way to capture changes in your source data over time. |
dbt snapshot-freshness | Checks how up-to-date your snapshots are. |
dbt run-operation | Runs a specified operation, which is a macro that doesn't return a result set. |
Diving a bit deeper into dbt, find a brief description of its main functionalities:
Functionality | Description | Example |
---|---|---|
Sources | define the schema and tables of raw data in your warehouse. | sources: - name: my_source tables: - name: my_table |
References | (ref ) allow you to depend on another model. | SELECT * FROM {{ ref('my_other_model') }} |
Hooks | are custom operations that run at specific times in the dbt execution process. | on-run-start: - 'CREATE SCHEMA IF NOT EXISTS analytics' |
Packages | are sets of models, tests, macros, and docs that can be imported into a dbt project. | packages: - package: fishtown-analytics/dbt_utils version: 0.6.4 |
Config Models | allow you to specify configurations for models. | {% config materialized='incremental', unique_key='id' %} |
Macros | are reusable pieces of SQL code. | {% macro calculate_percentage(numerator, denominator) %} {{ numerator }} * 1.0 / {{ denominator }} {% endmacro %} |
Tests | are assertions about your data to ensure it meets certain conditions. | - name: order_id tests: - unique - not_null |
Note that for some of these functionalities, the example syntax might need to be placed in certain files or sections of a dbt project. For instance, sources are typically defined in a sources.yml
file, hooks are defined in your dbt_project.yml
file under the on-run-start
or on-run-end
keys, and package usage would be defined in the packages.yml
file.
Jinja Templating: dbt extensively uses Jinja, a templating language for Python. Jinja allows you to inject logic into your SQL code, making your transformations more dynamic and powerful.
Incremental Models: dbt supports incremental models, which only process new data since the last run. This can significantly speed up your transformations for large datasets.
Data Testing: dbt has built-in support for data testing. You can define tests that assert certain conditions about your data, such as uniqueness or non-nullness of columns, referential integrity, etc. This helps you maintain the quality and integrity of your data.
Modularity and Reusability: dbt encourages a modular approach to write SQL, where transformations are broken down into small, reusable pieces. This makes your transformations easier to understand, maintain, and test.
Documentation: dbt can generate a website with documentation for your project. This includes descriptions of your models, columns, tests, and a visual diagram of your project's dependency graph.
Version Control: dbt projects are just code, which means they can be version-controlled using tools like Git. This makes it easier to track changes over time, collaborate with others, and roll back changes if needed.
Deployment Flexibility: dbt supports multiple databases and data warehouses, including PostgreSQL, BigQuery, Snowflake, and Redshift. This gives you the flexibility to use dbt in a variety of data environments.
Remember that dbt is a tool for data transformation in the "transform" part of ELT (Extract, Load, Transform) process. It's not an extraction or loading tool, and you'll need to use it in combination with other tools for a complete ELT/ETL pipeline.
Lastly, if you're starting with dbt, I recommend going through the dbt Tutorial and the dbt Learn courses on the dbt website. They provide a hands-on introduction to dbt and cover many of its key features in detail.