What is dbt?
dbt (data build tool) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouse using SQL. It's the "T" in ELT - handling transformations after data has been loaded into your warehouse.
dbt brings software engineering practices to analytics: version control, testing, documentation, and modular code. It's become the industry standard for analytics engineering.
Core Concepts
- Models: SQL SELECT statements that define transformations
- Sources: Raw data tables loaded into your warehouse
- Tests: Assertions about your data quality
- Documentation: Descriptions and metadata for your models
- Macros: Reusable SQL snippets with Jinja templating
- Seeds: CSV files loaded into your warehouse
Project Structure
my_dbt_project/
├── dbt_project.yml # Project configuration
├── models/
│ ├── staging/ # Raw data cleaning
│ │ ├── stg_customers.sql
│ │ └── stg_orders.sql
│ ├── intermediate/ # Business logic
│ │ └── int_orders_enriched.sql
│ └── marts/ # Final analytics tables
│ ├── dim_customers.sql
│ └── fct_orders.sql
├── tests/ # Custom tests
├── macros/ # Reusable SQL functions
├── seeds/ # Static CSV data
└── snapshots/ # Track slowly changing dimensions
Writing Models
-- models/staging/stg_customers.sql
-- Configuration block
{{ config(
materialized='view',
schema='staging'
) }}
with source as (
select * from {{ source('raw', 'customers') }}
),
renamed as (
select
id as customer_id,
first_name,
last_name,
email,
created_at,
updated_at
from source
)
select * from renamed
-- models/marts/fct_orders.sql
{{ config(
materialized='table',
schema='analytics'
) }}
with orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('stg_customers') }}
),
final as (
select
o.order_id,
o.customer_id,
c.customer_name,
o.order_date,
o.total_amount,
o.status,
-- Calculate metrics
sum(o.total_amount) over (
partition by o.customer_id
order by o.order_date
) as cumulative_spend
from orders o
left join customers c on o.customer_id = c.customer_id
)
select * from final
Defining Sources
# models/staging/sources.yml
version: 2
sources:
- name: raw
database: raw_database
schema: public
tables:
- name: customers
description: "Raw customer data from CRM"
columns:
- name: id
description: "Primary key"
tests:
- unique
- not_null
- name: orders
description: "Raw order data from e-commerce platform"
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
Testing Your Data
# models/staging/schema.yml
version: 2
models:
- name: stg_customers
description: "Cleaned customer data"
columns:
- name: customer_id
description: "Unique customer identifier"
tests:
- unique
- not_null
- name: email
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['active', 'inactive', 'churned']
- name: fct_orders
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- order_id
- customer_id
-- tests/assert_total_amount_positive.sql
-- Custom singular test
select
order_id,
total_amount
from {{ ref('fct_orders') }}
where total_amount < 0
Macros and Jinja
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
({{ column_name }} / 100.0)::decimal(10,2)
{% endmacro %}
-- Usage in model
select
order_id,
{{ cents_to_dollars('amount_cents') }} as amount_dollars
from {{ ref('stg_orders') }}
-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) %}
{% if target.name == 'prod' %}
{{ custom_schema_name | trim }}
{% else %}
{{ default__generate_schema_name(custom_schema_name, node) }}
{% endif %}
{% endmacro %}
Incremental Models
-- models/marts/fct_events.sql
{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge'
) }}
select
event_id,
user_id,
event_type,
event_timestamp,
properties
from {{ source('raw', 'events') }}
{% if is_incremental() %}
-- Only process new records since last run
where event_timestamp > (
select max(event_timestamp) from {{ this }}
)
{% endif %}
dbt Commands
# Run all models
dbt run
# Run specific models
dbt run --select stg_customers
dbt run --select marts.*
dbt run --select +fct_orders # Run with dependencies
# Test your models
dbt test
dbt test --select stg_customers
# Generate documentation
dbt docs generate
dbt docs serve
# Build (run + test)
dbt build
# Check source freshness
dbt source freshness
# Seed static data
dbt seed
Best Practices
- Follow naming conventions: stg_, int_, dim_, fct_ prefixes
- Use refs and sources: Never hardcode table names
- Test everything: Add tests for all critical columns
- Document your models: Write descriptions for columns and models
- Keep models modular: Small, focused transformations
- Use version control: Treat dbt like any software project
Master dbt with Expert Mentorship
Our Data Engineering program covers dbt from basics to production deployment. Build modern data pipelines with guidance from industry experts.
Explore Data Engineering Program