What is dbt?

dbt (data build tool) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouse using SQL. It's the "T" in ELT - handling transformations after data has been loaded into your warehouse.

dbt brings software engineering practices to analytics: version control, testing, documentation, and modular code. It's become the industry standard for analytics engineering.

Core Concepts

  • Models: SQL SELECT statements that define transformations
  • Sources: Raw data tables loaded into your warehouse
  • Tests: Assertions about your data quality
  • Documentation: Descriptions and metadata for your models
  • Macros: Reusable SQL snippets with Jinja templating
  • Seeds: CSV files loaded into your warehouse

Project Structure

my_dbt_project/
├── dbt_project.yml      # Project configuration
├── models/
│   ├── staging/         # Raw data cleaning
│   │   ├── stg_customers.sql
│   │   └── stg_orders.sql
│   ├── intermediate/    # Business logic
│   │   └── int_orders_enriched.sql
│   └── marts/          # Final analytics tables
│       ├── dim_customers.sql
│       └── fct_orders.sql
├── tests/              # Custom tests
├── macros/             # Reusable SQL functions
├── seeds/              # Static CSV data
└── snapshots/          # Track slowly changing dimensions

Writing Models

-- models/staging/stg_customers.sql
-- Configuration block
{{ config(
    materialized='view',
    schema='staging'
) }}

with source as (
    select * from {{ source('raw', 'customers') }}
),

renamed as (
    select
        id as customer_id,
        first_name,
        last_name,
        email,
        created_at,
        updated_at
    from source
)

select * from renamed
-- models/marts/fct_orders.sql
{{ config(
    materialized='table',
    schema='analytics'
) }}

with orders as (
    select * from {{ ref('stg_orders') }}
),

customers as (
    select * from {{ ref('stg_customers') }}
),

final as (
    select
        o.order_id,
        o.customer_id,
        c.customer_name,
        o.order_date,
        o.total_amount,
        o.status,
        -- Calculate metrics
        sum(o.total_amount) over (
            partition by o.customer_id
            order by o.order_date
        ) as cumulative_spend
    from orders o
    left join customers c on o.customer_id = c.customer_id
)

select * from final

Defining Sources

# models/staging/sources.yml
version: 2

sources:
  - name: raw
    database: raw_database
    schema: public
    tables:
      - name: customers
        description: "Raw customer data from CRM"
        columns:
          - name: id
            description: "Primary key"
            tests:
              - unique
              - not_null
      - name: orders
        description: "Raw order data from e-commerce platform"
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}

Testing Your Data

# models/staging/schema.yml
version: 2

models:
  - name: stg_customers
    description: "Cleaned customer data"
    columns:
      - name: customer_id
        description: "Unique customer identifier"
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'inactive', 'churned']

  - name: fct_orders
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - order_id
            - customer_id
-- tests/assert_total_amount_positive.sql
-- Custom singular test
select
    order_id,
    total_amount
from {{ ref('fct_orders') }}
where total_amount < 0

Macros and Jinja

-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
    ({{ column_name }} / 100.0)::decimal(10,2)
{% endmacro %}

-- Usage in model
select
    order_id,
    {{ cents_to_dollars('amount_cents') }} as amount_dollars
from {{ ref('stg_orders') }}
-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) %}
    {% if target.name == 'prod' %}
        {{ custom_schema_name | trim }}
    {% else %}
        {{ default__generate_schema_name(custom_schema_name, node) }}
    {% endif %}
{% endmacro %}

Incremental Models

-- models/marts/fct_events.sql
{{ config(
    materialized='incremental',
    unique_key='event_id',
    incremental_strategy='merge'
) }}

select
    event_id,
    user_id,
    event_type,
    event_timestamp,
    properties
from {{ source('raw', 'events') }}

{% if is_incremental() %}
    -- Only process new records since last run
    where event_timestamp > (
        select max(event_timestamp) from {{ this }}
    )
{% endif %}

dbt Commands

# Run all models
dbt run

# Run specific models
dbt run --select stg_customers
dbt run --select marts.*
dbt run --select +fct_orders  # Run with dependencies

# Test your models
dbt test
dbt test --select stg_customers

# Generate documentation
dbt docs generate
dbt docs serve

# Build (run + test)
dbt build

# Check source freshness
dbt source freshness

# Seed static data
dbt seed

Best Practices

  • Follow naming conventions: stg_, int_, dim_, fct_ prefixes
  • Use refs and sources: Never hardcode table names
  • Test everything: Add tests for all critical columns
  • Document your models: Write descriptions for columns and models
  • Keep models modular: Small, focused transformations
  • Use version control: Treat dbt like any software project

Master dbt with Expert Mentorship

Our Data Engineering program covers dbt from basics to production deployment. Build modern data pipelines with guidance from industry experts.

Explore Data Engineering Program

Related Articles