Transform SQL into tested models for data analytics
dbt is an open-source transformation tool and cloud service that builds production-grade SQL models, tests, and documentation for analytics teams. It’s ideal for analytics engineers and data teams standardizing ELT workflows, with a free open-source core and a paid dbt Cloud for collaboration and scheduling (Team/Enterprise pricing for production features).
dbt (data build tool) is a SQL-first transformation framework that compiles version-controlled SQL and Jinja into repeatable, tested analytics models. Its primary capability is turning SELECT statements into materialized views, incremental tables, and dependency graphs, with built-in testing and documentation generation. dbt’s key differentiator is its open-source core plus a managed dbt Cloud that adds an IDE, job scheduler, and access controls. It serves analytics engineers, data analysts, and platform teams standardizing ELT pipelines for Snowflake, BigQuery, and Redshift. Pricing ranges from a free open-source core and dbt Cloud Free to paid Team and Enterprise tiers for production features.
dbt (data build tool) began as an open-source project from Fishtown Analytics (now dbt Labs) to bring software engineering best practices to analytics code. It positions itself as the SQL-native transformation layer in modern ELT stacks, letting teams write modular SQL while dbt manages dependency compilation, materializations, and incremental runs. The core value proposition is reproducible, testable analytics code: models are source-controlled, documented, and executed against your cloud warehouse so business logic lives centrally in SQL rather than hidden in BI reports or ad-hoc scripts.
The feature surface focuses on model management, testing, documentation, and orchestration. dbt Core compiles SQL + Jinja into compiled SQL and supports materializations including table, view, ephemeral, and incremental models; incremental models reduce compute by applying only changed data. Schema and data tests enforce constraints (unique, not_null, relationships), while snapshots capture slowly-changing dimension state. The documentation generator builds a browsable docs site with a DAG/lineage graph and column-level descriptions. dbt Cloud layers a browser IDE, job scheduler with run history and alerting, role-based access controls, and a REST API for programmatic job management and artifacts retrieval.
Pricing is split between open-source dbt Core (free) and dbt Cloud (hosted) tiers. The open-source dbt Core is free to use in any environment. dbt Cloud offers a Free tier (developer IDE and basic job runs for small teams), a Team plan that typically bills per developer seat (price varies; listed at approximately $50/seat/month by dbt Labs as of 2025), and Enterprise plans that are custom-priced for SSO, advanced SLAs, audit logs, and large-scale orchestration. Free tier limits are intentionally modest; production scheduling, advanced permissions, and enterprise support require Team or Enterprise. Exact commercial pricing is quoted via dbt Labs and can be negotiated for volume and annual commitments.
Real-world users include analytics engineers creating centralized model layers, data platform teams operationalizing ELT, and analysts publishing governed metrics. For example, an Analytics Engineer uses dbt to deploy incremental models that cut warehouse costs by 60% versus full-refresh loads, while a Data Platform Lead uses dbt Cloud to enforce CI/CD and role-based access for 20+ developers. dbt commonly pairs with Snowflake, BigQuery, or Redshift; compared to Google Cloud Dataform or Apache Airflow, dbt emphasizes SQL-native modeling, testing, and documentation rather than general orchestration.
Three capabilities that set dbt apart from its nearest competitors.
Current tiers and what you get at each price point. Verified against the vendor's pricing page.
| Plan | Price | What you get | Best for |
|---|---|---|---|
| Open-source (dbt Core) | Free | Local CLI usage; no hosted IDE, scheduler, or managed support | Individual developers and teams running CI externally |
| Cloud Free | Free | Single-developer IDE, limited job minutes, basic run history | Exploration, small projects, and trialing dbt Cloud |
| Team | $50/seat/month (approx.) | Per-developer seats; scheduled jobs, basic RBAC, email support | Small analytic teams needing scheduling and collaboration |
| Enterprise | Custom | SAML/SSO, audit logs, SLOs, unlimited seats/workloads | Large organizations requiring SLAs and compliance |
Copy these into dbt as-is. Each targets a different high-value workflow.
You are a senior analytics engineer. Produce a single dbt model SQL file that transforms the source table raw.sales_events into a clean, production-ready incremental model. Constraints: target Snowflake, model path marts/sales_orders.sql, dbt config must set materialized='incremental' and unique_key='order_id', deduplicate by latest event_time, and include SQL comments documenting columns. Also produce a minimal marts/schema.yml that adds a not_null test on order_id and a description for the model. Output format: two labeled code blocks: (1) marts/sales_orders.sql content, (2) marts/schema.yml content. Example: use a dbt config({{config(materialized='incremental', unique_key='order_id')}}).
You are a dbt developer. Given a dbt model named marts.user_profiles, produce a schema.yml fragment that defines the model, adds human-readable descriptions, and includes these tests: not_null on user_id, unique on user_id, accepted_values for status ['active','inactive','pending'], and a relationship test linking user_id to raw.users.id. Constraints: produce valid dbt YAML, follow model name and column naming exactly, and include test severity and tags for each test. Output format: a single YAML code block labeled marts/schema.yml. Example: show how to add severity: warn under tests.
You are an analytics engineer. Create a dbt incremental model for BigQuery that ingests staging.events_raw into marts.user_events, deduplicates on event_id keeping the row with the latest received_at, and supports full-refresh. Constraints: use {{config(materialized='incremental', unique_key='event_id') }}, implement a merge-style incremental pattern compatible with BigQuery (use is_incremental()), and include one dbt test suggestion. Output format: provide (1) marts/user_events.sql full content, (2) brief 3-line explanation of the incremental logic, (3) a one-block schema.yml snippet adding not_null on event_id. Example: show the WHERE clause used when is_incremental() is true.
You are a data platform engineer. Provide a refactor plan and file templates to convert three ad-hoc models into a dbt package that uses sources for raw tables and exposures for two dashboards. Constraints: models are raw.orders, raw.users, raw.products; target Redshift naming conventions: schema = analytics, models prefix = marts_. Output format: a numbered list of files to create (sources.yml, marts_orders.sql, marts_users.sql, marts_products.sql, exposures.yml), with full contents for each file (YAML or SQL). Include comments explaining key lines and a short migration checklist (3 steps).
You are a data platform lead designing CI/CD for a 20+ developer analytics team using dbt Cloud. Deliver a production-ready CI/CD proposal: Git branching strategy, dbt Cloud job definitions (build, test, snapshot, seed), schedule and concurrency limits, role-based access controls, and a GitHub Actions workflow that runs model linting and unit tests on PRs. Constraints: include rollback strategy, environment promotion (dev->staging->prod), and Slack alerts for failures. Output format: structured sections with YAML job examples (dbt Cloud job JSON/YAML), a GitHub Actions workflow file, and a short RBAC table mapping roles to permissions. Include two short examples of job schedules.
You are a lead analytics engineer tasked with reducing warehouse costs by converting heavy full-refresh models to incremental and materialized views across Snowflake. Produce a prioritized migration plan for up to 12 models: include selection criteria (cost, row growth, last_modified, dependencies), an estimated % cost reduction per model, required schema changes, sample converted SQL for two representative models (one high-cardinality, one low-cardinality), impact on tests, and monitoring KPIs to track post-migration. Constraints: provide a rollout schedule (weeks), owner assignment template, and rollback/validation steps. Output format: prioritized table, two SQL examples, and a 6-step rollout checklist.
Choose dbt over Google Cloud Dataform if you prioritize an open-source SQL-first modeling framework with built-in testing and docs.
Head-to-head comparisons between dbt and top alternatives: