Section 14
Local Data System (DuckDB)
A complete analytics stack that runs on a laptop. Free, fast, and shippable.
DuckDB is an in-process columnar database. Combined with Python and a tidy folder layout, it's enough to power real analytics for small teams.
Folder structure
project/
my-data-system/
├── data/
│ ├── raw/ # source dumps (csv/json/parquet)
│ ├── warehouse.duckdb
│ └── exports/ # excel / csv outputs
├── pipelines/
│ ├── __init__.py
│ ├── ingest_claims.py
│ ├── transform_silver.py
│ └── build_gold.py
├── sql/
│ ├── bronze/
│ ├── silver/
│ ├── gold/
│ └── audit/
├── scripts/
│ └── run_all.py
├── tests/
├── .env
├── pyproject.toml
└── README.mdDuckDB schema design
sql/bronze/_schemas.sql
CREATE SCHEMA IF NOT EXISTS bronze;
CREATE SCHEMA IF NOT EXISTS silver;
CREATE SCHEMA IF NOT EXISTS gold;
CREATE SCHEMA IF NOT EXISTS audit;Example bronze table
sql/bronze/raw_claims.sql
CREATE TABLE IF NOT EXISTS bronze.raw_claims (
claim_id VARCHAR,
member_id VARCHAR,
provider_id VARCHAR,
service_date DATE,
amount_billed DECIMAL(12,2),
status VARCHAR,
payload JSON,
_source VARCHAR NOT NULL,
_loaded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Python pipeline skeleton
pipelines/ingest_claims.py
import duckdb
from pathlib import Path
from datetime import datetime
DB = "data/warehouse.duckdb"
SOURCE = "data/raw/claims_2025_01.csv"
def run():
con = duckdb.connect(DB)
con.execute("BEGIN")
try:
con.execute(f"""
INSERT INTO bronze.raw_claims
SELECT *,
'{SOURCE}' AS _source,
CURRENT_TIMESTAMP AS _loaded_at
FROM read_csv_auto('{SOURCE}', header=True)
""")
con.execute("""
INSERT INTO audit.pipeline_runs(job, status, rows, ran_at)
VALUES ('ingest_claims', 'success', changes(), CURRENT_TIMESTAMP)
""")
con.execute("COMMIT")
except Exception as e:
con.execute("ROLLBACK")
con.execute("""
INSERT INTO audit.pipeline_runs(job, status, error, ran_at)
VALUES ('ingest_claims', 'failure', ?, CURRENT_TIMESTAMP)
""", [str(e)])
raise
if __name__ == "__main__":
run()Excel export
scripts/export_excel.py
import duckdb
import pandas as pd
con = duckdb.connect("data/warehouse.duckdb", read_only=True)
df = con.execute("SELECT * FROM gold.claims_summary").df()
df.to_excel("data/exports/claims_summary.xlsx", index=False)
print(f"Exported {len(df)} rows")Why DuckDB locally
Single file, zero ops, columnar speed, reads CSV/Parquet/JSON directly. Perfect for prototyping the same shape you'll deploy to Postgres or Snowflake later.