Section 09

Operations Stack

A real data system needs more than tables. It needs repeatable operations.

A real data system needs more than tables. It needs repeatable operations.

Use this minimum operations stack:

ConcernSimple SystemProduction System
SchedulingcronDagster or Airflow
TestingSQL checksdbt tests + Great Expectations
CI/CDManual runGitHub Actions
AlertsEmail/manualSlack webhook / Sentry
LogsTerminal outputStructured logs
BackfillsManual rerunParameterized backfill job

cron example

bash
0 6 * * * python main.py

Makefile

Makefile
install:
	pip install -r requirements.txt

run:
	python main.py

GitHub Actions

.github/workflows/pipeline.yml
name: Run Data Pipeline

on:
  workflow_dispatch:
  schedule:
    - cron: "0 6 * * *"

jobs:
  run-pipeline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install -r requirements.txt
      - run: python main.py

Slack alerting

python
import requests

def notify_slack(message):
    webhook_url = "SLACK_WEBHOOK_URL"
    requests.post(webhook_url, json={"text": message})

Testing example

A simple SQL check that fails the run if any silver row is missing a key:

sql
SELECT COUNT(*) FROM silver.claims_clean WHERE claim_id IS NULL;

Logging example

Log every step so failures are easy to locate:

python
print("Running bronze load step...")

Backfill example

Run the same pipeline for a historical date:

bash
python main.py --date=2024-01-01