Section 09

Operations Stack

A real data system needs more than tables. It needs repeatable operations.

Use this minimum operations stack:

Concern	Simple System	Production System
Scheduling	cron	Dagster or Airflow
Testing	SQL checks	dbt tests + Great Expectations
CI/CD	Manual run	GitHub Actions
Alerts	Email/manual	Slack webhook / Sentry
Logs	Terminal output	Structured logs
Backfills	Manual rerun	Parameterized backfill job

cron example

bash

0 6 * * * python main.py

Makefile

install:
	pip install -r requirements.txt

run:
	python main.py

GitHub Actions

.github/workflows/pipeline.yml

name: Run Data Pipeline

on:
  workflow_dispatch:
  schedule:
    - cron: "0 6 * * *"

jobs:
  run-pipeline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install -r requirements.txt
      - run: python main.py

Slack alerting

python

import requests

def notify_slack(message):
    webhook_url = "SLACK_WEBHOOK_URL"
    requests.post(webhook_url, json={"text": message})

Testing example

A simple SQL check that fails the run if any silver row is missing a key:

sql

SELECT COUNT(*) FROM silver.claims_clean WHERE claim_id IS NULL;

Logging example

Log every step so failures are easy to locate:

python

print("Running bronze load step...")

Backfill example

Run the same pipeline for a historical date:

bash

python main.py --date=2024-01-01