Section 09
Operations Stack
A real data system needs more than tables. It needs repeatable operations.
A real data system needs more than tables. It needs repeatable operations.
Use this minimum operations stack:
| Concern | Simple System | Production System |
|---|---|---|
| Scheduling | cron | Dagster or Airflow |
| Testing | SQL checks | dbt tests + Great Expectations |
| CI/CD | Manual run | GitHub Actions |
| Alerts | Email/manual | Slack webhook / Sentry |
| Logs | Terminal output | Structured logs |
| Backfills | Manual rerun | Parameterized backfill job |
cron example
bash
0 6 * * * python main.pyMakefile
Makefile
install:
pip install -r requirements.txt
run:
python main.pyGitHub Actions
.github/workflows/pipeline.yml
name: Run Data Pipeline
on:
workflow_dispatch:
schedule:
- cron: "0 6 * * *"
jobs:
run-pipeline:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r requirements.txt
- run: python main.pySlack alerting
python
import requests
def notify_slack(message):
webhook_url = "SLACK_WEBHOOK_URL"
requests.post(webhook_url, json={"text": message})Testing example
A simple SQL check that fails the run if any silver row is missing a key:
sql
SELECT COUNT(*) FROM silver.claims_clean WHERE claim_id IS NULL;Logging example
Log every step so failures are easy to locate:
python
print("Running bronze load step...")Backfill example
Run the same pipeline for a historical date:
bash
python main.py --date=2024-01-01