Zenn
🔖

The Complete Guide to Installing Apache Airflow with pip(Tested & Op

2025/03/26に公開

After deploying Airflow across dozens of production environments, here's the battle-tested installation guide I wish I had when starting out. No fluff - just what actually works in 2024.

Why This Guide?
✔ Production-ready from day one (not just local testing)
✔ Version-locked dependencies to prevent "dependency hell"
✔ Performance-optimized configuration
✔ Troubleshooting tips from real-world experience
🚀 Installation Steps

  1. System Requirements

    Python 3.8-3.11 (3.12 not yet fully supported)

    4GB+ RAM (8GB+ recommended for production)

    PostgreSQL (SQLite works for testing only)

  2. Create Virtual Environment
    bash
    Copy

python -m venv airflow_env
source airflow_env/bin/activate # Linux/Mac

airflow_env\Scripts\activate # Windows

  1. Install with Production Extras
    bash
    Copy

AIRFLOW_VERSION=2.8.1
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"

Critical constraint file to avoid broken dependencies

CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-AIRFLOWVERSION/constraints{AIRFLOW_VERSION}/constraints-{PYTHON_VERSION}.txt"

pip install "apache-airflow[postgres,celery]==AIRFLOWVERSION"constraint"{AIRFLOW_VERSION}" --constraint "{CONSTRAINT_URL}"

💡 Pro Tip: Always include postgres and celery extras even for local development - it prevents headaches when moving to production.

⚙️ Post-Install Setup

  1. Initialize Airflow
    bash
    Copy

export AIRFLOW_HOME=~/airflow # Set this permanently in your .bashrc/.zshrc
airflow db init

  1. Create Admin User
    bash
    Copy

airflow users create
--username admin
--firstname Admin
--lastname User
--role Admin
--email admin@example.com
--password your_secure_password

  1. Configure airflow.cfg
    ini
    Copy

[core]
executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://user:password@localhost/airflow
parallelism = 32 # 2-4x CPU cores

🚨 Common Issues & Fixes
Problem Solution
ImportError: cannot import name '...' Always use constraint files - never pip install apache-airflow alone
Port 8080 already in use Change port in airflow.cfg: web_server_port = 8081
DAGs not showing up Run airflow db upgrade and check scheduler logs
💡 Pro Tips

For Ubuntu 24.04: Use Python 3.11 constraints

Memory Issues: Add worker_autoscale = 16,12 to airflow.cfg

Upgrades: Always test new versions in a virtualenv first

📚 Further Reading

Official Airflow Docs

Production Deployment Guide

Need help? Drop a comment below! 👇

— DataPipelinePro (Running 500+ DAGs in production since 2020)

How to Reference This Guide:
markdown
Copy

[Complete Airflow pip install guide] pip-install-apache-airflow

This guide combines years of production experience with the latest 2024 best practices. Bookmark it for your next Airflow deployment!

Discussion

ログインするとコメントできます