How to monitor Apache Airflow DAG schedules
Monitor Airflow DAG schedules by watching the scheduler's heartbeat via the /api/v2/monitor/health endpoint, which flags the scheduler unhealthy if its last heartbeat is more than 30 seconds old by default. For individual DAG run monitoring, use the on_failure_callback on a task or DAG to send an alert when a task fails, and check data_interval_end against the actual execution time to catch late or missed runs. The scheduler's heartbeat is the single point of failure — if it stops, all DAG runs stop with it.
How do I check if the Airflow scheduler is healthy?
Airflow exposes a health endpoint that reports the scheduler's last heartbeat. If the heartbeat is stale, the scheduler has stopped dispatching runs:
curl -s https://<your-airflow-host>/api/v2/monitor/health | python3 -m json.tool
# Look for: "scheduler": {"status": "healthy", "latest_scheduler_heartbeat": "..."}
# "status": "unhealthy" means the scheduler is not running or is hung.The scheduler is considered unhealthy when its last heartbeat is more than 30 seconds old (the default scheduler_health_check_threshold). Poll this endpoint from an external monitor and alert when status is not 'healthy'.
How do I get alerted when a task or DAG run fails?
Use on_failure_callback on the DAG or individual tasks to run a function when a task fails. A common pattern sends a notification to Slack, email, or a webhook:
import httpx
from airflow.sdk import DAG, task
from datetime import datetime
def on_failure(context):
task_id = context["task"].task_id
dag_id = context["dag"].dag_id
httpx.post("https://hooks.slack.com/your-webhook", json={
"text": f"Airflow task failed: {dag_id}/{task_id}"
}, timeout=10)
with DAG(
"nightly_report",
schedule="0 3 * * *",
start_date=datetime(2026, 1, 1),
catchup=False,
on_failure_callback=on_failure,
) as dag:
@task
def run_report():
# job logic here
pass
run_report()How do I detect a missed or late DAG run?
Airflow's DAG view shows each run's data_interval_end (when the run was scheduled to cover data through) and the actual execution start. A run that is far past its data_interval_end without starting is late or missed. You can query this via the Airflow REST API and alert when a run in 'queued' or 'scheduled' state is overdue:
# List the latest DAG run and its state
curl -s "https://<host>/api/v2/dags/nightly_report/dagRuns?limit=1&order_by=-start_date" \
-u airflow:airflow | python3 -m json.toolAdd a missed-run alert to this job
The free tier gives you a heartbeat endpoint and an email alert when an expected ping doesn't arrive. Paid tiers add the log-aware diagnosis — the last log line and a likely cause in the alert. The heartbeat receiver ships in an upcoming release; see the plans to learn what each tier adds.
Frequently asked questions
- What does 'unhealthy' mean in the Airflow health endpoint?
- The scheduler reports unhealthy when its last heartbeat is older than scheduler_health_check_threshold (30 seconds by default). An unhealthy scheduler means no new task instances are being dispatched, so DAGs that were due to run will queue but never start.
- Does Airflow retry failed tasks automatically?
- Yes, if you set retries on a task (or globally in default_args). Each retry is a separate task instance. On_failure_callback fires after all retries are exhausted.