Skip to content
Get started

What a heartbeat monitor is and how it works

A heartbeat (dead man's switch) monitor is a monitoring pattern where your job actively checks in by sending an HTTP ping on a regular, known schedule, and the monitor stays silent while pings arrive on time and alerts you when a ping is overdue. This is the opposite of a push alert — your job is responsible for reporting success, and silence is the failure signal. It catches the cases that push alerts miss: a job that never ran, a machine that went down, or a scheduler that stopped without logging an error.

How does a heartbeat monitor work?

You create a monitor with an expected period (how often the ping should arrive) and a grace time (how late a ping is allowed to be before the monitor alerts). The job sends an HTTP GET to the monitor's URL as its last action on success. The monitor tracks when it last heard from the job:

  • Ping arrives on time: the monitor resets its timer. No alert fires.
  • Ping is late but within grace: the monitor waits. Still no alert.
  • Ping is overdue past grace: the monitor fires an alert — the job missed its expected window.
  • Job pings a fail URL: some monitors support a separate fail endpoint so a job that ran but produced an error can report that explicitly.

What failure modes does a heartbeat catch that push alerts miss?

  • The whole machine went down: the job never ran, so no push alert was sent. The heartbeat simply stops, and the monitor alerts.
  • The scheduler was disabled: GitHub Actions auto-disables schedules after 60 days of inactivity. The job never ran, sent no error, triggered no push. The heartbeat catches it.
  • The job exited non-zero but produced no output: cron only emails output. A job that fails silently sends no email. The heartbeat catches it.
  • The job ran but the important work failed: a job that exits 0 but skipped the core work won't push an alert. Move the ping to after the critical step.

What is the canonical heartbeat pattern?

The simplest possible heartbeat
# Run the job. Ping only if it exits 0.
/path/to/job.sh && curl -fsS -m 10 --retry 3 "https://ping.cronshield.com/<your-check-id>"

The && is load-bearing: it ensures the ping fires only on a clean exit. With ; or on a separate line, the ping fires even after a failure — and you'd never know the job failed.

Set the grace period above the job's expected duration. A daily job that normally takes 5 minutes should have at least a 15-minute grace period so a slower-than-normal run doesn't trigger a false alarm. PING_URL is a placeholder for the endpoint you get when you create a monitor.

Add a missed-run alert to this job

The free tier gives you a heartbeat endpoint and an email alert when an expected ping doesn't arrive. Paid tiers add the log-aware diagnosis — the last log line and a likely cause in the alert. The heartbeat receiver ships in an upcoming release; see the plans to learn what each tier adds.

Frequently asked questions

What is the difference between a heartbeat monitor and uptime monitoring?
Uptime monitoring probes your service from outside (HTTP checks, ping) to confirm it responds. A heartbeat monitor is the inverse: your job reports to the monitor. Uptime monitoring confirms a server is reachable; a heartbeat confirms a specific job ran and succeeded.
Should I set grace time to zero?
No. Most schedulers have some timing imprecision — GitHub Actions can be delayed by minutes, platform crons can fire a few seconds early or late, and a job that normally takes 10 seconds might occasionally take 2 minutes. Set grace time above the job's expected maximum duration to avoid false alarms.