How to detect a cron job that runs longer than its interval
To detect a cron job that runs longer than its interval, send a start signal when the job begins and a success signal when it ends. A monitor that receives both measures the actual run duration and can alert when a job is still running past its expected duration. Without start signals, a monitor only sees the success pings and cannot distinguish a slow job from a fast one — or from one that is silently hung.
Why do overlapping cron runs cause problems?
A cron job that runs longer than its schedule interval will be triggered again before the first run finishes. Depending on how the job is written, this can cause:
- Duplicate work: two runs processing the same data simultaneously.
- Resource contention: two heavy jobs competing for the same database or API, slowing both.
- Data corruption: two writers racing to update the same rows.
- Cascading overload: each slow run spawns another, until the system is saturated.
How do I send start and success signals?
Ping a /start endpoint when the job begins and the regular success URL when it finishes. A monitor that receives a /start and then no success within the expected duration alerts that the job is running long:
#!/usr/bin/env bash
set -euo pipefail
# Signal the start of the run.
curl -fsS -m 10 --retry 3 "https://ping.cronshield.com/<your-check-id>/start"
# Do the work.
run_nightly_job
# Signal success. If the job errors, set -e aborts here and no success ping fires.
curl -fsS -m 10 --retry 3 "https://ping.cronshield.com/<your-check-id>"import httpx
PING_URL = "https://ping.cronshield.com/<your-check-id>"
def run_with_heartbeat():
httpx.get(f"{PING_URL}/start", timeout=10)
try:
run_nightly_job()
httpx.get(PING_URL, timeout=10)
except Exception:
httpx.get(f"{PING_URL}/fail", timeout=10)
raiseHow do I prevent a second run from starting while the first is active?
A lock file or a database row prevents concurrent runs. In a shell script, flock is the standard approach:
# Run the job exclusively — a second cron invocation exits immediately if the lock is held.
flock -n /var/lock/nightly.lock /path/to/nightly.shAdd a missed-run alert to this job
The free tier gives you a heartbeat endpoint and an email alert when an expected ping doesn't arrive. Paid tiers add the log-aware diagnosis — the last log line and a likely cause in the alert. The heartbeat receiver ships in an upcoming release; see the plans to learn what each tier adds.
Frequently asked questions
- What's the difference between a start signal and a success signal?
- The start signal tells the monitor when the job began. The success signal tells it when the job finished successfully. The monitor uses the gap between them as the actual run duration and can alert if the job is still 'started' past the expected maximum.
- Can I use flock on macOS?
- flock is a Linux utility (util-linux). On macOS, use lockfile from the procmail package or implement a lock via a database row, Redis key, or a file with a PID check. Kubernetes CronJobs use concurrencyPolicy: Forbid to prevent overlapping runs at the scheduler level.