Set service objectives (SLOs)
- Availability: Node RPC reachable > 99.9% monthly.
- Freshness: Chain tip lag < 2 blocks for 99% of minutes.
- Webhook latency: p95 delivery < 15s; success rate > 99.5%.
- Lightning payment success: p50 < 2s, p95 < 8s; success rate > 98% in steady state.
What to monitor
Chain sync: headers/blocks behind, mempool health, peer count.
Resources: disk free > 15%, disk I/O, CPU/RAM, file descriptors.
Services: bitcoind/lnd/cln processes up, RPC/REST responsiveness.
Security: auth failures, config drift, unexpected open ports.
Lightning: channel states, inbound/outbound liquidity, failed HTLCs, gossip sync.
Webhooks: delivery success, retries, signature verification, end‑to‑end latency.
Dashboards
- Overview: tip height, lag, peers, mempool size, CPU/RAM/disk.
- Lightning: channels by state, liquidity heatmap, payment success, HTLC failures.
- Webhooks: sent, succeeded, failed, p50/p95 latency (see Bitcoin Flux).
Logs & events
- Ship
bitcoindand node logs to centralized storage with retention > 30 days. - Tag events with
env,service, andhostfor search. - Keep webhook delivery logs with ids and HMAC verification status.
Alerting policy
- Page (critical): node down, lag >= 6 blocks for 5m, disk < 5%.
- Ticket (warning): lag 2–5 blocks, disk < 15%, mem > 90% for 10m.
- Info: new release available, peer churn > threshold.
Incident runbooks
- Lagging node: check peers, bandwidth, disk I/O; restart service; if corruption, reindex from snapshot.
- Webhook failures: verify DNS/HTTPS, rotate secret, replay from dead‑letter queue; confirm idempotency on receiver.
- Lightning failures: inspect liquidity on path, attempt MPP, rebalance, or fallback on‑chain.
Recommended tooling
- Metrics stack (Prometheus + Grafana or managed equivalent).
- Log shipping (vector/fluentbit) to centralized search.
- Alerting to on‑call with escalation policy.
- Bitcoin Flux for invoice/webhook observability.
Tip: Track symptom alerts (lag, failed payments) rather than only cause alerts (process down). Users feel symptoms.