2023Architect
Internal Observability Platform
Metrics, traces, logs — with SLOs people actually use.
Led the design and rollout of an internal observability platform. Traces on OpenTelemetry, metrics on Prometheus, logs on Loki, SLO burn-rate alerts, and a service catalog that ties each SLO to on-call runbooks.
Highlights
- SLO burn-rate alerts replaced noisy threshold alerts
- Service catalog integrated with on-call
- Team-level reliability scorecards
Stack
OpenTelemetryPrometheusGrafanaLokiTerraform