The CRE role at Cloudflare is a promotion from Senior Escalation Engineer — and a shift in scope. Our team still own the hardest technical problems, but now we are also accountable for the long-term reliability health of our most strategic customers, not just resolving individual incidents.
In practice, this means working with customers before things break — helping them define meaningful SLOs, identify architectural risks, and build systems that are resilient from the start. When something does go wrong, me and my CRE colleagues lead the technical response: pulling telemetry from Prometheus and Grafana, tracing distributed failures across Cloudflare's infrastructure, and coordinating across engineering teams until we have a root cause and a fix.
What makes this role different is the dual accountability — I'm simultaneously the customer's advocate inside Cloudflare, and Cloudflare's technical authority to the customer. Getting that balance right requires as much judgment as it does technical skill.
We also build internal tooling to eliminate recurring manual work — scripts and automation that make the next incident faster to resolve than the last.