Through a career spanning over twenty years at ISPs, eCommerce shops, and technology giants like Twitter and Stripe I have learned that complex systems will inevitably fail. To repair and expand these systems, we need humans! The goal is to create adaptive capacity, to be resilient.
After joining Twitter in 2012 as one of the first Site Reliability Engineers (SRE) I leaned into observability. While I still consider this work important, I came to the conclusion that charts don’t solve problems, people do. This led me to invest more in learning about resilience engineering and adaptive capacity.
- A 3 part series on automation:
I speak regularly and conferences across the country promoting resilient and thoughtful, empathetic operations.
Check Back Soon
Having spent most of my career on call I believe that organizations can greatly improve the happiness and effectiveness of employees and customers by investing in resilience!