The title is succinct, but in practice an organization’s “observability” efforts range a number of disciplines. This document aims to compress the breadth of topics into a succinct (fitting on 1 printed page) set of best of breed write-ups that avoid any ties to vendors or implementation.

Periodic Reading

Enjoy mailing lists and such? Here are some good ones:

  • Monitoring Weekly is exactly what it sounds like.
  • Thai Wood’s Resilience Roundup summarizes papers in the resilience space and adds special insight from his combined tech and EMT background.
  • Lex Neva’s SRE Weekly frequently hits topics in or adjacent to observability.

My Contributions

As a long time advocate of observability I hope it’s ok to add a few bits of my own. First, my definition:

Observability is a quality of software, services, platforms, or products that allows operators to understand how systems are working. Observability makes investigating and diagnosing problems easier; the more observable a system, the more tools we’ve made available to diagnose problems or understand behavior.

And some of my works:

Honorable Mention

I’ve not read all of these yet, but seen them referenced enough to think they are worth a mention.

Seeking SRE is a supplement to Google’s SRE book, aimed at how the SRE role can be applied to organizations that aren’t Google.