
Cloud & Infrastructure
Site Reliability Engineering: How Google Runs Production Systems
Niall Richard Murphy, et al.· Published 2016
The foundational text on SRE, explaining how to apply engineering principles to operations to build ultra-scalable and reliable systems.
As an Amazon and Bookshelf Associate, I earn from qualifying purchases.
Why It's On My Shelf
The concept of an error budget reframes reliability as a business decision rather than an engineering argument. By explicitly trading stability for speed, teams stop debating ideology and start aligning on risk tolerance. CIOs often reference this model when balancing uptime commitments with innovation pressure in cloud native environments.
Cloud & Infrastructure
Cloud FinOps: Collaborative, Real-Time Cloud Financial Management
J.R. Storment & Mike Fuller
The definitive guide to the practice of FinOps, bringing financial accountability to the variable spend model of the cloud.
View on Amazon