As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today's WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today's WSCs on a single board. Table of Introduction / Workloads and Software Infrastructure / Hardware Building Blocks / Datacenter Basics / Energy and Power Efficiency / Modeling Costs / Dealing with Failures and Repairs / Closing Remarks
An awesome book explaining not only the physical side of the datacenter, but also the requirements of the software infrastructure that needs to be built on top for developers to efficiently leverage the datacenter resources.
This book is a must read for any infrastructure engineer.
I had read excerpts from an older copy of this book. I always wanted to finish reading it.
This book may be easy to comprehend, perhaps because it is meant for engineers and likes, but few of the papers it cites weren't as comprehensive, at least not for me.
Chapter 4 was unusual in the sense that it discusses construction standards, power ratings, heat exchangers and has an image of CFD analysis of an entire data center wing. The idea of applying ML for data center cooling and related optimization was interesting.
One can try creating a pizza-box server using SBCs (some may have regional availability) over a LAN. It is an amazing way to try out certain physics simulations and distributed processing such as finding length limited plaintext of cryptographically unsecure hashes from known hash functions. I wanted to try out tools like Spark and Cassandra to store streamed sensor data and batch retrieval at lower frequency.
Notes: - Disks are magnetic storage units. - Topology is in context with networks.
I took my time to finish and truly understand the contents of this book because I was reading it for work. Our purpose was slightly different, which was to build a computational simulation server, much smaller than warehouse scale machines obviously.
However I found this book very comprehensive, and perfect for beginners in parallel computing. It covers issues I didn't think of in the beginning of our project, such as heat generation and power supply (these are more relevant to hardware and I only know a bit of numerical computing).
There are helpful "management" contents as well such as how to calculate ROI and Capex vs Opex. I totally recommend this book for any engineer who suddenly finds herself having to build a tiny "cluster" at work and have to start with googling shit up…
That said, because this book is for beginners you'll need to start digging through the references listed at the back and read more from there.
A lot of it went over my head, but I think it’s a good intro to understanding some of the hardware components of what goes into a data center, and the trade offs that are considered(I.e why low end CPUs are used over more expensive ones etc.). The chapters about the economics of the data centers and cooling, with all of its metrics was a bit too detailed for me(it talks about a lot of electrical engineering concepts, and provides diagrams that were mostly gibberish to me), but again, that’s because I have 0 prior knowledge. The actual software chapters aren’t bad, but mostly useful to understand the benefits of creating fault tolerant software — how it helps data centers get fixed up as well, etc.
Overall, not bad. I think i would have gotten a lot more out of it if I understood the hardware pieces more, maybe I’ll read it again once I do.
The book presents valuable general info about data center management and can be a starting point. However, it already has a few parts that are outdated and it would need a new version to continue being relevant.
The book explains more or less how Google designed their data centers and what challenges they are facing in running the DCs. Some facts are interesting, but overall it's more like a report with many references to other white papers.
Feels like a report paper (short and dense with statistics) than a book. There are a lot of insights into how Google operates, but it is a bit too short.