Notes from LISA 2006, Part 3

Here are my notes from Wednesday afternoon…

Site Reliability at Google/My First Year at Google

Tom Limoncelli opened his talk by discussing three ground rules. It turns out that ground rule #2 was the most important – there were certain details he simply could not talk about. In fact, rule #2 turned out to be the theme of the presentation. The result was a very high-level summary of some of the techniques used by Google to deliver their services in a fast and reliable manner. Much of it was info that I had seen at previous talks by Google folks, but a few items were new to me.

One tactic that I thought was particularly interesting was the separation between rolling out changes to the binary for a service and activating the new features contained therein. I’ve always done both operations in one step. At Google, they make a change to the binary and deploy the new binary with a command line switch that keeps the new feature turned off. After they complete a successful rollout of the new binary and confirm that it’s still stable, they turn on the new feature. Thus, they can tell the difference between a problem caused by the new binary itself or the new feature.

Another new piece of info for me was how they deal with servicing requests for services that are not provided by the data center that is closest to the requesting party. The example that Tom used for this was Google Maps. Let’s say a user of Google Maps is network-closest to Data Center A, but that data center does not provide that service – the network-closest data center that could provide that service to the user is Data Center B. They direct the user’s browser to fetch the UI portions of each page from Data Center A, which then directs the traffic over their high-speed WAN backbone to Data Center B. This gets the UI back to the user as quickly as possible. The user’s browser fetches the map images directly from Data Center B over the public Internet. I would like to think that if I was in this sort of situation that I would have thought of this tactic, but this was certainly an “oh, yeah, that’s cool” moment for me during the talk.

Naturally, the usual suspects (GFS and mapreduce) made it in to this talk as well. Those are, of course, very cool methods as well.

The Guru Is In: Spam

Doug Hughes gave some very brief opening remarks and fielded lots of questions on the spam arms race. He also made various notes on the content that was tossed out and/or discussed during the session. Rather than replicate the info here, you can take a look
at the conference Wiki
(for as long as it stays up, anyway).