“We are still fighting the Global War on Terror with Solaris 2.5.1”
–Andrew Seely
After a few weeks of holiday season distraction, I’ve finally finished writing Thursday’s notes. This is a time when I wish I could make it look like I posted the blog entry a while ago. 😉
Everything You Know About Monitoring Is Wrong
This was a talk where I was really glad to have the conference’s IRC channel handy. It was good to know that the places that I got lost in this talk also lost a number of other people. There were also a few times where a point in the talk got lost on me, but someone else in the IRC channel was able to put together a reasonable explanation of what Mazda was trying to say.
It seemed that Mazda’s premise was that if you take the quantity of all possible metrics that can be monitored for all of your systems, you only need to actually monitor about 40% of them in order to catch and/or proactively prevent more than 90% of the problems. To back this up, he presented a bunch of conclusions based on a simulation – including a couple slides full of very fancy and complex math. While the math certainly looked impressive, it didn’t seem to explain much in the short time it was on the screen and there didn’t seem to be anyone on the IRC channel that could explain it either. In fact, there seemed to be a distinct lack of information about the composition of the simulation, so it was difficult to judge whether it would apply accurately to typical cases.
Despite the gaps in the proof, the 40% claim does at least feel like it could be reasonable. I’m confident that I’m monitoring well under 40% of the metrics available from the systems that I’m directly responsible for. I think I’m catching the majority of problems before they become operationally impairing…
The Guru Is In: VoIP
Two years ago, this Guru, Robert Sparks, gave me the single piece of information from LISA that I have repeated the most at my job: It’s not enough simply to say that a device is SIP compliant. It is important to make sure that the device actually plays nicely with the other SIP devices that you have. For a given operation, the SIP standards often allow for multiple ways to accomplish the operation – you need to make sure your devices agree on how each other actually do “it.” This has come in to play multiple times at work where someone will suggest that we could use a given device because both it and our phone system “support SIP.” I’ve been able to adjust expectations appropriately that we need to try it out before we assume we’re home free.
Attendance at SIPit events is still a useful indication of the potential for interoperability. If you ask a vendor whether they attend SIPit and they either don’t attend or they don’t know if they attend, that’s a hint that they may lack clues.
Robert pointed out some efforts being made to deal with SIP passing through NAT. The issue is that SIP carries IP and port information at layer 5 and a lot of NAT devices don’t necessarily do the appropriate translations above layer 4. He mentioned “SIP Outbound,” which involves a TCP connection to a proxy server. There is also “TURN” (a.k.a. STUN relays) that involves remote port forwarders.
It was mentioned that H.323 development is pretty much dead at this point, but existing H.323 stuff is still being maintained. I assume that SIP is being utilized as the replacement for H.323 going forward, although I don’t know if it was actually said because didn’t write a specific note to this effect.
There was some discussion about what to use for an interface between one’s own VoIP system and the PSTN. A recommendation for Centrix Voice came up.
Something I had never noticed before was companies having their IVRs handle calls during the alerting phase of a call… For example, there is at least one major airline whose IVR interacts with incoming calls, having its outgoing audio masquerade as ringback. If you call them from a phone that displays ISDN call states (like my employer’s Cisco phones do), you’ll see that the phone still thinks the call is ringing while you’re happily(?) interacting with the IVR. Robert said that the companies don’t get charged for the call unless a human answers it, at which point the call moves to the connected state. The billing arrangement with the telco typically sets very high charges for completed calls. I guess that explains why it’s so hard to get out of IVR prison on some of those systems….
Command and Control: System Administration at U.S. Central Command
As Andrew explained it, Command and Control involves the systems that move information between the field and the command centers. This is literally a line of work where people can indeed be killed if the systems don’t get their jobs done.
One of the themes I got out of this talk was that working under pressure with stovepipe applications is not a fun situation. In Andrew’s case, it looks like a bunch of black boxes that each do a specific task and get connected to other black-box applications, like segments of a stovepipe. The problem is that each segment of the pipe was likely built as part of a separate government contract. Each contract may or may not have been well specified. The contactor (lowest bidder?) may or may not have been particularly good at implementing it. Each segment may have been built with or without knowledge of other stuff it would be connected to. During troubleshooting, the people who built a malfunctioning part of the pipe may or may not be reachable for help. Thus, the job involves a lot of trial and error with the heat turned up.
A funny moment during the talk also highlighted why it’s important to get outside your own walls. Andrew displayed a piece of code that he wrote to solve a problem he was having. The issue was that he needed to transmit a very large amount of data over unstable network links to restore a crucial system. Since it was not feasible to send the whole raft of data at one shot, he needed to split it up in to several pieces. He wrote roughly a full slide worth of code to break the data in to multiple pieces. This was the point in the presentation where a few audience members chimed in to point out the split utility. He was glad to have learned something during his trip to LISA!
If I had a nickel for every time I reinvented the wheel…