SuperComputing 2011 thoughts

I’ve just spent the last 5.5 hours wandering around the trade show at SuperComputing 2011. The trends I noticed last time SC was in Seattle (I think it was 2005) don’t seem to have changed much except for one “interesting” development which I’ll get to in a minute.

There appear to be two primary thrusts to supercomputing this year. Massively parallel systems based on Intel or AMD chips and Massively Clustered systems based on Intel or AMD chips. The confusing part is that they are referring to both of these models as High Performance Computing (HPC).

Massively Parallel systems are presented as if they were one host to the computing world. They have very high speed interconnects and focus on low latency between all cores in the system. They tend to have lots of disks “somewhere else” and access data via a SAN. These boxes can be reconfigured to run multiple jobs but focus on huge parallel tasks that take e.g. 4K cores at a go. Jobs are submitted via a special interface or via a batch queue system like Moab. These are the class of systems that show up in the top 500 supercomputer list.

Massively Clustered systems are individual computers (blades or 1- or 2-RU boxes) with a standard network connection between them. (At Convex we would have called them Loosely-Coupled, Tightly-Bolted.) They can do parallel jobs via MPICH2 or related but because of the slower interconnect aren’t as good at that class of job. Jobs are submitted either through a head-node or via a job/batch processing system like Condor or Moab. These are the kinds of clusters I’ve been working on since my days on the Farm with Paul Fink, Greg Rose, and Scott Kennedy at IBM TJ Watson Research in 1991-1992.

The interesting twist I mentioned earlier is that people are admitting Seymour Cray was right. Vector processing is powerful. They’re admitting this via the adaptation of GPU processing. There appears to be wide-spread adoption of NVIDIA’s methods and even a consortium developing around it. A lot of the custom and semi-custom computer-in-a-coffin (halfway between a blade and a standard 1U chassis) systems are adding them.

The big message at almost every booth was “GREEN!” Either in power or cooling or a combination. Everyone is pushing higher density, lower power, better cooling systems/solutions. Or, if you’re in a server room in North-Eastern Canada, you just push the fact that you have glaciers outside your server room and cooling is cheap there.

At least two booths were selling in-rack radiators for cooling at the rack level. Nice if you have the chilled water available to plumb them.

Two other trends I observed this year are the commoditization of the LUSTRE filesystem and the re-popularity of submersion of hardware in a cooling liquid (I saw water, oil, and mineral oil. I’m sure I missed several other options). Probably the most interesting one I saw was the mineral oil one. They just took a standard system, replaced the CPU cooling contact goop with a mineral-oil-resistant option and submerged the box, jacks, connectors, and all. They have to do special work to submerge drives but the rest is pretty much off the rack, as it were.

LUSTRE is technically open source but about 33% of the booths I looked at that were selling a product (not to be confused with the research institution booths which were selling themselves) were selling either specialized hardware or software solutions to improve LUSTRE performance/stability.

While it is heartening to know I’ve been “doing HPC right” for 20 years I was really hoping to see some really new and innovative methods. What was on proud display at this year’s trade show was more of the same, just with more oomph behind it.