Notes from the Austin Sun User Group June meeting: Virtualization and Solaris

Register for the website! http://www.austinsug.org/Members.html

If you want to be part of the leadership board, contact Jeff. Jeff.blanchard at sun.com.

Thanks to Sigma Solutions for providing the food and drink.

Next meeting: August 15, 2007. Solaris 10 Dtrace by Jarod Jenson, chief architect at Aeysis. Jarod is the resource Sun brings into companies who need dtrace help. Very likely a hands on meeting.

This meeting is about the different Virtualization technologies in Solaris and developed on Sun Sparc hardware. The presentation was given by Scott Gaspard at sun.com.

Years ago, in the land of dinosaurs, everyone stuck everything on a few large systems. Later, people used cheap hardware based on x86 systems and replaced the big iron. The problem is, people couldn’t scale the apps beyond the confines of the box, so it ended up with server sprawl where you had dozens of vendors, no real remote management, operating systems all over the place and lots of hardware that was provisioned to run for a certain type of load that may have not fully utilized the resources on that system. Somewhere along the way, someone decided that you could stick multiple operating systems and applications on a single piece of physical hardware and you ended up increasing the utilization of the underlying hardware. This translated to lower cost because you needed fewer resources to run the system (cooling, power, people).

Now, people are deploying more and more virtual systems as a way to deploy a service instead of deploying systems.

There are different virtualization technologies. Some do it through hardware partitioning, others do it with virtual machines (such as xen and vmware), and even others do it with OS virtualization (Solaris 10 zones/containers, FreeBSD jails).

Benefits of virtualization and partitioning. Each of these virtualization technologies operates at a different layer in the stack. For example:

Application virtualzation

Containers (creates the illusion for applications that they’re in their own dedicated OS)

Virtual Machines ( creates the illusion that each OS runs on its own dedicated hardware) Logical domains

Hardware partitioning (creates secure, electrically isolated domains within a single server) Hardware domains

As you move up the stack, you trend towards flexibility of physical resources, where as you move down the stack, you trend towards being able to isolate the application in a way that prevents other applications from affecting it.

With logical domains (LDOMs), Sun creates a virtualization layer is created (a hypervisor) in firmware that virtualizes the hardware and creates a control domain that allows you to manage guest domains. The control domain sees the physical hardware. The guest domains do not see the physical hardware. In the control domain, you create a hardware “service” that lets the logical domain see the physical hardware through a virtual interface. With Sun’s sun4v architecture, they’ve been able to treat the logical domains as sun4u and be able to run older operating systems (or even non-Solaris ones) under the control domain. LDoms are supported on the T1000/T2000, Netra T2000, etc (T1 based sparcs).

Within an LDom, each guest domain appears as a real physical machine, all the way down to the “ok” prompt. You can do everything you would do with a real machine.

You can have up to 32 LDoms per server (on a T1 series cpu). Ideally you would not run services in the control domain. That’s there to support the domains, and that’s it. You should be able to do transferring of domains like xen from system to system in the future. It’s not there yet (you have to shut the domain down right now, transfer it, and then bring it back up).

VMWare implements a virtualization system in software. You run it as a software application ontop of hardware and it creates an idealized environment (a set of simple interfaces for cpu, memory, nic,, disk, etc) that it then translates into the underlying hardware. This gives you flexibility because you’ve encapsulated your entire virtual machine into a format that can be run almost anywhere. The downside is that it will only run on platforms where the virtualization layer has been written for that platform. This virtualization comes at a cost because there’s a big enough overhead in the virtualization layer, that you have to pay attention to how many systems you’re provisioning on the hardware. The idealized interface creates a set of standard virtual hardware interfaces into the system that you can deal with in your OS.

VMotion is VMware’s live vm migration tool. The caveats are that you have to use the same cpu type and stepping, you must have shared storage to make your images available to all physical systems that will be transferring images between. Additionally, this allows you to create a high availability environment where vmware transfers your images back and forth when a hardware error is detected.

Xen implements a virtualization layer that is more similar to LDoms than it is to VMware. Xen implements it’s hypervisor in software though. With Xen, though, all your operating systems have to be aware that there’s a hypervisor in place to interface with the hardware. Since the OSes are aware of the hypervisor, you get more performance out of it because you’re not doing the instruction scanning that VMware does.

You can get it for Solaris in the opensolaris community release.

Paravirtualization enables high performance. Can support up to 100 active vm’s on a single modern system. Live migration of domains. Checkpoint/restart and live migration (suspend your OS, move it, and have it continue where it left off).

Para vs Fully virtualization: Full virtualization is like vmware. You’re emulating the entire systemin a virtual machine. In para virtualization, you’re creating a method for the OS to get to the underlying hardware in a virtual manner.

Solaris Zones: virtualized operating system environments, each looking like a Solaris instance. This is like FreeBSD jails. From within the global zone, it all looks like it’s running in the global zone. From within the zone, you cannot see outside of it, so you can’t see what else is going on outside that zone. Since there’s no hardware changes or requirements, you can run thousands of zones on a single global zone instance (up to 140,000 on an F25k). With zones, you’re sharing the global hardware resources and there really isn’t any sort of segregation of hardware like in LDoms or Containers. With Containers, you can assign dedicated hardware to that container. Within a zone, there’s a limited set of privileges. Certain system calls are forbidden inside a zone/container.

http://developers.sun.com/solaris/articles/application_in_zone.html has more information on the zone security and process space information.

Each zone is allocated it’s own root file system (sparse root zone – inherites some filesystems from the globalzone via loopback file systems; whole root zones are writable copies of whole filesystems that are isolated to that zone). In sparse root zones, all zones have to be the same patch level as the global zone. If you need to do zones with different patch levels, you’ll need to use whole root zones. ZFS clones muddy the water a bit because you can clone a file system, which makes it writable (copy on write), and then you can apply patches to that zone without increasing your disk space requirements greatly.

Zone cloning. Duplicating zones for fast prototyping or provisioning.

Zone moving. Move zones to different physical paths.

Zone migration. Migrate zones to different physical systems. Prepacking zones and deploying them from a central repository as needed.

Solaris 10 containers: reduce hardware, reduce software costs, maintain service levels, maintain architectural flexibility

Logical domains: http://www.sun.com/ldoms. The Sun LDom Administration Guide is excellent for getting up and running. The Sun Blueprint has a good description of what LDoms are, but sucks for actual implementation.

Virtualization solutions: http://www.sun.com/datacenter/consolidation/virtualization/
Virtualization Learning Center: http://www.sun.com/solaris/virtualization/

Scott went on to give a demo of LDoms.

There’s only one command you need to know for working with LDoms: ldm.

LDom is at version 1.0. This is beta. It’s not stable for production yet according to Scott. LDoms only work on Sun Sparc hadrware that is CMT enabled (So, T1000/T2000 series)

ldm list-domain — list domains running.
ldm list-domain –p – list it in machine readable format.
ldm list-domain –l primary — long listing of a single domain.

Creating a logical domain
ldm add-domain ldg3
ldm add-vcpu 4 ldg3 (add 4 threads to ldg3)
ldm add-memory 1024m ldg3
ldm add-vnet vnet3 primary-vsw0 ldg3
ldm add-vdsdev /dev/dsk/c0t3d0s2 vol2@primary-vds0
ldm add-vdisk vdisk1 vo23@primary-vds0
ldm set-var auto-boot?=true ldg3
ldm set-var boot-device=vdisk1 ldg3
ldm bind-domain ldg3
ldm start-domain ldg3
ldm list-domain
telnet localhost 5002 to get to the virtual console to get to the ok prompt.
show-disks
nvalias vdisk /virtual-devices@100/channel-devices@200/disk@0
show-nets
nvalias vnet /virtual-devices@100/channel-devices@200/network@0
nvstore
boot vnet – install (and will now install from jumpstart)

With the network interfaces, the primary domain creates a virtual switch. I asked if you could snoop traffic of virtual interfaces from the primary domain but scott wasn’t sure. Worth a try.

Primary-vds0 is the interface to the virtual disk controller. Vdisk1 is the virtual disk attached to the virtual controller.

For the vnet, how do you know what your mac address is? You can type “banner” at the ok prompt. The mac address is likely being driven by the hostid of the primary domain and the hostname of the logical domain

The need to setup the nv stuff from the ok prompt is likely a beta issue. The question was raised that maybe ldm should be doing this and Scott agrees, but he hasn’t figured out if it does do this yet (or if there’s an automated way when you create the domain using ldm).

In the current implementation, you _have_ to boot off of a jumpstart server after bringing up a new logical domain. There’s no way yet to provide pre built setups (like with zones) to a logical domain and just have a cheap and easy way to provision.

You can move CPUs live between logical domains. You cannot yet move memory, network, or disk between logical domains without shutting down the domains.

You can create multiple virtual switches.

You can create a zfs filesystem, assign that as your virtual disk, and then assign that to a logical domain.