trials in LACP |

I need to transfer about 40TB from one zfs box to another box because of a problem of initial setup in box 1 that left it without redundant raid and it now has a failing disk. ZFS points this out quite nicely, but we have to get all that data off before the disk fails completely, even though it’s a backup box. (Apparently you can’t replace a disk in a pure stripe without zfs having conniptions).

So, what to do? zfs send/recv are a marvelous tool for this, especially combined with a pure network stream like ttcp or netcat or the like. First, start up 20 listening processes on the receiver listening on 20 different ports piped into zfs recv. Second, make a snapshot on all of the filesystems so you have a consistent starting point. zfs snapshot -r zpool1@mig1. Third, send all of these filesystems over in parallel using a quick bash script and incrementing the port number.

All of these zfs sends are now running in parallel between 2 4x1G connected x4500s. It’s not going as fast as I would have thought, so I check things out. I check out the receiving end throughput using my little sunaggrbps script (below) and it shows me something odd:

0 384 1 377033000 2 454 3 384

The number in the first column represents the interface index. The number in the second column represents the number of bytes received in 3 seconds. We’re doing very well on e1000g1, but the others only have a small number of packets, partially comprised of LACP control packets and other minor traffic.

Since LACP is fully dynamic, I have the opportunity to play around a little bit. So I shut dowh the switch port corresponding to e1000g1 on the switch side. When I do this, I see some different behavior:

1 0 2 219965912 3 112587165 0 1348 1 0 2 383779635 3 283032155 0 384 1 0 2 358813400 3 301547626 0 448 1 0 2 379984136 3 273314922 0 640 1 0 2 227159881 3 121488640

Much better! I’m not getting almost twice the throughput! There’s still only a smattering of traffic on e1000g0, and none on the shutdown port, but 2x is better than 1x in my book. I’ve talked with the vendor of the switch and unfortunately the LACP hashing algorithms are not adjustable, so there’s not much that I can do. Since this is an L2 link, it is using source and destination based hashing. I’m not sure what exactly this behavior is, but you’d think that it would even out a little bit more.

Moral: check your traffic on LACP links to make sure you really are getting your best balance and utilization. This is usually not a big deal as you have more than 2 hosts, but sometimes it’s worth checking.

sunaggrbps

#!/usr/bin/perl 

BEGIN { my $pass=0; } 
while (my $d = `kstat -m e1000g -n mac -s rbytes64`) { 
    @strs = split(/s+/, $d); $d =~ m/instance: ([0-9]).*rbytes64s+([0-9]+)/; 
    if ($pass++ > 0) { 
        foreach my $i (3, 13, 23, 33) {
            print "$strs[$i] " . ($strs[$i+6] - $ostr[$i+6])  . "n"; 
        }
    }; 
    @ostr = @strs;
    sleep 3;
}

Aleksey_Tsalolikhin	spp
Kacoroski, Ski	Damon, Lee
Boris, John	tep
borwick	caseybea
jessetrucks	LOPSA Blogs Admin
McCullough, Mark	hcoyote
stpierre	trey
Matt_Simmons	fatherlinux
nhruby	lois
Constantine, Craig	wnl
jdetke	drich
mhalligan	doug
jlothian	jennine
Philip Kizer	jeremyc
lufthans	English, Paul
moose	warner
Evan_Pettrey	villyard
asachs228	nicolefv
wbilancio	nickanderson
unicityd	mharlow
allberyb	d_white
apthorpe	mdisney
mjulian	solarce
nnmiller	TheDreamer
morgajel	jgsmith