The following was written by my student, Timothy Trindle. I asked him to prove or disprove a few conjectures and assertions about how NFS + default OnTap filesystem configurations suffered when massive numbers of very small files are involved. Tim devised the tests and executed them with minimal need for guidence from me. The following are his results.
A while ago we noticed that backups from our NetApp filers were running so slowly that bacula, our backup system, would take several weeks to complete them. There were two notable differences between these and our other backups. First of all we were reading through an NFS mount since NetApps cannot run the bacula file daemon. Our bacula director therefore is configured to read the files using the local file daemon and an automounted directory. We therefore suspected there may have been something wrong with the NFS setup and decided to run some tests to help diagnose the problem. Secondly the data on our NetApps tended to be spread across many very small files. We therefore also wanted to investigate the effect of file size on read times over the NFS mounts.
To test these variables I created four directories on the machine which runs our backups: test_large, test_med, test_small, and test_vsmall in the transitory volume of one of the NetApp machines. Each of these directories contained 4 gigabytes of data split between different numbers of files. Test_large contained 1 4 gb file, test_med contained 100 40 mb files, test_small had 10,000 400 kb files, and test_vsmall had 1,000,000 4 kb files. The files in test_vsmall had to be split between 2 sub-directories. The files in these directories were generated by running dd taking input from /dev/urandom. The data was randomized in order to avoid potential compression over NFS and to emulate normal user data.
After creating the directories I created a script which recorded (using /usr/bin/time) how long it took to dd all of the files in each directory over a given NFS mount, sending output to /dev/null. This operation was repeated for 7 different NFS mounting configurations: the dedicated backup channel (/n/chiken-b), the front channel which everything else uses (/g/tial), a manual mount with default parameters, a manual mount with the rsize parameter set to 65536 (twice the default), a manual mount with the rsize parameter set to 16384 (half the default), a manual mount with timeo set to 300 (30 seconds), and a manual mount with the noatime parameter.
For reference our backup machine, which I ran the tests on, has a quad core Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz and 24 gb of RAM.
This first graph shows the total (system + user) time in seconds taken to read each directory over each NFS mount. The time it takes to read 1,000,000 4kb files absolutely dwarfs all other file sizes. This confirmed our suspicions that the large number of small files was slowing down our backups. The read time across the manual mounts is very consistent, just above 1200 seconds for all five. This indicates that fiddling with parameters on the NFS mount has little effect on the read time for large numbers of small files.
This second graph ignores test_vsmall so that we can actually see the data for the other three directories. Again it seems that many small files significantly increase read time, since test_small took consistently at least twice the time as test_large and test_med. Likewise the read times are relatively stable across the different mount setups, although setting rsize to 65536 sped up test_large and test_med by a couple of seconds.
As this table shows the load average as measured before each test. For our quad core machine load averages of 2-3 are well below critical levels and therefore should not have affected the time taken.
As the above charts show, it seems that none of the various parameter changes affect read speed to the level required to fix our situation. Additionally this data shows that the problem is not specific to our dedicated backup mount but rather with NFS in general. This unfortunately means that there is nothing we can do to immediately speed up our backups. However, this data does provide us with concrete evidence that large quantities of very small files (which we do have on our systems) lead to severely elongated read times. The problem may therefore be alleviated by having users either clean up or zip their small unused files.