Since we want to backup and move users' home file systems regularly between Thumpers we want to have as much speed as possible for this set-up. We noticed that the standard approach of
zfs send ... | ssh host zfs receive
only gave us 50 MByte/s. Since netperf easily gave us more then 7 Gbit/s we are worried.
For testing/debugging we use
mbuffer which gives us the possibility to study on which end of the connection the performance drop is.
There are two sections here. The first one investigates which methods should be used to perform the transfer over the network while the second section looks into the issue of which partitioning layout to use:
Networked Transfer
The first test just moves data from
/dev/zero
to
/dev/null
:
cat /dev/zero | mbuffer -m 2048M > /dev/null
summary: 170 GByte in 4 min 59.3 sec - average of 583 MB/s
Thus, /dev/zero is producing 0's fast enough.
cat /dev/zero | mbuffer -m 2048M | ssh localhost 'cat > /dev/null'
summary: 678 MByte in 20.9 sec - average of 32.5 MB/s
cat /dev/zero | mbuffer -m 2048M | ssh -c blowfish localhost 'cat > /dev/null'
summary: 9.9 GByte in 3 min 13.6 sec - average of 52.5 MB/s
Even locally this is far from being ideal.
Now the swiss army knife netcat
Receiver part:
netcat -l -p 7000 | mbuffer -m 2048M > /dev/null
summary: 7410 MByte in 1 min 28.7 sec - average of 83.5 MB/s
(note this includes 14 seconds initial delay)
Sender:
cat /dev/zero | mbuffer -m 2048M | netcat localhost 7000
summary: 7410 MByte in 1 min 14.9 sec - average of 98.9 MB/s
The problem here is that the buffer on the sending side is full:
in @ 103 MB/s, out @ 103 MB/s, 912 MB total, buffer 100% full
, i.e. netcat cannot send data fast enough.
The same speed is also found when transferring over the network cards.
Another candidate: socat
=socat= seems to be yet another tool which promises good performance, and indeed:
Receiver end:
LD_LIBRARY_PATH=/usr/sfw/lib ./socat TCP4-LISTEN:5678 - | /root/mbuffer-20080507/mbuffer -m 2048M > /dev/null
summary: 1e+04 Byte in 49.0 sec - average of 204 MB/s
(again modulo waiting time at the start)
Sender part:
dd if=/dev/zero bs=1024k count=10000 | /root/mbuffer-20080507/mbuffer -m 2048M | LD_LIBRARY_PATH=/usr/sfw/lib ./socat - TCP4:localhost:5678
in @ 233 MB/s, out @ 233 MB/s, 7882 MB total, buffer 100% full
summary: 1e+04 Byte in 43.3 sec - average of 231 MB/s
This means that socat is much faster than netcat (also offers more flexibility) but still not fast enough (locally).
Via the network the results are much better:
summary: 1e+04 Byte in 32.1 sec - average of 311 MB/s
Essentially the send (m)buffer never filled more than a few percent, i.e. this one was
really up to the task.
Finally: ZFS transfer test
A real test showed that zfs send/receive was able to get about 160 MB/s out of the disk system. This should be pretty close to the maximum, since zfs send to
/dev/null
is about the same speed (single test showed about 180 MB/s).
Partition layout test
For these tests 46 disk drives have been used for a large zpool in different layouts (see layout column), the speeds mentioned in this table are speed results averaged over 5 tried where the file system has been unmounted/mounted to invalidate the buffer cache. Also the test size was at 27 GByte which is beyond the 16 GB of main memory. Please also note that the partition sizes are 1000 based while df -h reports 1024 based numbers. Finally, the disk layout was always using the maximum "spread" across controllers, i.e.
zpool create -f testpool raidz c0t0d0 c1t0d0 c4t0d0 c6t0d0 c7t0d0 c0t1d0 c1t1d0 c4t1d0 raidz c5t1d0 c6t1d0 c7t1d0 c0t2d0 c1t2d0 c4t2d0 c5t2d0 c6t2d0 raidz c7t2d0 c0t3d0 c1t3d0 c4t3d0 c5t3d0 c6t3d0 c7t
3d0 c0t4d0 raidz c1t4d0 c4t4d0 c6t4d0 c7t4d0 c0t5d0 c1t5d0 c4t5d0 c5t5d0 raidz c6t5d0 c7t5d0 c0t6d0 c1t6d0 c4t6d0 c5t6d0 c6t6d0 raidz c7t6d0 c0t7d0 c1t7d0 c4t7d0 c5t7d0 c6t7d0 c7t7d0
In short: For each measurement a
zfs send fs@snap | mbuffer > /dev/null
is performed.
zpool type |
layout |
mirror |
raidz |
raidz2 |
size [TB] |
speed [MB/s] |
size [TB] |
speed [MB/s] |
size [TB] |
speed [MB/s] |
0.5 |
43.2 |
22.5 |
95.2 |
22 |
97.6 |
46 |
1 |
45.0 |
22 |
130.0 |
21 |
131.6 |
23 23 |
1.5 |
47.8 |
21.5 |
154.4 |
20 |
151.6 |
16 15 15 |
2 |
47.8 |
21 |
166.0 |
19 |
162.4 |
12 12 11 11 |
2.5 |
48.0 |
20.5 |
168.6 |
18 |
169.0 |
10 9 9 9 9 |
3 |
48.4 |
20 |
174.0 |
17 |
165.8 |
8 8 8 8 7 7 |
3.5 |
48.8 |
19.5 |
167.4 |
16 |
159.0 |
7 7 7 7 6 6 6 |
4 |
48.8 |
19 |
168.6 |
15 |
147.6 |
6 6 6 6 6 6 5 5 |
4.5 |
49.0 |
18.5 |
157.8 |
14 |
140.4 |
6 5 5 5 5 5 5 5 5 |
5 |
49.8 |
18 |
157.2 |
13 |
126.0 |
5 5 5 5 5 5 4 4 4 4 |
5.5 |
50.0 |
17.5 |
152.0 |
12 |
111.2 |
5 5 4 4 4 4 4 4 4 4 4 |
6 |
50.0 |
17 |
139.8 |
0 |
--- |
4 4 4 4 4 4 4 4 4 4 3 3 |
6.5 |
51.0 |
16.5 |
121.4 |
0 |
--- |
4 4 4 4 4 4 4 3 3 3 3 3 3 |
7 |
51.0 |
16.0 |
108.6 |
0 |
--- |
4 4 4 4 3 3 3 3 3 3 3 3 3 3 |
7.5 |
51.8 |
15.5 |
96.2 |
0 |
--- |
4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 |
8 |
52.4 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 |
8.5 |
53.2 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 |
9 |
54.8 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 |
9.5 |
56.0 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 |
11.5 |
58.2 |
0 |
--- |
0 |
--- |
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 |
Bulk writes of /dev/zero to ZFS
For this test we create file systems as before, but use
dd if=/dev/zero count=20000 bs=1024k | mbuffer -q -m 2048M > /testpool/benchfile
to write 20 GByte of zeroes onto the filesystem (no compression enabled). The null test
dd if=/dev/zero count=20000 bs=1024k | ./mbuffer - m 2048M > /dev/null
shows about 500 MB/s, thus below this level the results should be alright. Our findings:
zpool type |
layout |
mirror |
raidz |
raidz2 |
size [TB] |
speed [MB/s] |
size [TB] |
speed [MB/s] |
size [TB] |
speed [MB/s] |
0.5 |
63.2 |
22.5 |
251.2 |
22 |
236.8 |
46 |
1 |
131.8 |
22 |
280.8 |
21 |
269.6 |
23 23 |
1.5 |
181.8 |
21.5 |
306.8 |
20 |
288.8 |
16 15 15 |
2 |
194.4 |
21 |
305.8 |
19 |
298.8 |
12 12 11 11 |
2.5 |
227.4 |
20.5 |
314.0 |
18 |
304.6 |
10 9 9 9 9 |
3 |
238.6 |
20 |
320.2 |
17 |
304.4 |
8 8 8 8 7 7 |
3.5 |
251.2 |
19.5 |
327.4 |
16 |
314.6 |
7 7 7 7 6 6 6 |
4 |
267.4 |
19 |
329.4 |
15 |
314.2 |
6 6 6 6 6 6 5 5 |
4.5 |
280.6 |
18.5 |
332.2 |
14 |
322.8 |
6 5 5 5 5 5 5 5 5 |
5 |
287.4 |
18 |
337.4 |
13 |
319.2 |
5 5 5 5 5 5 4 4 4 4 |
5.5 |
292.8 |
17.5 |
338.0 |
12 |
316.8 |
5 5 4 4 4 4 4 4 4 4 4 |
6 |
304.0 |
17 |
335.8 |
0 |
--- |
4 4 4 4 4 4 4 4 4 4 3 3 |
6.5 |
314.4 |
16.5 |
332.8 |
0 |
--- |
4 4 4 4 4 4 4 3 3 3 3 3 3 |
7 |
313.8 |
16.0 |
335.6 |
0 |
--- |
4 4 4 4 3 3 3 3 3 3 3 3 3 3 |
7.5 |
317.4 |
15.5 |
330.4 |
0 |
--- |
4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 |
8 |
321.4 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 |
8.5 |
326.0 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 |
9 |
324.8 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 |
9.5 |
330.4 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 |
10 |
333.4 |
0 |
--- |
0 |
--- |
3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 |
10.5 |
339.4 |
0 |
--- |
0 |
--- |
3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 |
11 |
340.6 |
0 |
--- |
0 |
--- |
3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 |
11.5 |
342.4 |
0 |
--- |
0 |
--- |
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 |
--
CarstenAulbert - 11 Aug 2008