[sheepdog-users] Unexpeted freeze of sheep on one node
Valerio Pachera
sirio81 at gmail.com
Thu Nov 20 11:44:11 CET 2014
2014-11-20 7:30 GMT+01:00 Maxim Terletskiy <terletskiy at emu.ru>:
> I've had similar problems.
> In such cases good test will be iostat:
> iostat -dx 5 /dev/sd[a-z]
This is a very good suggestion.
My host has 2 devices: a single disk of 2T and a raid 5 (3 disk of
500G) managed by mdadm.
I run a write test on the raid device and check iostat
dd if=/dev/zero of=deleteme2 bs=4M count=$((2048/4)) oflag=direct
60,4 MB/s
(not bad)
iostat -dx 5 /dev/sd[a-c]
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 8,03 19,80 0,67 0,61 38,40 80,82
187,30 0,02 12,47 11,41 13,63 4,63 0,59
sdc 8,00 19,86 0,65 0,73 38,67 81,56
174,01 0,01 9,94 9,39 10,43 4,27 0,59
sda 7,89 19,77 0,70 0,75 38,46 81,26
164,85 0,01 5,80 5,93 5,67 3,04 0,44
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 2483,80 7573,40 51,20 95,00 10038,40 30672,80
556,92 1,58 10,80 20,92 5,35 4,52 66,08
sdc 2443,80 7578,80 42,60 105,40 9943,20 30736,00
549,72 1,20 8,14 18,69 3,88 3,72 55,04
sda 2461,60 7527,20 47,60 102,60 10137,60 30518,40
541,36 1,15 7,66 17,39 3,15 3,57 53,60
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 2382,60 7384,20 50,00 95,60 9832,00 29744,00
543,63 1,94 13,35 29,10 5,11 4,57 66,48
sdc 2413,20 7416,40 46,60 106,20 9940,80 30063,20
523,61 1,26 8,28 18,03 4,00 3,58 54,64
sda 2419,60 7342,80 49,60 106,20 9876,80 29768,80
508,93 1,16 7,42 16,65 3,12 3,44 53,52
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 2607,80 7957,00 54,60 107,60 10548,80 32383,20
529,37 1,76 10,86 23,75 4,31 3,87 62,80
sdc 2586,00 8013,60 52,80 111,80 10555,20 32506,40
523,23 1,35 8,18 17,32 3,86 3,49 57,44
sda 2598,00 8008,40 53,60 115,40 10556,00 32475,20
509,24 1,27 7,47 16,46 3,29 3,34 56,48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 2428,00 7481,00 57,20 108,40 10041,60 30409,60
488,54 1,73 10,44 22,08 4,30 3,91 64,80
sdc 2456,60 7494,60 52,20 118,20 9936,00 30476,80
474,33 1,34 7,80 17,93 3,32 3,26 55,60
sda 2425,20 7460,20 58,20 106,80 9883,20 30318,40
487,29 1,56 9,42 19,15 4,12 3,49 57,52
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 2407,40 7340,20 52,40 99,40 9737,60 29757,60
520,36 1,95 12,81 27,30 5,17 4,43 67,28
sdc 2357,00 7323,80 50,20 103,80 9728,00 29709,60
512,18 1,39 9,14 18,76 4,49 3,60 55,44
sda 2409,00 7328,00 48,40 110,40 9828,80 29752,80
498,51 1,11 7,00 16,78 2,71 3,20 50,88
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 2464,00 7550,00 47,00 91,00 10044,80 30460,80
587,04 1,48 10,50 20,66 5,26 4,62 63,76
sdc 2485,40 7532,00 48,80 91,80 10136,80 30493,60
577,96 1,42 10,09 19,20 5,25 4,31 60,56
sda 2470,20 7542,00 39,40 102,20 10140,00 30575,20
575,07 1,02 7,25 17,14 3,44 3,68 52,16
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 1483,20 4507,00 29,80 54,60 6152,80 18347,30
580,57 1,56 18,88 36,59 9,22 6,94 58,56
sdc 1433,40 4542,20 26,20 61,00 5838,40 18412,10
556,20 0,77 8,86 18,96 4,52 3,90 34,00
sda 1417,40 4441,20 28,60 59,80 5784,00 18002,50
538,16 0,69 7,76 16,92 3,37 3,58 31,68
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 1,00 2,80 1,00 2,00 8,00 15,60
15,73 0,06 21,60 18,40 23,20 8,53 2,56
sdc 2,80 1,00 0,60 2,40 13,60 10,00
15,73 0,06 19,73 24,00 18,67 10,13 3,04
sda 0,00 3,80 0,40 2,60 1,60 22,00
15,73 0,01 4,00 0,00 4,62 4,00 1,20
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 0,00 0,00 0,00
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 0,00 0,00 0,00
SDB is clearly slower than the other two devices and it's the only one
with a smart issue:
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
Always - 1
So it might be a good idea to change it.
But this isn't enough to cause sheep to hang up I think.
Also consider that it's much faster if I don't use oflag=direct
sync; dd if=/dev/zero of=deleteme3 bs=4M count=$((2048/4))
266 MB/s
I'm also testing the 2T disk (sdd)
fsck.ext4 -c /dev/sdd1 (with '-c' option a badblocks is run along with fsck).
You can see the disk is fully busy but the 'await' is very low, so i
consider the device to be healthy.
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdd 0,00 0,00 480,00 0,00 122880,00 0,00
512,00 0,96 2,02 2,02 0,00 2,01 96,48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdd 0,00 0,00 482,80 0,00 123596,80 0,00
512,00 0,96 2,00 2,00 0,00 2,00 96,40
I need to test the net as last chance.
I'll report a.s.a.p.
Thank you.
PS: running atop and looking at 'avio' (Avarage Input Output) may give
an idea of a slow responding disk.
More information about the sheepdog-users
mailing list