[sheepdog] [sbd] I/O stuck shortly after starting writes

Mon Jul 21 13:58:26 CEST 2014

Liu Yuan wrote:
> On Sun, Jul 20, 2014 at 01:08:12AM -0400, Miles Fidelman wrote:
>> A little late to the party here, but just saw this... question at end..
>>
>>
>> /On Mon Jun 2 05:52:30 CEST 2014/
>> *Liu Yuan* namei.unix at gmail.com <mailto:sheepdog%40lists.wpkg.org?Subject=Re%3A%20%5Bsheepdog%5D%20%5Bsbd%5D%20I/O%20stuck%20shortly%20after%20starting%20writes&In-Reply-To=%3C20140602035230.GB31935%40ubuntu-precise%3E>wrote
>>
>>> On Sun, Jun 01, 2014 at 09:56:00PM +0200, Marcin Mirosław wrote:
>>>> /  Hi!
>>> />/  I'm launching three sheeps locally, creating vdi with EC 2:1. Next I'm
>>> />/  starting sbd0 block device. mkfs.xfs /dev/sbd0 && mount ...
>>> />/  Next step is starting simple dd command: dd if=/dev/zero
>>> />/  of=/mnt/test/zero bs=4M count=2000
>>> />/  After short moment I've got man sheep stuck in D state:
>>> />/  sheepdog  4126  1.2  9.8 2199564 151052 ?      Sl   21:41   0:06
>>> />/  /usr/sbin/sheep -n --port 7000 -z 0 /mnt/sdb1 --pidfile
>>> />/  /run/sheepdog/sheepdog.sdb1
>>> />/  sheepdog  4127  0.0  0.0  34468   396 ?        Ds   21:41   0:00
>>> />/  /usr/sbin/sheep -n --port 7000 -z 0 /mnt/sdb1 --pidfile
>>> />/  /run/sheepdog/sheepdog.sdb1
>>> />/  sheepdog  4179  0.2  6.7 1855792 103780 ?      Sl   21:41   0:01
>>> />/  /usr/sbin/sheep -n --port 7001 -z 1 /mnt/sdc1 --pidfile
>>> />/  /run/sheepdog/sheepdog.sdc1
>>> />/  sheepdog  4180  0.0  0.0  34468   396 ?        Ss   21:41   0:00
>>> />/  /usr/sbin/sheep -n --port 7001 -z 1 /mnt/sdc1 --pidfile
>>> />/  /run/sheepdog/sheepdog.sdc1
>>> />/  sheepdog  4231  0.3  7.3 1863228 111700 ?      Sl   21:41   0:01
>>> />/  /usr/sbin/sheep -n --port 7002 -z 2 /mnt/sdd1 --pidfile
>>> />/  /run/sheepdog/sheepdog.sdd1
>>> />/  sheepdog  4232  0.0  0.0  34468   400 ?        Ss   21:41   0:00
>>> />/  /usr/sbin/sheep -n --port 7002 -z 2 /mnt/sdd1 --pidfile
>>> />/  /run/sheepdog/sheepdog.sdd1
>>> />/  />/  Also dd stucks:
>>> />/  root      4326  0.2  0.3  14764  4664 pts/1    D+   21:44   0:01 dd
>>> />/  if=/dev/zero of=/mnt/test/zero bs=4M count=2000
>>> />/  />/  There is in dmesg:
>>> />/  />/  [ 6386.240000] INFO: rcu_sched self-detected stall on
>>> CPU { 0}  (t=6001
>>> />/  jiffies g=139833 c=139832 q=86946)
>>> />/  [ 6386.240000] sending NMI to all CPUs:
>>> />/  [ 6386.240000] NMI backtrace for cpu 0
>>> />/  [ 6386.240000] CPU: 0 PID: 4286 Comm: sbd_submiter Tainted: P
>>> />/  O 3.12.20-gentoo #1
>>> />/  [ 6386.240000] Hardware name: Gigabyte Technology Co., Ltd.
>>> />/  965P-S3/965P-S3, BIOS F14A 07/31/2008
>>> />/  [ 6386.240000] task: ffff88001cff3960 ti: ffff88002f78a000 task.ti:
>>> />/  ffff88002f78a000
>>> />/  [ 6386.240000] RIP: 0010:[<ffffffff811d4542>]  [<ffffffff811d4542>]
>>> />/  __const_udelay+0x12/0x30
>>> />/  [ 6386.240000] RSP: 0000:ffff88005f403dc8  EFLAGS: 00000006
>>> />/  [ 6386.240000] RAX: 0000000001062560 RBX: 0000000000002710 RCX:
>>> />/  0000000000000006
>>> />/  [ 6386.240000] RDX: 0000000001140694 RSI: 0000000000000002 RDI:
>>> />/  0000000000418958
>>> />/  [ 6386.240000] RBP: ffff88005f403de8 R08: 000000000000000a R09:
>>> />/  00000000000002bc
>>> />/  [ 6386.240000] R10: 0000000000000000 R11: 00000000000002bb R12:
>>> />/  ffffffff8149eec0
>>> />/  [ 6386.240000] R13: ffffffff8149eec0 R14: ffff88005f40d700 R15:
>>> />/  00000000000153a2
>>> />/  [ 6386.240000] FS:  0000000000000000(0000) GS:ffff88005f400000(0000)
>>> />/  knlGS:0000000000000000
>>> />/  [ 6386.240000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> />/  [ 6386.240000] CR2: 00007f3bf6907000 CR3: 0000000001488000 CR4:
>>> />/  00000000000007f0
>>> />/  [ 6386.240000] Stack:
>>> />/  [ 6386.240000]  ffff88005f403de8 ffffffff8102d00a 0000000000000000
>>> />/  ffffffff814c43b8
>>> />/  [ 6386.240000]  ffff88005f403e58 ffffffff810967ac ffff88001bcb1800
>>> />/  0000000000000001
>>> />/  [ 6386.240000]  ffff88005f403e18 ffffffff81098407 ffff88002f78a000
>>> />/  0000000000000000
>>> />/  [ 6386.240000] Call Trace:
>>> />/  [ 6386.240000]  <IRQ>
>>> />/  />/  [ 6386.240000]  [<ffffffff8102d00a>] ?
>>> />/  arch_trigger_all_cpu_backtrace+0x5a/0x80
>>> />/  [ 6386.240000]  [<ffffffff810967ac>] rcu_check_callbacks+0x2fc/0x570
>>> />/  [ 6386.240000]  [<ffffffff81098407>] ? acct_account_cputime+0x17/0x20
>>> />/  [ 6386.240000]  [<ffffffff810494d3>] update_process_times+0x43/0x80
>>> />/  [ 6386.240000]  [<ffffffff81082621>] tick_sched_handle.isra.12+0x31/0x40
>>> />/  [ 6386.240000]  [<ffffffff81082764>] tick_sched_timer+0x44/0x70
>>> />/  [ 6386.240000]  [<ffffffff8105dc4a>] __run_hrtimer.isra.29+0x4a/0xd0
>>> />/  [ 6386.240000]  [<ffffffff8105e415>] hrtimer_interrupt+0xf5/0x230
>>> />/  [ 6386.240000]  [<ffffffff8102b7f6>] local_apic_timer_interrupt+0x36/0x60
>>> />/  [ 6386.240000]  [<ffffffff8102bc0e>] smp_apic_timer_interrupt+0x3e/0x60
>>> />/  [ 6386.240000]  [<ffffffff8136aaca>] apic_timer_interrupt+0x6a/0x70
>>> />/  [ 6386.240000]  <EOI>
>>> />/  />/  [ 6386.240000]  [<ffffffff811d577d>] ?
>>> __write_lock_failed+0xd/0x20
>>> />/  [ 6386.240000]  [<ffffffff813690f2>] _raw_write_lock+0x12/0x20
>>> />/  [ 6386.240000]  [<ffffffffa030579b>] sheep_aiocb_submit+0x2db/0x360 [sbd]
>>> />/  [ 6386.240000]  [<ffffffffa030544e>] ? sheep_aiocb_setup+0x13e/0x1b0 [sbd]
>>> />/  [ 6386.240000]  [<ffffffffa0304740>] 0xffffffffa030473f
>>> />/  [ 6386.240000]  [<ffffffff8105b510>] ? finish_wait+0x80/0x80
>>> />/  [ 6386.240000]  [<ffffffffa03046c0>] ? 0xffffffffa03046bf
>>> />/  [ 6386.240000]  [<ffffffff8105af4b>] kthread+0xbb/0xc0
>>> />/  [ 6386.240000]  [<ffffffff8105ae90>] ? kthread_create_on_node+0x120/0x120
>>> />/  [ 6386.240000]  [<ffffffff81369d7c>] ret_from_fork+0x7c/0xb0
>>> />/  [ 6386.240000]  [<ffffffff8105ae90>] ? kthread_create_on_node+0x120/0x120
>>> />/  [ 6386.240000] Code: c8 5d c3 66 0f 1f 44 00 00 55 48 89 e5 ff 15 ae 2d
>>> />/  2d 00 5d c3 0f 1f 40 00 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 20 0d
>>> />/  01 00 <48> 8d 14 92 48 89 e5 48 8d 14 92 f7 e2 48 8d 7a 01 ff 15 7f 2d
>>> />/  [ 6386.240208] NMI backtrace for cpu 1
>>> />/  [ 6386.240213] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O
>>> />/  3.12.20-gentoo #1
>>> />/  [ 6386.240215] Hardware name: Gigabyte Technology Co., Ltd.
>>> />/  965P-S3/965P-S3, BIOS F14A 07/31/2008
>>> />/  [ 6386.240218] task: ffff88005d07b960 ti: ffff88005d09a000 task.ti:
>>> />/  ffff88005d09a000
>>> />/  [ 6386.240220] RIP: 0010:[<ffffffff8100b536>]  [<ffffffff8100b536>]
>>> />/  default_idle+0x6/0x10
>>> />/  [ 6386.240227] RSP: 0018:ffff88005d09bea8  EFLAGS: 00000286
>>> />/  [ 6386.240229] RAX: 00000000ffffffed RBX: ffff88005d09bfd8 RCX:
>>> />/  0100000000000000
>>> />/  [ 6386.240231] RDX: 0100000000000000 RSI: 0000000000000000 RDI:
>>> />/  0000000000000001
>>> />/  [ 6386.240234] RBP: ffff88005d09bea8 R08: 0000000000000000 R09:
>>> />/  0000000000000000
>>> />/  [ 6386.240236] R10: ffff88005f480000 R11: 0000000000000e1e R12:
>>> />/  ffffffff814c43b0
>>> />/  [ 6386.240238] R13: ffff88005d09bfd8 R14: ffff88005d09bfd8 R15:
>>> />/  ffff88005d09bfd8
>>> />/  [ 6386.240241] FS:  0000000000000000(0000) GS:ffff88005f480000(0000)
>>> />/  knlGS:0000000000000000
>>> />/  [ 6386.240243] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> />/  [ 6386.240246] CR2: 00007f31fe05d000 CR3: 000000002a0f5000 CR4:
>>> />/  00000000000007e0
>>> />/  [ 6386.240247] Stack:
>>> />/  [ 6386.240249]  ffff88005d09beb8 ffffffff8100bc56 ffff88005d09bf18
>>> />/  ffffffff81073c7a
>>> />/  [ 6386.240253]  0000000000000000 ffff88005d09bfd8 ffff88005d09bef8
>>> />/  15e35c4b103d3891
>>> />/  [ 6386.240256]  0000000000000001 0000000000000001 0000000000000001
>>> />/  0000000000000000
>>> />/  [ 6386.240260] Call Trace:
>>> />/  [ 6386.240264]  [<ffffffff8100bc56>] arch_cpu_idle+0x16/0x20
>>> />/  [ 6386.240269]  [<ffffffff81073c7a>] cpu_startup_entry+0xda/0x1c0
>>> />/  [ 6386.240273]  [<ffffffff8102a1f1>] start_secondary+0x1e1/0x240
>>> />/  [ 6386.240275] Code: 21 ff ff ff 90 48 b8 00 00 00 00 01 00 00 00 48 89
>>> />/  07 e9 0e ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 48 89 e5
>>> />/  fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 fe 48 c7 c7 40 e7 58 81
>>> /
>>> Actually, this is a notorious memory dead-lock and can't be solved if you run
>>> sbd and sheep on the same nodes. It is a problem similar to NFS that you have
>>> NFS server and client in the same machine.
>>>
>>> Deadlock:
>>> Suppose SBD has dirty pages to writeout to backend of sheep. But sheep itself
>>> need some clean pages to hold this dirty data before do actually disk IOs and
>>> wait kernel to clean some dirty pages, unfortunately those dirty pages need
>>> sheep's help to writeout.
>>>
>>> Solution:
>>> never run client SBD and sheep daemon on the same node.
>>>
>>> Thanks
>>> Yuan
>> Here I was, all excited to start experimenting with SBDs, to see if
>> they could work as block devices for Xen, and I come across this
>> caveat: "never run client SBD and sheep daemon on the same node."
>>
>> This would seem to negate the whole purpose of exposing SBDs - i.e.,
>> making Sheepdog usable with things other than KVM/Qemu.  Unless I'm
>> radically mistaken, it's standard to mount a Sheepdog volume on the
>> same node as a daemon - that's the basic architecture.  Seems like
>> something is very broken if one can't use SBD's the same way.
>>
>> Does this caveat also apply to mounting iSCSI or NBD volumes?
>>
>> Miles Fidelman
> Yes, there is identical deadlock problem for iscsi, nbd, sheepdog, nfs. It is
> just a blind solution, actually you can run SBD and storage node at the same
> node, with careful tunning of dirty pages, probably you can minimize the
> deadlock problem if you even can't get rid of it. see /proc/sys/vm.

Ahh.. did a little research - a problem common with pretty much any 
distributed file system when running a client and server on the same node.

Ouch.

Thanks for the info.

Miles Fidelman

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra