[sheepdog] [sbd] I/O stuck shortly after starting writes
Miles Fidelman
mfidelman at meetinghouse.net
Mon Jul 21 13:58:26 CEST 2014
Liu Yuan wrote:
> On Sun, Jul 20, 2014 at 01:08:12AM -0400, Miles Fidelman wrote:
>> A little late to the party here, but just saw this... question at end..
>>
>>
>> /On Mon Jun 2 05:52:30 CEST 2014/
>> *Liu Yuan* namei.unix at gmail.com <mailto:sheepdog%40lists.wpkg.org?Subject=Re%3A%20%5Bsheepdog%5D%20%5Bsbd%5D%20I/O%20stuck%20shortly%20after%20starting%20writes&In-Reply-To=%3C20140602035230.GB31935%40ubuntu-precise%3E>wrote
>>
>>> On Sun, Jun 01, 2014 at 09:56:00PM +0200, Marcin Mirosław wrote:
>>>> / Hi!
>>> />/ I'm launching three sheeps locally, creating vdi with EC 2:1. Next I'm
>>> />/ starting sbd0 block device. mkfs.xfs /dev/sbd0 && mount ...
>>> />/ Next step is starting simple dd command: dd if=/dev/zero
>>> />/ of=/mnt/test/zero bs=4M count=2000
>>> />/ After short moment I've got man sheep stuck in D state:
>>> />/ sheepdog 4126 1.2 9.8 2199564 151052 ? Sl 21:41 0:06
>>> />/ /usr/sbin/sheep -n --port 7000 -z 0 /mnt/sdb1 --pidfile
>>> />/ /run/sheepdog/sheepdog.sdb1
>>> />/ sheepdog 4127 0.0 0.0 34468 396 ? Ds 21:41 0:00
>>> />/ /usr/sbin/sheep -n --port 7000 -z 0 /mnt/sdb1 --pidfile
>>> />/ /run/sheepdog/sheepdog.sdb1
>>> />/ sheepdog 4179 0.2 6.7 1855792 103780 ? Sl 21:41 0:01
>>> />/ /usr/sbin/sheep -n --port 7001 -z 1 /mnt/sdc1 --pidfile
>>> />/ /run/sheepdog/sheepdog.sdc1
>>> />/ sheepdog 4180 0.0 0.0 34468 396 ? Ss 21:41 0:00
>>> />/ /usr/sbin/sheep -n --port 7001 -z 1 /mnt/sdc1 --pidfile
>>> />/ /run/sheepdog/sheepdog.sdc1
>>> />/ sheepdog 4231 0.3 7.3 1863228 111700 ? Sl 21:41 0:01
>>> />/ /usr/sbin/sheep -n --port 7002 -z 2 /mnt/sdd1 --pidfile
>>> />/ /run/sheepdog/sheepdog.sdd1
>>> />/ sheepdog 4232 0.0 0.0 34468 400 ? Ss 21:41 0:00
>>> />/ /usr/sbin/sheep -n --port 7002 -z 2 /mnt/sdd1 --pidfile
>>> />/ /run/sheepdog/sheepdog.sdd1
>>> />/ />/ Also dd stucks:
>>> />/ root 4326 0.2 0.3 14764 4664 pts/1 D+ 21:44 0:01 dd
>>> />/ if=/dev/zero of=/mnt/test/zero bs=4M count=2000
>>> />/ />/ There is in dmesg:
>>> />/ />/ [ 6386.240000] INFO: rcu_sched self-detected stall on
>>> CPU { 0} (t=6001
>>> />/ jiffies g=139833 c=139832 q=86946)
>>> />/ [ 6386.240000] sending NMI to all CPUs:
>>> />/ [ 6386.240000] NMI backtrace for cpu 0
>>> />/ [ 6386.240000] CPU: 0 PID: 4286 Comm: sbd_submiter Tainted: P
>>> />/ O 3.12.20-gentoo #1
>>> />/ [ 6386.240000] Hardware name: Gigabyte Technology Co., Ltd.
>>> />/ 965P-S3/965P-S3, BIOS F14A 07/31/2008
>>> />/ [ 6386.240000] task: ffff88001cff3960 ti: ffff88002f78a000 task.ti:
>>> />/ ffff88002f78a000
>>> />/ [ 6386.240000] RIP: 0010:[<ffffffff811d4542>] [<ffffffff811d4542>]
>>> />/ __const_udelay+0x12/0x30
>>> />/ [ 6386.240000] RSP: 0000:ffff88005f403dc8 EFLAGS: 00000006
>>> />/ [ 6386.240000] RAX: 0000000001062560 RBX: 0000000000002710 RCX:
>>> />/ 0000000000000006
>>> />/ [ 6386.240000] RDX: 0000000001140694 RSI: 0000000000000002 RDI:
>>> />/ 0000000000418958
>>> />/ [ 6386.240000] RBP: ffff88005f403de8 R08: 000000000000000a R09:
>>> />/ 00000000000002bc
>>> />/ [ 6386.240000] R10: 0000000000000000 R11: 00000000000002bb R12:
>>> />/ ffffffff8149eec0
>>> />/ [ 6386.240000] R13: ffffffff8149eec0 R14: ffff88005f40d700 R15:
>>> />/ 00000000000153a2
>>> />/ [ 6386.240000] FS: 0000000000000000(0000) GS:ffff88005f400000(0000)
>>> />/ knlGS:0000000000000000
>>> />/ [ 6386.240000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> />/ [ 6386.240000] CR2: 00007f3bf6907000 CR3: 0000000001488000 CR4:
>>> />/ 00000000000007f0
>>> />/ [ 6386.240000] Stack:
>>> />/ [ 6386.240000] ffff88005f403de8 ffffffff8102d00a 0000000000000000
>>> />/ ffffffff814c43b8
>>> />/ [ 6386.240000] ffff88005f403e58 ffffffff810967ac ffff88001bcb1800
>>> />/ 0000000000000001
>>> />/ [ 6386.240000] ffff88005f403e18 ffffffff81098407 ffff88002f78a000
>>> />/ 0000000000000000
>>> />/ [ 6386.240000] Call Trace:
>>> />/ [ 6386.240000] <IRQ>
>>> />/ />/ [ 6386.240000] [<ffffffff8102d00a>] ?
>>> />/ arch_trigger_all_cpu_backtrace+0x5a/0x80
>>> />/ [ 6386.240000] [<ffffffff810967ac>] rcu_check_callbacks+0x2fc/0x570
>>> />/ [ 6386.240000] [<ffffffff81098407>] ? acct_account_cputime+0x17/0x20
>>> />/ [ 6386.240000] [<ffffffff810494d3>] update_process_times+0x43/0x80
>>> />/ [ 6386.240000] [<ffffffff81082621>] tick_sched_handle.isra.12+0x31/0x40
>>> />/ [ 6386.240000] [<ffffffff81082764>] tick_sched_timer+0x44/0x70
>>> />/ [ 6386.240000] [<ffffffff8105dc4a>] __run_hrtimer.isra.29+0x4a/0xd0
>>> />/ [ 6386.240000] [<ffffffff8105e415>] hrtimer_interrupt+0xf5/0x230
>>> />/ [ 6386.240000] [<ffffffff8102b7f6>] local_apic_timer_interrupt+0x36/0x60
>>> />/ [ 6386.240000] [<ffffffff8102bc0e>] smp_apic_timer_interrupt+0x3e/0x60
>>> />/ [ 6386.240000] [<ffffffff8136aaca>] apic_timer_interrupt+0x6a/0x70
>>> />/ [ 6386.240000] <EOI>
>>> />/ />/ [ 6386.240000] [<ffffffff811d577d>] ?
>>> __write_lock_failed+0xd/0x20
>>> />/ [ 6386.240000] [<ffffffff813690f2>] _raw_write_lock+0x12/0x20
>>> />/ [ 6386.240000] [<ffffffffa030579b>] sheep_aiocb_submit+0x2db/0x360 [sbd]
>>> />/ [ 6386.240000] [<ffffffffa030544e>] ? sheep_aiocb_setup+0x13e/0x1b0 [sbd]
>>> />/ [ 6386.240000] [<ffffffffa0304740>] 0xffffffffa030473f
>>> />/ [ 6386.240000] [<ffffffff8105b510>] ? finish_wait+0x80/0x80
>>> />/ [ 6386.240000] [<ffffffffa03046c0>] ? 0xffffffffa03046bf
>>> />/ [ 6386.240000] [<ffffffff8105af4b>] kthread+0xbb/0xc0
>>> />/ [ 6386.240000] [<ffffffff8105ae90>] ? kthread_create_on_node+0x120/0x120
>>> />/ [ 6386.240000] [<ffffffff81369d7c>] ret_from_fork+0x7c/0xb0
>>> />/ [ 6386.240000] [<ffffffff8105ae90>] ? kthread_create_on_node+0x120/0x120
>>> />/ [ 6386.240000] Code: c8 5d c3 66 0f 1f 44 00 00 55 48 89 e5 ff 15 ae 2d
>>> />/ 2d 00 5d c3 0f 1f 40 00 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 20 0d
>>> />/ 01 00 <48> 8d 14 92 48 89 e5 48 8d 14 92 f7 e2 48 8d 7a 01 ff 15 7f 2d
>>> />/ [ 6386.240208] NMI backtrace for cpu 1
>>> />/ [ 6386.240213] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P O
>>> />/ 3.12.20-gentoo #1
>>> />/ [ 6386.240215] Hardware name: Gigabyte Technology Co., Ltd.
>>> />/ 965P-S3/965P-S3, BIOS F14A 07/31/2008
>>> />/ [ 6386.240218] task: ffff88005d07b960 ti: ffff88005d09a000 task.ti:
>>> />/ ffff88005d09a000
>>> />/ [ 6386.240220] RIP: 0010:[<ffffffff8100b536>] [<ffffffff8100b536>]
>>> />/ default_idle+0x6/0x10
>>> />/ [ 6386.240227] RSP: 0018:ffff88005d09bea8 EFLAGS: 00000286
>>> />/ [ 6386.240229] RAX: 00000000ffffffed RBX: ffff88005d09bfd8 RCX:
>>> />/ 0100000000000000
>>> />/ [ 6386.240231] RDX: 0100000000000000 RSI: 0000000000000000 RDI:
>>> />/ 0000000000000001
>>> />/ [ 6386.240234] RBP: ffff88005d09bea8 R08: 0000000000000000 R09:
>>> />/ 0000000000000000
>>> />/ [ 6386.240236] R10: ffff88005f480000 R11: 0000000000000e1e R12:
>>> />/ ffffffff814c43b0
>>> />/ [ 6386.240238] R13: ffff88005d09bfd8 R14: ffff88005d09bfd8 R15:
>>> />/ ffff88005d09bfd8
>>> />/ [ 6386.240241] FS: 0000000000000000(0000) GS:ffff88005f480000(0000)
>>> />/ knlGS:0000000000000000
>>> />/ [ 6386.240243] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> />/ [ 6386.240246] CR2: 00007f31fe05d000 CR3: 000000002a0f5000 CR4:
>>> />/ 00000000000007e0
>>> />/ [ 6386.240247] Stack:
>>> />/ [ 6386.240249] ffff88005d09beb8 ffffffff8100bc56 ffff88005d09bf18
>>> />/ ffffffff81073c7a
>>> />/ [ 6386.240253] 0000000000000000 ffff88005d09bfd8 ffff88005d09bef8
>>> />/ 15e35c4b103d3891
>>> />/ [ 6386.240256] 0000000000000001 0000000000000001 0000000000000001
>>> />/ 0000000000000000
>>> />/ [ 6386.240260] Call Trace:
>>> />/ [ 6386.240264] [<ffffffff8100bc56>] arch_cpu_idle+0x16/0x20
>>> />/ [ 6386.240269] [<ffffffff81073c7a>] cpu_startup_entry+0xda/0x1c0
>>> />/ [ 6386.240273] [<ffffffff8102a1f1>] start_secondary+0x1e1/0x240
>>> />/ [ 6386.240275] Code: 21 ff ff ff 90 48 b8 00 00 00 00 01 00 00 00 48 89
>>> />/ 07 e9 0e ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 48 89 e5
>>> />/ fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 fe 48 c7 c7 40 e7 58 81
>>> /
>>> Actually, this is a notorious memory dead-lock and can't be solved if you run
>>> sbd and sheep on the same nodes. It is a problem similar to NFS that you have
>>> NFS server and client in the same machine.
>>>
>>> Deadlock:
>>> Suppose SBD has dirty pages to writeout to backend of sheep. But sheep itself
>>> need some clean pages to hold this dirty data before do actually disk IOs and
>>> wait kernel to clean some dirty pages, unfortunately those dirty pages need
>>> sheep's help to writeout.
>>>
>>> Solution:
>>> never run client SBD and sheep daemon on the same node.
>>>
>>> Thanks
>>> Yuan
>> Here I was, all excited to start experimenting with SBDs, to see if
>> they could work as block devices for Xen, and I come across this
>> caveat: "never run client SBD and sheep daemon on the same node."
>>
>> This would seem to negate the whole purpose of exposing SBDs - i.e.,
>> making Sheepdog usable with things other than KVM/Qemu. Unless I'm
>> radically mistaken, it's standard to mount a Sheepdog volume on the
>> same node as a daemon - that's the basic architecture. Seems like
>> something is very broken if one can't use SBD's the same way.
>>
>> Does this caveat also apply to mounting iSCSI or NBD volumes?
>>
>> Miles Fidelman
> Yes, there is identical deadlock problem for iscsi, nbd, sheepdog, nfs. It is
> just a blind solution, actually you can run SBD and storage node at the same
> node, with careful tunning of dirty pages, probably you can minimize the
> deadlock problem if you even can't get rid of it. see /proc/sys/vm.
Ahh.. did a little research - a problem common with pretty much any
distributed file system when running a client and server on the same node.
Ouch.
Thanks for the info.
Miles Fidelman
--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra
More information about the sheepdog
mailing list