[sheepdog] [sbd] I/O stuck shortly after starting writes

Miles Fidelman mfidelman at meetinghouse.net
Sun Jul 20 07:08:12 CEST 2014


A little late to the party here, but just saw this... question at end..


/On Mon Jun 2 05:52:30 CEST 2014/
*Liu Yuan* namei.unix at gmail.com 
<mailto:sheepdog%40lists.wpkg.org?Subject=Re%3A%20%5Bsheepdog%5D%20%5Bsbd%5D%20I/O%20stuck%20shortly%20after%20starting%20writes&In-Reply-To=%3C20140602035230.GB31935%40ubuntu-precise%3E>wrote

> On Sun, Jun 01, 2014 at 09:56:00PM +0200, Marcin Mirosław wrote:
> >/  Hi!
> />/  I'm launching three sheeps locally, creating vdi with EC 2:1. Next I'm
> />/  starting sbd0 block device. mkfs.xfs /dev/sbd0 && mount ...
> />/  Next step is starting simple dd command: dd if=/dev/zero
> />/  of=/mnt/test/zero bs=4M count=2000
> />/  After short moment I've got man sheep stuck in D state:
> />/  sheepdog  4126  1.2  9.8 2199564 151052 ?      Sl   21:41   0:06
> />/  /usr/sbin/sheep -n --port 7000 -z 0 /mnt/sdb1 --pidfile
> />/  /run/sheepdog/sheepdog.sdb1
> />/  sheepdog  4127  0.0  0.0  34468   396 ?        Ds   21:41   0:00
> />/  /usr/sbin/sheep -n --port 7000 -z 0 /mnt/sdb1 --pidfile
> />/  /run/sheepdog/sheepdog.sdb1
> />/  sheepdog  4179  0.2  6.7 1855792 103780 ?      Sl   21:41   0:01
> />/  /usr/sbin/sheep -n --port 7001 -z 1 /mnt/sdc1 --pidfile
> />/  /run/sheepdog/sheepdog.sdc1
> />/  sheepdog  4180  0.0  0.0  34468   396 ?        Ss   21:41   0:00
> />/  /usr/sbin/sheep -n --port 7001 -z 1 /mnt/sdc1 --pidfile
> />/  /run/sheepdog/sheepdog.sdc1
> />/  sheepdog  4231  0.3  7.3 1863228 111700 ?      Sl   21:41   0:01
> />/  /usr/sbin/sheep -n --port 7002 -z 2 /mnt/sdd1 --pidfile
> />/  /run/sheepdog/sheepdog.sdd1
> />/  sheepdog  4232  0.0  0.0  34468   400 ?        Ss   21:41   0:00
> />/  /usr/sbin/sheep -n --port 7002 -z 2 /mnt/sdd1 --pidfile
> />/  /run/sheepdog/sheepdog.sdd1
> />/  
> />/  Also dd stucks:
> />/  root      4326  0.2  0.3  14764  4664 pts/1    D+   21:44   0:01 dd
> />/  if=/dev/zero of=/mnt/test/zero bs=4M count=2000
> />/  
> />/  There is in dmesg:
> />/  
> />/  [ 6386.240000] INFO: rcu_sched self-detected stall on CPU { 0}  (t=6001
> />/  jiffies g=139833 c=139832 q=86946)
> />/  [ 6386.240000] sending NMI to all CPUs:
> />/  [ 6386.240000] NMI backtrace for cpu 0
> />/  [ 6386.240000] CPU: 0 PID: 4286 Comm: sbd_submiter Tainted: P
> />/  O 3.12.20-gentoo #1
> />/  [ 6386.240000] Hardware name: Gigabyte Technology Co., Ltd.
> />/  965P-S3/965P-S3, BIOS F14A 07/31/2008
> />/  [ 6386.240000] task: ffff88001cff3960 ti: ffff88002f78a000 task.ti:
> />/  ffff88002f78a000
> />/  [ 6386.240000] RIP: 0010:[<ffffffff811d4542>]  [<ffffffff811d4542>]
> />/  __const_udelay+0x12/0x30
> />/  [ 6386.240000] RSP: 0000:ffff88005f403dc8  EFLAGS: 00000006
> />/  [ 6386.240000] RAX: 0000000001062560 RBX: 0000000000002710 RCX:
> />/  0000000000000006
> />/  [ 6386.240000] RDX: 0000000001140694 RSI: 0000000000000002 RDI:
> />/  0000000000418958
> />/  [ 6386.240000] RBP: ffff88005f403de8 R08: 000000000000000a R09:
> />/  00000000000002bc
> />/  [ 6386.240000] R10: 0000000000000000 R11: 00000000000002bb R12:
> />/  ffffffff8149eec0
> />/  [ 6386.240000] R13: ffffffff8149eec0 R14: ffff88005f40d700 R15:
> />/  00000000000153a2
> />/  [ 6386.240000] FS:  0000000000000000(0000) GS:ffff88005f400000(0000)
> />/  knlGS:0000000000000000
> />/  [ 6386.240000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> />/  [ 6386.240000] CR2: 00007f3bf6907000 CR3: 0000000001488000 CR4:
> />/  00000000000007f0
> />/  [ 6386.240000] Stack:
> />/  [ 6386.240000]  ffff88005f403de8 ffffffff8102d00a 0000000000000000
> />/  ffffffff814c43b8
> />/  [ 6386.240000]  ffff88005f403e58 ffffffff810967ac ffff88001bcb1800
> />/  0000000000000001
> />/  [ 6386.240000]  ffff88005f403e18 ffffffff81098407 ffff88002f78a000
> />/  0000000000000000
> />/  [ 6386.240000] Call Trace:
> />/  [ 6386.240000]  <IRQ>
> />/  
> />/  [ 6386.240000]  [<ffffffff8102d00a>] ?
> />/  arch_trigger_all_cpu_backtrace+0x5a/0x80
> />/  [ 6386.240000]  [<ffffffff810967ac>] rcu_check_callbacks+0x2fc/0x570
> />/  [ 6386.240000]  [<ffffffff81098407>] ? acct_account_cputime+0x17/0x20
> />/  [ 6386.240000]  [<ffffffff810494d3>] update_process_times+0x43/0x80
> />/  [ 6386.240000]  [<ffffffff81082621>] tick_sched_handle.isra.12+0x31/0x40
> />/  [ 6386.240000]  [<ffffffff81082764>] tick_sched_timer+0x44/0x70
> />/  [ 6386.240000]  [<ffffffff8105dc4a>] __run_hrtimer.isra.29+0x4a/0xd0
> />/  [ 6386.240000]  [<ffffffff8105e415>] hrtimer_interrupt+0xf5/0x230
> />/  [ 6386.240000]  [<ffffffff8102b7f6>] local_apic_timer_interrupt+0x36/0x60
> />/  [ 6386.240000]  [<ffffffff8102bc0e>] smp_apic_timer_interrupt+0x3e/0x60
> />/  [ 6386.240000]  [<ffffffff8136aaca>] apic_timer_interrupt+0x6a/0x70
> />/  [ 6386.240000]  <EOI>
> />/  
> />/  [ 6386.240000]  [<ffffffff811d577d>] ? __write_lock_failed+0xd/0x20
> />/  [ 6386.240000]  [<ffffffff813690f2>] _raw_write_lock+0x12/0x20
> />/  [ 6386.240000]  [<ffffffffa030579b>] sheep_aiocb_submit+0x2db/0x360 [sbd]
> />/  [ 6386.240000]  [<ffffffffa030544e>] ? sheep_aiocb_setup+0x13e/0x1b0 [sbd]
> />/  [ 6386.240000]  [<ffffffffa0304740>] 0xffffffffa030473f
> />/  [ 6386.240000]  [<ffffffff8105b510>] ? finish_wait+0x80/0x80
> />/  [ 6386.240000]  [<ffffffffa03046c0>] ? 0xffffffffa03046bf
> />/  [ 6386.240000]  [<ffffffff8105af4b>] kthread+0xbb/0xc0
> />/  [ 6386.240000]  [<ffffffff8105ae90>] ? kthread_create_on_node+0x120/0x120
> />/  [ 6386.240000]  [<ffffffff81369d7c>] ret_from_fork+0x7c/0xb0
> />/  [ 6386.240000]  [<ffffffff8105ae90>] ? kthread_create_on_node+0x120/0x120
> />/  [ 6386.240000] Code: c8 5d c3 66 0f 1f 44 00 00 55 48 89 e5 ff 15 ae 2d
> />/  2d 00 5d c3 0f 1f 40 00 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 20 0d
> />/  01 00 <48> 8d 14 92 48 89 e5 48 8d 14 92 f7 e2 48 8d 7a 01 ff 15 7f 2d
> />/  [ 6386.240208] NMI backtrace for cpu 1
> />/  [ 6386.240213] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O
> />/  3.12.20-gentoo #1
> />/  [ 6386.240215] Hardware name: Gigabyte Technology Co., Ltd.
> />/  965P-S3/965P-S3, BIOS F14A 07/31/2008
> />/  [ 6386.240218] task: ffff88005d07b960 ti: ffff88005d09a000 task.ti:
> />/  ffff88005d09a000
> />/  [ 6386.240220] RIP: 0010:[<ffffffff8100b536>]  [<ffffffff8100b536>]
> />/  default_idle+0x6/0x10
> />/  [ 6386.240227] RSP: 0018:ffff88005d09bea8  EFLAGS: 00000286
> />/  [ 6386.240229] RAX: 00000000ffffffed RBX: ffff88005d09bfd8 RCX:
> />/  0100000000000000
> />/  [ 6386.240231] RDX: 0100000000000000 RSI: 0000000000000000 RDI:
> />/  0000000000000001
> />/  [ 6386.240234] RBP: ffff88005d09bea8 R08: 0000000000000000 R09:
> />/  0000000000000000
> />/  [ 6386.240236] R10: ffff88005f480000 R11: 0000000000000e1e R12:
> />/  ffffffff814c43b0
> />/  [ 6386.240238] R13: ffff88005d09bfd8 R14: ffff88005d09bfd8 R15:
> />/  ffff88005d09bfd8
> />/  [ 6386.240241] FS:  0000000000000000(0000) GS:ffff88005f480000(0000)
> />/  knlGS:0000000000000000
> />/  [ 6386.240243] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> />/  [ 6386.240246] CR2: 00007f31fe05d000 CR3: 000000002a0f5000 CR4:
> />/  00000000000007e0
> />/  [ 6386.240247] Stack:
> />/  [ 6386.240249]  ffff88005d09beb8 ffffffff8100bc56 ffff88005d09bf18
> />/  ffffffff81073c7a
> />/  [ 6386.240253]  0000000000000000 ffff88005d09bfd8 ffff88005d09bef8
> />/  15e35c4b103d3891
> />/  [ 6386.240256]  0000000000000001 0000000000000001 0000000000000001
> />/  0000000000000000
> />/  [ 6386.240260] Call Trace:
> />/  [ 6386.240264]  [<ffffffff8100bc56>] arch_cpu_idle+0x16/0x20
> />/  [ 6386.240269]  [<ffffffff81073c7a>] cpu_startup_entry+0xda/0x1c0
> />/  [ 6386.240273]  [<ffffffff8102a1f1>] start_secondary+0x1e1/0x240
> />/  [ 6386.240275] Code: 21 ff ff ff 90 48 b8 00 00 00 00 01 00 00 00 48 89
> />/  07 e9 0e ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 48 89 e5
> />/  fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 fe 48 c7 c7 40 e7 58 81
> /
> Actually, this is a notorious memory dead-lock and can't be solved if you run
> sbd and sheep on the same nodes. It is a problem similar to NFS that you have
> NFS server and client in the same machine.
>
> Deadlock:
> Suppose SBD has dirty pages to writeout to backend of sheep. But sheep itself
> need some clean pages to hold this dirty data before do actually disk IOs and
> wait kernel to clean some dirty pages, unfortunately those dirty pages need
> sheep's help to writeout.
>
> Solution:
> never run client SBD and sheep daemon on the same node.
>
> Thanks
> Yuan

Here I was, all excited to start experimenting with SBDs, to see if they 
could work as block devices for Xen, and I come across this caveat: 
"never run client SBD and sheep daemon on the same node."

This would seem to negate the whole purpose of exposing SBDs - i.e., 
making Sheepdog usable with things other than KVM/Qemu.  Unless I'm 
radically mistaken, it's standard to mount a Sheepdog volume on the same 
node as a daemon - that's the basic architecture.  Seems like something 
is very broken if one can't use SBD's the same way.

Does this caveat also apply to mounting iSCSI or NBD volumes?

Miles Fidelman


-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra




More information about the sheepdog mailing list