[sheepdog] [sbd] I/O stuck shortly after starting writes
Liu Yuan
namei.unix at gmail.com
Mon Jun 2 05:52:30 CEST 2014
On Sun, Jun 01, 2014 at 09:56:00PM +0200, Marcin Mirosław wrote:
> Hi!
> I'm launching three sheeps locally, creating vdi with EC 2:1. Next I'm
> starting sbd0 block device. mkfs.xfs /dev/sbd0 && mount ...
> Next step is starting simple dd command: dd if=/dev/zero
> of=/mnt/test/zero bs=4M count=2000
> After short moment I've got man sheep stuck in D state:
> sheepdog 4126 1.2 9.8 2199564 151052 ? Sl 21:41 0:06
> /usr/sbin/sheep -n --port 7000 -z 0 /mnt/sdb1 --pidfile
> /run/sheepdog/sheepdog.sdb1
> sheepdog 4127 0.0 0.0 34468 396 ? Ds 21:41 0:00
> /usr/sbin/sheep -n --port 7000 -z 0 /mnt/sdb1 --pidfile
> /run/sheepdog/sheepdog.sdb1
> sheepdog 4179 0.2 6.7 1855792 103780 ? Sl 21:41 0:01
> /usr/sbin/sheep -n --port 7001 -z 1 /mnt/sdc1 --pidfile
> /run/sheepdog/sheepdog.sdc1
> sheepdog 4180 0.0 0.0 34468 396 ? Ss 21:41 0:00
> /usr/sbin/sheep -n --port 7001 -z 1 /mnt/sdc1 --pidfile
> /run/sheepdog/sheepdog.sdc1
> sheepdog 4231 0.3 7.3 1863228 111700 ? Sl 21:41 0:01
> /usr/sbin/sheep -n --port 7002 -z 2 /mnt/sdd1 --pidfile
> /run/sheepdog/sheepdog.sdd1
> sheepdog 4232 0.0 0.0 34468 400 ? Ss 21:41 0:00
> /usr/sbin/sheep -n --port 7002 -z 2 /mnt/sdd1 --pidfile
> /run/sheepdog/sheepdog.sdd1
>
> Also dd stucks:
> root 4326 0.2 0.3 14764 4664 pts/1 D+ 21:44 0:01 dd
> if=/dev/zero of=/mnt/test/zero bs=4M count=2000
>
> There is in dmesg:
>
> [ 6386.240000] INFO: rcu_sched self-detected stall on CPU { 0} (t=6001
> jiffies g=139833 c=139832 q=86946)
> [ 6386.240000] sending NMI to all CPUs:
> [ 6386.240000] NMI backtrace for cpu 0
> [ 6386.240000] CPU: 0 PID: 4286 Comm: sbd_submiter Tainted: P
> O 3.12.20-gentoo #1
> [ 6386.240000] Hardware name: Gigabyte Technology Co., Ltd.
> 965P-S3/965P-S3, BIOS F14A 07/31/2008
> [ 6386.240000] task: ffff88001cff3960 ti: ffff88002f78a000 task.ti:
> ffff88002f78a000
> [ 6386.240000] RIP: 0010:[<ffffffff811d4542>] [<ffffffff811d4542>]
> __const_udelay+0x12/0x30
> [ 6386.240000] RSP: 0000:ffff88005f403dc8 EFLAGS: 00000006
> [ 6386.240000] RAX: 0000000001062560 RBX: 0000000000002710 RCX:
> 0000000000000006
> [ 6386.240000] RDX: 0000000001140694 RSI: 0000000000000002 RDI:
> 0000000000418958
> [ 6386.240000] RBP: ffff88005f403de8 R08: 000000000000000a R09:
> 00000000000002bc
> [ 6386.240000] R10: 0000000000000000 R11: 00000000000002bb R12:
> ffffffff8149eec0
> [ 6386.240000] R13: ffffffff8149eec0 R14: ffff88005f40d700 R15:
> 00000000000153a2
> [ 6386.240000] FS: 0000000000000000(0000) GS:ffff88005f400000(0000)
> knlGS:0000000000000000
> [ 6386.240000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 6386.240000] CR2: 00007f3bf6907000 CR3: 0000000001488000 CR4:
> 00000000000007f0
> [ 6386.240000] Stack:
> [ 6386.240000] ffff88005f403de8 ffffffff8102d00a 0000000000000000
> ffffffff814c43b8
> [ 6386.240000] ffff88005f403e58 ffffffff810967ac ffff88001bcb1800
> 0000000000000001
> [ 6386.240000] ffff88005f403e18 ffffffff81098407 ffff88002f78a000
> 0000000000000000
> [ 6386.240000] Call Trace:
> [ 6386.240000] <IRQ>
>
> [ 6386.240000] [<ffffffff8102d00a>] ?
> arch_trigger_all_cpu_backtrace+0x5a/0x80
> [ 6386.240000] [<ffffffff810967ac>] rcu_check_callbacks+0x2fc/0x570
> [ 6386.240000] [<ffffffff81098407>] ? acct_account_cputime+0x17/0x20
> [ 6386.240000] [<ffffffff810494d3>] update_process_times+0x43/0x80
> [ 6386.240000] [<ffffffff81082621>] tick_sched_handle.isra.12+0x31/0x40
> [ 6386.240000] [<ffffffff81082764>] tick_sched_timer+0x44/0x70
> [ 6386.240000] [<ffffffff8105dc4a>] __run_hrtimer.isra.29+0x4a/0xd0
> [ 6386.240000] [<ffffffff8105e415>] hrtimer_interrupt+0xf5/0x230
> [ 6386.240000] [<ffffffff8102b7f6>] local_apic_timer_interrupt+0x36/0x60
> [ 6386.240000] [<ffffffff8102bc0e>] smp_apic_timer_interrupt+0x3e/0x60
> [ 6386.240000] [<ffffffff8136aaca>] apic_timer_interrupt+0x6a/0x70
> [ 6386.240000] <EOI>
>
> [ 6386.240000] [<ffffffff811d577d>] ? __write_lock_failed+0xd/0x20
> [ 6386.240000] [<ffffffff813690f2>] _raw_write_lock+0x12/0x20
> [ 6386.240000] [<ffffffffa030579b>] sheep_aiocb_submit+0x2db/0x360 [sbd]
> [ 6386.240000] [<ffffffffa030544e>] ? sheep_aiocb_setup+0x13e/0x1b0 [sbd]
> [ 6386.240000] [<ffffffffa0304740>] 0xffffffffa030473f
> [ 6386.240000] [<ffffffff8105b510>] ? finish_wait+0x80/0x80
> [ 6386.240000] [<ffffffffa03046c0>] ? 0xffffffffa03046bf
> [ 6386.240000] [<ffffffff8105af4b>] kthread+0xbb/0xc0
> [ 6386.240000] [<ffffffff8105ae90>] ? kthread_create_on_node+0x120/0x120
> [ 6386.240000] [<ffffffff81369d7c>] ret_from_fork+0x7c/0xb0
> [ 6386.240000] [<ffffffff8105ae90>] ? kthread_create_on_node+0x120/0x120
> [ 6386.240000] Code: c8 5d c3 66 0f 1f 44 00 00 55 48 89 e5 ff 15 ae 2d
> 2d 00 5d c3 0f 1f 40 00 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 20 0d
> 01 00 <48> 8d 14 92 48 89 e5 48 8d 14 92 f7 e2 48 8d 7a 01 ff 15 7f 2d
> [ 6386.240208] NMI backtrace for cpu 1
> [ 6386.240213] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P O
> 3.12.20-gentoo #1
> [ 6386.240215] Hardware name: Gigabyte Technology Co., Ltd.
> 965P-S3/965P-S3, BIOS F14A 07/31/2008
> [ 6386.240218] task: ffff88005d07b960 ti: ffff88005d09a000 task.ti:
> ffff88005d09a000
> [ 6386.240220] RIP: 0010:[<ffffffff8100b536>] [<ffffffff8100b536>]
> default_idle+0x6/0x10
> [ 6386.240227] RSP: 0018:ffff88005d09bea8 EFLAGS: 00000286
> [ 6386.240229] RAX: 00000000ffffffed RBX: ffff88005d09bfd8 RCX:
> 0100000000000000
> [ 6386.240231] RDX: 0100000000000000 RSI: 0000000000000000 RDI:
> 0000000000000001
> [ 6386.240234] RBP: ffff88005d09bea8 R08: 0000000000000000 R09:
> 0000000000000000
> [ 6386.240236] R10: ffff88005f480000 R11: 0000000000000e1e R12:
> ffffffff814c43b0
> [ 6386.240238] R13: ffff88005d09bfd8 R14: ffff88005d09bfd8 R15:
> ffff88005d09bfd8
> [ 6386.240241] FS: 0000000000000000(0000) GS:ffff88005f480000(0000)
> knlGS:0000000000000000
> [ 6386.240243] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 6386.240246] CR2: 00007f31fe05d000 CR3: 000000002a0f5000 CR4:
> 00000000000007e0
> [ 6386.240247] Stack:
> [ 6386.240249] ffff88005d09beb8 ffffffff8100bc56 ffff88005d09bf18
> ffffffff81073c7a
> [ 6386.240253] 0000000000000000 ffff88005d09bfd8 ffff88005d09bef8
> 15e35c4b103d3891
> [ 6386.240256] 0000000000000001 0000000000000001 0000000000000001
> 0000000000000000
> [ 6386.240260] Call Trace:
> [ 6386.240264] [<ffffffff8100bc56>] arch_cpu_idle+0x16/0x20
> [ 6386.240269] [<ffffffff81073c7a>] cpu_startup_entry+0xda/0x1c0
> [ 6386.240273] [<ffffffff8102a1f1>] start_secondary+0x1e1/0x240
> [ 6386.240275] Code: 21 ff ff ff 90 48 b8 00 00 00 00 01 00 00 00 48 89
> 07 e9 0e ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 48 89 e5
> fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 fe 48 c7 c7 40 e7 58 81
Actually, this is a notorious memory dead-lock and can't be solved if you run
sbd and sheep on the same nodes. It is a problem similar to NFS that you have
NFS server and client in the same machine.
Deadlock:
Suppose SBD has dirty pages to writeout to backend of sheep. But sheep itself
need some clean pages to hold this dirty data before do actually disk IOs and
wait kernel to clean some dirty pages, unfortunately those dirty pages need
sheep's help to writeout.
Solution:
never run client SBD and sheep daemon on the same node.
Thanks
Yuan
More information about the sheepdog
mailing list