[sheepdog-users] Stability problems with kvm using a remote sheepdog volume

David Douard david.douard at logilab.fr
Mon Jun 11 15:22:44 CEST 2012


On 09/06/2012 12:39, David Douard wrote:
> On 08/06/2012 16:48, MORITA Kazutaka wrote:
>> On Fri, Jun 8, 2012 at 9:41 PM, David Douard <david.douard at logilab.fr> wrote:
>>> Hi,
>>>
>>> I still have very serious stability problems with kvm when using remote
>>> sheepdog access.
>>>
>>> I filles a bug on github about this:
>>>
>>>  https://github.com/collie/sheepdog/issues/26
>>>
>>> Are there any other people having similar problems? What can I do to
>>> identify the problem and try to fix it?
>> Hi David,
>>
> Hi,
>
>> I'm working on fixing a race condition in the qemu sheepdog block driver.
>> I guess you are hitting the same problem.  I've pushed some half baked fixes to
>>   git://github.com/kazum/qemu.git
>>
>> Can you try this tree?
> I will.
>
> Thanks,
> David

Humm, spoke a bit too quick.

The kvm does not segfault any more, but the sheepdog volume generates
errors (in the guest) when writing. I have many

  end_request: I/O error, dev vdc, sector 0

in the syslog of the guest (vdc being the block device served by sheepdog).

Running "zcav -w",  the guest freezed for a while, and finally produced
the traceback below.

If I can, I'd like to try to rebuild the kvm binary from the ubuntu
package, just applying the required patches to fix the race condition.
Kazataka, can you please point me the strictly required changesets in
your git repo I must apply as patches?

David

PS: I resend this cause I sent it using a wrong email address.




Jun  9 13:56:20 test-precise kernel: [ 3181.484106] end_request: I/O
error, dev vdc, sector 0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] INFO: rcu_sched
detected stall on CPU 0 (t=32435 jiffies)
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] sending NMI to all CPUs:
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] NMI backtrace for cpu 0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] CPU 0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] Modules linked in:
floppy psmouse serio_raw virtio_balloon 8139too 8139cp acpiphp
Jun  9 13:56:20 test-precise kernel: [ 3181.484106]
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] Pid: 0, comm:
swapper/0 Not tainted 3.2.0-23-virtual #36-Ubuntu Bochs Bochs
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] RIP:
0010:[<ffffffff81036c5f>]  [<ffffffff81036c5f>] flat_send_IPI_all+0xaf/0xd0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] RSP:
0018:ffff88007fc037e0  EFLAGS: 00010006
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] RAX:
0000000000000000 RBX: 0000000000000046 RCX: 000000000003ffff
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] RDX:
0000000000000000 RSI: 0000000000000086 RDI: 0000000000000300
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] RBP:
ffff88007fc03800 R08: 000000000000000a R09: 0000000000000000
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] R10:
0000000000000000 R11: 0000000000000000 R12: 0000000000000c00
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] R13:
0000000001000000 R14: ffff88007fc0e700 R15: 0000000000000000
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] FS:
0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] CS:  0010 DS: 0000
ES: 0000 CR0: 000000008005003b
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] CR2:
00007fb7f39f800f CR3: 000000007a231000 CR4: 00000000000006f0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] DR0:
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] DR3:
0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] Process swapper/0
(pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0d020)
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] Stack:
Jun  9 13:56:20 test-precise kernel: [ 3181.484106]  0000000000000000
0000000000002710 ffffffff81c30f00 ffffffff81c31000
Jun  9 13:56:20 test-precise kernel: [ 3181.484106]  ffff88007fc03820
ffffffff810322ca 0000000000082000 ffffffff81c30f00
Jun  9 13:56:20 test-precise kernel: [ 3181.484106]  ffff88007fc03840
ffffffff810de077 ffff88007fc0e258 ffff88007fc0eb80
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] Call Trace:
Jun  9 13:56:20 test-precise kernel: [ 3181.484106]  <IRQ>
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810322ca>]
arch_trigger_all_cpu_backtrace+0x5a/0x90
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810de077>]
check_cpu_stall.isra.36+0x97/0xf0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810de108>]
__rcu_pending+0x38/0x1b0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810de579>]
rcu_check_callbacks+0x79/0x1e0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81077198>]
update_process_times+0x48/0x90
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8109a7a4>]
tick_sched_timer+0x64/0xc0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8108cfc8>]
__run_hrtimer+0x78/0x1f0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8109a740>]
? tick_nohz_handler+0x100/0x100
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8103bd39>]
? kvm_clock_get_cycles+0x9/0x10
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8108d853>]
hrtimer_interrupt+0xe3/0x200
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff816604c9>]
smp_apic_timer_interrupt+0x69/0x99
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8165e39e>]
apic_timer_interrupt+0x6e/0x80
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffffa000e56b>]
? cp_start_xmit+0x54b/0x6b0 [8139cp]
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81019e80>]
? nommu_map_sg+0xe0/0xe0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81050d16>]
? ttwu_do_activate.constprop.176+0x66/0x70
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8153bbb2>]
dev_hard_start_xmit+0x2a2/0x580
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8105e5e0>]
? try_to_wake_up+0x190/0x200
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8155850e>]
sch_direct_xmit+0xfe/0x1d0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8153bfc8>]
dev_queue_xmit+0x138/0x420
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff815724fb>]
ip_finish_output+0x16b/0x2f0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81573068>]
ip_output+0x98/0xa0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8156b61f>]
? ipv4_dst_check+0x2f/0x50
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81572759>]
ip_local_out+0x29/0x30
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff815728bc>]
ip_queue_xmit+0x15c/0x410
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d5a0>]
? tcp_retransmit_timer+0x440/0x440
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158a9d9>]
tcp_transmit_skb+0x359/0x580
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158b941>]
tcp_retransmit_skb+0x171/0x310
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d37b>]
tcp_retransmit_timer+0x21b/0x440
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d688>]
tcp_write_timer+0xe8/0x110
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d5a0>]
? tcp_retransmit_timer+0x440/0x440
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810748f6>]
call_timer_fn+0x46/0x160
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d5a0>]
? tcp_retransmit_timer+0x440/0x440
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81076242>]
run_timer_softirq+0x132/0x2a0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81093485>]
? ktime_get+0x65/0xe0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8106d318>]
__do_softirq+0xa8/0x210
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8103bd39>]
? kvm_clock_get_cycles+0x9/0x10
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8109a344>]
? tick_program_event+0x24/0x30
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8165fb2c>]
call_softirq+0x1c/0x30
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81015295>]
do_softirq+0x65/0xa0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8106d6fe>]
irq_exit+0x8e/0xb0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff816604ce>]
smp_apic_timer_interrupt+0x6e/0x99
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8165e39e>]
apic_timer_interrupt+0x6e/0x80
Jun  9 13:56:20 test-precise kernel: [ 3181.484106]  <EOI>
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8103be2b>]
? native_safe_halt+0xb/0x10
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8101b893>]
default_idle+0x53/0x1d0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81012236>]
cpu_idle+0xd6/0x120
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8161cb2e>]
rest_init+0x72/0x74
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81cf9c0d>]
start_kernel+0x3ba/0x3c7
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81cf9388>]
x86_64_start_reservations+0x132/0x136
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81cf9140>]
? early_idt_handlers+0x140/0x140
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81cf9459>]
x86_64_start_kernel+0xcd/0xdc
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] Code: 8b 05 06 4d ca
00 41 c1 e5 18 44 8b 60 34 ff 90 48 01 00 00 44 89 2c 25 10 b3 5f ff 41
81 cc 00 04 00 00 44 89 24 25 00 b3 5f ff <48> 89 df 57 9d 66 66 90 66
90 48 83 c4 08 5b 41 5c 41 5d 5d c3
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] Call Trace:
Jun  9 13:56:20 test-precise kernel: [ 3181.484106]  <IRQ>
[<ffffffff810322ca>] arch_trigger_all_cpu_backtrace+0x5a/0x90
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810de077>]
check_cpu_stall.isra.36+0x97/0xf0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810de108>]
__rcu_pending+0x38/0x1b0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810de579>]
rcu_check_callbacks+0x79/0x1e0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81077198>]
update_process_times+0x48/0x90
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8109a7a4>]
tick_sched_timer+0x64/0xc0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8108cfc8>]
__run_hrtimer+0x78/0x1f0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8109a740>]
? tick_nohz_handler+0x100/0x100
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8103bd39>]
? kvm_clock_get_cycles+0x9/0x10
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8108d853>]
hrtimer_interrupt+0xe3/0x200
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff816604c9>]
smp_apic_timer_interrupt+0x69/0x99
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8165e39e>]
apic_timer_interrupt+0x6e/0x80
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffffa000e56b>]
? cp_start_xmit+0x54b/0x6b0 [8139cp]
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81019e80>]
? nommu_map_sg+0xe0/0xe0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81050d16>]
? ttwu_do_activate.constprop.176+0x66/0x70
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8153bbb2>]
dev_hard_start_xmit+0x2a2/0x580
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8105e5e0>]
? try_to_wake_up+0x190/0x200
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8155850e>]
sch_direct_xmit+0xfe/0x1d0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8153bfc8>]
dev_queue_xmit+0x138/0x420
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff815724fb>]
ip_finish_output+0x16b/0x2f0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81573068>]
ip_output+0x98/0xa0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8156b61f>]
? ipv4_dst_check+0x2f/0x50
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81572759>]
ip_local_out+0x29/0x30
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff815728bc>]
ip_queue_xmit+0x15c/0x410
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d5a0>]
? tcp_retransmit_timer+0x440/0x440
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158a9d9>]
tcp_transmit_skb+0x359/0x580
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158b941>]
tcp_retransmit_skb+0x171/0x310
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d37b>]
tcp_retransmit_timer+0x21b/0x440
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d688>]
tcp_write_timer+0xe8/0x110
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d5a0>]
? tcp_retransmit_timer+0x440/0x440
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff810748f6>]
call_timer_fn+0x46/0x160
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8158d5a0>]
? tcp_retransmit_timer+0x440/0x440
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81076242>]
run_timer_softirq+0x132/0x2a0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81093485>]
? ktime_get+0x65/0xe0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8106d318>]
__do_softirq+0xa8/0x210
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8103bd39>]
? kvm_clock_get_cycles+0x9/0x10
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8109a344>]
? tick_program_event+0x24/0x30
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8165fb2c>]
call_softirq+0x1c/0x30
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81015295>]
do_softirq+0x65/0xa0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8106d6fe>]
irq_exit+0x8e/0xb0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff816604ce>]
smp_apic_timer_interrupt+0x6e/0x99
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8165e39e>]
apic_timer_interrupt+0x6e/0x80
Jun  9 13:56:20 test-precise kernel: [ 3181.484106]  <EOI>
[<ffffffff8103be2b>] ? native_safe_halt+0xb/0x10
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8101b893>]
default_idle+0x53/0x1d0
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81012236>]
cpu_idle+0xd6/0x120
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff8161cb2e>]
rest_init+0x72/0x74
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81cf9c0d>]
start_kernel+0x3ba/0x3c7
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81cf9388>]
x86_64_start_reservations+0x132/0x136
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81cf9140>]
? early_idt_handlers+0x140/0x140
Jun  9 13:56:20 test-precise kernel: [ 3181.484106] [<ffffffff81cf9459>]
x86_64_start_kernel+0xcd/0xdc





>
>
>> Thanks,
>>
>> Kazutaka
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: david_douard.vcf
Type: text/x-vcard
Size: 246 bytes
Desc: not available
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20120611/8481f67b/attachment-0004.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20120611/8481f67b/attachment-0003.sig>


More information about the sheepdog-users mailing list