Could you please paste your /var/log/cluster/corosync.log and sheep.log? Thanks, Wenhao On Tue, Jul 2, 2013 at 10:59 AM, George Y. Hu <huyuanyuan at gamutsoft.com>wrote: > Dears, > > I installed corosync(1.4.6)+sheepdog(0.6.0) on two Centos6, with the > following configuration of corosync.conf > > ----------------------------------------- > compatibility: whitetank > > totem { > version: 2 > secauth: off > threads: 0 > interface { > ringnumber: 0 > bindnetaddr: 10.86.213.251 (252 is another) > mcastaddr: 226.94.1.1 > mcastport: 5405 > ttl: 1 > } > } > > logging { > fileline: off > to_stderr: no > to_logfile: yes > logfile: /var/log/cluster/corosync.log > to_syslog: yes > debug: off > timestamp: on > logger_subsys { > subsys: AMF > debug: off > } > } > -------------------------------------------- > > When I start sheepdog service by "sheep /var/lib/sheep", it seems two nodes > are not connected since I can see only one node in "collie node list", > M Id Host:Port V-Nodes Zone > - 0 10.86.213.251:7000 64 -69904886 > > Iptables has been disabled but the problem remains. > Will somebody help me on that? > > > Best Regards, > > George Y. Hu > > > Send sheepdog-users mailing list submissions to > sheepdog-users at lists.wpkg.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.wpkg.org/mailman/listinfo/sheepdog-users > or, via email, send a message with subject or body 'help' to > sheepdog-users-request at lists.wpkg.org > > You can reach the person managing the list at > sheepdog-users-owner at lists.wpkg.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of sheepdog-users digest..." > > > Today's Topics: > > 1. Re: Problem with snapshots made with qemu-img (Liu Yuan) > 2. Crash khugepaged (Valerio Pachera) > 3. Re: Crash khugepaged (Valerio Pachera) > 4. Re: cluster format during recovery (MORITA Kazutaka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 28 Jun 2013 18:02:59 +0800 > From: Liu Yuan <namei.unix at gmail.com> > To: "Ing. Luca Lazzeroni - Trend Servizi Srl" <luca at gvnet.it> > Cc: "sheepdog-users at lists.wpkg.org" <sheepdog-users at lists.wpkg.org> > Subject: Re: [sheepdog-users] Problem with snapshots made with > qemu-img > Message-ID: <20130628100259.GC13194 at ubuntu-precise> > Content-Type: text/plain; charset=utf-8 > > On Fri, Jun 28, 2013 at 10:07:47AM +0200, Ing. Luca Lazzeroni - Trend > Servizi Srl wrote: > > Hi, > > if I make a snapshot of a running VM using: > > > > qemu-img snapshot -c Pippo Pluto.raw > > > > snapshot is created on all nodes, but its tag is updated on all nodes > except the one running the VM. > > On other nodes I can see, via "collie vdi list" the snapshot tag updated > correctly, but on the node running the VM I see 2 VDI with the same name, > different ID and empty Tag. > > Seems that recent qemu-img need fixes, we didn't test snapshot with > qemu-img > with our functonal tests. We should though. > > > > > If I create the snapshot via "collie vdi snapshot", everything works fine > and tag is propagated to all nodes; but I don't know if creating a snapshot > with collie of a running VM with writeback cache enabled is a good idea in > terms of data integrity? > > No problem, snapshot operation will > 1 flush the cache first > 2 mark the vdi as readonly > > If there is, it is a bug that should be fixed. > > Thanks, > Yuan > > > ------------------------------ > > Message: 2 > Date: Fri, 28 Jun 2013 17:40:26 +0200 > From: Valerio Pachera <sirio81 at gmail.com> > To: Lista sheepdog user <sheepdog-users at lists.wpkg.org> > Subject: [sheepdog-users] Crash khugepaged > Message-ID: > <CAHS0cb-KqoS6wWt_gT+bSQ56KS7Z5iA4yOSpX5zQsoGPX0WV= > Q at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > What do you think about this? > > Jun 28 16:34:20 sheepdog004 kernel: [103658.606691] khugepaged D > ffff88021f393780 0 32 2 0x00000000 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606696] ffff880213793750 > 0000000000000046 ffffffff00000000 ffff880216566f60 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606701] 0000000000013780 > ffff880213795fd8 ffff880213795fd8 ffff880213793750 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606713] ffff880213795730 > 0000000113795730 ffff88021657fe50 ffff88021f393fd0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606718] Call Trace: > Jun 28 16:34:20 sheepdog004 kernel: [103658.606727] > [<ffffffff810b47b3>] ? lock_page+0x20/0x20 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606732] > [<ffffffff8134da71>] ? io_schedule+0x59/0x71 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606737] > [<ffffffff810b47b9>] ? sleep_on_page+0x6/0xa > Jun 28 16:34:20 sheepdog004 kernel: [103658.606740] > [<ffffffff8134deb4>] ? __wait_on_bit+0x3e/0x71 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606744] > [<ffffffff810b48f5>] ? wait_on_page_bit+0x6e/0x73 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606751] > [<ffffffff8105fb09>] ? autoremove_wake_function+0x2a/0x2a > Jun 28 16:34:20 sheepdog004 kernel: [103658.606756] > [<ffffffff810c2850>] ? shrink_page_list+0x166/0x73f > Jun 28 16:34:20 sheepdog004 kernel: [103658.606761] > [<ffffffff810c9cfa>] ? zone_page_state_add+0x14/0x23 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606765] > [<ffffffff810c0e13>] ? update_isolated_counts+0x13b/0x15a > Jun 28 16:34:20 sheepdog004 kernel: [103658.606769] > [<ffffffff810c32c4>] ? shrink_inactive_list+0x2cd/0x3f0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606774] > [<ffffffff810be232>] ? __lru_cache_add+0x2b/0x51 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606778] > [<ffffffff810c3a89>] ? shrink_zone+0x3c0/0x4e6 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606783] > [<ffffffff810c3fa7>] ? do_try_to_free_pages+0x1cc/0x41c > Jun 28 16:34:20 sheepdog004 kernel: [103658.606787] > [<ffffffff810c4462>] ? try_to_free_pages+0xa9/0xe9 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606791] > [<ffffffff810364e8>] ? should_resched+0x5/0x23 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606796] > [<ffffffff810bb3ee>] ? __alloc_pages_nodemask+0x4ed/0x7aa > Jun 28 16:34:20 sheepdog004 kernel: [103658.606801] > [<ffffffff8100d69f>] ? __switch_to+0x133/0x258 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606806] > [<ffffffff8134eb77>] ? _raw_spin_unlock_irqrestore+0xe/0xf > Jun 28 16:34:20 sheepdog004 kernel: [103658.606811] > [<ffffffff810e5f05>] ? alloc_pages_vma+0x12d/0x136 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606815] > [<ffffffff810ce1c5>] ? pte_pfn+0x5/0xe > Jun 28 16:34:20 sheepdog004 kernel: [103658.606819] > [<ffffffff810ef9bd>] ? khugepaged+0x4dc/0xef3 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606823] > [<ffffffff8100d69f>] ? __switch_to+0x133/0x258 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606828] > [<ffffffff8105fadf>] ? add_wait_queue+0x3c/0x3c > Jun 28 16:34:20 sheepdog004 kernel: [103658.606833] > [<ffffffff810ef4e1>] ? add_mm_counter.constprop.28+0x9/0x9 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606837] > [<ffffffff8105f48d>] ? kthread+0x76/0x7e > Jun 28 16:34:20 sheepdog004 kernel: [103658.606842] > [<ffffffff81355cb4>] ? kernel_thread_helper+0x4/0x10 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606847] > [<ffffffff8105f417>] ? kthread_worker_fn+0x139/0x139 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606851] > [<ffffffff81355cb0>] ? gs_change+0x13/0x13 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606983] sheep D > ffff88021f393780 0 30859 1 0x00000000 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606987] ffff880101d48730 > 0000000000000082 0000000000000000 ffff880216566f60 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606992] 0000000000013780 > ffff8802141dffd8 ffff8802141dffd8 ffff880101d48730 > Jun 28 16:34:20 sheepdog004 kernel: [103658.606997] ffffea00048c4b20 > 0000000105019098 ffffea0004fdaaa8 ffff880214677be0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607001] Call Trace: > Jun 28 16:34:20 sheepdog004 kernel: [103658.607005] > [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607011] > [<ffffffff811b3af3>] ? call_rwsem_down_write_failed+0x13/0x20 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607015] > [<ffffffff8134e431>] ? down_write+0x25/0x27 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607019] > [<ffffffff810d543d>] ? sys_munmap+0x2e/0x52 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607023] > [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b > Jun 28 16:34:20 sheepdog004 kernel: [103658.607135] tar D > ffff88021f293780 0 14370 13938 0x00000000 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607139] ffff88021472ae60 > 0000000000000086 ffffffff00000000 ffff8802165160c0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607144] 0000000000013780 > ffff880128b77fd8 ffff880128b77fd8 ffff88021472ae60 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607148] ffffffff8101360a > 00000001810660a1 ffff880213ff3f30 ffff88021f293fd0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607001] Call Trace: > Jun 28 16:34:20 sheepdog004 kernel: [103658.607005] > [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607011] > [<ffffffff811b3af3>] ? call_rwsem_down_write_failed+0x13/0x20 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607015] > [<ffffffff8134e431>] ? down_write+0x25/0x27 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607019] > [<ffffffff810d543d>] ? sys_munmap+0x2e/0x52 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607023] > [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b > Jun 28 16:34:20 sheepdog004 kernel: [103658.607135] tar D > ffff88021f293780 0 14370 13938 0x00000000 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607139] ffff88021472ae60 > 0000000000000086 ffffffff00000000 ffff8802165160c0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607144] 0000000000013780 > ffff880128b77fd8 ffff880128b77fd8 ffff88021472ae60 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607148] ffffffff8101360a > 00000001810660a1 ffff880213ff3f30 ffff88021f293fd0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607153] Call Trace: > Jun 28 16:34:20 sheepdog004 kernel: [103658.607157] > [<ffffffff8101360a>] ? read_tsc+0x5/0x14 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607161] > [<ffffffff810b47b3>] ? lock_page+0x20/0x20 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607165] > [<ffffffff8134da71>] ? io_schedule+0x59/0x71 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607169] > [<ffffffff810b47b9>] ? sleep_on_page+0x6/0xa > Jun 28 16:34:20 sheepdog004 kernel: [103658.607172] > [<ffffffff8134deb4>] ? __wait_on_bit+0x3e/0x71 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607176] > [<ffffffff810b48f5>] ? wait_on_page_bit+0x6e/0x73 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607181] > [<ffffffff8105fb09>] ? autoremove_wake_function+0x2a/0x2a > Jun 28 16:34:20 sheepdog004 kernel: [103658.607186] > [<ffffffff810b49cd>] ? filemap_fdatawait_range+0x74/0x139 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607191] > [<ffffffff810b6181>] ? filemap_write_and_wait+0x24/0x30 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607205] > [<ffffffffa053ac73>] ? nfs_getattr+0x32/0xac [nfs] > Jun 28 16:34:20 sheepdog004 kernel: [103658.607211] > [<ffffffff810fda17>] ? vfs_fstat+0x30/0x4e > Jun 28 16:34:20 sheepdog004 kernel: [103658.607214] > [<ffffffff810fdb49>] ? sys_newfstat+0x12/0x2b > Jun 28 16:34:20 sheepdog004 kernel: [103658.607218] > [<ffffffff810fa376>] ? vfs_write+0xbb/0xe9 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607221] > [<ffffffff810fa554>] ? sys_write+0x5f/0x6b > Jun 28 16:34:20 sheepdog004 kernel: [103658.607225] > [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b > Jun 28 16:34:20 sheepdog004 kernel: [103658.607335] pgrep D > ffff88021f293780 0 30870 30869 0x00000000 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607339] ffff880133e7c730 > 0000000000000086 0000000100000000 ffff8802165160c0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607344] 0000000000013780 > ffff8801340f5fd8 ffff8801340f5fd8 ffff880133e7c730 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607349] 0000000000000020 > 000000011f5fcc08 0000000000000002 ffff880214677be0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607353] Call Trace: > Jun 28 16:34:20 sheepdog004 kernel: [103658.607357] > [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607361] > [<ffffffff811b3ac4>] ? call_rwsem_down_read_failed+0x14/0x30 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607365] > [<ffffffff8134e44a>] ? down_read+0x17/0x19 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607369] > [<ffffffff810d1a94>] ? __access_remote_vm+0x3a/0x1c1 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607374] > [<ffffffff810d2acb>] ? access_process_vm+0x48/0x65 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607378] > [<ffffffff81140852>] ? proc_pid_cmdline+0x63/0xf0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607382] > [<ffffffff81141a58>] ? proc_info_read+0x5b/0xb8 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607386] > [<ffffffff810fa443>] ? vfs_read+0x9f/0xe6 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607390] > [<ffffffff810fa4cf>] ? sys_read+0x45/0x6b > Jun 28 16:34:20 sheepdog004 kernel: [103658.607393] > [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b > Jun 28 16:34:20 sheepdog004 kernel: [103658.607503] pgrep D > ffff88021f293780 0 30926 30925 0x00000000 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607507] ffff880212ed2e20 > 0000000000000086 0000000100000000 ffff8802165160c0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607512] 0000000000013780 > ffff880132047fd8 ffff880132047fd8 ffff880212ed2e20 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607517] 0000000000000020 > 000000011f5fcc08 0000000000000002 ffff880214677be0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607521] Call Trace: > Jun 28 16:34:20 sheepdog004 kernel: [103658.607525] > [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607529] > [<ffffffff811b3ac4>] ? call_rwsem_down_read_failed+0x14/0x30 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607533] > [<ffffffff8134e44a>] ? down_read+0x17/0x19 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607537] > [<ffffffff810d1a94>] ? __access_remote_vm+0x3a/0x1c1 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607541] > [<ffffffff810d2acb>] ? access_process_vm+0x48/0x65 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607545] > [<ffffffff81140852>] ? proc_pid_cmdline+0x63/0xf0 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607548] > [<ffffffff81141a58>] ? proc_info_read+0x5b/0xb8 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607552] > [<ffffffff810fa443>] ? vfs_read+0x9f/0xe6 > Jun 28 16:34:20 sheepdog004 kernel: [103658.607556] > [<ffffffff810fa4cf>] ? sys_read+0x45/0x6b > Jun 28 16:34:20 sheepdog004 kernel: [103658.607559] > [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b > Jun 28 16:36:20 sheepdog004 kernel: [103778.581543] Call Trace: > Jun 28 16:36:20 sheepdog004 kernel: [103778.581552] > [<ffffffff810b47b3>] ? lock_page+0x20/0x20 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581557] > [<ffffffff8134da71>] ? io_schedule+0x59/0x71 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581561] > [<ffffffff810b47b9>] ? sleep_on_page+0x6/0xa > Jun 28 16:36:20 sheepdog004 kernel: [103778.581565] > [<ffffffff8134deb4>] ? __wait_on_bit+0x3e/0x71 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581569] > [<ffffffff810b48f5>] ? wait_on_page_bit+0x6e/0x73 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581575] > [<ffffffff8105fb09>] ? autoremove_wake_function+0x2a/0x2a > Jun 28 16:36:20 sheepdog004 kernel: [103778.581581] > [<ffffffff810c2850>] ? shrink_page_list+0x166/0x73f > Jun 28 16:36:20 sheepdog004 kernel: [103778.581586] > [<ffffffff810c9cfa>] ? zone_page_state_add+0x14/0x23 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581591] > [<ffffffff810c0e13>] ? update_isolated_counts+0x13b/0x15a > Jun 28 16:36:20 sheepdog004 kernel: [103778.581595] > [<ffffffff810c32c4>] ? shrink_inactive_list+0x2cd/0x3f0 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581600] > [<ffffffff810be232>] ? __lru_cache_add+0x2b/0x51 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581604] > [<ffffffff810c3a89>] ? shrink_zone+0x3c0/0x4e6 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581608] > [<ffffffff810c3fa7>] ? do_try_to_free_pages+0x1cc/0x41c > Jun 28 16:36:20 sheepdog004 kernel: [103778.581612] > [<ffffffff810c4462>] ? try_to_free_pages+0xa9/0xe9 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581616] > [<ffffffff810364e8>] ? should_resched+0x5/0x23 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581621] > [<ffffffff810bb3ee>] ? __alloc_pages_nodemask+0x4ed/0x7aa > Jun 28 16:36:20 sheepdog004 kernel: [103778.581626] > [<ffffffff8100d69f>] ? __switch_to+0x133/0x258 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581631] > [<ffffffff8134eb77>] ? _raw_spin_unlock_irqrestore+0xe/0xf > Jun 28 16:36:20 sheepdog004 kernel: [103778.581636] > [<ffffffff810e5f05>] ? alloc_pages_vma+0x12d/0x136 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581640] > [<ffffffff810ce1c5>] ? pte_pfn+0x5/0xe > Jun 28 16:36:20 sheepdog004 kernel: [103778.581645] > [<ffffffff810ef9bd>] ? khugepaged+0x4dc/0xef3 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581649] > [<ffffffff8100d69f>] ? __switch_to+0x133/0x258 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581654] > [<ffffffff8105fadf>] ? add_wait_queue+0x3c/0x3c > Jun 28 16:36:20 sheepdog004 kernel: [103778.581658] > [<ffffffff810ef4e1>] ? add_mm_counter.constprop.28+0x9/0x9 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581662] > [<ffffffff8105f48d>] ? kthread+0x76/0x7e > Jun 28 16:36:20 sheepdog004 kernel: [103778.581667] > [<ffffffff81355cb4>] ? kernel_thread_helper+0x4/0x10 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581671] > [<ffffffff8105f417>] ? kthread_worker_fn+0x139/0x139 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581675] > [<ffffffff81355cb0>] ? gs_change+0x13/0x13 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581808] sheep D > ffff88021f393780 0 30859 1 0x00000000 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581813] ffff880101d48730 > 0000000000000082 0000000000000000 ffff880216566f60 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581817] 0000000000013780 > ffff8802141dffd8 ffff8802141dffd8 ffff880101d48730 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581822] ffffea00048c4b20 > 0000000105019098 ffffea0004fdaaa8 ffff880214677be0 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581827] Call Trace: > Jun 28 16:36:20 sheepdog004 kernel: [103778.581831] > [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581842] > [<ffffffff811b3af3>] ? call_rwsem_down_write_failed+0x13/0x20 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581846] > [<ffffffff8134e431>] ? down_write+0x25/0x27 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581850] > [<ffffffff810d543d>] ? sys_munmap+0x2e/0x52 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581854] > [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b > Jun 28 16:36:20 sheepdog004 kernel: [103778.581969] tar D > ffff88021f293780 0 14370 13938 0x00000000 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581974] ffff88021472ae60 > 0000000000000086 ffffffff00000000 ffff8802165160c0 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581978] 0000000000013780 > ffff880128b77fd8 ffff880128b77fd8 ffff88021472ae60 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581983] ffffffff8101360a > 00000001810660a1 ffff880213ff3f30 ffff88021f293fd0 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581988] Call Trace: > Jun 28 16:36:20 sheepdog004 kernel: [103778.581992] > [<ffffffff8101360a>] ? read_tsc+0x5/0x14 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581996] > [<ffffffff810b47b3>] ? lock_page+0x20/0x20 > Jun 28 16:36:20 sheepdog004 kernel: [103778.581999] > [<ffffffff8134da71>] ? io_schedule+0x59/0x71 > Jun 28 16:36:20 sheepdog004 kernel: [103778.582003] > [<ffffffff810b47b9>] ? sleep_on_page+0x6/0xa > .... > > Host with 8G of ram. > The host was exporting also a nfs folder. > Guest was mounting this folder. > Guest for decompressing a big tar.gz (77G). > > > ------------------------------ > > Message: 3 > Date: Fri, 28 Jun 2013 18:02:37 +0200 > From: Valerio Pachera <sirio81 at gmail.com> > To: Lista sheepdog user <sheepdog-users at lists.wpkg.org> > Subject: Re: [sheepdog-users] Crash khugepaged > Message-ID: > <CAHS0cb8TogSOD2pGE+TsScm+o= > g1kEXGdUNoFWWF-xYoVgfwog at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > 2013/6/28 Valerio Pachera <sirio81 at gmail.com>: > > What do you think about this? > > The crash was host side. > It was difficult to interact with the host because pgrep, atop, pa > aux, were freezing. > 'top' and 'kill' were working. > I had to kill -9 the guests. > I've been able to reboot the host (and first shutdown the cluster). > Collie node list was showing the host still inside the cluster. > > I wonder if the crash may be related to excessive network traffic on > the nic, or it's related to the use of transparent huge pages. > I set back the default value (madvide) but I'm not going to repeat the > decompression via nfs today. > > > ------------------------------ > > Message: 4 > Date: Sat, 29 Jun 2013 12:50:06 +0900 > From: MORITA Kazutaka <morita.kazutaka at gmail.com> > To: Valerio Pachera <sirio81 at gmail.com> > Cc: Lista sheepdog user <sheepdog-users at lists.wpkg.org> > Subject: Re: [sheepdog-users] cluster format during recovery > Message-ID: <m27ghd60ep.wl%morita.kazutaka at gmail.com> > Content-Type: text/plain; charset=US-ASCII > > At Thu, 27 Jun 2013 15:45:36 +0200, > Valerio Pachera wrote: > > > > This is an unusual thing. > > It's useful for testing purpose only: > > > > What happens if cluster format is run during a recovery? > > Probably, the recovery process will print a lot of error messages > after cluster format since it cannot find any objects to be recovered. > > Thanks, > > Kazutaka > > > ------------------------------ > > _______________________________________________ > sheepdog-users mailing list > sheepdog-users at lists.wpkg.org > http://lists.wpkg.org/mailman/listinfo/sheepdog-users > > > End of sheepdog-users Digest, Vol 14, Issue 40 > ********************************************** > > > -- > sheepdog-users mailing lists > sheepdog-users at lists.wpkg.org > http://lists.wpkg.org/mailman/listinfo/sheepdog-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20130702/d2a7143c/attachment-0001.html> |