Dears, I installed corosync(1.4.6)+sheepdog(0.6.0) on two Centos6, with the following configuration of corosync.conf ----------------------------------------- compatibility: whitetank totem { version: 2 secauth: off threads: 0 interface { ringnumber: 0 bindnetaddr: 10.86.213.251 (252 is another) mcastaddr: 226.94.1.1 mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } -------------------------------------------- When I start sheepdog service by "sheep /var/lib/sheep", it seems two nodes are not connected since I can see only one node in "collie node list", M Id Host:Port V-Nodes Zone - 0 10.86.213.251:7000 64 -69904886 Iptables has been disabled but the problem remains. Will somebody help me on that? Best Regards, George Y. Hu Send sheepdog-users mailing list submissions to sheepdog-users at lists.wpkg.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.wpkg.org/mailman/listinfo/sheepdog-users or, via email, send a message with subject or body 'help' to sheepdog-users-request at lists.wpkg.org You can reach the person managing the list at sheepdog-users-owner at lists.wpkg.org When replying, please edit your Subject line so it is more specific than "Re: Contents of sheepdog-users digest..." Today's Topics: 1. Re: Problem with snapshots made with qemu-img (Liu Yuan) 2. Crash khugepaged (Valerio Pachera) 3. Re: Crash khugepaged (Valerio Pachera) 4. Re: cluster format during recovery (MORITA Kazutaka) ---------------------------------------------------------------------- Message: 1 Date: Fri, 28 Jun 2013 18:02:59 +0800 From: Liu Yuan <namei.unix at gmail.com> To: "Ing. Luca Lazzeroni - Trend Servizi Srl" <luca at gvnet.it> Cc: "sheepdog-users at lists.wpkg.org" <sheepdog-users at lists.wpkg.org> Subject: Re: [sheepdog-users] Problem with snapshots made with qemu-img Message-ID: <20130628100259.GC13194 at ubuntu-precise> Content-Type: text/plain; charset=utf-8 On Fri, Jun 28, 2013 at 10:07:47AM +0200, Ing. Luca Lazzeroni - Trend Servizi Srl wrote: > Hi, > if I make a snapshot of a running VM using: > > qemu-img snapshot -c Pippo Pluto.raw > > snapshot is created on all nodes, but its tag is updated on all nodes except the one running the VM. > On other nodes I can see, via "collie vdi list" the snapshot tag updated correctly, but on the node running the VM I see 2 VDI with the same name, different ID and empty Tag. Seems that recent qemu-img need fixes, we didn't test snapshot with qemu-img with our functonal tests. We should though. > > If I create the snapshot via "collie vdi snapshot", everything works fine and tag is propagated to all nodes; but I don't know if creating a snapshot with collie of a running VM with writeback cache enabled is a good idea in terms of data integrity? No problem, snapshot operation will 1 flush the cache first 2 mark the vdi as readonly If there is, it is a bug that should be fixed. Thanks, Yuan ------------------------------ Message: 2 Date: Fri, 28 Jun 2013 17:40:26 +0200 From: Valerio Pachera <sirio81 at gmail.com> To: Lista sheepdog user <sheepdog-users at lists.wpkg.org> Subject: [sheepdog-users] Crash khugepaged Message-ID: <CAHS0cb-KqoS6wWt_gT+bSQ56KS7Z5iA4yOSpX5zQsoGPX0WV=Q at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 What do you think about this? Jun 28 16:34:20 sheepdog004 kernel: [103658.606691] khugepaged D ffff88021f393780 0 32 2 0x00000000 Jun 28 16:34:20 sheepdog004 kernel: [103658.606696] ffff880213793750 0000000000000046 ffffffff00000000 ffff880216566f60 Jun 28 16:34:20 sheepdog004 kernel: [103658.606701] 0000000000013780 ffff880213795fd8 ffff880213795fd8 ffff880213793750 Jun 28 16:34:20 sheepdog004 kernel: [103658.606713] ffff880213795730 0000000113795730 ffff88021657fe50 ffff88021f393fd0 Jun 28 16:34:20 sheepdog004 kernel: [103658.606718] Call Trace: Jun 28 16:34:20 sheepdog004 kernel: [103658.606727] [<ffffffff810b47b3>] ? lock_page+0x20/0x20 Jun 28 16:34:20 sheepdog004 kernel: [103658.606732] [<ffffffff8134da71>] ? io_schedule+0x59/0x71 Jun 28 16:34:20 sheepdog004 kernel: [103658.606737] [<ffffffff810b47b9>] ? sleep_on_page+0x6/0xa Jun 28 16:34:20 sheepdog004 kernel: [103658.606740] [<ffffffff8134deb4>] ? __wait_on_bit+0x3e/0x71 Jun 28 16:34:20 sheepdog004 kernel: [103658.606744] [<ffffffff810b48f5>] ? wait_on_page_bit+0x6e/0x73 Jun 28 16:34:20 sheepdog004 kernel: [103658.606751] [<ffffffff8105fb09>] ? autoremove_wake_function+0x2a/0x2a Jun 28 16:34:20 sheepdog004 kernel: [103658.606756] [<ffffffff810c2850>] ? shrink_page_list+0x166/0x73f Jun 28 16:34:20 sheepdog004 kernel: [103658.606761] [<ffffffff810c9cfa>] ? zone_page_state_add+0x14/0x23 Jun 28 16:34:20 sheepdog004 kernel: [103658.606765] [<ffffffff810c0e13>] ? update_isolated_counts+0x13b/0x15a Jun 28 16:34:20 sheepdog004 kernel: [103658.606769] [<ffffffff810c32c4>] ? shrink_inactive_list+0x2cd/0x3f0 Jun 28 16:34:20 sheepdog004 kernel: [103658.606774] [<ffffffff810be232>] ? __lru_cache_add+0x2b/0x51 Jun 28 16:34:20 sheepdog004 kernel: [103658.606778] [<ffffffff810c3a89>] ? shrink_zone+0x3c0/0x4e6 Jun 28 16:34:20 sheepdog004 kernel: [103658.606783] [<ffffffff810c3fa7>] ? do_try_to_free_pages+0x1cc/0x41c Jun 28 16:34:20 sheepdog004 kernel: [103658.606787] [<ffffffff810c4462>] ? try_to_free_pages+0xa9/0xe9 Jun 28 16:34:20 sheepdog004 kernel: [103658.606791] [<ffffffff810364e8>] ? should_resched+0x5/0x23 Jun 28 16:34:20 sheepdog004 kernel: [103658.606796] [<ffffffff810bb3ee>] ? __alloc_pages_nodemask+0x4ed/0x7aa Jun 28 16:34:20 sheepdog004 kernel: [103658.606801] [<ffffffff8100d69f>] ? __switch_to+0x133/0x258 Jun 28 16:34:20 sheepdog004 kernel: [103658.606806] [<ffffffff8134eb77>] ? _raw_spin_unlock_irqrestore+0xe/0xf Jun 28 16:34:20 sheepdog004 kernel: [103658.606811] [<ffffffff810e5f05>] ? alloc_pages_vma+0x12d/0x136 Jun 28 16:34:20 sheepdog004 kernel: [103658.606815] [<ffffffff810ce1c5>] ? pte_pfn+0x5/0xe Jun 28 16:34:20 sheepdog004 kernel: [103658.606819] [<ffffffff810ef9bd>] ? khugepaged+0x4dc/0xef3 Jun 28 16:34:20 sheepdog004 kernel: [103658.606823] [<ffffffff8100d69f>] ? __switch_to+0x133/0x258 Jun 28 16:34:20 sheepdog004 kernel: [103658.606828] [<ffffffff8105fadf>] ? add_wait_queue+0x3c/0x3c Jun 28 16:34:20 sheepdog004 kernel: [103658.606833] [<ffffffff810ef4e1>] ? add_mm_counter.constprop.28+0x9/0x9 Jun 28 16:34:20 sheepdog004 kernel: [103658.606837] [<ffffffff8105f48d>] ? kthread+0x76/0x7e Jun 28 16:34:20 sheepdog004 kernel: [103658.606842] [<ffffffff81355cb4>] ? kernel_thread_helper+0x4/0x10 Jun 28 16:34:20 sheepdog004 kernel: [103658.606847] [<ffffffff8105f417>] ? kthread_worker_fn+0x139/0x139 Jun 28 16:34:20 sheepdog004 kernel: [103658.606851] [<ffffffff81355cb0>] ? gs_change+0x13/0x13 Jun 28 16:34:20 sheepdog004 kernel: [103658.606983] sheep D ffff88021f393780 0 30859 1 0x00000000 Jun 28 16:34:20 sheepdog004 kernel: [103658.606987] ffff880101d48730 0000000000000082 0000000000000000 ffff880216566f60 Jun 28 16:34:20 sheepdog004 kernel: [103658.606992] 0000000000013780 ffff8802141dffd8 ffff8802141dffd8 ffff880101d48730 Jun 28 16:34:20 sheepdog004 kernel: [103658.606997] ffffea00048c4b20 0000000105019098 ffffea0004fdaaa8 ffff880214677be0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607001] Call Trace: Jun 28 16:34:20 sheepdog004 kernel: [103658.607005] [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 Jun 28 16:34:20 sheepdog004 kernel: [103658.607011] [<ffffffff811b3af3>] ? call_rwsem_down_write_failed+0x13/0x20 Jun 28 16:34:20 sheepdog004 kernel: [103658.607015] [<ffffffff8134e431>] ? down_write+0x25/0x27 Jun 28 16:34:20 sheepdog004 kernel: [103658.607019] [<ffffffff810d543d>] ? sys_munmap+0x2e/0x52 Jun 28 16:34:20 sheepdog004 kernel: [103658.607023] [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b Jun 28 16:34:20 sheepdog004 kernel: [103658.607135] tar D ffff88021f293780 0 14370 13938 0x00000000 Jun 28 16:34:20 sheepdog004 kernel: [103658.607139] ffff88021472ae60 0000000000000086 ffffffff00000000 ffff8802165160c0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607144] 0000000000013780 ffff880128b77fd8 ffff880128b77fd8 ffff88021472ae60 Jun 28 16:34:20 sheepdog004 kernel: [103658.607148] ffffffff8101360a 00000001810660a1 ffff880213ff3f30 ffff88021f293fd0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607001] Call Trace: Jun 28 16:34:20 sheepdog004 kernel: [103658.607005] [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 Jun 28 16:34:20 sheepdog004 kernel: [103658.607011] [<ffffffff811b3af3>] ? call_rwsem_down_write_failed+0x13/0x20 Jun 28 16:34:20 sheepdog004 kernel: [103658.607015] [<ffffffff8134e431>] ? down_write+0x25/0x27 Jun 28 16:34:20 sheepdog004 kernel: [103658.607019] [<ffffffff810d543d>] ? sys_munmap+0x2e/0x52 Jun 28 16:34:20 sheepdog004 kernel: [103658.607023] [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b Jun 28 16:34:20 sheepdog004 kernel: [103658.607135] tar D ffff88021f293780 0 14370 13938 0x00000000 Jun 28 16:34:20 sheepdog004 kernel: [103658.607139] ffff88021472ae60 0000000000000086 ffffffff00000000 ffff8802165160c0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607144] 0000000000013780 ffff880128b77fd8 ffff880128b77fd8 ffff88021472ae60 Jun 28 16:34:20 sheepdog004 kernel: [103658.607148] ffffffff8101360a 00000001810660a1 ffff880213ff3f30 ffff88021f293fd0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607153] Call Trace: Jun 28 16:34:20 sheepdog004 kernel: [103658.607157] [<ffffffff8101360a>] ? read_tsc+0x5/0x14 Jun 28 16:34:20 sheepdog004 kernel: [103658.607161] [<ffffffff810b47b3>] ? lock_page+0x20/0x20 Jun 28 16:34:20 sheepdog004 kernel: [103658.607165] [<ffffffff8134da71>] ? io_schedule+0x59/0x71 Jun 28 16:34:20 sheepdog004 kernel: [103658.607169] [<ffffffff810b47b9>] ? sleep_on_page+0x6/0xa Jun 28 16:34:20 sheepdog004 kernel: [103658.607172] [<ffffffff8134deb4>] ? __wait_on_bit+0x3e/0x71 Jun 28 16:34:20 sheepdog004 kernel: [103658.607176] [<ffffffff810b48f5>] ? wait_on_page_bit+0x6e/0x73 Jun 28 16:34:20 sheepdog004 kernel: [103658.607181] [<ffffffff8105fb09>] ? autoremove_wake_function+0x2a/0x2a Jun 28 16:34:20 sheepdog004 kernel: [103658.607186] [<ffffffff810b49cd>] ? filemap_fdatawait_range+0x74/0x139 Jun 28 16:34:20 sheepdog004 kernel: [103658.607191] [<ffffffff810b6181>] ? filemap_write_and_wait+0x24/0x30 Jun 28 16:34:20 sheepdog004 kernel: [103658.607205] [<ffffffffa053ac73>] ? nfs_getattr+0x32/0xac [nfs] Jun 28 16:34:20 sheepdog004 kernel: [103658.607211] [<ffffffff810fda17>] ? vfs_fstat+0x30/0x4e Jun 28 16:34:20 sheepdog004 kernel: [103658.607214] [<ffffffff810fdb49>] ? sys_newfstat+0x12/0x2b Jun 28 16:34:20 sheepdog004 kernel: [103658.607218] [<ffffffff810fa376>] ? vfs_write+0xbb/0xe9 Jun 28 16:34:20 sheepdog004 kernel: [103658.607221] [<ffffffff810fa554>] ? sys_write+0x5f/0x6b Jun 28 16:34:20 sheepdog004 kernel: [103658.607225] [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b Jun 28 16:34:20 sheepdog004 kernel: [103658.607335] pgrep D ffff88021f293780 0 30870 30869 0x00000000 Jun 28 16:34:20 sheepdog004 kernel: [103658.607339] ffff880133e7c730 0000000000000086 0000000100000000 ffff8802165160c0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607344] 0000000000013780 ffff8801340f5fd8 ffff8801340f5fd8 ffff880133e7c730 Jun 28 16:34:20 sheepdog004 kernel: [103658.607349] 0000000000000020 000000011f5fcc08 0000000000000002 ffff880214677be0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607353] Call Trace: Jun 28 16:34:20 sheepdog004 kernel: [103658.607357] [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 Jun 28 16:34:20 sheepdog004 kernel: [103658.607361] [<ffffffff811b3ac4>] ? call_rwsem_down_read_failed+0x14/0x30 Jun 28 16:34:20 sheepdog004 kernel: [103658.607365] [<ffffffff8134e44a>] ? down_read+0x17/0x19 Jun 28 16:34:20 sheepdog004 kernel: [103658.607369] [<ffffffff810d1a94>] ? __access_remote_vm+0x3a/0x1c1 Jun 28 16:34:20 sheepdog004 kernel: [103658.607374] [<ffffffff810d2acb>] ? access_process_vm+0x48/0x65 Jun 28 16:34:20 sheepdog004 kernel: [103658.607378] [<ffffffff81140852>] ? proc_pid_cmdline+0x63/0xf0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607382] [<ffffffff81141a58>] ? proc_info_read+0x5b/0xb8 Jun 28 16:34:20 sheepdog004 kernel: [103658.607386] [<ffffffff810fa443>] ? vfs_read+0x9f/0xe6 Jun 28 16:34:20 sheepdog004 kernel: [103658.607390] [<ffffffff810fa4cf>] ? sys_read+0x45/0x6b Jun 28 16:34:20 sheepdog004 kernel: [103658.607393] [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b Jun 28 16:34:20 sheepdog004 kernel: [103658.607503] pgrep D ffff88021f293780 0 30926 30925 0x00000000 Jun 28 16:34:20 sheepdog004 kernel: [103658.607507] ffff880212ed2e20 0000000000000086 0000000100000000 ffff8802165160c0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607512] 0000000000013780 ffff880132047fd8 ffff880132047fd8 ffff880212ed2e20 Jun 28 16:34:20 sheepdog004 kernel: [103658.607517] 0000000000000020 000000011f5fcc08 0000000000000002 ffff880214677be0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607521] Call Trace: Jun 28 16:34:20 sheepdog004 kernel: [103658.607525] [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 Jun 28 16:34:20 sheepdog004 kernel: [103658.607529] [<ffffffff811b3ac4>] ? call_rwsem_down_read_failed+0x14/0x30 Jun 28 16:34:20 sheepdog004 kernel: [103658.607533] [<ffffffff8134e44a>] ? down_read+0x17/0x19 Jun 28 16:34:20 sheepdog004 kernel: [103658.607537] [<ffffffff810d1a94>] ? __access_remote_vm+0x3a/0x1c1 Jun 28 16:34:20 sheepdog004 kernel: [103658.607541] [<ffffffff810d2acb>] ? access_process_vm+0x48/0x65 Jun 28 16:34:20 sheepdog004 kernel: [103658.607545] [<ffffffff81140852>] ? proc_pid_cmdline+0x63/0xf0 Jun 28 16:34:20 sheepdog004 kernel: [103658.607548] [<ffffffff81141a58>] ? proc_info_read+0x5b/0xb8 Jun 28 16:34:20 sheepdog004 kernel: [103658.607552] [<ffffffff810fa443>] ? vfs_read+0x9f/0xe6 Jun 28 16:34:20 sheepdog004 kernel: [103658.607556] [<ffffffff810fa4cf>] ? sys_read+0x45/0x6b Jun 28 16:34:20 sheepdog004 kernel: [103658.607559] [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b Jun 28 16:36:20 sheepdog004 kernel: [103778.581543] Call Trace: Jun 28 16:36:20 sheepdog004 kernel: [103778.581552] [<ffffffff810b47b3>] ? lock_page+0x20/0x20 Jun 28 16:36:20 sheepdog004 kernel: [103778.581557] [<ffffffff8134da71>] ? io_schedule+0x59/0x71 Jun 28 16:36:20 sheepdog004 kernel: [103778.581561] [<ffffffff810b47b9>] ? sleep_on_page+0x6/0xa Jun 28 16:36:20 sheepdog004 kernel: [103778.581565] [<ffffffff8134deb4>] ? __wait_on_bit+0x3e/0x71 Jun 28 16:36:20 sheepdog004 kernel: [103778.581569] [<ffffffff810b48f5>] ? wait_on_page_bit+0x6e/0x73 Jun 28 16:36:20 sheepdog004 kernel: [103778.581575] [<ffffffff8105fb09>] ? autoremove_wake_function+0x2a/0x2a Jun 28 16:36:20 sheepdog004 kernel: [103778.581581] [<ffffffff810c2850>] ? shrink_page_list+0x166/0x73f Jun 28 16:36:20 sheepdog004 kernel: [103778.581586] [<ffffffff810c9cfa>] ? zone_page_state_add+0x14/0x23 Jun 28 16:36:20 sheepdog004 kernel: [103778.581591] [<ffffffff810c0e13>] ? update_isolated_counts+0x13b/0x15a Jun 28 16:36:20 sheepdog004 kernel: [103778.581595] [<ffffffff810c32c4>] ? shrink_inactive_list+0x2cd/0x3f0 Jun 28 16:36:20 sheepdog004 kernel: [103778.581600] [<ffffffff810be232>] ? __lru_cache_add+0x2b/0x51 Jun 28 16:36:20 sheepdog004 kernel: [103778.581604] [<ffffffff810c3a89>] ? shrink_zone+0x3c0/0x4e6 Jun 28 16:36:20 sheepdog004 kernel: [103778.581608] [<ffffffff810c3fa7>] ? do_try_to_free_pages+0x1cc/0x41c Jun 28 16:36:20 sheepdog004 kernel: [103778.581612] [<ffffffff810c4462>] ? try_to_free_pages+0xa9/0xe9 Jun 28 16:36:20 sheepdog004 kernel: [103778.581616] [<ffffffff810364e8>] ? should_resched+0x5/0x23 Jun 28 16:36:20 sheepdog004 kernel: [103778.581621] [<ffffffff810bb3ee>] ? __alloc_pages_nodemask+0x4ed/0x7aa Jun 28 16:36:20 sheepdog004 kernel: [103778.581626] [<ffffffff8100d69f>] ? __switch_to+0x133/0x258 Jun 28 16:36:20 sheepdog004 kernel: [103778.581631] [<ffffffff8134eb77>] ? _raw_spin_unlock_irqrestore+0xe/0xf Jun 28 16:36:20 sheepdog004 kernel: [103778.581636] [<ffffffff810e5f05>] ? alloc_pages_vma+0x12d/0x136 Jun 28 16:36:20 sheepdog004 kernel: [103778.581640] [<ffffffff810ce1c5>] ? pte_pfn+0x5/0xe Jun 28 16:36:20 sheepdog004 kernel: [103778.581645] [<ffffffff810ef9bd>] ? khugepaged+0x4dc/0xef3 Jun 28 16:36:20 sheepdog004 kernel: [103778.581649] [<ffffffff8100d69f>] ? __switch_to+0x133/0x258 Jun 28 16:36:20 sheepdog004 kernel: [103778.581654] [<ffffffff8105fadf>] ? add_wait_queue+0x3c/0x3c Jun 28 16:36:20 sheepdog004 kernel: [103778.581658] [<ffffffff810ef4e1>] ? add_mm_counter.constprop.28+0x9/0x9 Jun 28 16:36:20 sheepdog004 kernel: [103778.581662] [<ffffffff8105f48d>] ? kthread+0x76/0x7e Jun 28 16:36:20 sheepdog004 kernel: [103778.581667] [<ffffffff81355cb4>] ? kernel_thread_helper+0x4/0x10 Jun 28 16:36:20 sheepdog004 kernel: [103778.581671] [<ffffffff8105f417>] ? kthread_worker_fn+0x139/0x139 Jun 28 16:36:20 sheepdog004 kernel: [103778.581675] [<ffffffff81355cb0>] ? gs_change+0x13/0x13 Jun 28 16:36:20 sheepdog004 kernel: [103778.581808] sheep D ffff88021f393780 0 30859 1 0x00000000 Jun 28 16:36:20 sheepdog004 kernel: [103778.581813] ffff880101d48730 0000000000000082 0000000000000000 ffff880216566f60 Jun 28 16:36:20 sheepdog004 kernel: [103778.581817] 0000000000013780 ffff8802141dffd8 ffff8802141dffd8 ffff880101d48730 Jun 28 16:36:20 sheepdog004 kernel: [103778.581822] ffffea00048c4b20 0000000105019098 ffffea0004fdaaa8 ffff880214677be0 Jun 28 16:36:20 sheepdog004 kernel: [103778.581827] Call Trace: Jun 28 16:36:20 sheepdog004 kernel: [103778.581831] [<ffffffff8134eac4>] ? rwsem_down_failed_common+0xe0/0x114 Jun 28 16:36:20 sheepdog004 kernel: [103778.581842] [<ffffffff811b3af3>] ? call_rwsem_down_write_failed+0x13/0x20 Jun 28 16:36:20 sheepdog004 kernel: [103778.581846] [<ffffffff8134e431>] ? down_write+0x25/0x27 Jun 28 16:36:20 sheepdog004 kernel: [103778.581850] [<ffffffff810d543d>] ? sys_munmap+0x2e/0x52 Jun 28 16:36:20 sheepdog004 kernel: [103778.581854] [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b Jun 28 16:36:20 sheepdog004 kernel: [103778.581969] tar D ffff88021f293780 0 14370 13938 0x00000000 Jun 28 16:36:20 sheepdog004 kernel: [103778.581974] ffff88021472ae60 0000000000000086 ffffffff00000000 ffff8802165160c0 Jun 28 16:36:20 sheepdog004 kernel: [103778.581978] 0000000000013780 ffff880128b77fd8 ffff880128b77fd8 ffff88021472ae60 Jun 28 16:36:20 sheepdog004 kernel: [103778.581983] ffffffff8101360a 00000001810660a1 ffff880213ff3f30 ffff88021f293fd0 Jun 28 16:36:20 sheepdog004 kernel: [103778.581988] Call Trace: Jun 28 16:36:20 sheepdog004 kernel: [103778.581992] [<ffffffff8101360a>] ? read_tsc+0x5/0x14 Jun 28 16:36:20 sheepdog004 kernel: [103778.581996] [<ffffffff810b47b3>] ? lock_page+0x20/0x20 Jun 28 16:36:20 sheepdog004 kernel: [103778.581999] [<ffffffff8134da71>] ? io_schedule+0x59/0x71 Jun 28 16:36:20 sheepdog004 kernel: [103778.582003] [<ffffffff810b47b9>] ? sleep_on_page+0x6/0xa .... Host with 8G of ram. The host was exporting also a nfs folder. Guest was mounting this folder. Guest for decompressing a big tar.gz (77G). ------------------------------ Message: 3 Date: Fri, 28 Jun 2013 18:02:37 +0200 From: Valerio Pachera <sirio81 at gmail.com> To: Lista sheepdog user <sheepdog-users at lists.wpkg.org> Subject: Re: [sheepdog-users] Crash khugepaged Message-ID: <CAHS0cb8TogSOD2pGE+TsScm+o=g1kEXGdUNoFWWF-xYoVgfwog at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 2013/6/28 Valerio Pachera <sirio81 at gmail.com>: > What do you think about this? The crash was host side. It was difficult to interact with the host because pgrep, atop, pa aux, were freezing. 'top' and 'kill' were working. I had to kill -9 the guests. I've been able to reboot the host (and first shutdown the cluster). Collie node list was showing the host still inside the cluster. I wonder if the crash may be related to excessive network traffic on the nic, or it's related to the use of transparent huge pages. I set back the default value (madvide) but I'm not going to repeat the decompression via nfs today. ------------------------------ Message: 4 Date: Sat, 29 Jun 2013 12:50:06 +0900 From: MORITA Kazutaka <morita.kazutaka at gmail.com> To: Valerio Pachera <sirio81 at gmail.com> Cc: Lista sheepdog user <sheepdog-users at lists.wpkg.org> Subject: Re: [sheepdog-users] cluster format during recovery Message-ID: <m27ghd60ep.wl%morita.kazutaka at gmail.com> Content-Type: text/plain; charset=US-ASCII At Thu, 27 Jun 2013 15:45:36 +0200, Valerio Pachera wrote: > > This is an unusual thing. > It's useful for testing purpose only: > > What happens if cluster format is run during a recovery? Probably, the recovery process will print a lot of error messages after cluster format since it cannot find any objects to be recovered. Thanks, Kazutaka ------------------------------ _______________________________________________ sheepdog-users mailing list sheepdog-users at lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog-users End of sheepdog-users Digest, Vol 14, Issue 40 ********************************************** |