[sheepdog-users] Single disk getting full

Valerio Pachera sirio81 at gmail.com
Thu Aug 8 02:23:11 CEST 2013


Auch! This time things went bad:
cluster has stopped.

But it seem not to be related to the recover of the "small" disk I
inserted back (/mnt/sheep/dsk01): it didn't get full.
It seems something about the command attr I run yesterday.

sheepdog001
/dev/mapper/vg00-sheepdsk01                             167G   70G
97G  43% /mnt/sheep/dsk01
/dev/sdb1                                               466G  367G
100G  79% /mnt/sheep/dsk02
/dev/sdc1                                               1,9T  1,2T
679G  64% /mnt/sheep/dsk03

What I find unusual is a call trace on node 'sheepdog001' (the other
nodes have no call trace).
I can see that from dmesg, /var/log/messages and /var/log/syslog
[note: 1][note: 4]

On all nodes I find corsync messages in /var/log/syslog, all at the
same time 21:00 [note: 2]
In sheepdog001 there's some extra message right after the call trace [note: 4]

I report also what's int the sheep.log file of the nodes [note: 3]

I checked the filesystem of /mnt/sheep/dsk01 device and seems fine [note: 5]

Checking the smart of the disk I notice a value that might means problems:
  183 Runtime_Bad_Block

I'm running a full test to check the disk status.

[note: 1]
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725454] attr: page
allocation failure: order:4, mode:0xd0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725468] Pid: 14171, comm:
attr Not tainted 3.2.0-4-amd64 #1 Debian 3.2.46-1
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725474] Call Trace:
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725491]
[<ffffffff810b88d7>] ? warn_alloc_failed+0x11a/0x12d
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725504]
[<ffffffff810c9c7c>] ? next_zone+0x25/0x27
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725515]
[<ffffffff810bb605>] ? __alloc_pages_nodemask+0x704/0x7aa
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725528]
[<ffffffff810eaa2d>] ? kmem_getpages+0x4c/0xd9
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725538]
[<ffffffff810ebdde>] ? fallback_alloc+0x13e/0x1e2
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725548]
[<ffffffff8111312c>] ? getxattr+0x84/0x126
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725555]
[<ffffffff810ec2e5>] ? __kmalloc+0xba/0x112
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725564]
[<ffffffff810364e8>] ? should_resched+0x5/0x23
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725573]
[<ffffffff8111312c>] ? getxattr+0x84/0x126
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725582]
[<ffffffff810ead50>] ? kmem_cache_free+0x2d/0x69
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725592]
[<ffffffff81105943>] ? user_path_at_empty+0x53/0x7b
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725603]
[<ffffffff811138d3>] ? sys_lgetxattr+0x3f/0x5c
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725612]
[<ffffffff8134f0f5>] ? page_fault+0x25/0x30
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725620]
[<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725626] Mem-Info:
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725630] Node 0 DMA per-cpu:
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725637] CPU    0: hi:
0, btch:   1 usd:   0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725643] CPU    1: hi:
0, btch:   1 usd:   0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725648] Node 0 DMA32 per-cpu:
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725655] CPU    0: hi:
186, btch:  31 usd:   0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725660] CPU    1: hi:
186, btch:  31 usd:   0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725665] Node 0 Normal per-cpu:
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725671] CPU    0: hi:
186, btch:  31 usd:   0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725676] CPU    1: hi:
186, btch:  31 usd:   0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725690] active_anon:12964
inactive_anon:12371 isolated_anon:0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725693]
active_file:169712 inactive_file:547532 isolated_file:0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725697]  unevictable:949
dirty:1378 writeback:0 unstable:0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725700]  free:45231
slab_reclaimable:147488 slab_unreclaimable:6686
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725703]  mapped:3632
shmem:1940 pagetables:663 bounce:0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725710] Node 0 DMA
free:15880kB min:264kB low:328kB high:396kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:20kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB
dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:8kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725739] lowmem_reserve[]:
0 3383 3888 3888
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725748] Node 0 DMA32
free:152592kB min:58576kB low:73220kB high:87864kB active_anon:45376kB
inactive_anon:19188kB active_file:604448kB inactive_file:2116456kB
unevictable:88kB isolated(anon):0kB isolated(file):0kB
present:3464992kB mlocked:88kB dirty:4388kB writeback:0kB
mapped:2364kB shmem:820kB slab_reclaimable:415320kB
slab_unreclaimable:7688kB kernel_stack:224kB pagetables:716kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725779] lowmem_reserve[]:
0 0 505 505
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725788] Node 0 Normal
free:12452kB min:8740kB low:10924kB high:13108kB active_anon:6480kB
inactive_anon:30296kB active_file:74400kB inactive_file:73652kB
unevictable:3708kB isolated(anon):0kB isolated(file):0kB
present:517120kB mlocked:3708kB dirty:1124kB writeback:0kB
mapped:12164kB shmem:6940kB slab_reclaimable:174624kB
slab_unreclaimable:19056kB kernel_stack:1208kB pagetables:1936kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725818] lowmem_reserve[]: 0 0 0 0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725826] Node 0 DMA: 2*4kB
2*8kB 1*16kB 3*32kB 4*64kB 3*128kB 3*256kB 2*512kB 3*1024kB 3*2048kB
1*4096kB = 15880kB
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725848] Node 0 DMA32:
25550*4kB 5749*8kB 11*16kB 4*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 2*2048kB 0*4096kB = 152592kB
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725869] Node 0 Normal:
2051*4kB 19*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 1*4096kB = 12452kB
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725891] 721148 total
pagecache pages
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725895] 1311 pages in swap cache
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725901] Swap cache stats:
add 18691, delete 17380, find 3363212/3363399
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725906] Free swap  = 7760204kB
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725910] Total swap = 7815584kB
Aug  7 15:34:05 sheepdog001 kernel: [3543736.785649] 1015792 pages RAM
Aug  7 15:34:05 sheepdog001 kernel: [3543736.785656] 32718 pages reserved
Aug  7 15:34:05 sheepdog001 kernel: [3543736.785664] 636786 pages shared
Aug  7 15:34:05 sheepdog001 kernel: [3543736.785671] 298954 pages non-shared
Aug  7 16:28:04 sheepdog001 rsyslogd-2177: imuxsock begins to drop
messages from pid 23853 due to rate-limiting

[note: 2]
Aug  7 21:00:07 sheepdog004 corosync[4365]:   [TOTEM ] Retransmit
List: 757f 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 758a 758b
758c 758d 758e 758f 7590 7591 7592
Aug  7 21:00:07 sheepdog004 corosync[4365]:   [TOTEM ] Retransmit
List: 757f 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 758a 758b
758c 758d 758e 758f 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599
759a 759b 759c
...

[note: 3]
sheepdog001
...
Aug 07 21:09:53 [main] recover_object_main(626) done:208768
count:414381, oid:c8d13200032c4f
Aug 07 21:09:53 [main] recover_object_main(626) done:208769
count:414381, oid:c8d13600033c77
Aug 07 21:09:54 [main] cdrv_cpg_confchg(602) PANIC: Network partition
is detected
Aug 07 21:09:54 [main] crash_handler(181) sheep exits unexpectedly (Aborted).
Aug 07 21:09:54 [main] sd_backtrace(834) sheep.c:183: crash_handler
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7ff842f2102f]
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7ff84252d474]
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17f) [0x7ff8425306ef]
Aug 07 21:09:54 [main] sd_backtrace(834) corosync.c:602: cdrv_cpg_confchg
Aug 07 21:09:54 [main] sd_backtrace(848)
/usr/lib/libcpg.so.4(cpg_dispatch+0x594) [0x7ff842a8cd74]
Aug 07 21:09:54 [main] sd_backtrace(834) corosync.c:744: corosync_handler
Aug 07 21:09:54 [main] sd_backtrace(834) event.c:209: do_event_loop
Aug 07 21:09:54 [main] sd_backtrace(834) sheep.c:795: main
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfc)
[0x7ff842519eac]
Aug 07 21:09:54 [main] sd_backtrace(848) sheep() [0x405498]
Aug 07 21:09:54 [main] __dump_stack_frames(744) cannot find gdb
Aug 07 21:09:54 [main] __sd_dump_variable(694) cannot find gdb
Aug 07 21:09:54 [main] crash_handler(487) sheep pid 23853 exited unexpectedly.

sheepdog002
...
Aug 07 08:17:17 [gway 26470] wait_forward_request(176) poll timeout 1,
disks of some nodes or network is busy. Going to poll-wait again
Aug 07 08:17:17 [gway 26476] wait_forward_request(176) poll timeout 1,
disks of some nodes or network is busy. Going to poll-wait again
Aug 07 21:09:54 [main] cdrv_cpg_confchg(602) PANIC: Network partition
is detected
Aug 07 21:09:54 [main] crash_handler(181) sheep exits unexpectedly (Aborted).
Aug 07 21:09:54 [main] sd_backtrace(834) sheep.c:183: crash_handler
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7fd1e9afd02f]
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7fd1e9109474]
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17f) [0x7fd1e910c6ef]
Aug 07 21:09:54 [main] sd_backtrace(834) corosync.c:602: cdrv_cpg_confchg
Aug 07 21:09:54 [main] sd_backtrace(848)
/usr/lib/libcpg.so.4(cpg_dispatch+0x594) [0x7fd1e9668d74]
Aug 07 21:09:54 [main] sd_backtrace(834) corosync.c:744: corosync_handler
Aug 07 21:09:54 [main] sd_backtrace(834) event.c:209: do_event_loop
Aug 07 21:09:54 [main] sd_backtrace(834) sheep.c:795: main
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfc)
[0x7fd1e90f5eac]
Aug 07 21:09:54 [main] sd_backtrace(848) sheep() [0x405498]
Aug 07 21:09:54 [main] __dump_stack_frames(744) cannot find gdb
Aug 07 21:09:54 [main] __sd_dump_variable(694) cannot find gdb
Aug 07 21:09:54 [main] crash_handler(487) sheep pid 20833 exited unexpectedly.

sheepdog003 sheepdog003
  seep.log is empty.

[note:4]
sheepdog001 syslog
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725454] attr: page
allocation failure: order:4, mode:0xd0
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725468] Pid: 14171, comm:
attr Not tainted 3.2.0-4-amd64 #1 Debian 3.2.46-1
Aug  7 15:34:05 sheepdog001 kernel: [3543736.725474] Call Trace:
...
Aug  7 16:28:04 sheepdog001 sheep: enqueue: log area overrun, dropping message
Aug  7 16:28:04 sheepdog001 sheep: enqueue: log area overrun, dropping message
Aug  7 16:28:04 sheepdog001 sheep: enqueue: log area overrun, dropping message
...
Aug  7 16:28:04 sheepdog001 sheep: enqueue: log area overrun, dropping message
Aug  7 16:28:04 sheepdog001 rsyslogd-2177: imuxsock begins to drop
messages from pid 23853 due to rate-limiting
Aug  7 21:00:07 sheepdog001 corosync[23846]:   [TOTEM ] Retransmit
List: 757f 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 758a 758b
758c 758d 758e 758f 7590 7591 7592
Aug  7 21:00:07 sheepdog001 corosync[23846]:   [TOTEM ] Retransmit
List: 757f 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 758a 758b
758c 758d 758e 758f 7590 7591 7592 7593 7594 7595 7596
 7597 7598 7599 759a 759b 759c

[note: 5]
root at sheepdog001:~# xfs_repair -n /dev/mapper/vg00-sheepdsk01
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.



More information about the sheepdog-users mailing list