[sheepdog-users] Failed to read object 80244e5600000000 No object found

Struan Bartlett struan.bartlett at NewsNow.co.uk
Sun May 4 21:11:02 CEST 2014


Having been running sheepdog-0.8.0 successfully for a number of weeks,
earlier last month I suddenly found that my cluster would not longer
launch. After reattempting launch this evening, I finally got the
cluster launched but now to make matters worse it now looks like
sheepdog has deleted all underlying objects! Here is some data:

A. Before start-up today 'ls -l /var/lib/sheepdog/obj | wc -l' returned
the following on the three nodes that were running sheep:

server1
1
server2
4453
server3
4453

B. After start-up of sheep on each of the three nodes, the same command
returns only '1' on each server! I guess this means my vdis are no more!

C. dog vdi list outputs the following on any of the three nodes:

Name        Id    Size    Used  Shared    Creation time   VDI id 
Copies  Tag
Failed to read object 80244e5600000000 No object found
Failed to read inode header
Failed to read object 802b5c3a00000000 No object found
Failed to read inode header
Failed to read object 802b5c3b00000000 No object found
Failed to read inode header
Failed to read object 802b5c3c00000000 No object found
Failed to read inode header
Failed to read object 80cde59c00000000 No object found
Failed to read inode header
Failed to read object 80cde59d00000000 No object found
Failed to read inode header
Failed to read object 80cde59e00000000 No object found
Failed to read inode header
Failed to read object 80cde59f00000000 No object found
Failed to read inode header
Failed to read object 80cde5a000000000 No object found
Failed to read inode header
Failed to read object 80cde5a100000000 No object found
Failed to read inode header
Failed to read object 80cde5a200000000 No object found
Failed to read inode header
Failed to read object 80cde5a300000000 No object found
Failed to read inode header
Failed to read object 80cde5a400000000 No object found
Failed to read inode header
Failed to read object 80cde5a500000000 No object found
Failed to read inode header
Failed to read object 80d8c70600000000 No object found
Failed to read inode header
Failed to read object 80ddce9a00000000 No object found
Failed to read inode header

Here is a grep for the first object ID on each of the three nodes
running sheep:

1.
Apr 09 16:46:14  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 16:46:14  DEBUG [main] err_to_sderr(100) object 80244e5600000000
not found locally
Apr 09 16:46:54  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 16:46:54  DEBUG [main] err_to_sderr(100) object 80244e5600000000
not found locally
May 04 19:38:50  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
May 04 19:38:50  DEBUG [main] err_to_sderr(100) object 80244e5600000000
not found locally
May 04 19:39:13  DEBUG [main] prepare_schedule_oid(566) 80244e5600000000
nr_prio_oids 1
May 04 19:39:13  DEBUG [main] request_in_recovery(195) 80244e5600000000
wait on oid
May 04 19:39:13  DEBUG [rw] default_get_hash(646) the message digest of
80244e5600000000 at epoch 12 is f5a828c6a3c5b6dc99520d31fcbc3fd76f080a34
May 04 19:39:13  DEBUG [rw] recover_replication_object(369) try recover
object 80244e5600000000 from epoch 16
May 04 19:39:13  DEBUG [rw] recover_replication_object(369) try recover
object 80244e5600000000 from epoch 15
May 04 19:39:13  DEBUG [main] wakeup_requests_on_oid(250) retry
80244e5600000000
May 04 19:39:13  DEBUG [main] request_in_recovery(195) 80244e5600000000
wait on oid
May 04 19:39:13   INFO [main] recover_object_main(855) object
80244e5600000000 is recovered (322/2745)
May 04 19:39:41  DEBUG [main] oid_in_recovery(596) 80244e5600000000 has
been already recovered
May 04 19:39:41  DEBUG [io 12960] do_process_work(1393) a4,
80244e5600000000, 17
May 04 19:39:41  DEBUG [io 12960] err_to_sderr(100) object
80244e5600000000 not found locally
May 04 19:39:41  DEBUG [io 12960] do_process_work(1400) failed: a4,
80244e5600000000 , 17, No object found
May 04 19:39:55  DEBUG [main] oid_in_recovery(596) 80244e5600000000 has
been already recovered
May 04 19:39:55  DEBUG [io 12950] do_process_work(1393) a4,
80244e5600000000, 17
May 04 19:39:55  DEBUG [io 12950] err_to_sderr(100) object
80244e5600000000 not found locally
May 04 19:39:55  DEBUG [io 12950] do_process_work(1400) failed: a4,
80244e5600000000 , 17, No object found
May 04 19:40:04  DEBUG [main] oid_in_recovery(596) 80244e5600000000 has
been already recovered
May 04 19:40:04  DEBUG [io 12960] do_process_work(1393) a4,
80244e5600000000, 17
May 04 19:40:04  DEBUG [io 12960] err_to_sderr(100) object
80244e5600000000 not found locally
May 04 19:40:04  DEBUG [io 12960] do_process_work(1400) failed: a4,
80244e5600000000 , 17, No object found

2.
Mar 31 22:14:45   INFO [main] recover_object_main(856) object
80244e5600000000 is recovered (2868/4370)
May 04 19:39:02   INFO [main] recover_object_main(856) object
80244e5600000000 is recovered (181/3034)
May 04 19:40:04  ERROR [gway 11266] gateway_replication_read(294) local
read 80244e5600000000 failed, No object found
May 04 19:41:59  ERROR [gway 12177] gateway_replication_read(294) local
read 80244e5600000000 failed, No object found
May 04 19:43:43  ERROR [gway 12177] gateway_replication_read(294) local
read 80244e5600000000 failed, No object found
May 04 19:43:58  ERROR [gway 12177] gateway_replication_read(294) local
read 80244e5600000000 failed, No object found
May 04 19:45:15  ERROR [gway 12177] gateway_replication_read(294) local
read 80244e5600000000 failed, No object found

3
Mar 31 22:14:04   INFO [main] recover_object_main(855) object
80244e5600000000 is recovered (2868/4370)
Apr 09 12:23:11   INFO [main] recover_object_main(855) object
80244e5600000000 is recovered (2923/4452)
Apr 09 16:45:32  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 16:50:46  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 16:58:26  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 17:02:34  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 17:03:17  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 17:03:34  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 17:03:47  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 17:03:48  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
Apr 09 17:07:34  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
May 04 19:35:00  DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
VDI object 80244e5600000000
May 04 19:35:00  DEBUG [main] move_object_to_stale_dir(507) moved object
80244e5600000000
May 04 19:39:02  DEBUG [io 21128] do_process_work(1393) b4,
80244e5600000000, 17
May 04 19:39:02  DEBUG [io 21128] do_process_work(1400) failed: b4,
80244e5600000000 , 17, No object found
May 04 19:39:13  DEBUG [io 21128] do_process_work(1393) b4,
80244e5600000000, 17
May 04 19:39:13  DEBUG [io 21128] do_process_work(1400) failed: b4,
80244e5600000000 , 17, No object found
May 04 19:39:41  DEBUG [gway 21110] do_process_work(1393) 2,
80244e5600000000, 17

Can anyone explain what has happened, and why sheepdog has just now
deleted all the objects associated with my cluster, I assume rendering
it completely unrecoverable? Please let me know if there are other
investigations I should perform.

Thank you!

Struan Bartlett





More information about the sheepdog-users mailing list