[sheepdog-users] Failed to read object 80244e5600000000 No object found
Hitoshi Mitake
mitake.hitoshi at gmail.com
Tue May 6 17:39:07 CEST 2014
Hi Struan,
At Sun, 04 May 2014 20:11:02 +0100,
Struan Bartlett wrote:
>
>
> Having been running sheepdog-0.8.0 successfully for a number of weeks,
> earlier last month I suddenly found that my cluster would not longer
> launch. After reattempting launch this evening, I finally got the
> cluster launched but now to make matters worse it now looks like
> sheepdog has deleted all underlying objects! Here is some data:
>
> A. Before start-up today 'ls -l /var/lib/sheepdog/obj | wc -l' returned
> the following on the three nodes that were running sheep:
>
> server1
> 1
> server2
> 4453
> server3
> 4453
>
> B. After start-up of sheep on each of the three nodes, the same command
> returns only '1' on each server! I guess this means my vdis are no more!
>
> C. dog vdi list outputs the following on any of the three nodes:
>
> Name Id Size Used Shared Creation time VDI id
> Copies Tag
> Failed to read object 80244e5600000000 No object found
> Failed to read inode header
> Failed to read object 802b5c3a00000000 No object found
> Failed to read inode header
> Failed to read object 802b5c3b00000000 No object found
> Failed to read inode header
> Failed to read object 802b5c3c00000000 No object found
> Failed to read inode header
> Failed to read object 80cde59c00000000 No object found
> Failed to read inode header
> Failed to read object 80cde59d00000000 No object found
> Failed to read inode header
> Failed to read object 80cde59e00000000 No object found
> Failed to read inode header
> Failed to read object 80cde59f00000000 No object found
> Failed to read inode header
> Failed to read object 80cde5a000000000 No object found
> Failed to read inode header
> Failed to read object 80cde5a100000000 No object found
> Failed to read inode header
> Failed to read object 80cde5a200000000 No object found
> Failed to read inode header
> Failed to read object 80cde5a300000000 No object found
> Failed to read inode header
> Failed to read object 80cde5a400000000 No object found
> Failed to read inode header
> Failed to read object 80cde5a500000000 No object found
> Failed to read inode header
> Failed to read object 80d8c70600000000 No object found
> Failed to read inode header
> Failed to read object 80ddce9a00000000 No object found
> Failed to read inode header
>
> Here is a grep for the first object ID on each of the three nodes
> running sheep:
>
> 1.
> Apr 09 16:46:14 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 16:46:14 DEBUG [main] err_to_sderr(100) object 80244e5600000000
> not found locally
> Apr 09 16:46:54 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 16:46:54 DEBUG [main] err_to_sderr(100) object 80244e5600000000
> not found locally
> May 04 19:38:50 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> May 04 19:38:50 DEBUG [main] err_to_sderr(100) object 80244e5600000000
> not found locally
> May 04 19:39:13 DEBUG [main] prepare_schedule_oid(566) 80244e5600000000
> nr_prio_oids 1
> May 04 19:39:13 DEBUG [main] request_in_recovery(195) 80244e5600000000
> wait on oid
> May 04 19:39:13 DEBUG [rw] default_get_hash(646) the message digest of
> 80244e5600000000 at epoch 12 is f5a828c6a3c5b6dc99520d31fcbc3fd76f080a34
> May 04 19:39:13 DEBUG [rw] recover_replication_object(369) try recover
> object 80244e5600000000 from epoch 16
> May 04 19:39:13 DEBUG [rw] recover_replication_object(369) try recover
> object 80244e5600000000 from epoch 15
> May 04 19:39:13 DEBUG [main] wakeup_requests_on_oid(250) retry
> 80244e5600000000
> May 04 19:39:13 DEBUG [main] request_in_recovery(195) 80244e5600000000
> wait on oid
> May 04 19:39:13 INFO [main] recover_object_main(855) object
> 80244e5600000000 is recovered (322/2745)
> May 04 19:39:41 DEBUG [main] oid_in_recovery(596) 80244e5600000000 has
> been already recovered
> May 04 19:39:41 DEBUG [io 12960] do_process_work(1393) a4,
> 80244e5600000000, 17
> May 04 19:39:41 DEBUG [io 12960] err_to_sderr(100) object
> 80244e5600000000 not found locally
> May 04 19:39:41 DEBUG [io 12960] do_process_work(1400) failed: a4,
> 80244e5600000000 , 17, No object found
> May 04 19:39:55 DEBUG [main] oid_in_recovery(596) 80244e5600000000 has
> been already recovered
> May 04 19:39:55 DEBUG [io 12950] do_process_work(1393) a4,
> 80244e5600000000, 17
> May 04 19:39:55 DEBUG [io 12950] err_to_sderr(100) object
> 80244e5600000000 not found locally
> May 04 19:39:55 DEBUG [io 12950] do_process_work(1400) failed: a4,
> 80244e5600000000 , 17, No object found
> May 04 19:40:04 DEBUG [main] oid_in_recovery(596) 80244e5600000000 has
> been already recovered
> May 04 19:40:04 DEBUG [io 12960] do_process_work(1393) a4,
> 80244e5600000000, 17
> May 04 19:40:04 DEBUG [io 12960] err_to_sderr(100) object
> 80244e5600000000 not found locally
> May 04 19:40:04 DEBUG [io 12960] do_process_work(1400) failed: a4,
> 80244e5600000000 , 17, No object found
>
> 2.
> Mar 31 22:14:45 INFO [main] recover_object_main(856) object
> 80244e5600000000 is recovered (2868/4370)
> May 04 19:39:02 INFO [main] recover_object_main(856) object
> 80244e5600000000 is recovered (181/3034)
> May 04 19:40:04 ERROR [gway 11266] gateway_replication_read(294) local
> read 80244e5600000000 failed, No object found
> May 04 19:41:59 ERROR [gway 12177] gateway_replication_read(294) local
> read 80244e5600000000 failed, No object found
> May 04 19:43:43 ERROR [gway 12177] gateway_replication_read(294) local
> read 80244e5600000000 failed, No object found
> May 04 19:43:58 ERROR [gway 12177] gateway_replication_read(294) local
> read 80244e5600000000 failed, No object found
> May 04 19:45:15 ERROR [gway 12177] gateway_replication_read(294) local
> read 80244e5600000000 failed, No object found
>
> 3
> Mar 31 22:14:04 INFO [main] recover_object_main(855) object
> 80244e5600000000 is recovered (2868/4370)
> Apr 09 12:23:11 INFO [main] recover_object_main(855) object
> 80244e5600000000 is recovered (2923/4452)
> Apr 09 16:45:32 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 16:50:46 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 16:58:26 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 17:02:34 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 17:03:17 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 17:03:34 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 17:03:47 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 17:03:48 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> Apr 09 17:07:34 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> May 04 19:35:00 DEBUG [main] init_objlist_and_vdi_bitmap(234) found the
> VDI object 80244e5600000000
> May 04 19:35:00 DEBUG [main] move_object_to_stale_dir(507) moved object
> 80244e5600000000
> May 04 19:39:02 DEBUG [io 21128] do_process_work(1393) b4,
> 80244e5600000000, 17
> May 04 19:39:02 DEBUG [io 21128] do_process_work(1400) failed: b4,
> 80244e5600000000 , 17, No object found
> May 04 19:39:13 DEBUG [io 21128] do_process_work(1393) b4,
> 80244e5600000000, 17
> May 04 19:39:13 DEBUG [io 21128] do_process_work(1400) failed: b4,
> 80244e5600000000 , 17, No object found
> May 04 19:39:41 DEBUG [gway 21110] do_process_work(1393) 2,
> 80244e5600000000, 17
>
> Can anyone explain what has happened, and why sheepdog has just now
> deleted all the objects associated with my cluster, I assume rendering
> it completely unrecoverable? Please let me know if there are other
> investigations I should perform.
>
> Thank you!
>
> Struan Bartlett
Sorry for inconvenience. Your problem is very fatal. Could you provide
as much information about your environment as you can correct and add
a new issue to our list*?
e.g. options for sheeps, logs of sheeps, logs and parameters of
cluster manager (corosync or zookeeper?), and machine information
(disks, NICs, etc).
* https://bugs.launchpad.net/sheepdog-project
Thanks,
Hitoshi
More information about the sheepdog-users
mailing list