I removed the files in the journal folder and started the sheep again. It started successfully. However, I found all the vdi are lost (All the node recovery has completed.). What happened? and how can I get my lost vdi? thanks, Hongyi ================================================ > collie vdi list collie vdi list Name Id Size Used Shared Creation time VDI id Copies Tag Failed to read object 801d5fbd00000000 No object found Failed to read inode header Failed to read object 80791a2400000000 No object found Failed to read inode header Failed to read object 809133bf00000000 No object found Failed to read inode header Failed to read object 809133c000000000 No object found Failed to read inode header Failed to read object 80d322dd00000000 No object found Failed to read inode header ================================================ I started sheep on 3 nodes, here is the sheep.log shows: z1: .... May 23 21:53:49 [rw] sheep_exec_req(547) failed No object found May 23 21:53:49 [rw] default_link(374) failed to link from /sheep/disk1/.stale/00d322dd000039f0.14 to /sheep/disk1/00d322dd000039f0, No such file or directory May 23 21:53:49 [rw] sheep_exec_req(547) failed No object found May 23 21:53:49 [rw] default_link(374) failed to link from /sheep/disk1/.stale/00d322dd000039f0.13 to /sheep/disk1/00d322dd000039f0, No such file or directory May 23 21:53:49 [rw] do_epoch_log_read(93) failed to open epoch 12 log, No such file or directory May 23 21:53:49 [main] recover_object_main(612) done:9311 count:9311, oid:d322dd000039f0 May 23 21:53:49 [main] modify_event(151) event info for fd 29 not found z2: .... May 23 22:14:38 [rw] default_link(374) failed to link from /sheep/disk2/.stale/001d5fbd00001480.14 to /sheep/disk2/001d5fbd00001480, No such file or directoy May 23 22:14:38 [rw] sheep_exec_req(547) failed No object found May 23 22:14:38 [rw] do_epoch_log_read(93) failed to open epoch 13 log, No such file or directory May 23 22:14:38 [rw] sheep_exec_req(547) failed No object found May 23 22:14:38 [rw] do_epoch_log_read(93) failed to open epoch 12 log, No such file or directory May 23 22:14:38 [main] recover_object_main(612) done:15244 count:15244, oid:1d5fbd00001480 May 23 22:14:38 [main] modify_event(151) event info for fd 36 not found May 23 22:14:38 [main] modify_event(151) event info for fd 38 not found May 23 22:14:38 [main] modify_event(151) event info for fd 39 not found May 23 22:14:38 [main] modify_event(151) event info for fd 43 not found May 23 22:14:38 [main] modify_event(151) event info for fd 42 not found z3: .... May 08 07:22:52 [io 19984] do_epoch_log_read(93) failed to open epoch 11 log, No such file or directory May 08 07:22:52 [io 19984] do_epoch_log_read(93) failed to open epoch 10 log, No such file or directory May 08 07:22:52 [io 19984] do_epoch_log_read(93) failed to open epoch 9 log, No such file or directory May 08 07:22:52 [io 19984] do_epoch_log_read(93) failed to open epoch 8 log, No such file or directory May 08 07:22:52 [io 19984] do_epoch_log_read(93) failed to open epoch 7 log, No such file or directory May 08 07:22:52 [io 19984] do_epoch_log_read(93) failed to open epoch 6 log, No such file or directory May 08 07:22:57 [gway 20019] gateway_read_obj(60) local read 801d5fbd00000000 failed, No object found May 08 07:22:57 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:22:57 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:22:57 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:22:57 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:22:57 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:22:57 [gway 19975] gateway_read_obj(60) local read 809133c000000000 failed, No object found May 08 07:22:57 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:22:57 [gway 20019] gateway_read_obj(60) local read 80d322dd00000000 failed, No object found May 08 07:22:57 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:30:46 [gway 19975] gateway_read_obj(60) local read 801d5fbd00000000 failed, No object found May 08 07:30:46 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:30:46 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:30:46 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:30:46 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:30:46 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:30:46 [gway 20019] gateway_read_obj(60) local read 809133c000000000 failed, No object found May 08 07:30:46 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:30:46 [gway 19975] gateway_read_obj(60) local read 80d322dd00000000 failed, No object found May 08 07:30:46 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:31:13 [gway 20019] gateway_read_obj(60) local read 801d5fbd00000000 failed, No object found May 08 07:31:13 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:31:13 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:31:13 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:31:13 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:31:13 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:31:13 [gway 19975] gateway_read_obj(60) local read 809133c000000000 failed, No object found May 08 07:31:13 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:31:13 [gway 20019] gateway_read_obj(60) local read 80d322dd00000000 failed, No object found May 08 07:31:13 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:31:32 [gway 19975] gateway_read_obj(60) local read 801d5fbd00000000 failed, No object found May 08 07:31:32 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:31:32 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:31:32 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:31:32 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:31:32 [gway 19975] sheep_exec_req(547) failed No object found May 08 07:31:32 [gway 20019] gateway_read_obj(60) local read 809133c000000000 failed, No object found May 08 07:31:32 [gway 20019] sheep_exec_req(547) failed No object found May 08 07:31:32 [gway 19975] gateway_read_obj(60) local read 80d322dd00000000 failed, No object found May 08 07:31:32 [gway 19975] sheep_exec_req(547) failed No object found ============================================================================ ------------------ Original ------------------ From: "Liu Yuan"<namei.unix at gmail.com>; Date: Thu, May 23, 2013 01:37 PM To: "Hongyi Wang"<hongyi at zelin.io>; Cc: "sheepdog"<sheepdog at lists.wpkg.org>; "k"<k at zelin.io>; Subject: Re: [sheepdog] zookeeper quitting unexpectedly causes recoveringfrom journal file failed On 05/23/2013 01:29 PM, Hongyi Wang wrote: > Hi, > > This is followed by our last test. One sheep node in our cluster > connected zookeeper timeout so we tried to restart sheep on the node. > However, the sheep cannot be started successfully, > I am not sure if zk connection timeout could somehow causes recovering > from journal file failed? Is this a bug of journal replay? I guess it is a bug of journal replay for some corner cases. Pass 'skip' for -j or simply remove files in journal dir will start the sheep again. Thanks, Yuan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20130523/50bf8cf0/attachment.html> |