[sheepdog-users] problem with cluster recover

Wed Feb 13 10:54:33 CET 2013

2013/2/12 icez network <icez at icez.net>:
> How you start the cluster back? The recommended procedure is to start 2
> node, then 'collie cluster recover force' and wait until cluster recover
> completed and then start the "new" 3rd node.

I did the wrong procedure then:
  run sheep on the "new" node 3
  run the recover

I tried now to use the recommended prcedure.
This time I used 4 nodes.
  Cluster shutdwon.
  Removed node 4.
  Run cluster recover with the 3 nodes on line.
    ('collie node recovery' shows nothing)
  I didn't add node 4 back.
  Status of the cluster is running.
  Now I run a guest on node 1; it starts but gives I/O errors.

Here is the portion of sheep.log (node 1):
...
Feb 13 10:37:24 [rw 464] recover_object_work(201) done:440 count:444,
oid:a34c67000004d6
Feb 13 10:37:24 [rw 465] recover_object_work(201) done:441 count:444,
oid:a34c67000005a0
Feb 13 10:37:24 [rw 466] recover_object_work(201) done:442 count:444,
oid:a34c670000048d
Feb 13 10:37:24 [rw 467] recover_object_work(201) done:443 count:444,
oid:a34c67000004ec
Feb 13 10:37:24 [main] queue_cluster_request(307) COMPLETE_RECOVERY (0x2300bb0)
Feb 13 10:38:13 [io 471] do_lookup_vdi(373) looking for squeeze (a34c67)
Feb 13 10:38:33 [gway 10025] wait_forward_request(206) fail 2
Feb 13 10:38:33 [gway 10026] wait_forward_request(206) fail 2
Feb 13 10:38:33 [gway 10027] wait_forward_request(206) fail 2
Feb 13 10:38:38 [gway 10436] wait_forward_request(206) fail 2
Feb 13 10:38:38 [gway 10438] wait_forward_request(206) fail 2
Feb 13 10:38:39 [gway 10457] wait_forward_request(206) fail 2
...

I run vdi check (I have only 1 vdi for testing), and after it, the
guest doesn't report I/O errors.

At this point, adding the 4th node is flowless.