[sheepdog-users] About split brain

Tue Aug 20 06:34:55 CEST 2013

On Mon, Aug 19, 2013 at 04:41:39PM +0200, Valerio Pachera wrote:
> Scenario:
> 
> cluster with 3 nodes and a vdi with some snapshots.
> 
> Node 0 dies (and sure it didn't flush the cache).
> 
> Before Node 0 is back on line, I remove the vdi and its snapshots!
> 
> Once Node 0 is back, it's still showing the vdi and the snapshots.
> 
> # collie vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> s big          2   50 GB  0.0 MB   44 MB 2013-08-19 16:12   4fd25e
> 2
> s big          3   50 GB  0.0 MB   44 MB 2013-08-19 16:12   4fd25f
> 2
> s big          4   50 GB  220 MB   40 MB 2013-08-19 16:12   4fd260
> 2
> s big          5   50 GB   40 MB  252 MB 2013-08-19 16:13   4fd261
> 2
> s big          6   50 GB  8.0 MB  284 MB 2013-08-19 16:13   4fd262
> 2
> s big          7   50 GB  460 MB   40 MB 2013-08-19 16:13   4fd263
> 2
>   big          0   50 GB   52 MB  496 MB 2013-08-19 16:13   4fd264     2
> 
> If I remove a snapshot from Node 0 I get an error, but it does the job:
> 
> # collie  vdi delete -s 1 big
>   Failed to delete big: Failed to find requested tag
> # collie  vdi delete -s 2 big
>   Failed to delete big: Failed to find requested tag
> 
> # collie vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> s big          3   50 GB  0.0 MB   44 MB 2013-08-19 16:12   4fd25f
> 2
> s big          4   50 GB  220 MB   40 MB 2013-08-19 16:12   4fd260
> 2
> s big          5   50 GB   40 MB  252 MB 2013-08-19 16:13   4fd261
> 2
> s big          6   50 GB  8.0 MB  284 MB 2013-08-19 16:13   4fd262
> 2
> s big          7   50 GB  460 MB   40 MB 2013-08-19 16:13   4fd263
> 2
>   big          0   50 GB   52 MB  496 MB 2013-08-19 16:13   4fd264     2
> 
> In such situation, I think it's better to clear the node data before
> inserting it back on line, as if it was new.
> 
> What do you think?

We actually purge data of backend store, but not the cache. I am not sure if
we should purge the cache too. For code simplicity, I favor purging cache
when joining back.

Thanks
Yuan