[sheepdog] Issue with "-m unsafe", copies and zones

Tue Oct 2 20:56:40 CEST 2012

At Tue, 2 Oct 2012 10:20:53 -0400,
Shawn Moore wrote:
> 
> I have been testing the 0.5.0 release and believe I have found
> regression issues related to "mode unsafe" as well as just one zone
> out of three causing issues.  The last time I know this worked was
> when the option was "-H" for no halt before the "-m OPTION".
> 
> 
> I have 6 nodes (2 per zone with 3 zones).  Each zone is on it's own
> switch with the switch for zone 0 bringing them all together.
>   # collie node list
>   M   Id   Host:Port         V-Nodes       Zone
>   -    0   172.16.1.151:7000 	64          0
>   -    1   172.16.1.152:7000 	64          0
>   -    2   172.16.1.153:7000 	64          1
>   -    3   172.16.1.154:7000 	64          1
>   -    4   172.16.1.155:7000 	64          2
>   -    5   172.16.1.159:7000 	64          2
> 
> 
> The cluster was formatted as follows:
>   # collie cluster format -b farm -c 3 -m unsafe
>   # collie cluster info
>   Cluster status: running
>   Cluster created at Mon Oct  1 15:40:55 2012
>   Epoch Time           Version
>   2012-10-01 15:40:55      1 [172.16.1.151:7000, 172.16.1.152:7000,
> 172.16.1.153:7000, 172.16.1.154:7000, 172.16.1.155:7000,
> 172.16.1.159:7000]
> 
> 
> I created a 40GB vdi via each node.
>   # collie vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
>   test159      1   40 GB   40 GB  0.0 MB 2012-10-01 16:46   279f76
> 3
>   test153      1   40 GB   40 GB  0.0 MB 2012-10-01 16:46   27a9a8
> 3
>   test152      1   40 GB   40 GB  0.0 MB 2012-10-01 16:46   27ab5b
> 3
>   test151      1   40 GB   40 GB  0.0 MB 2012-10-01 16:46   27ad0e
> 3
>   test155      1   40 GB   40 GB  0.0 MB 2012-10-01 16:46   27b3da
> 3
>   test154      1   40 GB   40 GB  0.0 MB 2012-10-01 16:46   27b58d     3
>   # collie node info
>   Id	Size	Used	Use%
>    0	476 GB	117 GB	 24%
>    1	476 GB	123 GB	 25%
>    2	476 GB	136 GB	 28%
>    3	476 GB	104 GB	 21%
>    4	476 GB	117 GB	 24%
>    5	476 GB	123 GB	 25%
>   Total	2.8 TB	720 GB	 25%
> 
> 
> Then I kill the uplink interface for zone 2 from the zone 0 switch.
> This leaves zones 0/1 talking to each other and zone 2 talking only to
> itself.
>   # collie cluster info
>   Cluster status: running
>   Cluster created at Mon Oct  1 15:40:55 2012
>   Epoch Time           Version
>   2012-10-02 09:04:28      3 [172.16.1.151:7000, 172.16.1.152:7000,
> 172.16.1.153:7000, 172.16.1.154:7000]
>   2012-10-02 09:04:28      2 [172.16.1.151:7000, 172.16.1.152:7000,
> 172.16.1.153:7000, 172.16.1.154:7000, 172.16.1.159:7000]
>   2012-10-01 15:40:55      1 [172.16.1.151:7000, 172.16.1.152:7000,
> 172.16.1.153:7000, 172.16.1.154:7000, 172.16.1.155:7000,
> 172.16.1.159:7000]
>   # collie node info
>   Id	Size	Used	Use%
>    0	476 GB	117 GB	 24%
>    1	476 GB	123 GB	 25%
>    2	476 GB	136 GB	 28%
>    3	476 GB	104 GB	 21%
>   Total	1.9 TB	480 GB	 25%
> At this point every node in zones 0/1 start writing every second:
>   Oct 02 09:04:28 [rw 128323] get_vdi_copy_number(82) No VDI copy
> entry for 0 found
> The command below hangs till killed on every vdi:
>   # collie vdi object test151
> So I try to check the vdi's and they all do:
>   # collie vdi check test151
>   [main] get_vnode_next_idx(106) PANIC: can't find next new idx
>   Aborted
> 
> 
> When I bring back the interface between zone 0/1 and 2, the sheep
> processes have died stating:
>   Oct 02 09:04:28 [main] cdrv_cpg_confchg(599) PANIC: Network
> partition is detected
>   Oct 02 09:04:28 [main] crash_handler(439) sheep pid 6780 exited unexpectedly.
> Shouldn't zone two have remained running due to the "-m unsafe"
> option?  I understand about network partitioning and want this issue
> as I can handle it myself.

I think we should add another option to disable network partition
detection.  "-m unsafe" means only allowing I/Os even if there are
enough nodes in the cluster.  With the option, we have a risk of
reading an old data.  On the other hand, the risk of allowing network
parition is that we could update the same data in both clusters at the
same time and different from what "-m unsafe" can cause.

> And I can't understand why zones 0/1 were affected at all with copies
> 2 and especially with "-m unsafe".
> 
> 
> Let me know if you need anymore information or would like me to re-run
> the test a different way.

Unfortunately, I could not reproduced the problem.  Does the problem
happen only against network error?  What happens if you simply stop
sheeps in zone 2 with kill command?

Thanks,

Kazutaka