[Sheepdog] [PATCH 2/2] make vdi setattr atomic

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Fri Oct 14 11:59:32 CEST 2011


Hi Chris,

Okay, I'll look into this, but let me ask you some questions.

At Fri, 14 Oct 2011 10:14:47 +0100,
Chris Webb wrote:
> 
> Hi Kazutaka. I pulled your vdiattr branch to test, and it does seem to have
> changed the race behaviour so I think your diagnosis was correct. However,
> when I remove some of my debugging code (which slows down the rate at which
> collie commands are invoked because of the extensive logging), I'm now
> seeing this:
> 
> 0026# cat /tmp/collie.log 
> [2584] collie vdi create dc9d3806-cafd-47c5-8711-9b4f99b5b061 539545600
> Exit code: 0
> 
> [2584] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
> 
> [2584] collie vdi setattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 properties
> Exit code: 0
> 
> [2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
> 
> [2584] collie vdi setattr -d dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
> 
> [2584] collie vdi list --raw
> Exit code: 0
> 
> [2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
> no such attribute, claimed
> Exit code: 5
> 
> [2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 properties
> Exit code: 0
> 
> [2636] collie vdi list --raw
> Exit code: 0
> 
> [2652] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
> 
> [2652] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
> no such attribute, claimed
> Exit code: 5
> 
> [2652] collie vdi setattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
> Exit code: 0
> 
> [2652] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
> 
> [2652] collie vdi setattr -d dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
> 
> [2652] collie vdi write dc9d3806-cafd-47c5-8711-9b4f99b5b061 0
> failed to write object, b1028300000000 I/O error
> failed to write vdi
> Exit code: 1

Before pulling vdiattr branch, didn't this bug happen?  If possible,
can you check what is written in sheep.log when this problem happens?

> 
> [2652] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> 
> There are no collie processes other than the ones listed above accessing the
> vdi, and the three processes you see above run sequentially, not
> concurrently.
> 
> At this point, all my nodes have gone away!
> 
>   0026# collie node list
>      Idx - Host:Port          Vnodes       Zone
>   ---------------------------------------------
> 
>   0026# collie vdi list
>   name        id    size    used  shared    creation time   vdi id
>   ------------------------------------------------------------------
>   Floating point exception (core dumped)

Can you get a stack trace from the core?


Thanks,

Kazutaka

> 
> although the sheep processes still seem to be running:
> 
>  2512 ?        Ssl    0:00 sheep -D -p 7000 /mnt/sheep-0026-00
>  2514 ?        Ss     0:00 sheep -D -p 7000 /mnt/sheep-0026-00
>  2515 ?        Ssl    0:00 sheep -D -p 7001 /mnt/sheep-0026-01
>  2518 ?        Ss     0:00 sheep -D -p 7001 /mnt/sheep-0026-01
>  2519 ?        Ssl    0:00 sheep -D -p 7002 /mnt/sheep-0026-02
>  2535 ?        Ss     0:00 sheep -D -p 7002 /mnt/sheep-0026-02
> 
> I've reproduced this a few times from a clean
> 
>   killall sheep
>   rm -rf /mnt/sheep-0026-0*/*
>   sheep -D -p 7000 /mnt/sheep-0026-00
>   sheep -D -p 7001 /mnt/sheep-0026-01
>   sheep -D -p 7002 /mnt/sheep-0026-02
>   collie cluster format --copies=1
> 
> on my test host.
> 
> Cheers,
> 
> Chris.
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list