[Sheepdog] [PATCH 2/2] make vdi setattr atomic
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Fri Oct 14 11:59:32 CEST 2011
Hi Chris,
Okay, I'll look into this, but let me ask you some questions.
At Fri, 14 Oct 2011 10:14:47 +0100,
Chris Webb wrote:
>
> Hi Kazutaka. I pulled your vdiattr branch to test, and it does seem to have
> changed the race behaviour so I think your diagnosis was correct. However,
> when I remove some of my debugging code (which slows down the rate at which
> collie commands are invoked because of the extensive logging), I'm now
> seeing this:
>
> 0026# cat /tmp/collie.log
> [2584] collie vdi create dc9d3806-cafd-47c5-8711-9b4f99b5b061 539545600
> Exit code: 0
>
> [2584] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
>
> [2584] collie vdi setattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 properties
> Exit code: 0
>
> [2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
>
> [2584] collie vdi setattr -d dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
>
> [2584] collie vdi list --raw
> Exit code: 0
>
> [2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
> no such attribute, claimed
> Exit code: 5
>
> [2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 properties
> Exit code: 0
>
> [2636] collie vdi list --raw
> Exit code: 0
>
> [2652] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
>
> [2652] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
> no such attribute, claimed
> Exit code: 5
>
> [2652] collie vdi setattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
> Exit code: 0
>
> [2652] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
>
> [2652] collie vdi setattr -d dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
> Exit code: 0
>
> [2652] collie vdi write dc9d3806-cafd-47c5-8711-9b4f99b5b061 0
> failed to write object, b1028300000000 I/O error
> failed to write vdi
> Exit code: 1
Before pulling vdiattr branch, didn't this bug happen? If possible,
can you check what is written in sheep.log when this problem happens?
>
> [2652] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
>
> There are no collie processes other than the ones listed above accessing the
> vdi, and the three processes you see above run sequentially, not
> concurrently.
>
> At this point, all my nodes have gone away!
>
> 0026# collie node list
> Idx - Host:Port Vnodes Zone
> ---------------------------------------------
>
> 0026# collie vdi list
> name id size used shared creation time vdi id
> ------------------------------------------------------------------
> Floating point exception (core dumped)
Can you get a stack trace from the core?
Thanks,
Kazutaka
>
> although the sheep processes still seem to be running:
>
> 2512 ? Ssl 0:00 sheep -D -p 7000 /mnt/sheep-0026-00
> 2514 ? Ss 0:00 sheep -D -p 7000 /mnt/sheep-0026-00
> 2515 ? Ssl 0:00 sheep -D -p 7001 /mnt/sheep-0026-01
> 2518 ? Ss 0:00 sheep -D -p 7001 /mnt/sheep-0026-01
> 2519 ? Ssl 0:00 sheep -D -p 7002 /mnt/sheep-0026-02
> 2535 ? Ss 0:00 sheep -D -p 7002 /mnt/sheep-0026-02
>
> I've reproduced this a few times from a clean
>
> killall sheep
> rm -rf /mnt/sheep-0026-0*/*
> sheep -D -p 7000 /mnt/sheep-0026-00
> sheep -D -p 7001 /mnt/sheep-0026-01
> sheep -D -p 7002 /mnt/sheep-0026-02
> collie cluster format --copies=1
>
> on my test host.
>
> Cheers,
>
> Chris.
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
More information about the sheepdog
mailing list