[Sheepdog] [PATCH 2/2] make vdi setattr atomic
Chris Webb
chris at arachsys.com
Fri Oct 14 11:14:47 CEST 2011
Hi Kazutaka. I pulled your vdiattr branch to test, and it does seem to have
changed the race behaviour so I think your diagnosis was correct. However,
when I remove some of my debugging code (which slows down the rate at which
collie commands are invoked because of the extensive logging), I'm now
seeing this:
0026# cat /tmp/collie.log
[2584] collie vdi create dc9d3806-cafd-47c5-8711-9b4f99b5b061 539545600
Exit code: 0
[2584] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0
[2584] collie vdi setattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 properties
Exit code: 0
[2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0
[2584] collie vdi setattr -d dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0
[2584] collie vdi list --raw
Exit code: 0
[2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
no such attribute, claimed
Exit code: 5
[2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 properties
Exit code: 0
[2636] collie vdi list --raw
Exit code: 0
[2652] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0
[2652] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
no such attribute, claimed
Exit code: 5
[2652] collie vdi setattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
Exit code: 0
[2652] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0
[2652] collie vdi setattr -d dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0
[2652] collie vdi write dc9d3806-cafd-47c5-8711-9b4f99b5b061 0
failed to write object, b1028300000000 I/O error
failed to write vdi
Exit code: 1
[2652] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
There are no collie processes other than the ones listed above accessing the
vdi, and the three processes you see above run sequentially, not
concurrently.
At this point, all my nodes have gone away!
0026# collie node list
Idx - Host:Port Vnodes Zone
---------------------------------------------
0026# collie vdi list
name id size used shared creation time vdi id
------------------------------------------------------------------
Floating point exception (core dumped)
although the sheep processes still seem to be running:
2512 ? Ssl 0:00 sheep -D -p 7000 /mnt/sheep-0026-00
2514 ? Ss 0:00 sheep -D -p 7000 /mnt/sheep-0026-00
2515 ? Ssl 0:00 sheep -D -p 7001 /mnt/sheep-0026-01
2518 ? Ss 0:00 sheep -D -p 7001 /mnt/sheep-0026-01
2519 ? Ssl 0:00 sheep -D -p 7002 /mnt/sheep-0026-02
2535 ? Ss 0:00 sheep -D -p 7002 /mnt/sheep-0026-02
I've reproduced this a few times from a clean
killall sheep
rm -rf /mnt/sheep-0026-0*/*
sheep -D -p 7000 /mnt/sheep-0026-00
sheep -D -p 7001 /mnt/sheep-0026-01
sheep -D -p 7002 /mnt/sheep-0026-02
collie cluster format --copies=1
on my test host.
Cheers,
Chris.
More information about the sheepdog
mailing list