[Sheepdog] [PATCH 2/2] make vdi setattr atomic

Chris Webb chris at arachsys.com
Fri Oct 14 11:14:47 CEST 2011


Hi Kazutaka. I pulled your vdiattr branch to test, and it does seem to have
changed the race behaviour so I think your diagnosis was correct. However,
when I remove some of my debugging code (which slows down the rate at which
collie commands are invoked because of the extensive logging), I'm now
seeing this:

0026# cat /tmp/collie.log 
[2584] collie vdi create dc9d3806-cafd-47c5-8711-9b4f99b5b061 539545600
Exit code: 0

[2584] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0

[2584] collie vdi setattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 properties
Exit code: 0

[2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0

[2584] collie vdi setattr -d dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0

[2584] collie vdi list --raw
Exit code: 0

[2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
no such attribute, claimed
Exit code: 5

[2584] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 properties
Exit code: 0

[2636] collie vdi list --raw
Exit code: 0

[2652] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0

[2652] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
no such attribute, claimed
Exit code: 5

[2652] collie vdi setattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 claimed
Exit code: 0

[2652] collie vdi getattr dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0

[2652] collie vdi setattr -d dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock
Exit code: 0

[2652] collie vdi write dc9d3806-cafd-47c5-8711-9b4f99b5b061 0
failed to write object, b1028300000000 I/O error
failed to write vdi
Exit code: 1

[2652] collie vdi setattr -x dc9d3806-cafd-47c5-8711-9b4f99b5b061 lock

There are no collie processes other than the ones listed above accessing the
vdi, and the three processes you see above run sequentially, not
concurrently.

At this point, all my nodes have gone away!

  0026# collie node list
     Idx - Host:Port          Vnodes       Zone
  ---------------------------------------------

  0026# collie vdi list
  name        id    size    used  shared    creation time   vdi id
  ------------------------------------------------------------------
  Floating point exception (core dumped)

although the sheep processes still seem to be running:

 2512 ?        Ssl    0:00 sheep -D -p 7000 /mnt/sheep-0026-00
 2514 ?        Ss     0:00 sheep -D -p 7000 /mnt/sheep-0026-00
 2515 ?        Ssl    0:00 sheep -D -p 7001 /mnt/sheep-0026-01
 2518 ?        Ss     0:00 sheep -D -p 7001 /mnt/sheep-0026-01
 2519 ?        Ssl    0:00 sheep -D -p 7002 /mnt/sheep-0026-02
 2535 ?        Ss     0:00 sheep -D -p 7002 /mnt/sheep-0026-02

I've reproduced this a few times from a clean

  killall sheep
  rm -rf /mnt/sheep-0026-0*/*
  sheep -D -p 7000 /mnt/sheep-0026-00
  sheep -D -p 7001 /mnt/sheep-0026-01
  sheep -D -p 7002 /mnt/sheep-0026-02
  collie cluster format --copies=1

on my test host.

Cheers,

Chris.



More information about the sheepdog mailing list