Hi Chris, At Mon, 10 Oct 2011 12:30:56 +0100, Chris Webb wrote: > > Hi. We've finished porting our infrastructure management system to live > entirely on top of Sheepdog, and have begun doing some testing as a result. Great! > We use setattr -x to implement locking in the way we've previously > discussed, and I've noticed a few consistency problems. > > Here's a first, simple example, which turned up when I trying to reproduce > some of the rarer odd behaviours: > > 0026# collie vdi create foo 1G > 0026# collie vdi setattr -x foo foo <<< "bar" > 0026# collie vdi delete foo > 0026# collie vdi create foo 1G > 0026# collie vdi setattr -x foo foo <<< "bar" > the attribute already exists, foo > 0026# collie vdi getattr foo foo > bar > > Looks like attributes don't get cleaned away when a vdi is deleted, and a > new vdi with the same name with end up with the same vdi id and hence 'pick > up' the stray attributes. I've confirmed the bug, thanks for your report. > > However, I'm seeing some strange behaviours even when we're using UUID VDI > names, so there's no risk of one ever being reused. > > I arranged for the lowest level of our management system to log all collie > invocations to a file to capture what's going on. There are no qemu-img > operations or qemu vms running at the same time as these commands, and I > started with a completely clean cluster of three nodes on the same box, > empty directories, and cluster format --copies=1. > > Here's an example trace: > > [4581] collie vdi create 13121389-6673-4fe1-b30a-6608b9623bbf 539545600 > Exit code: 0 > > [4581] collie vdi setattr -x 13121389-6673-4fe1-b30a-6608b9623bbf lock > stdin: 002689c3-aeab-433d-bafc-acfb95dafe7c:4581:1318241623 > stdout: > Exit code: 0 > > [4581] collie vdi setattr 13121389-6673-4fe1-b30a-6608b9623bbf properties > stdin: email test at test > name debian > user 00000000-0000-0000-0000-000000000000 > stdout: > Exit code: 0 > > [4581] collie vdi getattr 13121389-6673-4fe1-b30a-6608b9623bbf lock > stdin: > stdout: > Exit code: 0 > > So, the 'lock' attribute is found here (else exit code would be EMISSING), > but has an empty value instead of the expected value > 002689c3-aeab-433d-bafc-acfb95dafe7c:4581:1318241623. I'm guessing that > what's really going on here is that the setattr operation happens > asynchronously and hasn't finished by the time the command exits and we do > the getattr? Sorry, I couldn't reproduce this. "collie vdi setattr" should be a synchronous operation. Were there node membership changes (node join/left) during vdi attribute operations? Thanks, Kazutaka |