[Sheepdog] Sheepdog Read-only issues.

Eric Renfro erenfro at gmail.com
Thu Apr 7 08:06:07 CEST 2011


I just started using Sheepdog and I'm curious as to why this is occurring,
if it's a known issue or a resolved issue.

What's happening to me is, when one of my sheep servers are taken down, it
causes server-wide issues, especially with running VM's. I have 6 sheep
servers running on 6 physical computers. 4 of the servers run kvm guests
along with sheep, 2 servers are just storage servers only. I currently run
sheepdog through pacemaker as a primitive lsb resource in every sheep node.
When I stop pacemaker on nas2 (a storage only server), vm's on vservers 1-4
suddenly get I/O errors and the filesystems remount R/O and either won't
restart properly on the same node and have to be migrated to another node,
or they do. Either way the only way to restore access is by rebooting the
guest vm outright. Each guest vm uses the localhost:7000 for sheep access to
the sheepdog vdi's.

I'm running this platform all on OpenSUSE 11.4 with qemu 0.14.0 from
standard opensuse repositores (not the virtualization repository) and
reasonably current sheepdog git build.

I setup the sheepdog collie cluster to maintain 3 copies as well.

In another test, I had just 2 vservers running sheepdog with vm guests on
the same 2, using only 2 copies, during my initial testing of sheepdog, and
by crowbarring pacemaker into standby mode to test migration of the kvm
sessions, it ended up destroying the sheepdog cluster completely loosing all
of the vdi's, and being unable to find a specific obj file it was looking
for from the cluster so it kept trying endlessly. Ended up having to
reformat the cluster, which is when I got my two storage servers rebuilt to
handle 2 more sheep clusters and set it up to use 3 copies amongst 4
servers, then finally the 2 other vservers were joined into the sheepdog
cluster as a whole.

Any information regarding this problem I'd be glad to hear it. So far it
looks like Sheepdog is going to be very strong and powerful and meet my
needs, as long as I can get around this current problem I have presently.

Eric Renfro
