[Sheepdog] Sheepdog Read-only issues.

Sun Apr 10 16:15:39 CEST 2011

I had another issue of it.

First case test was simple. 2 nodes, 2 copies. Very very basic pacemaker 
configuration. In fact, the pacemaker configuration was simple. 
primitive to start the sheepdog servers on each node from lsb:sheepdog, 
primitive for the libvirt ocf:heartbeat:VirtualDomain, and location 
constraints for the virtual machine to be based on the node attribute 
scores of each node.

Something like:

node ygg1 \
         utilization memory="8192" cpu="4" \
         attributes fw1="100" com="50" vserver="1" sheep="1" standby="off"
node ygg2 \
         utilization memory="8192" cpu="4" \
         attributes fw1="50" com="100" vserver="1" sheep="1"
primitive sheep lsb:sheepdog
primitive kvm_vfw1 ocf:heartbeat:VirtualDomain \
         params config="/etc/libvirt/qemu/vfw1.xml" 
hypervisor="qemu:///system" migration_transport="ssh" \
         meta allow-migrate="true" priority="10" target-role="Started" 
is-managed="true" resource-stickiness="2" migration-threshold="2" \
         op start interval="0" timeout="120s" \
         op stop interval="0" timeout="120s" \
         op migrate_to interval="0" timeout="120s" \
         op migrate_from interval="0" timeout="120s" \
         op monitor interval="10" timeout="30" depth="0" \
         utilization memory="512" cpu="1"
primitive kvm_com ocf:heartbeat:VirtualDomain \
         params config="/etc/libvirt/qemu/com.xml" 
hypervisor="qemu:///system" migration_transport="ssh" \
         meta allow-migrate="true" priority="5" target-role="Started" 
is-managed= "true" resource-stickiness="2" migration-threshold="2" \
         op start interval="0" timeout="120s" \
         op stop interval="0" timeout="120s" \
         op migrate_to interval="0" timeout="120s" \
         op migrate_from interval="0" timeout="120s" \
         op monitor interval="10" timeout="30" depth="0" \
         utilization memory="512" cpu="1"
location com-os-loc kvm_com \
         rule $id="sheep-loc-rule-0" -inf: not_defined sheep or sheep 
number:lte 0 \
         rule $id="sheep-loc-rule-1" sheep: defined sheep
location com-os-loc kvm_com \
         rule $id="com-os-loc-rule-0" -inf: not_defined com or com 
number:lte 0 or not_defined vserver or vserver number:lte 0 \
         rule $id="com-os-loc-rule-1" com: defined com and defined vserver
location vfw1-os-loc kvm_vfw1 \
         rule $id="vfw1-os-loc-rule-0" -inf: not_defined fw1 or fw1 
number:lte 0 or not_defined vserver or vserver number:lte 0 \
         rule $id="vfw1-os-loc-rule-1" fw1: defined fw1 and defined vserver
colocation kvm_com-loc inf: kvm_com sheep:Started
colocation kvm_mon-loc inf: kvm_mon sheep:Started
property $id="cib-bootstrap-options" \
         dc-version="1.1.5-jlkjgjhgfjhf" \
         cluster-infrastructure="openais" \
         expected-quorum-votes="2" \
         stonith-enabled="false" \
         last-lrm-refresh="1302345517" \
         placement-strategy="utilization"

What this setup does in plain English is:
Run sheepdog only on nodes that have sheep="1" or higher, so that 
non-storage nodes can also be part of the cluster.
Run kvm_com (and kvm_vfw1 similarly) on nodes with com="1" or higher, 
priorities by the value. The alternative number provides failover and 
failback support seemlessly.
Make kvm_* also depend on sheepdog being running on the node it's going 
to be on, else it has to move or shut down.

Here's an example libvirt domain definition:

<domain type='kvm'>
<name>vfw1</name>
<uuid>64dff97d-fd8f-4c14-9fc3-dbfc6a197ff9</uuid>
<description>Firewall 1</description>
<memory>524288</memory>
<currentMemory>524288</currentMemory>
<vcpu>1</vcpu>
<os>
<type arch='x86_64' machine='pc-0.13'>hvm</type>
<boot dev='hd'/>
<boot dev='cdrom'/>
<bootmenu enable='yes'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu match='exact'>
<model>Opteron_G3</model>
<vendor>AMD</vendor>
<feature policy='require' name='skinit'/>
<feature policy='require' name='vme'/>
<feature policy='require' name='mmxext'/>
<feature policy='require' name='fxsr_opt'/>
<feature policy='require' name='cr8legacy'/>
<feature policy='require' name='ht'/>
<feature policy='require' name='3dnowprefetch'/>
<feature policy='require' name='3dnowext'/>
<feature policy='require' name='wdt'/>
<feature policy='require' name='extapic'/>
<feature policy='require' name='pdpe1gb'/>
<feature policy='require' name='osvw'/>
<feature policy='require' name='cmp_legacy'/>
<feature policy='require' name='3dnow'/>
</cpu>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/bin/qemu-kvm</emulator>
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='sheepdog' name='vfw1'/>
<target dev='vda' bus='virtio'/>
<host name='localhost' port='7000'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw' cache='writeback'/>
<source file='/vm/iso/OpenSUSE_JeOS64.x86_64-1.0.0.iso'/>
<target dev='hda' bus='ide'/>
<readonly/>
<address type='drive' controller='0' bus='0' unit='0'/>
</disk>
<controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<interface type='bridge'>
<mac address='de:ad:09:e9:01:01'/>
<source bridge='br0'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<interface type='bridge'>
<mac address='de:ad:09:e0:01:01'/>
<source bridge='br1'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</interface>
<input type='tablet' bus='usb'/>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='5901' autoport='no' listen='0.0.0.0'/>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memballoon>
</devices>
</domain>

Beyond that, all I needed to test the results was toggling the value of 
standby between 0 and 1 about 3-4 times max, watching the vm migrate to 
and from the node and sheepdog corrupted itself.

Secondary scenario, 6 nodes, 3 copies. 4 servers running virtual 
machines, 2 dedicated storage-only. 2 of my VM servers went down due to 
a failing APC. vserver2 and vserver4. This happened only once before the 
problem was fully revealed and total corruption occured with all vdi's 
wiped out due to missing objects.

Setup was done on openSUSE 11.4 servers running pacemaker 1.1.5. Built 
sheepdog from the latest source from git a week ago.

Eric Renfro

On 4/8/2011 2:49 PM, MORITA Kazutaka wrote:
> Hi Eric,
>
> At Thu, 7 Apr 2011 02:06:07 -0400,
> Eric Renfro wrote:
>> Hello,
>>
>> I just started using Sheepdog and I'm curious as to why this is occurring,
>> if it's a known issue or a resolved issue.
>>
>> What's happening to me is, when one of my sheep servers are taken down, it
>> causes server-wide issues, especially with running VM's. I have 6 sheep
>> servers running on 6 physical computers. 4 of the servers run kvm guests
>> along with sheep, 2 servers are just storage servers only. I currently run
>> sheepdog through pacemaker as a primitive lsb resource in every sheep node.
>> When I stop pacemaker on nas2 (a storage only server), vm's on vservers 1-4
>> suddenly get I/O errors and the filesystems remount R/O and either won't
>> restart properly on the same node and have to be migrated to another node,
>> or they do. Either way the only way to restore access is by rebooting the
>> guest vm outright. Each guest vm uses the localhost:7000 for sheep access to
>> the sheepdog vdi's.
> Thanks for your report!  This is not a known issue, but I confirmed
> that the sheep daemon could return EIO when the cluster membership is
> changing.  I'll dig into this problem soon.
>
>> I'm running this platform all on OpenSUSE 11.4 with qemu 0.14.0 from
>> standard opensuse repositores (not the virtualization repository) and
>> reasonably current sheepdog git build.
>>
>> I setup the sheepdog collie cluster to maintain 3 copies as well.
>>
>> In another test, I had just 2 vservers running sheepdog with vm guests on
>> the same 2, using only 2 copies, during my initial testing of sheepdog, and
>> by crowbarring pacemaker into standby mode to test migration of the kvm
>> sessions, it ended up destroying the sheepdog cluster completely loosing all
> Could you explain more details about what you have done?  What's the
> configuration of your pacemaker?  What are the commands you ran to
> make pacemaker a standby mode and migrate your virtual machine?  I'd
> like to reproduce the problem.
>
>
> Thanks,
>
> Kazutaka
>
>
>> of the vdi's, and being unable to find a specific obj file it was looking
>> for from the cluster so it kept trying endlessly. Ended up having to
>> reformat the cluster, which is when I got my two storage servers rebuilt to
>> handle 2 more sheep clusters and set it up to use 3 copies amongst 4
>> servers, then finally the 2 other vservers were joined into the sheepdog
>> cluster as a whole.
>>
>> Any information regarding this problem I'd be glad to hear it. So far it
>> looks like Sheepdog is going to be very strong and powerful and meet my
>> needs, as long as I can get around this current problem I have presently.
>>
>> Eric Renfro
>> -- 
>> sheepdog mailing list
>> sheepdog at lists.wpkg.org
>> http://lists.wpkg.org/mailman/listinfo/sheepdog