At Tue, 16 Aug 2011 21:04:07 +0100, Brian Candler wrote: > > I am in the process of getting a trivial (1-node) sheepdog running under > Ubuntu 11.04 x86_64. > > I have the corosync package installed, copied corosync.conf.example to > corosync.conf and set a valid bindnetaddr. It appears to start - these > messages appear in /var/log/syslog > > ~~~~ > Aug 16 20:31:37 x100 corosync[15772]: [MAIN ] Corosync Cluster Engine ('1.2.1'): started and ready to provide service. > Aug 16 20:31:37 x100 corosync[15772]: [MAIN ] Corosync built-in features: nss > Aug 16 20:31:37 x100 corosync[15772]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. > Aug 16 20:31:37 x100 corosync[15772]: [TOTEM ] Initializing transport (UDP/IP). > Aug 16 20:31:37 x100 corosync[15772]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). > Aug 16 20:31:37 x100 corosync[15772]: [TOTEM ] The network interface [192.168.122.1] is now up. > Aug 16 20:31:37 x100 corosync[15772]: [SERV ] Service engine loaded: corosync extended virtual synchrony service > Aug 16 20:31:37 x100 corosync[15772]: [SERV ] Service engine loaded: corosync configuration service > Aug 16 20:31:37 x100 corosync[15772]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 > Aug 16 20:31:37 x100 corosync[15772]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 > Aug 16 20:31:37 x100 corosync[15772]: [SERV ] Service engine loaded: corosync profile loading service > Aug 16 20:31:37 x100 corosync[15772]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 > Aug 16 20:31:37 x100 corosync[15772]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. > Aug 16 20:31:37 x100 corosync[15772]: [TOTEM ] A processor joined or left the membership and a new membership was formed. > Aug 16 20:31:37 x100 corosync[15772]: [MAIN ] Completed service synchronization, ready to provide service. > ~~~~ > > However, when I try to run sheep, I get the following: > > ~~~~ > $ sheep -f /var/tmp/sheep > sheep: jrnl_recover(2305) Openning the directory /var/tmp/sheep/journal/00000000/. > sheep: create_cluster(1709) Failed to initialize cpg, 100 > sheep: create_cluster(1710) Is corosync running? > sheep: main(150) failed to create sheepdog cluster. > ~~~~ > > And each time I do this, I get the following message in /var/log/syslog: > > Aug 16 20:32:46 x100 corosync[15772]: [IPC ] Invalid IPC credentials. > > This suggests to me some sort of authentication issue between sheep and > corosync. > > The usage example at https://github.com/collie/sheepdog/wiki/Getting-Started > seems to show sheep being run as a regular user, not root. But I tried > running as root anyway, and it seemed to work this time: > > ~~~~ > $ sudo sheep -f /var/tmp/sheep > sheep: jrnl_recover(2305) Openning the directory /var/tmp/sheep/journal/00000000/. > sheep: set_addr(1696) addr = 192.168.122.1, port = 7000 > sheep: main(154) Sheepdog daemon (version 0.2.3) started > sheep: read_epoch(2099) failed to read epoch 0 > ~~~~ > > OK, so let's go with that for now (although I'd prefer not to run as root) > > ~~~~ > $ collie cluster format --copies=2 > $ collie node list > Idx - Host:Port Vnodes Zone > ----------------------------------------- > * 0 - 192.168.122.1:7000 64 0 > $ qemu-img create sheepdog:Test 2G > Formatting 'sheepdog:Test', fmt=raw size=2147483648 > qemu-img: Failed to write the requested VDI, Test > > qemu-img: sheepdog:Test: error while creating raw: Input/output error > ~~~~ > > Hmm, that's not so good. The sheep process says: > > ~~~~ > sheep: cluster_queue_request(266) 0x7f6c891f4010 84 > sheep: attr(1928) use 'user_xattr' option?, user.sheepdog.copies > sheep: __sd_deliver_done(925) unknown message 2 > sheep: cluster_queue_request(266) 0x10d3130 82 > sheep: cluster_queue_request(266) 0x10d3130 11 > sheep: do_lookup_vdi(236) looking for Test 4, ec9f05 > sheep: add_vdi(333) we create a new vdi, 0 Test (4) 2147483648, vid: ec9f05, base 0, cur 0 > sheep: add_vdi(337) qemu doesn't specify the copies... 2 > sheep: store_queue_request_local(628) use 'user_xattr' option? > sheep: write_object(647) fail 80ec9f0500000000 6 > sheep: __sd_deliver_done(925) unknown message 2 > ~~~~ > > Maybe I need to set copies=1 for a degraded cluster? > > ~~~~ > $ collie cluster format --copies=1 > $ collie node list > Idx - Host:Port Vnodes Zone > ----------------------------------------- > * 0 - 192.168.122.1:7000 64 0 > brian at x100:/etc/corosync$ qemu-img create sheepdog:Test 2G > Formatting 'sheepdog:Test', fmt=raw size=2147483648 > qemu-img: Failed to write the requested VDI, Test > > qemu-img: sheepdog:Test: error while creating raw: Input/output error > ~~~~ > > Same result: > > ~~~~ > sheep: cluster_queue_request(266) 0x10d3130 84 > sheep: attr(1928) use 'user_xattr' option?, user.sheepdog.copies > sheep: __sd_deliver_done(925) unknown message 2 > sheep: cluster_queue_request(266) 0x10d3130 82 > sheep: cluster_queue_request(266) 0x10d3130 11 > sheep: do_lookup_vdi(236) looking for Test 4, ec9f05 > sheep: add_vdi(333) we create a new vdi, 0 Test (4) 2147483648, vid: ec9f05, base 0, cur 0 > sheep: add_vdi(337) qemu doesn't specify the copies... 1 > sheep: store_queue_request_local(628) use 'user_xattr' option? > sheep: write_object(647) fail 80ec9f0500000000 6 > sheep: __sd_deliver_done(925) unknown message 2 > ~~~~ > > I notice the message about "user_xattr" option. However this filesystem > is ext4: > > $ mount | grep "on / " > /dev/sda5 on / type ext4 (rw,errors=remount-ro,commit=0) > > and the Getting-Started guide says that user_xattr is only needed for ext3. > However, let's try it anyway: > > $ sudo mount -o remount,user_xattr / > > OK, that seems to work! Sheep shows: > > ~~~~ > sheep: cluster_queue_request(266) 0x10d3130 11 > sheep: do_lookup_vdi(236) looking for Test 4, ec9f05 > sheep: add_vdi(333) we create a new vdi, 0 Test (4) 2147483648, vid: ec9f05, base 0, cur 0 > sheep: add_vdi(337) qemu doesn't specify the copies... 1 > sheep: vdi_op_done(758) done 0 15507205 > sheep: __sd_deliver_done(925) unknown message 2 > ~~~~ > > and I can boot with > > $ qemu-system-x86_64 -cdrom /v/downloads/linux/ubuntu-10.04.3-server-amd64.iso sheepdog:Test > > So it looks like I have a one-node cluster: > > ~~~~ > # collie cluster info > Cluster status: running > > Creation time Epoch Nodes > 1970-01-01 01:00:00 1 [192.168.122.1:7000] > ~~~~~ > > Anyway, my questions are: > > 1. Can I run sheep as a non-root user? If so, how? > > 2. Do I really need user_xattr even for ext4? (if so, the documentation > needs adjusting) For Sheepdog, the underlying filesystem needs to support an extended attribute. I'm not familiar with ext4, but the answer is probably yes. > > 3. Can I ignore the cluster creation time of '1970-01-01 01:00:00' ? If your filesystem supports an extended attribute, you can set the correct creation time. > > 4. What happens if you set --copies=N but the cluster degrades to > the point where it has fewer nodes than that? As far as I can see, > my one-node cluster with --copies=2 does actually work. Would the > data get copied when a new node is added? Even if the number of nodes is fewer than N, Sheepdog can work. When you add a new machine, the copies of data will be increased up to N. > > One other point. Experimentation shows that "collie cluster format" > instantly destroys all existing vdis, with no confirmation - and it can be > run as a non-root user. Can I suggest some idiot-proofing is done on this? > e.g. if a cluster already exists then you need to add some extra parameter > to force deletion? Good point. I think we should add something like a "--force" option. E.g. $ collie cluster format # succeed only when the store directory is empty $ collie cluster format --force # succeed always Thanks, Kazutaka |