[sheepdog-users] monitor cluster to avoid corruption

Tue Dec 18 15:08:02 CET 2012

2012/12/18 Liu Yuan <namei.unix at gmail.com>:
> I think this is not easy to solve with a small change.

Well, I'm going to add more trouble :-)

During the performance tests of the other thread, my guest crushed 3
or more times.
I've been reproducing the problem and I describe it.

Some days ago I tried to write some data on the guest;
I took note of the percentage 'cluster node info';
I deleteed the data and wrote some other (less or the same amount).
Percentage of use didn't change, because once allocated the space, it
uses it and it doesn't allocate more.

During my performance test of today, I've been doing the same thing, but
*when I run dd the second time (on the same file), sheep daemon goes
crazy on my first node.*
The first node is the one running kvm.
The sheep daemon uses "all" the cpu (180%);
it's not in the cluster anymore;
it's impossible to interact with the guest (kvm uses almost no cpu)

(My cluster size is still the same)

*This is the situation after writing the 512M*
Looking at the percentage I deduce I'm able to write 512M more.

collie node list
M   Id   Host:Port         V-Nodes       Zone
-    0   192.168.2.41:7000      16  688040128
-    1   192.168.2.42:7000      16  704817344
-    2   192.168.2.43:7000      161  721594560

collie node info
Id      Size    Used    Use%
 0      982 MB  320 MB   32%
 1      982 MB  228 MB   23%
 2      10.0 GB 540 MB    5%
Total   12 GB   1.1 GB    8%
Total virtual image size        10 GB

*This is the situation rewriting the same data*

collie node list
M   Id   Host:Port         V-Nodes       Zone
-    0   192.168.2.42:7000      11  704817344
-    1   192.168.2.43:7000      117  721594560

collie node info
Id      Size    Used    Use%
 0      982 MB  660 MB   67%  (Note: this was 55% right after the freeze)
 1      10.0 GB 660 MB    6%
Total   11 GB   1.3 GB   11%
Total virtual image size        10 GB

sheep.log
Dec 18 14:32:21 [block] do_lookup_vdi(393) looking for test (7c2b25)
Dec 18 14:34:56 [gway 805] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:56 [gway 806] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:56 [main] gateway_op_done(100) leaving sheepdog cluster
Dec 18 14:34:56 [gway 807] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:56 [gway 808] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:56 [gway 809] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:56 [main] gateway_op_done(100) leaving sheepdog cluster
Dec 18 14:34:56 [gway 811] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:56 [rw 810] get_vdi_copy_number(82) No VDI copy entry for 0 found
Dec 18 14:34:56 [rw 810] screen_object_list(545) ERROR: can not find
copy number for object fc310
Dec 18 14:34:56 [rw 810] get_vdi_copy_number(82) No VDI copy entry for 0 found
Dec 18 14:34:56 [rw 810] screen_object_list(545) ERROR: can not find
copy number for object 57
Dec 18 14:34:56 [gway 814] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:56 [gway 815] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [main] queue_cluster_request(315) COMPLETE_RECOVERY (0xa921b0)
Dec 18 14:34:57 [gway 817] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 820] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 821] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 822] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 824] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 825] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 826] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 830] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 831] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:34:57 [gway 832] prealloc(303) failed to preallocate space,
No space left on device
Dec 18 14:35:01 [gway 836] prealloc(303) failed to preallocate space,
No space left on device
(END)

I have to
  kill -9 <pid of kvm>

The disk seems to be corrupted
  collie vdi check test
  Failed to read, No object found

sheep.log
Dec 18 14:34:58 [main] queue_cluster_request(315) COMPLETE_RECOVERY (0x2029df0)
Dec 18 14:45:41 [main] queue_cluster_request(315) LOCK_VDI (0x7f12e00008e0)
Dec 18 14:45:41 [block] do_lookup_vdi(393) looking for test (7c2b25)
Dec 18 15:01:46 [main] queue_cluster_request(315) LOCK_VDI (0x7f12e0000a00)
Dec 18 15:01:46 [block] do_lookup_vdi(393) looking for test (7c2b25)

0.5.5_6_gb3f888b