[sheepdog-users] Problems with upgrade from 0.4.0 to 0.5.1

Jens WEBER jweber at tek2b.org
Wed Oct 3 21:50:35 CEST 2012


I have tested this many times always same result - upgrade dosn't work

Test setup:
cluster format 3 copies -m unsafe (without -m unsafe same result) 
[ ok ] sheepdog gateway-only (-p 7000 -g -z 999 /var/lib/sheepdog/disk0) is running.
[ ok ] sheepdog for Disk A (-p 7001 -z 1 /var/lib/sheepdog/disk1) is running.
[ ok ] sheepdog for Disk B (-p 7002 -z 1 /var/lib/sheepdog/disk2) is running.
[ ok ] sheepdog for Disk D (-p 7004 -z 2 /var/lib/sheepdog/disk3) is running.
[ ok ] sheepdog for Disk C (-p 7003 -z 2 /var/lib/sheepdog/disk4) is running.
[ ok ] sheepdog for Disk E (-p 7005 -z 3 /var/lib/sheepdog/disk5) is running.
[ ok ] sheepdog for Disk F (-p 7006 -z 3 /var/lib/sheepdog/disk6) is running.

starting all node with -u option, looks fine so far but

root at sheep01:/# collie node info
Id	Size	Used	Use%
Response's result: Waiting for other nodes to join cluster
Response's result: Waiting for other nodes to join cluster
Response's result: Waiting for other nodes to join cluster
Response's result: Waiting for other nodes to join cluster
Response's result: Waiting for other nodes to join cluster
Response's result: Waiting for other nodes to join cluster
Response's result: Waiting for other nodes to join cluster
Cannot get information from any nodes

what about the config file

root at sheep01:/# ls -l /var/lib/sheepdog/disk*/config
-rw-r----- 1 root root 16 Okt  3 21:30 /var/lib/sheepdog/disk0/config <- not converted, still size of version 0.4.0 !!!!
-rw-r----- 1 root root 40 Okt  3 21:31 /var/lib/sheepdog/disk1/config
-rw-r----- 1 root root 40 Okt  3 21:31 /var/lib/sheepdog/disk2/config
-rw-r----- 1 root root 40 Okt  3 21:31 /var/lib/sheepdog/disk3/config
-rw-r----- 1 root root 40 Okt  3 21:31 /var/lib/sheepdog/disk4/config
-rw-r----- 1 root root 40 Okt  3 21:31 /var/lib/sheepdog/disk5/config
-rw-r----- 1 root root 40 Okt  3 21:31 /var/lib/sheepdog/disk6/config

try to stop cluster

root at sheep01:/# collie cluster shutdown
Response's result: Waiting for other nodes to join cluster
failed to execute request

ok, so I do pkill -9 sheep and start again

[FAIL] sheepdog gateway-only (-p 7000 -g -z 999 /var/lib/sheepdog/disk0) is not running ... failed!
[ ok ] sheepdog for Disk A (-p 7001 -z 1 /var/lib/sheepdog/disk1) is running.
[ ok ] sheepdog for Disk B (-p 7002 -z 1 /var/lib/sheepdog/disk2) is running.
[ ok ] sheepdog for Disk D (-p 7004 -z 2 /var/lib/sheepdog/disk3) is running.
[ ok ] sheepdog for Disk C (-p 7003 -z 2 /var/lib/sheepdog/disk4) is running.
[ ok ] sheepdog for Disk E (-p 7005 -z 3 /var/lib/sheepdog/disk5) is running.
[ ok ] sheepdog for Disk F (-p 7006 -z 3 /var/lib/sheepdog/disk6) is running.

gateway-only dosn't start any more! so what about the config file now

root at sheep01:/# ls -l /var/lib/sheepdog/disk*/config
-rw-r----- 1 root root 16 Okt  3 21:30 /var/lib/sheepdog/disk0/config <- still old size
-rw-r----- 1 root root 80 Okt  3 21:43 /var/lib/sheepdog/disk1/config <- wow double size now !?!?!?
-rw-r----- 1 root root 80 Okt  3 21:43 /var/lib/sheepdog/disk2/config
-rw-r----- 1 root root 80 Okt  3 21:43 /var/lib/sheepdog/disk3/config
-rw-r----- 1 root root 80 Okt  3 21:43 /var/lib/sheepdog/disk4/config
-rw-r----- 1 root root 80 Okt  3 21:43 /var/lib/sheepdog/disk5/config
-rw-r----- 1 root root 80 Okt  3 21:43 /var/lib/sheepdog/disk6/config

what says sheep.log of disk0

Oct 03 21:31:42 [main] jrnl_recover(230) opening the directory /var/lib/sheepdog/disk0/journal/
Oct 03 21:31:42 [main] jrnl_recover(235) starting journal recovery
Oct 03 21:31:42 [main] jrnl_recover(291) journal recovery complete
Oct 03 21:31:42 [main] init_config_path(96) This sheep version is not compatible with the existing data layout, 0
Oct 03 21:31:42 [main] send_join_request(1011) IPv4 ip:172.30.0.80 port:7000
Oct 03 21:31:42 [main] main(527) sheepdog daemon (version 0.5.1) started
Oct 03 21:31:42 [main] update_cluster_info(798) status = 4, epoch = 1, finished: 0
Oct 03 21:31:42 [main] sd_check_join_cb(971) 172.30.0.80:7001: ret = 0x0, cluster_status = 0x4
Oct 03 21:31:42 [main] update_cluster_info(798) status = 4, epoch = 1, finished: 1
Oct 03 21:31:43 [main] sd_check_join_cb(971) 172.30.0.80:7004: ret = 0x0, cluster_status = 0x4
Oct 03 21:31:43 [main] update_cluster_info(798) status = 4, epoch = 1, finished: 1
Oct 03 21:31:43 [main] sd_check_join_cb(971) 172.30.0.80:7002: ret = 0x0, cluster_status = 0x4
Oct 03 21:31:43 [main] update_cluster_info(798) status = 4, epoch = 1, finished: 1
Oct 03 21:31:43 [main] sd_check_join_cb(971) 172.30.0.80:7005: ret = 0x0, cluster_status = 0x4
Oct 03 21:31:43 [main] update_cluster_info(798) status = 4, epoch = 1, finished: 1
Oct 03 21:31:43 [main] sd_check_join_cb(971) 172.30.0.80:7006: ret = 0x0, cluster_status = 0x4
Oct 03 21:31:43 [main] update_cluster_info(798) status = 4, epoch = 1, finished: 1
Oct 03 21:31:43 [main] sd_check_join_cb(971) 172.30.0.80:7003: ret = 0x0, cluster_status = 0x4
Oct 03 21:31:43 [main] update_cluster_info(798) status = 4, epoch = 1, finished: 1
Oct 03 21:43:28 [main] crash_handler(439) sheep pid 10553 exited unexpectedly.
Oct 03 21:43:46 [main] jrnl_recover(230) opening the directory /var/lib/sheepdog/disk0/journal/
Oct 03 21:43:46 [main] jrnl_recover(235) starting journal recovery
Oct 03 21:43:46 [main] jrnl_recover(291) journal recovery complete
Oct 03 21:43:46 [main] init_config_path(96) This sheep version is not compatible with the existing data layout, 0
Oct 03 21:43:47 [main] send_join_request(1011) IPv4 ip:172.30.0.80 port:7000
Oct 03 21:43:47 [main] main(527) sheepdog daemon (version 0.5.1) started
Oct 03 21:43:47 [main] sd_join_handler(1029) Failed to join, exiting.
Oct 03 21:43:47 [main] crash_handler(439) sheep pid 11181 exited unexpectedly.

Cheers Jens



More information about the sheepdog-users mailing list