[sheepdog] compatibility of dog command between new and old cluster

Ruoyu liangry at ucweb.com
Tue Jul 15 11:31:20 CEST 2014


Once I submit a read request by new version dog command (ledger object 
supported) to a old version cluster (ledger object not supported), the 
cluster is corrupted.

Error messages in sheep.log:

Jul 15 11:17:26  ERROR [gway 24285] default_read_from_path(291) failed 
to read object 80e4a2b600000000, 
path=/mnt/sheepdog/obj/80e4a2b600000000, offset=0, size=12587576, 
result=4198976, Success
Jul 15 11:17:26  ERROR [gway 24285] err_to_sderr(114) 
oid=80e4a2b600000000, Success
Jul 15 11:17:26  ERROR [gway 24285] gateway_replication_read(270) local 
read 80e4a2b600000000 failed, Network error between sheep
Jul 15 11:17:26   INFO [main] md_remove_disk(349) /mnt/sheepdog/obj from 
multi-disk array
Jul 15 11:17:26  ERROR [gway 24285] sheep_exec_req(1114) failed Network 
error between sheep, remote address: 192.168.1.2:7000, op name: READ_PEER

As you can see, vdi object size was changed, expected 12587576, actually 
4198976. As a result, sheep thought the disk had unrecoverable problem 
so that it must been removed. And then, a recovery will be triggered. 
The behavior is not so robust.

Maybe we need something like version control to avoid this issue. What 
is your opinion?




More information about the sheepdog mailing list