[sheepdog] compatibility of dog command between new and old cluster

Tue Jul 15 11:31:20 CEST 2014

Once I submit a read request by new version dog command (ledger object 
supported) to a old version cluster (ledger object not supported), the 
cluster is corrupted.

Error messages in sheep.log:

Jul 15 11:17:26  ERROR [gway 24285] default_read_from_path(291) failed 
to read object 80e4a2b600000000, 
path=/mnt/sheepdog/obj/80e4a2b600000000, offset=0, size=12587576, 
result=4198976, Success
Jul 15 11:17:26  ERROR [gway 24285] err_to_sderr(114) 
oid=80e4a2b600000000, Success
Jul 15 11:17:26  ERROR [gway 24285] gateway_replication_read(270) local 
read 80e4a2b600000000 failed, Network error between sheep
Jul 15 11:17:26   INFO [main] md_remove_disk(349) /mnt/sheepdog/obj from 
multi-disk array
Jul 15 11:17:26  ERROR [gway 24285] sheep_exec_req(1114) failed Network 
error between sheep, remote address: 192.168.1.2:7000, op name: READ_PEER

As you can see, vdi object size was changed, expected 12587576, actually 
4198976. As a result, sheep thought the disk had unrecoverable problem 
so that it must been removed. And then, a recovery will be triggered. 
The behavior is not so robust.

Maybe we need something like version control to avoid this issue. What 
is your opinion?