[sheepdog] [PATCH V2 00/11] INTRODUCE
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Mon Aug 20 15:00:42 CEST 2012
At Thu, 9 Aug 2012 16:43:38 +0800,
Yunkai Zhang wrote:
>
> From: Yunkai Zhang <qiushu.zyk at taobao.com>
>
> V2:
> - fix a typo
> - when an object is updated, delete it old version
> - reset cluster recovery state in finish_recovery()
>
> Yunkai Zhang (11):
> sheep: enable variale-length of join_message in response of join
> event
> sheep: share joining nodes with newly added sheep
> sheep: delay to process recovery caused by LEAVE event just like JOIN
> event
> sheep: don't cleanup working directory when sheep joined back
> sheep: read objects only from live nodes
> sheep: write objects only on live nodes
> sheep: mark dirty object that belongs to the leaving nodes
> sheep: send dirty object list to each sheep when cluster do recovery
> sheep: do recovery with dirty object list
> collie: update 'collie cluster recover info' commands
> collie: update doc about 'collie cluster recover disable'
>
> collie/cluster.c | 46 ++++++++---
> include/internal_proto.h | 32 ++++++--
> include/sheep.h | 23 ++++++
> man/collie.8 | 2 +-
> sheep/cluster.h | 29 +------
> sheep/cluster/accord.c | 2 +-
> sheep/cluster/corosync.c | 9 ++-
> sheep/cluster/local.c | 2 +-
> sheep/cluster/zookeeper.c | 2 +-
> sheep/farm/trunk.c | 2 +-
> sheep/gateway.c | 39 ++++++++-
> sheep/group.c | 202 +++++++++++++++++++++++++++++++++++++++++-----
> sheep/object_list_cache.c | 182 +++++++++++++++++++++++++++++++++++++++--
> sheep/ops.c | 85 ++++++++++++++++---
> sheep/recovery.c | 133 +++++++++++++++++++++++++++---
> sheep/sheep_priv.h | 57 ++++++++++++-
> 16 files changed, 743 insertions(+), 104 deletions(-)
I've looked into this series, and IMHO the change is too complex.
With this series, when recovery is disabled and there are left nodes,
sheep can succeed in a write operation even if the data is not fully
replicated. But, if we allow it, it is difficult to prevent VMs from
reading old data. Actually this series put a lot of effort into it.
I'd suggest allowing epoch increment even when recover is
disabled. If recovery work recovers only rw->prio_oids and delays the
recovery of rw->oids, I think we can get the similar benefit with much
simpler way:
http://www.mail-archive.com/sheepdog@lists.wpkg.org/msg05439.html
Thanks,
Kazutaka
More information about the sheepdog
mailing list