[sheepdog] [PATCH v2 1/2] sheep, dog: make recycling VID selectable

Tue Mar 17 05:40:16 CET 2015

On Tue, Mar 17, 2015 at 01:33:58PM +0900, Hitoshi Mitake wrote:
> At Tue, 17 Mar 2015 11:06:34 +0800,
> Liu Yuan wrote:
> > 
> > On Tue, Mar 17, 2015 at 11:42:01AM +0900, Hitoshi Mitake wrote:
> > > At Tue, 17 Mar 2015 10:03:53 +0800,
> > > Liu Yuan wrote:
> > > > 
> > > > On Tue, Mar 17, 2015 at 04:44:46AM +0900, MORITA Kazutaka wrote:
> > > > > At Mon, 16 Mar 2015 21:13:29 +0800,
> > > > > Liu Yuan wrote:
> > > > > > 
> > > > > > How about make 'dog vdi clone --no-share' as the default clone operation? And
> > > > > > we can add dog vdi clone --share to keep old behavior as optional. By this
> > > > > > manner, --no-share will save us from this kind of subtle problem. And your team
> > > > > > problem about vdi exhaustion will be achieved :).
> > > > > 
> > > > > --no-share option disables thin provisioing.  It shouldn't be a default option,
> > > > > IMHO.
> > > > 
> > > > Following bug will disable vid recycle for old algorithm.
> > > > 
> > > > commit 21549a1bd4981fabcc09d062a647162127fe0637
> > > > Author: Hitoshi Mitake <mitake.hitoshi at gmail.com>
> > > > Date:   Sun Jun 1 23:23:18 2014 +0900
> > > > 
> > > >     sheep: don't recycle VDI ID
> > > >     
> > > >     Recycling VDI IDs of deleted VDIs is a completely wrong idea. It
> > > >     breaks relations between inode objects and data objects. For example,
> > > >     it can cause a problem of corrupting cloned VDIs (see related
> > > >     issue). This patch forbids the recycling.
> > > >     
> > > >     Related issue:
> > > >     https://bugs.launchpad.net/sheepdog-project/+bug/1317755
> > > >     
> > > >     Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > >     Signed-off-by: Liu Yuan <namei.unix at gmail.com>
> > > > 
> > > > This means we don't have vid recycle for old algorithm now because of this
> > > > subtle problem. This is why I suggest set --no-share as default, in order to
> > > > bring this functionality back.
> > > 
> > > 1. clone --no-share is much heavier operation than whole range lookup
> > > of bitmap. It produces read + write request * # replication for every
> > > objects pointed by a parent snapshot. It means we cannot provide fast
> > > cloning. And space consumption will increase explosively.
> > > 
> > > 2. the old recycling doesn't take care about snapshots completely as I
> > >    wrote in my another email (and the issue in the above link of
> > >    launchpad describes).
> > > 
> > > > 
> > > > > > 
> > > > > > This manner is not perfect, but it will benefit us:
> > > > > > 
> > > > > > 1. stable code base since old algorithm is long tested.
> > > > > 
> > > > > Hitoshi's patch enables the stable algorithm by default.  Isn't it enough?
> > > > 
> > > > I'm afraid not.
> > > > 
> > > > 1. as above mentioned, simply disable new algorithm won't bring us back vid
> > > >    recycle. But if we bring it back, seems it will conflict with new
> > > >    algorithm.
> > > 
> > > Revive the old algorithm is completely impossible as I described in the above.
> > 
> > Not all the use case will have above mentioned problem. Most of the time, people
> > won't destroy the whole chain and recreate it with the same name while clones
> > are running. In this sense, it is a extreme corner case that some user might
> > have it.
> > 
> > So you mentioned NTT will take periodic snapshots and afraid of vid exthaustion,
> > it is a valid demand. But you can't recycle vid unless you use --no-share for
> > clone even with your new old algorithm, right?
> > 
> > This means, both old and new algorithm face the same problem, no?
> > 
> > > > 2. new algorithm has a bug that need to hack vdi_lookup(), which will degrade it
> > > >    a lot. I'm not sure if we can hack vdi_lookup() to meet two
> > > >    algorithm's needs.
> > > 
> > > Looking up whole range unconditionally solves the problem. And it can
> > > be disabled with the option if users don't like.
> > 
> > But we can only recycle vid if the whole chain is deleted even with your new
> > algorithm, meaning that --no-share will still be used if your new alogithm take
> > effect, otherwise, your new algorithm won't help us recycle vid, no? In other
> > words, new algorithm = old algorithm.
> > 
> > What I am concerned of new algorithm is it is very limited, if I don't get it
> > wrong.
> > 
> > The new algorithm allow to recycle vid only if we delete the whole chain and
> > use --no-share for clone to cut the relationship, which old algorithm can
> > achieve the same purpose withouth changing a single line.
> > 
> > *So my question is, why we need new one?*
> > 
> > considering new algorithm will look up the whole range unconditionally, which
> > will degrade the general case, even some people won't need recycle vid.
> > 
> > Did I misunderstand anything of your new algorithm?
> 
> Yes. The old algorithm has a possibility of data corruption as described:
> https://bugs.launchpad.net/sheepdog-project/+bug/1317755
> Because it doesn't care about family relation of VDIs. Your latest
> patchset can revive the above problem.
> 
> My new one cares it. So we recycle VIDs safely without the data
> corruption. Even though it requires --no-share cloning, it is much
> better to have the new one at least as an option.

Okay.

> > 
> > A real new algorithm, I guess, is uproot the old algorithm completely and get
> > rid of vid exthaution without the help of --no-share.
> > 
> 
> For doing this, at least we need a mechanism to enforce COW to VMs and
> cut dependency between VDIs. And it is just a part of the
> requirements. For detecting dependency between VDIs, we need to check
> data_vdi_id of every VDI member of the family. It will require much
> more complex implementation and runtime overhead.
> 

Store {name, vid, relationship} into zookeeper might sovle the problem. But yes,
this is another topic.

Thanks,
Yuan