[sheepdog] [PATCH v2 0/5] garbage collect needless VIDs and inode objects
Liu Yuan
namei.unix at gmail.com
Mon Mar 16 03:21:50 CET 2015
On Thu, Mar 12, 2015 at 08:14:33PM +0900, Hitoshi Mitake wrote:
> At Thu, 12 Mar 2015 14:41:56 +0800,
> Liu Yuan wrote:
> >
> > On Tue, Jan 13, 2015 at 10:37:40AM +0900, Hitoshi Mitake wrote:
> > > Current sheepdog never recycles VIDs. But it will cause problems
> > > e.g. VID space exhaustion, too much garbage inode objects.
> > >
> > > Keeping deleted inode objects is required because living inodes
> > > (snapshots or clones) can point objects of the deleted inodes. So if
> > > every member of VDI family is deleted, it is safe to remove deleted
> > > inode objects.
> > >
> > > v2:
> > > - update test scripts
> >
> > All the nodes of our test cluster panic out for the following problem:
> >
> > Mar 12 00:05:03 DEBUG [main] zk_handle_notify(1216) NOTIFY
> > Mar 12 00:05:03 DEBUG [main] sd_notify_handler(960) op NOTIFY_VDI_ADD, size: 96, from: IPv4 ip:192.168.39.177 port:7000
> > Mar 12 00:05:03 DEBUG [main] do_add_vdi_state(362) 7c2b2b, 3, 0, 22, 0
> > Mar 12 00:05:03 DEBUG [main] do_add_vdi_state(362) 7c2b2c, 3, 0, 22, 7c2b2b
> > Mar 12 00:05:03 EMERG [main] update_vdi_family(127) PANIC: parent VID: 7c2b2b not found
> > Mar 12 00:05:03 EMERG [main] crash_handler(286) sheep exits unexpectedly (Aborted), si pid 4786, uid 0, errno 0, code -6
> > Mar 12 00:05:03 EMERG [main] sd_backtrace(833) sheep.c:288: crash_handler
> > Mar 12 00:05:03 EMERG [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x338200f4ff]
> > Mar 12 00:05:03 EMERG [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3381c328a4]
> > Mar 12 00:05:03 EMERG [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3381c34084]
> > Mar 12 00:05:03 EMERG [main] sd_backtrace(833) vdi.c:127: update_vdi_family
> > Mar 12 00:05:03 EMERG [main] sd_backtrace(833) vdi.c:398: add_vdi_state
> > Mar 12 00:05:03 EMERG [main] sd_backtrace(833) ops.c:711: cluster_notify_vdi_add
> > Mar 12 00:05:03 EMERG [main] sd_backtrace(833) group.c:975: sd_notify_handler
> >
> > So I tracked back to this patch set. The problem of this patch set tried to
> > solve is very clear and come along with sheepdog since its born. This reveals
> > actually the defeciency of our vdi allocation algorithm, which we need rethink
> > a completely new algorithm to replace it and is not fixable, unfortunately.
> >
> > One simple rule, we can't recyle any vid if it is once created because of its
> > current hash collision handling. Our current implementation forbigs recycling.
> >
> > Instead of fixing the above panic bug, I'd suggest we revert this patch set.
> > For the problem this patch set mentioned, I think we need a new algoirthm and
> > implementation. But before that, we should stay with old one, it is stable and
> > reliable and should work for small size cluster.
> >
> > How do you think, Hitoshi and Kazutaka?
>
> How about providing switch turn on/off VID recycling? e.g. dog cluster
> format --enable-vid-recycle. The code can easily be pushed into
> conditional branches. I can post a patch if this way is good for you.
>
This temporary workaroud looks okay but not good enough to me, what I am
concerned is that vdi recycle will probably never be implemented if we stick to
current vdi allocation algorithm. Once the new vdi allocation is intruduced
someday in the future, the new algorithm would have no this kind of problem at
all. If this is the case, the above code we leave here is also useless.
I think we should focus on the new vdi allocation algorithm, e.g, store
{name, vid} directly into a kv engine either implemented by sheep or by with the
help of other software like zookeeper.
I'm inclined to revert above patch set, for
1. it can't fix a non-fixable problem inherently
2. the code is probalematic and can cause a catastraphic disaster (all node die)
3. we might not need it in the future because it is specific for current vdi
allocation algorithm.
One more statement, based on our deployment experience, the lowest utility of
the computer resource (cpu, memeory, disk) is the disk, meaning that we could
trade space for other things like cpu, algorithm simplicity, etc. And sheepdog
is designed to be a scalable storage system, so if the disk space is in the high
watermwark, we can easily add node or disk.
Thanks,
Yuan
More information about the sheepdog
mailing list