[sheepdog] [sheepdog-users] plan for Sheepdog 1.0 and new organization

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu May 23 13:03:46 CEST 2013


Hi Wenhao,

> 1.  QEMU sheepdog driver reconnect mechanism
> 
> Currently, if sheep exits, all the VMs need to be restarted. This
> feature is necessary for online sheepdog upgrade and avoid restarting
> VMs when sheep unexpected crashes.

Yes, this looks a definitely necessary feature to me.  QEMU needs to
resend all the IO requests which were sent before the connection is
closed.  It may not be so easy to implement it.

> 2. Zookeeper Driver Panic.
> 
> If zk handle times out, the sheep will panic and exits now because it
> doesn't know how to handle this time-out event. This needs a rework of
> zookeeper driver to resend the event when timeout and should every
> node detect duplicate events. @Kai is working on this.

Sounds great, I'm looking forward to his patch.

> 3. Recovery I/O control.
> 
> When a node is joined/left or a new md disk is pluged/unpluged, this
> will trigger the recovery event which will takes lots of bandwidth in
> our test. This makes all the VM IO very slow. As Hongyi suggested, we
> could either allocate a proportional bandwidth to the recovery I/O or
> put them in a low priority.
> 
> The same could apply to other I/O channel, like vdi check.

I think of improving the current recovery codes in near future so that
the recovery thread doesn't affect VM I/O as far as it can.

> 
> 4. A parameter to adjust recovery trigger interval.
> 
> This one is related to 3.
> 
> Now sheepdog always do eager recovery when a node is found lost. In
> many cases, the node could join very soon. It is useful if there is a
> parameter that the Admin could specify how long the recovery should
> wait.

Yes, specifying the wait time was proposed long time ago, but is not
implemented yet.


I received many feedbacks and bug reports with the current code, and I
think I was going too fast to release a stable version.  I discussed
this with Yuan and we are planning to change the release version at
the end of this month to 0.6.0.

After the release, we think of using the same release schedule with
QEMU (a quarterly cycle).  Then, the next release (0.7.0 or 1.0) will
be on August.

Any feedback is welcome!

Thanks,

Kazutaka



More information about the sheepdog mailing list