[sheepdog] [PATCH v2] sheep/recovery: multi-threading recovery process

Liu Yuan namei.unix at gmail.com
Wed Jan 29 09:12:44 CET 2014


On Wed, Jan 29, 2014 at 05:01:52PM +0900, Hitoshi Mitake wrote:
> At Wed, 29 Jan 2014 15:53:57 +0800,
> Liu Yuan wrote:
> > 
> > On Wed, Jan 29, 2014 at 03:32:34PM +0800, Liu Yuan wrote:
> > > On Wed, Jan 29, 2014 at 04:28:35PM +0900, Hitoshi Mitake wrote:
> > > > At Tue, 28 Jan 2014 18:01:42 +0800,
> > > > Liu Yuan wrote:
> > > > > 
> > > > > Rationale for multi-threaded recovery:
> > > > > 
> > > > > 1. If one node is added, we find that all the VMs on other nodes will get
> > > > >    noticeably affected until 50% data is transferred to the new node.
> > > > > 
> > > > > 2. For node failure, we might not have problems of running VM but the
> > > > >    recovery process boost will benefit IO operation of VM with less
> > > > >    chances to be blocked for write and also improve reliability.
> > > > > 
> > > > > 3. For disk failure in node, this is similar to adding a node. All
> > > > >    the data on the broken disk will be recovered on other disks in
> > > > >    this node. Speedy recoery not only improve data reliability but
> > > > >    also cause less writing blocking on the lost data.
> > > > > 
> > > > > Our oid scheduling algorithm is intact and simply add multi-threading onto top
> > > > > of current recovery algorithm with minimal changes.
> > > > > 
> > > > > - we still have ->oids array to denote oids to be recovered
> > > > > - we start up 2 * nr_disks threads for recovery
> > > > > - the tricky part is that we need to wait all the running threads to
> > > > >   completion before start next recovery events for multiple nodes/disks events
> > > > > 
> > > > > This patch passes "./check -g md -md" on my local box
> > > > 
> > > > On my box, at least 32 and 33 failed. I'm seeking the root cause now
> > > > but this patch seems to be a little bit dangerous.
> > > > 
> > > 
> > > Yes, this shouldn't go to stable-0.8, but master is okay and at least we need to
> > > pass all the tests before it can goto master.
> > 
> > 32 and 33 isn't md-ready tests. Please use 
> > 
> > ./check -g md -md
> > 
> > to test this patch.
> 
> 32 and 33 are tests for recovery. So I think we shouldn't exclude them
> for testing your patch, no?

ah, how did you run ./check? I'll try to reproduce on my box.

Thanks
Yuan



More information about the sheepdog mailing list