[sheepdog] [PATCH v2] sheep/recovery: multi-threading recovery process

Hitoshi Mitake mitake.hitoshi at gmail.com
Wed Jan 29 10:04:15 CET 2014


At Wed, 29 Jan 2014 17:38:34 +0900,
Hitoshi Mitake wrote:
> 
> At Wed, 29 Jan 2014 16:29:03 +0800,
> Liu Yuan wrote:
> > 
> > On Wed, Jan 29, 2014 at 05:19:09PM +0900, Hitoshi Mitake wrote:
> > > At Wed, 29 Jan 2014 16:14:56 +0800,
> > > Liu Yuan wrote:
> > > > 
> > > > On Wed, Jan 29, 2014 at 05:01:52PM +0900, Hitoshi Mitake wrote:
> > > > > At Wed, 29 Jan 2014 15:53:57 +0800,
> > > > > Liu Yuan wrote:
> > > > > > 
> > > > > > On Wed, Jan 29, 2014 at 03:32:34PM +0800, Liu Yuan wrote:
> > > > > > > On Wed, Jan 29, 2014 at 04:28:35PM +0900, Hitoshi Mitake wrote:
> > > > > > > > At Tue, 28 Jan 2014 18:01:42 +0800,
> > > > > > > > Liu Yuan wrote:
> > > > > > > > > 
> > > > > > > > > Rationale for multi-threaded recovery:
> > > > > > > > > 
> > > > > > > > > 1. If one node is added, we find that all the VMs on other nodes will get
> > > > > > > > >    noticeably affected until 50% data is transferred to the new node.
> > > > > > > > > 
> > > > > > > > > 2. For node failure, we might not have problems of running VM but the
> > > > > > > > >    recovery process boost will benefit IO operation of VM with less
> > > > > > > > >    chances to be blocked for write and also improve reliability.
> > > > > > > > > 
> > > > > > > > > 3. For disk failure in node, this is similar to adding a node. All
> > > > > > > > >    the data on the broken disk will be recovered on other disks in
> > > > > > > > >    this node. Speedy recoery not only improve data reliability but
> > > > > > > > >    also cause less writing blocking on the lost data.
> > > > > > > > > 
> > > > > > > > > Our oid scheduling algorithm is intact and simply add multi-threading onto top
> > > > > > > > > of current recovery algorithm with minimal changes.
> > > > > > > > > 
> > > > > > > > > - we still have ->oids array to denote oids to be recovered
> > > > > > > > > - we start up 2 * nr_disks threads for recovery
> > > > > > > > > - the tricky part is that we need to wait all the running threads to
> > > > > > > > >   completion before start next recovery events for multiple nodes/disks events
> > > > > > > > > 
> > > > > > > > > This patch passes "./check -g md -md" on my local box
> > > > > > > > 
> > > > > > > > On my box, at least 32 and 33 failed. I'm seeking the root cause now
> > > > > > > > but this patch seems to be a little bit dangerous.
> > > > > > > > 
> > > > > > > 
> > > > > > > Yes, this shouldn't go to stable-0.8, but master is okay and at least we need to
> > > > > > > pass all the tests before it can goto master.
> > > > > > 
> > > > > > 32 and 33 isn't md-ready tests. Please use 
> > > > > > 
> > > > > > ./check -g md -md
> > > > > > 
> > > > > > to test this patch.
> > > > > 
> > > > > 32 and 33 are tests for recovery. So I think we shouldn't exclude them
> > > > > for testing your patch, no?
> > > > 
> > > > I run 'sudo tests/functional/check 32 33' several times and no failure.
> > > 
> > > On my environment, 32 and 33 fail every time I run (sudo DRIVER=local
> > > ./check 32 33).
> > > 
> > > Below is 32.out.bad and 33.out.bad:
> > > 
> > > 32.out.bad:
> > > QA output created by 032
> > > using backend plain store
> > > 9c7766570b3be3aff2724f587c2f4107  -
> > > STORE/1/obj/807c2b2500000000
> > > STORE/2/obj/807c2b2500000000
> > > STORE/4/obj/807c2b2500000000
> > > STORE/5/obj/807c2b2500000000
> > > STORE/6/obj/807c2b2500000000
> > > STORE/7/obj/807c2b2500000000
> > > STORE/0/obj/007c2b2500000000
> > > STORE/1/obj/007c2b2500000000
> > > STORE/4/obj/007c2b2500000000
> > > STORE/5/obj/007c2b2500000000
> > > STORE/6/obj/007c2b2500000000
> > > STORE/7/obj/007c2b2500000000
> > > STORE/0/obj/007c2b2500000001
> > > STORE/1/obj/007c2b2500000001
> > > STORE/2/obj/007c2b2500000001
> > > STORE/5/obj/007c2b2500000001
> > > STORE/6/obj/007c2b2500000001
> > > STORE/7/obj/007c2b2500000001
> > > STORE/0/obj/007c2b2500000002
> > > STORE/2/obj/007c2b2500000002
> > > STORE/4/obj/007c2b2500000002
> > > STORE/5/obj/007c2b2500000002
> > > STORE/6/obj/007c2b2500000002
> > > STORE/7/obj/007c2b2500000002
> > > STORE/0/obj/007c2b2500000003
> > > STORE/1/obj/007c2b2500000003
> > > STORE/2/obj/007c2b2500000003
> > > STORE/5/obj/007c2b2500000003
> > > STORE/6/obj/007c2b2500000003
> > > STORE/7/obj/007c2b2500000003
> > > STORE/0/obj/007c2b2500000004
> > > STORE/1/obj/007c2b2500000004
> > > STORE/2/obj/007c2b2500000004
> > > STORE/4/obj/007c2b2500000004
> > > STORE/5/obj/007c2b2500000004
> > > STORE/6/obj/007c2b2500000004
> > > STORE/1/obj/007c2b2500000005
> > > STORE/2/obj/007c2b2500000005
> > > STORE/4/obj/007c2b2500000005
> > > STORE/5/obj/007c2b2500000005
> > > STORE/6/obj/007c2b2500000005
> > > STORE/7/obj/007c2b2500000005
> > > STORE/0/obj/007c2b2500000006
> > > STORE/1/obj/007c2b2500000006
> > > STORE/4/obj/007c2b2500000006
> > > STORE/5/obj/007c2b2500000006
> > > STORE/6/obj/007c2b2500000006
> > > STORE/7/obj/007c2b2500000006
> > > STORE/0/obj/007c2b2500000007
> > > STORE/1/obj/007c2b2500000007
> > > STORE/2/obj/007c2b2500000007
> > > STORE/4/obj/007c2b2500000007
> > > STORE/5/obj/007c2b2500000007
> > > STORE/6/obj/007c2b2500000007
> > > STORE/0/obj/007c2b2500000008
> > > STORE/1/obj/007c2b2500000008
> > > STORE/2/obj/007c2b2500000008
> > > STORE/4/obj/007c2b2500000008
> > > STORE/5/obj/007c2b2500000008
> > > STORE/6/obj/007c2b2500000008
> > > STORE/1/obj/007c2b2500000009
> > > STORE/2/obj/007c2b2500000009
> > > STORE/4/obj/007c2b2500000009
> > > STORE/5/obj/007c2b2500000009
> > > STORE/6/obj/007c2b2500000009
> > > STORE/7/obj/007c2b2500000009
> > > STORE/0/obj/007c2b250000000a
> > > STORE/1/obj/007c2b250000000a
> > > STORE/2/obj/007c2b250000000a
> > > STORE/4/obj/007c2b250000000a
> > > STORE/5/obj/007c2b250000000a
> > > STORE/6/obj/007c2b250000000a
> > > STORE/0/obj/007c2b250000000b
> > > STORE/1/obj/007c2b250000000b
> > > STORE/2/obj/007c2b250000000b
> > > STORE/4/obj/007c2b250000000b
> > > STORE/5/obj/007c2b250000000b
> > > STORE/6/obj/007c2b250000000b
> > > STORE/0/obj/007c2b250000000c
> > > STORE/1/obj/007c2b250000000c
> > > STORE/4/obj/007c2b250000000c
> > > STORE/5/obj/007c2b250000000c
> > > STORE/6/obj/007c2b250000000c
> > > STORE/7/obj/007c2b250000000c
> > > STORE/1/obj/007c2b250000000d
> > > STORE/2/obj/007c2b250000000d
> > > STORE/3/obj/007c2b250000000d
> > > STORE/4/obj/007c2b250000000d
> > > STORE/5/obj/007c2b250000000d
> > > STORE/6/obj/007c2b250000000d
> > > STORE/7/obj/007c2b250000000d
> > > STORE/1/obj/007c2b250000000e
> > > STORE/2/obj/007c2b250000000e
> > > STORE/4/obj/007c2b250000000e
> > > STORE/5/obj/007c2b250000000e
> > > STORE/6/obj/007c2b250000000e
> > > STORE/7/obj/007c2b250000000e
> > > STORE/1/obj/007c2b250000000f
> > > STORE/2/obj/007c2b250000000f
> > > STORE/4/obj/007c2b250000000f
> > > STORE/5/obj/007c2b250000000f
> > > STORE/6/obj/007c2b250000000f
> > > STORE/7/obj/007c2b250000000f
> > > STORE/0/obj/007c2b2500000010
> > > STORE/2/obj/007c2b2500000010
> > > STORE/4/obj/007c2b2500000010
> > > STORE/5/obj/007c2b2500000010
> > > STORE/6/obj/007c2b2500000010
> > > STORE/7/obj/007c2b2500000010
> > > STORE/1/obj/007c2b2500000011
> > > STORE/2/obj/007c2b2500000011
> > > STORE/4/obj/007c2b2500000011
> > > STORE/5/obj/007c2b2500000011
> > > STORE/6/obj/007c2b2500000011
> > > STORE/7/obj/007c2b2500000011
> > > STORE/1/obj/007c2b2500000012
> > > STORE/2/obj/007c2b2500000012
> > > STORE/4/obj/007c2b2500000012
> > > STORE/5/obj/007c2b2500000012
> > > STORE/6/obj/007c2b2500000012
> > > STORE/7/obj/007c2b2500000012
> > > STORE/0/obj/007c2b2500000013
> > > STORE/1/obj/007c2b2500000013
> > > STORE/2/obj/007c2b2500000013
> > > STORE/4/obj/007c2b2500000013
> > > STORE/5/obj/007c2b2500000013
> > > STORE/7/obj/007c2b2500000013
> > > STORE/0/obj/007c2b2500000014
> > > STORE/1/obj/007c2b2500000014
> > > STORE/2/obj/007c2b2500000014
> > > STORE/4/obj/007c2b2500000014
> > > STORE/5/obj/007c2b2500000014
> > > STORE/6/obj/007c2b2500000014
> > > STORE/0/obj/007c2b2500000015
> > > STORE/1/obj/007c2b2500000015
> > > STORE/2/obj/007c2b2500000015
> > > STORE/4/obj/007c2b2500000015
> > > STORE/6/obj/007c2b2500000015
> > > STORE/7/obj/007c2b2500000015
> > > STORE/0/obj/007c2b2500000016
> > > STORE/1/obj/007c2b2500000016
> > > STORE/3/obj/007c2b2500000016
> > > STORE/4/obj/007c2b2500000016
> > > STORE/5/obj/007c2b2500000016
> > > STORE/6/obj/007c2b2500000016
> > > STORE/7/obj/007c2b2500000016
> > > STORE/0/obj/007c2b2500000017
> > > STORE/1/obj/007c2b2500000017
> > > STORE/2/obj/007c2b2500000017
> > > STORE/4/obj/007c2b2500000017
> > > STORE/5/obj/007c2b2500000017
> > > STORE/6/obj/007c2b2500000017
> > > STORE/1/obj/007c2b2500000018
> > > STORE/2/obj/007c2b2500000018
> > > STORE/4/obj/007c2b2500000018
> > > STORE/5/obj/007c2b2500000018
> > > STORE/6/obj/007c2b2500000018
> > > STORE/7/obj/007c2b2500000018
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 007c2b2500000000.5
> > > 007c2b2500000001.5
> > > 007c2b2500000002.5
> > > 007c2b2500000003.5
> > > 007c2b2500000004.5
> > > 007c2b2500000005.5
> > > 007c2b2500000006.5
> > > 007c2b2500000007.5
> > > 007c2b2500000008.5
> > > 007c2b2500000009.5
> > > 007c2b250000000a.5
> > > 007c2b250000000b.5
> > > 007c2b250000000c.5
> > > 007c2b250000000d.5
> > > 007c2b250000000e.5
> > > 007c2b250000000f.5
> > > 007c2b2500000010.5
> > > 007c2b2500000011.5
> > > 007c2b2500000012.5
> > > 007c2b2500000013.5
> > > 007c2b2500000014.5
> > > 007c2b2500000015.5
> > > 007c2b2500000016.5
> > > 007c2b2500000017.5
> > > 007c2b2500000018.5
> > > 807c2b2500000000.5
> > > STORE/0/obj/.stale:
> > > STORE/1/obj/.stale:
> > > STORE/2/obj/.stale:
> > > STORE/3/obj/.stale:
> > > STORE/4/obj/.stale:
> > > STORE/5/obj/.stale:
> > > STORE/6/obj/.stale:
> > > STORE/7/obj/.stale:
> > > 9c7766570b3be3aff2724f587c2f4107  -
> > > 
> > > 
> > > 
> > > 33.out.bad:
> > > QA output created by 033
> > > using backend plain store
> > > 9c7766570b3be3aff2724f587c2f4107  -
> > > should have 2, but have 1 sheep
> > > 
> > 
> > I am also using local driver and haven't met a single failure yet. Could you try
> > make clean; then test?
> > 
> > Seems that 33 has core? if yes, it would be easy to debug.
> 
> Ah yes, I could find the core file. I'll look at it later.
> 
> BTW, below is a tail of the dead sheep's log:
> 
> Jan 29 17:33:15  DEBUG [rw 18759] fetch_object_list(971) 14
> Jan 29 17:33:15  DEBUG [rw 18759] prepare_object_list(1039) go to the next recovery
> Jan 29 17:33:15  DEBUG [main] run_next_rw(680) running threads nr 0
> Jan 29 17:33:15  EMERG [rw 18759] crash_handler(267) sheep exits unexpectedly (Segmentation fault).
> Jan 29 17:33:15  DEBUG [io 18806] do_process_work(1393) a1, 0, 7
> Jan 29 17:33:15  EMERG [rw 18759] sd_backtrace(817) sheep.c:269: crash_handler
> Jan 29 17:33:15  EMERG [rw 18759] sd_backtrace(831) /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7f5dcd64302f]
> Jan 29 17:33:15  EMERG [rw 18759] sd_backtrace(817) work.c:336: worker_routine
> Jan 29 17:33:15  EMERG [rw 18759] sd_backtrace(831) /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b4f) [0x7f5dcd63ab4f]
> Jan 29 17:33:15  EMERG [rw 18759] sd_backtrace(831) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7f5dcccf7a7c]

I forgot to mention. I could reproduce the problem and above log even
after cleaning and rebuild.

Thanks,
Hitoshi



More information about the sheepdog mailing list