[Sheepdog] [PATCH 0/2] fix cluster event sequences with coroutine

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Nov 24 19:22:02 CET 2011


Currently, cluster drivers may call a check_join_cb() callback before
finishing the previous join/leave event handling, and the master
server could send wrong cluster information to the newly added node.
But we cannot sleep in the join/leave handlers until the event
handling is finished, because the handlers are called in the main
thread.

This patchset introduces coroutine and solves it simply and elegantly.
The coroutine code is borrowed from QEMU project and fairly stable.

I think it is not overkill to introduce coroutines.  We are suffering
from many timing problems in the main thread, but coroutines will
handle them with simple code.  For example:

 - wait I/Os until the target objects are recovered
 - wait epoch update until all I/O requests are flushed
 - wait I/Os until the previous join/leave handling is finished
 - etc...

Especially, we can simplify start_cpg_event_work(), which are
confusing to many developers I think.


MORITA Kazutaka (2):
  introduce coroutine
  fix cluster event sequences

 include/coroutine.h |   20 +++
 lib/Makefile.am     |    2 +-
 lib/coroutine.c     |  355 +++++++++++++++++++++++++++++++++++++++++++++++++++
 sheep/group.c       |   43 +++++--
 4 files changed, 406 insertions(+), 14 deletions(-)
 create mode 100644 include/coroutine.h
 create mode 100644 lib/coroutine.c

-- 
1.7.2.5




More information about the sheepdog mailing list