[Sheepdog] [PATCH 0/2] fix cluster event sequences with coroutine
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Fri Nov 25 13:57:09 CET 2011
Currently, cluster drivers may call a check_join_cb() callback before
finishing the previous join/leave event handling, and the master
server could send wrong cluster information to the newly added node.
But we cannot sleep in the join/leave handlers until the event
handling is finished, because the handlers are called in the main
thread.
This patchset introduces coroutine and solves it simply and elegantly.
The coroutine code is borrowed from QEMU project and fairly stable.
I think it is not overkill to introduce coroutines. We are suffering
from many timing problems in the main thread, but coroutines will
handle them with simple code. For example:
- wait I/Os until the target objects are recovered
- wait epoch update until all I/O requests are flushed
- wait I/Os until the previous join/leave handling is finished
- etc...
Especially, we can simplify start_cpg_event_work(), which are
confusing to many developers I think.
v2:
- move register/unregister_event() to the correct place
- reduce too large POOL_MAX_SIZE
MORITA Kazutaka (2):
introduce coroutine
sheep: fix cluster event sequences
include/coroutine.h | 20 +++
lib/Makefile.am | 2 +-
lib/coroutine.c | 355 +++++++++++++++++++++++++++++++++++++++++++++++++++
sheep/group.c | 52 +++++---
4 files changed, 411 insertions(+), 18 deletions(-)
create mode 100644 include/coroutine.h
create mode 100644 lib/coroutine.c
--
1.7.2.5
More information about the sheepdog
mailing list