[sheepdog] [Qemu-devel] [PATCH v4 00/10] sheepdog: reconnect server after connection failure

Stefan Hajnoczi stefanha at gmail.com
Tue Jul 30 16:13:38 CEST 2013


On Fri, Jul 26, 2013 at 03:10:42PM +0900, MORITA Kazutaka wrote:
> Currently, if a sheepdog server exits, all the connecting VMs need to
> be restarted.  This series implements a feature to reconnect the
> server, and enables us to do online sheepdog upgrade and avoid
> restarting VMs when sheepdog servers crash unexpectedly.
> 
> v4:
>  - Added comment to explain why we need a failed queue.
>  - Fixed a return value of sd_acb_cancelable().
> 
> v3:
>  - Check return values of qemu_co_recv/send more strictly.
>  - Move inflight requests to the failed list after reconnection
>    completes.  This is necessary to resend I/Os while connection is
>    lost.
>  - Check simultaneous create in resend_aioreq().
> 
> v2:
>  - Dropped nonblocking connect patches.
> 
> MORITA Kazutaka (10):
>   ignore SIGPIPE in qemu-img and qemu-io
>   iov: handle EOF in iov_send_recv
>   sheepdog: check return values of qemu_co_recv/send correctly
>   sheepdog: handle vdi objects in resend_aio_req
>   sheepdog: reload inode outside of resend_aioreq
>   coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
>   sheepdog: try to reconnect to sheepdog after network error
>   sheepdog: make add_aio_request and send_aioreq void functions
>   sheepdog: cancel aio requests if possible
>   sheepdog: check simultaneous create in resend_aioreq
> 
>  block/sheepdog.c          | 320 +++++++++++++++++++++++++++++-----------------
>  include/block/coroutine.h |   8 ++
>  qemu-coroutine-sleep.c    |  47 +++++++
>  qemu-img.c                |   4 +
>  qemu-io.c                 |   4 +
>  util/iov.c                |   6 +
>  6 files changed, 269 insertions(+), 120 deletions(-)

I have done a brief review.  The biggest change that I suggest using the
new AioContext timer support that Alex Bligh and Ping Fan are working on
(see qemu-devel for the latest patches).  It provides a way to use a
timer during qemu_aio_wait() without spinning.

CCed Nick Thomas who worked on NBD reconnect.  Maybe your series will
motivate him to push his patches again, or he might have some review
suggestions for you.



More information about the sheepdog mailing list