[sheepdog] [PATCH v2 0/7] make IO requests to wait in recovery instead of busy retrying
Liu Yuan
namei.unix at gmail.com
Thu May 24 05:22:42 CEST 2012
On 05/24/2012 11:12 AM, Liu Yuan wrote:
> I have created a branch called 'live-lock-fix' to get a better review of
> this patch set.
>
> Generally speaking, this patch set tries to fix live lock(busy retrying)
> on some of mutually influenced nodes in recovery phase, observed by our
> simulated 960 nodes cluster.
I am also now fixing yet another fatal problem, dead lock between nodes
that send recovery requests to each other and get a retry err code, then
doing a timer retry. Those nodes never progress to completion because
those recovery requests blocks confchg event(because of
sys->nr_outstanding_io), and epoch is never lifted on a agreed value.
Thanks,
Yuan
More information about the sheepdog
mailing list