[sheepdog] [PATCH v2 0/7] make IO requests to wait in recovery instead of busy retrying

Liu Yuan namei.unix at gmail.com
Thu May 24 05:22:42 CEST 2012


On 05/24/2012 11:12 AM, Liu Yuan wrote:

> I have created a branch called 'live-lock-fix' to get a better review of
> this patch set.
> 
> Generally speaking, this patch set tries to fix live lock(busy retrying)
> on some of mutually influenced nodes in recovery phase, observed by our
> simulated 960 nodes cluster.


I am also now fixing yet another fatal problem, dead lock between nodes
that send recovery requests to each other and get a retry err code, then
doing a timer retry. Those nodes never progress to completion because
those recovery requests blocks confchg event(because of
sys->nr_outstanding_io), and epoch is never lifted on a agreed value.

Thanks,
Yuan



More information about the sheepdog mailing list