At Sat, 31 Mar 2012 18:31:00 +0800, Liu Yuan wrote: > > On 03/31/2012 06:23 PM, MORITA Kazutaka wrote: > > > Many bad effects. For example, imagine that join messages are > > processed in the different order with other nodes. > > > Maybe not. I notice that every call to start_cpg_event_work() will drain > the cpg queue, So this change will assure us that confchg will be > handled for sure, despite of other requests. No, membership change events are blocked until all outstanding I/O requests are flushed or the previous change membership event are finished. There exists the case that the cpg queue is not empty after start_cpg_event_work() was called. > > We both do DD in guests and do a loop for creating new vid and deleting > that vdi during the join/leave test. > > All seems good so far... look the sequence for joining 60 nodes This is a timing problem. I think the problem would happen on other environments. Let's take another approach. Here is my suggestion: - Use different queues for I/O requests and membership events. - When membership queue is empty, we can process I/O requests as usual. - When membership queue is not empty, flush all outstanding I/Os. New I/O requests are blocked until the membership queue becomes empty. - SD_OP_SHUTDOWN and SD_OP_MAKE_FS should be pushed to the membership queue, and other operations are pushed to the I/O request queue. Thanks, Kazutaka |