Hi > It is because we cannot decide the first joined node only from > delivered messages. The first node checks whether the next nodes can > join Sheepdog, so it is necessary to know which node is the first one. > >> I came up the idea to do the following minimal changes (just move >> update_cluster_info() upwards) >> >> if (m->state == DM_FIN) { >> switch (m->op) { >> case SD_MSG_JOIN: >> update_cluster_info((struct join_message *)m); >> if (((struct join_message *)m)->cluster_status == SD_STATUS_OK) >> get_vdi_bitmap_from_all(); >> break; >> >> It fixes the problem on my environment. Is it okay with you? If so, I will send it as V2. > No, this causes a race condition. update_cluster_info() updates > global info, so it can be called only in main thread > (__sd_deliver_done() and __sd_confchg_done()). > Okay, I'll cook V2 as you specified. And by the way, so this answers me the question when I read through the code for a long time, that splits cpg-messgage and socket-message handling into two parts, one is in worker thread context that can sleep, the other in main thread context that exclude race condition. We are doing this mainly in order to get rid of locks, right? This is similar to qemu's main-thread/io-thread, that the nature of only-one-thread executing all the time would simplify the logic and exclude locks. But this is kind of too coarse, and main-thread now already causes some bottlenecks for qemu's performance and scalability in perspective of IO. I am not sure, but I want to ask if we will, in some future, turn our coarse main thread into muti threads with locks and even get rid of current work/done pairs, when sheepdog grows bigger with higher concurrency? Thanks, Yuan |