At Tue, 13 Sep 2011 13:46:04 +0900, MORITA Kazutaka wrote: > > At Tue, 13 Sep 2011 11:12:32 +0800, > Liu Yuan wrote: > > > > On 09/13/2011 08:05 AM, MORITA Kazutaka wrote: > > > At Sun, 11 Sep 2011 00:49:20 +0800, > > > Liu Yuan wrote: > > >> From: Liu Yuan<tailai.ly at taobao.com> > > >> > > >> After the new node joins, command *collie-vdi-list* > > >> cannot list any vdi information, that are created > > >> before this node is joined. This is because in the > > >> join phase, get_vdi_bitmap_from_all() cannot get corret > > >> vdi bitmap *if* there is no node in sd_node_list. > > >> This patch moves this function after the function > > >> which updates the sd_node_list. > > >> > > >> With this patch, collie-vdi-list works as expected in > > >> newly added nodes. > > >> > > >> Signed-off-by: Liu Yuan<tailai.ly at taobao.com> > > >> --- > > >> sheep/group.c | 13 ++----------- > > >> 1 files changed, 2 insertions(+), 11 deletions(-) > > >> > > >> diff --git a/sheep/group.c b/sheep/group.c > > >> index eb0c4e2..22c4f66 100644 > > >> --- a/sheep/group.c > > >> +++ b/sheep/group.c > > >> @@ -863,17 +863,6 @@ static void __sd_deliver(struct cpg_event *cevent) > > >> break; > > >> } > > >> } > > >> - > > >> - if (m->state == DM_FIN) { > > >> - switch (m->op) { > > >> - case SD_MSG_JOIN: > > >> - if (((struct join_message *)m)->cluster_status == SD_STATUS_OK) > > >> - if (sys->status != SD_STATUS_OK) > > >> - get_vdi_bitmap_from_all(); > > >> - break; > > >> - } > > >> - } > > >> - > > >> } > > >> > > >> static void send_join_response(struct work_deliver *w) > > >> @@ -902,6 +891,8 @@ static void __sd_deliver_done(struct cpg_event *cevent) > > >> switch (m->op) { > > >> case SD_MSG_JOIN: > > >> update_cluster_info((struct join_message *)m); > > >> + if (((struct join_message *)m)->cluster_status == SD_STATUS_OK) > > >> + get_vdi_bitmap_from_all(); > > >> break; > > >> case SD_MSG_LEAVE: > > >> node = find_node(&sys->sd_node_list, m->nodeid, m->pid); > > Hi > > Thanks for your viewing and explanation > > > Thanks for your contribution, but this doesn't work. > > > > > > We cannot sleep in __sd_deliver_done (the main thread), so we cannot > > > call get_vdi_bitmap_from_all() in it; if all the sheep daemons call > > > get_vdi_bitmap_from_all() at the same time, the cluster will stuck. > > > > > > I think the right way is doing the followings in __sd_deliver (the > > > worker thread): > > > > > > - call get_vdi_bitmap_from_all() > > > - extract a new node from join_message, and get a vdi bitmap from the node > > > > > > > I don't quit get it. did you mean first call > > get_vdi_bitmap_from_all() and check then sys->vdi_inuse ourselves or > > something that can help us understand vdi_bitmap. if not as expected, we > > then > > extract a new node from join_message, and get a vdi bitmap from the node? > > Sorry, my intention was just like: > > static void __sd_deliver(struct cpg_event *cevent) > { > ... > > if (m->state == DM_FIN) { > switch (m->op) { > case SD_MSG_JOIN: > if (((struct join_message *)m)->cluster_status == SD_STATUS_OK) > if (sys->status != SD_STATUS_OK) { > get_vdi_bitmap_from_all(); > get_vdi_bitmap_from_one(m->from); And, for newly joined nodes, we also need to get a bitmap from each node in ((struct join_message *)m)->nodes. Thanks, Kazutaka > } > break; > } > } > } > > get_vdi_bitmap_from_one() is a function to get a vdi bitmap from one > node, though it is not defined yet. > > > > > I don't see helper that get vdi bitmap from one node, and from > > get_vdi_bitmap_from_all(), I gather that > > we need to iterate all the nodes to update the vdi_inuse bitmap, no? > > > > after reading the code more, I think __sd_confchg_done should get > > vdi bitmap done for every node, > > *not* only for first cpg node in the group as it is now. Because of it, > > all the nodes except the first-cpg-node will have to call > > update_cluster_info() in a later deliver messge phase (DM_FIN). So my > > question > > is, why we call udpate_cluster_info() in different places > > (__sd_confchg_done for first-cpg-node, DM_FIN for > > other). > > It is because we cannot decide the first joined node only from > delivered messages. The first node checks whether the next nodes can > join Sheepdog, so it is necessary to know which node is the first one. > > > > > I came up the idea to do the following minimal changes (just move > > update_cluster_info() upwards) > > > > if (m->state == DM_FIN) { > > switch (m->op) { > > case SD_MSG_JOIN: > > update_cluster_info((struct join_message *)m); > > if (((struct join_message *)m)->cluster_status == SD_STATUS_OK) > > get_vdi_bitmap_from_all(); > > break; > > > > It fixes the problem on my environment. Is it okay with you? If so, I will send it as V2. > > No, this causes a race condition. update_cluster_info() updates > global info, so it can be called only in main thread > (__sd_deliver_done() and __sd_confchg_done()). > > > Thanks, > > Kazutaka > -- > sheepdog mailing list > sheepdog at lists.wpkg.org > http://lists.wpkg.org/mailman/listinfo/sheepdog |