[Sheepdog] [PATCH] sheep: get vdi bitmap correctly in join phase

Thu Sep 15 03:31:50 CEST 2011

At Tue, 13 Sep 2011 14:18:42 +0800,
Liu Yuan wrote:
> 
> Hi
> > It is because we cannot decide the first joined node only from
> > delivered messages.  The first node checks whether the next nodes can
> > join Sheepdog, so it is necessary to know which node is the first one.
> >
> >>       I came up the idea to do the following minimal changes (just move
> >> update_cluster_info() upwards)
> >>
> >>           if (m->state == DM_FIN) {
> >>                   switch (m->op) {
> >>                   case SD_MSG_JOIN:
> >>                           update_cluster_info((struct join_message *)m);
> >>                           if (((struct join_message *)m)->cluster_status == SD_STATUS_OK)
> >>                                   get_vdi_bitmap_from_all();
> >>                           break;
> >>
> >> It fixes the problem on my environment. Is it okay with you? If so, I will send it as V2.
> > No, this causes a race condition.  update_cluster_info() updates
> > global info, so it can be called only in main thread
> > (__sd_deliver_done() and __sd_confchg_done()).
> >
> 
> Okay, I'll cook V2 as you specified.
> 
> And by the way, so this answers me the question when I read through the 
> code for a long time, that splits cpg-messgage
> and socket-message handling into two parts, one is in worker thread 
> context that can sleep, the other in
> main thread context that exclude race condition. We are doing this 
> mainly in order to get rid of locks, right?

Yes.  Sheepdog uses a work queue which has multiple worker threads.
All queued works are processed in two parts, worker thread and main
thread, as you say.

> 
> This is similar to qemu's main-thread/io-thread, that the nature of 
> only-one-thread executing all the time would
> simplify the logic and exclude locks. But this is kind of too coarse, 
> and main-thread now already causes
> some bottlenecks for qemu's performance and scalability in perspective 
> of IO. I am not sure, but I want to ask
> if we will, in some future, turn our coarse main thread into muti 
> threads with locks and even get rid of current
> work/done pairs, when sheepdog grows bigger with higher concurrency?

I prefer the current approach because it can easily ensure the
correctness of the program.  Using locks blindly makes the program
hard to read, and causes other bugs like deadlocks.

However, if the main thread becomes a bottleneck in Sheepdog and there
are no other ways to remove that, I'll accept the change. :)

Thanks,

Kazutaka