[sheepdog] [PATCH v6 6/6] zookeeper: handle session timeout for all zookeeper operations

Sat Jun 22 10:17:44 CEST 2013

From Kyle's iPhone

在 2013-6-22，13:22，Liu Yuan <namei.unix at gmail.com> 写道：

> On 06/21/2013 08:34 PM, Kai Zhang wrote:
>> @@ -976,18 +1071,34 @@ static void zk_event_handler(int listen_fd, int events, void *data)
>>        return;
>>    }
>> 
>> -    if (!zk_queue_peek())
>> -        goto kick_block_event;
>> +    if (zk_queue_peek(&peek) == ZOK) {
>> +        if (!peek)
>> +            goto kick_block_event;
>> +    } else {
>> +        sd_eprintf("failed");
> 
> This sd_eprintf() isn't necessary. You just return if timeout, but you
> don't handle it. This means some other nodes call event handler, but
> some nodes don't, no? If we can't handle failure gracefully, we still
> need panic(). This applies to push_join_response() too.
> 

Maybe panic is not necessary. The nodes who didn't call event handler will rejoin the cluster as a new one. Any inconsistent state will be dropped before rejoin or learn from other nodes.

A error msg is just to let people know there was a failed operation. 
However change it to sd_dprintf is fine to me.

> You seem to assume to handle session timeout in *next* event, but what
> if following case:
>  1 get one session timeout for some operation, we simply return
>  2 never trigger a zk event operation, so we don't have chance to call
> reconnect.
> 
> Why not try reconnect whenever we find a session timeout immediately?
> 

Maybe there is a misunderstanding.
Based on my knowledge, zookeeper API will return a session timeout code only when it has re-established tcp connection to zookeeper server after a connection loss. And this will definitely trigger the zk watcher with a session event.

So I think there is no need to place a reconnect logic after each zk operation returns a session timeout code. 
Just let the zk watcher to trigger a event. And the reconnect will be handled in that event handler.
This can also ensure that there is no flying zk operations.

And it will add complexity to code if we reconnect just after a zk operation.
For example, for a simple get children operation, a reconnect and retry is enough. But for a push_join_response, we have to determine the mastership before retry. 
So I would prefer to a simple uniformed rule that is described in the commit log.

> Thanks,
> Yuan
> 
>> +        return;
>> +    }
>> 
>> -    zk_queue_pop_advance(&ev);
>> +    if (zk_queue_pop_advance(&ev) != ZOK) {
>> +        sd_eprintf("failed");
>> +        return;
>> +    }
>>    if (ev.type < zk_max_event_handlers && zk_event_handlers[ev.type])
>>        zk_event_handlers[ev.type](&ev);
>>    else
>>        panic("unhandled type %d", ev.type);
>> 
>> -     /* Someone has created next event, go kick event handler. */
>> -    if (zk_queue_peek()) {
>> -        eventfd_write(efd, 1);
>> +    if (zk_queue_peek(&peek) == ZOK) {
>> +        if (peek) {
>> +            /*
>> +             * Someone has created next event,
>> +             * go kick event handler.
>> +             */
>> +            eventfd_write(efd, 1);
>> +            return;
>> +        }
>> +    } else {
>> +        sd_eprintf("failed");
>>        return;
>>    }
>