[sheepdog] [PATCH v2 0/2] zookeeper: zookeeper: fix error handling

Liu Yuan namei.unix at gmail.com
Fri May 31 11:18:43 CEST 2013


On 05/31/2013 03:25 PM, Liu Yuan wrote:
> On 05/30/2013 10:27 PM, MORITA Kazutaka wrote:
>> From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
>>
>> v2:
>>  - update comment of zk_create_seq_node()
>>
>> The first patch fixes a problem under heavy network traffic, and the
>> second patch is a clean-up one.
>>
>> MORITA Kazutaka (2):
>>   zookeeper: retry zk_create_seq_node on retryable error
>>   zookeeper: use offsetof to calculate offset
>>
>>  sheep/cluster/zookeeper.c |   82 ++++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 73 insertions(+), 9 deletions(-)
>>
> 
> just FYI, I have met a scenario that 
> 
> May 31 15:07:51 [main] zk_queue_push(328) create path:/sheepdog/queue/0000000181, queue_pos:0000000179, len:152  <--- zk seems tried internally and created 3 seq node
> May 31 15:07:51 [main] recalculate_vnodes(865) node 7000 has 96 vnodes, free space 355984654336
> May 31 15:07:51 [main] recalculate_vnodes(865) node 7001 has 48 vnodes, free space 178013593600
> May 31 15:07:51 [main] recalculate_vnodes(865) node 7002 has 48 vnodes, free space 178013462528
> May 31 15:07:51 [main] update_epoch_log(42) update epoch: 2, 3
> May 31 15:07:52 [rw] prepare_object_list(761) 2
> May 31 15:07:52 [rw] wait_get_vdis_done(832) waiting for vdi list
> May 31 15:07:52 [rw] wait_get_vdis_done(839) vdi list ready
> May 31 15:07:52 [rw] fetch_object_list(670) 10.32.228.126 7001
> May 31 15:07:52 [rw] sockfd_cache_get(387) 10.32.228.126:7001, idx 0
> May 31 15:07:52 [main] zk_event_handler(803) 1, 179
> May 31 15:07:52 [main] zk_queue_pop_advance(366) /sheepdog/queue/0000000179, type:7, len:152, pos:179
> May 31 15:07:52 [main] zk_handle_update_node(776) IPv4 ip:10.32.228.126 port:7000 < -- 1
> May 31 15:07:52 [main] build_node_list(433) nr_sd_nodes:3
> May 31 15:07:52 [main] listen_handler(867) accepted a new connection: 21
> May 31 15:07:52 [main] zk_event_handler(803) 1, 180
> May 31 15:07:52 [main] zk_queue_pop_advance(366) /sheepdog/queue/0000000180, type:7, len:152, pos:180
> May 31 15:07:52 [main] zk_handle_update_node(776) IPv4 ip:10.32.228.126 port:7000 < -- 2
> May 31 15:07:52 [main] build_node_list(433) nr_sd_nodes:3
> May 31 15:07:52 [main] listen_handler(867) accepted a new connection: 22
> May 31 15:07:52 [main] client_handler(808) 1, rx 0, tx 0
> May 31 15:07:52 [main] finish_rx(612) 21, 10.32.228.126:37320
> May 31 15:07:52 [main] queue_request(353) GET_OBJ_LIST, 1
> May 31 15:07:52 [main] zk_event_handler(803) 1, 181
> May 31 15:07:52 [io 3112] do_process_work(1376) a1, 0, 2
> May 31 15:07:52 [main] zk_queue_pop_advance(366) /sheepdog/queue/0000000181, type:7, len:152, pos:181
> May 31 15:07:52 [main] zk_handle_update_node(776) IPv4 ip:10.32.228.126 port:7000 < -- 3
> 
> zk_handle_update_node was called three times, even though it doesn't do harm for this event,
> but if this is a other event like node event, I guess this will screw the sheep.
> 

Oops, this was caused by my patch.

Thanks,
Yuan




More information about the sheepdog mailing list