On 05/30/2013 10:27 PM, MORITA Kazutaka wrote: > From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> > > v2: > - update comment of zk_create_seq_node() > > The first patch fixes a problem under heavy network traffic, and the > second patch is a clean-up one. > > MORITA Kazutaka (2): > zookeeper: retry zk_create_seq_node on retryable error > zookeeper: use offsetof to calculate offset > > sheep/cluster/zookeeper.c | 82 ++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 73 insertions(+), 9 deletions(-) > just FYI, I have met a scenario that May 31 15:07:51 [main] zk_queue_push(328) create path:/sheepdog/queue/0000000181, queue_pos:0000000179, len:152 <--- zk seems tried internally and created 3 seq node May 31 15:07:51 [main] recalculate_vnodes(865) node 7000 has 96 vnodes, free space 355984654336 May 31 15:07:51 [main] recalculate_vnodes(865) node 7001 has 48 vnodes, free space 178013593600 May 31 15:07:51 [main] recalculate_vnodes(865) node 7002 has 48 vnodes, free space 178013462528 May 31 15:07:51 [main] update_epoch_log(42) update epoch: 2, 3 May 31 15:07:52 [rw] prepare_object_list(761) 2 May 31 15:07:52 [rw] wait_get_vdis_done(832) waiting for vdi list May 31 15:07:52 [rw] wait_get_vdis_done(839) vdi list ready May 31 15:07:52 [rw] fetch_object_list(670) 10.32.228.126 7001 May 31 15:07:52 [rw] sockfd_cache_get(387) 10.32.228.126:7001, idx 0 May 31 15:07:52 [main] zk_event_handler(803) 1, 179 May 31 15:07:52 [main] zk_queue_pop_advance(366) /sheepdog/queue/0000000179, type:7, len:152, pos:179 May 31 15:07:52 [main] zk_handle_update_node(776) IPv4 ip:10.32.228.126 port:7000 < -- 1 May 31 15:07:52 [main] build_node_list(433) nr_sd_nodes:3 May 31 15:07:52 [main] listen_handler(867) accepted a new connection: 21 May 31 15:07:52 [main] zk_event_handler(803) 1, 180 May 31 15:07:52 [main] zk_queue_pop_advance(366) /sheepdog/queue/0000000180, type:7, len:152, pos:180 May 31 15:07:52 [main] zk_handle_update_node(776) IPv4 ip:10.32.228.126 port:7000 < -- 2 May 31 15:07:52 [main] build_node_list(433) nr_sd_nodes:3 May 31 15:07:52 [main] listen_handler(867) accepted a new connection: 22 May 31 15:07:52 [main] client_handler(808) 1, rx 0, tx 0 May 31 15:07:52 [main] finish_rx(612) 21, 10.32.228.126:37320 May 31 15:07:52 [main] queue_request(353) GET_OBJ_LIST, 1 May 31 15:07:52 [main] zk_event_handler(803) 1, 181 May 31 15:07:52 [io 3112] do_process_work(1376) a1, 0, 2 May 31 15:07:52 [main] zk_queue_pop_advance(366) /sheepdog/queue/0000000181, type:7, len:152, pos:181 May 31 15:07:52 [main] zk_handle_update_node(776) IPv4 ip:10.32.228.126 port:7000 < -- 3 zk_handle_update_node was called three times, even though it doesn't do harm for this event, but if this is a other event like node event, I guess this will screw the sheep. Thanks, Yuan |