On 05/22/2012 11:37 PM, Christoph Hellwig wrote: > What is the intended use of epoch_log_read_remote? > > There are two callers of it, local_stat_cluster and get_vnodes_from_epoch, > and both only call it after epoch_log_read failed (in the second case > indirected via epoch_log_read_nr), but the first thing > epoch_log_read_remote does is to call epoch_log_read again to find the Note that it pass latest epoch as argument. Not all nodes have complete epoch history (for e.g, a newly joined node only have the epoch files (store/epoch/numbered_file) since it joins, crashed nodes may have noncontinuous epoch files), so this function reads the missing epoch files on the local nodes. > remote nodes to connect to to get a node list. Even worse > epoch_log_read_remote returns 0 even if epoch_log_read failed, thus > making error handling basically impossible. 0 itself is a error case, that the requested epoch is found in all nodes. It will be a 'null terminator' for collie cluster info and recovery code does check this. Thanks, Yuan/ |