[sheepdog] a question about epoch_log_read_remote

Wed May 23 03:59:57 CEST 2012

On 05/22/2012 11:37 PM, Christoph Hellwig wrote:

> What is the intended use of epoch_log_read_remote?
> 
> There are two callers of it, local_stat_cluster and get_vnodes_from_epoch,
> and both only call it after epoch_log_read failed (in the second case
> indirected via epoch_log_read_nr), but the first thing
> epoch_log_read_remote does is to call epoch_log_read again to find the

Note that it pass latest epoch as argument.

Not all nodes have complete epoch history (for e.g, a newly joined node
only have the epoch files (store/epoch/numbered_file) since it joins,
crashed nodes may have noncontinuous epoch files), so this function
reads the missing epoch files on the local nodes.

> remote nodes to connect to to get a node list.  Even worse
> epoch_log_read_remote returns 0 even if epoch_log_read failed, thus
> making error handling basically impossible.

0 itself is a error case, that the requested epoch is found in all
nodes. It will be a 'null terminator' for collie cluster info and
recovery code does check this.

Thanks,
Yuan/