[sheepdog] [PATCH 1/8] use rbtree to replace the priority scheduler in recovery

Tue May 22 09:01:10 CEST 2012

On 05/22/2012 10:51 AM, levin li wrote:

> When IO request comes for an object in recovery, we put it
> into the priority rbtree, during the recovery, we make sheep
> recover the objects in the rbtree firstly, it's more efficient
> than the scheduler in is_recovering_oid()

What is the point to use rbtree instead of a list? It seems this tree
doesn't have anything to do with 'priority', the objects in the tree are
sorted by 'oid', then the smaller oid take precedence over bigger one,
why so? This strategy seems quit irreverent to recovery performance. And
the latency of recovering one object can't be estimated at all. Maybe a
simple FIFO list will do a better job.

> +again:
> +	if (rw->prior_count == 0) {
> +		if (rw->state == RW_INIT)
> +			rw->state = RW_RUN;
> +		else if (!rw->retry)
> +			rw->done++;
> +	}
> +
> +	if (rw->prior_count > 0) {
> +		node = rb_first(&rw->prior_tree);
> +		entry = rb_entry(node, struct object_rb_entry, node);
> +		oid = entry->oid;
> +		rb_erase(node, &rw->prior_tree);
> +		free(entry);
> +	} else if (rw->done < rw->count) {
> +		oid = rw->oids[rw->done];
> +		if (!oid) {
> +			dprintf("oid is zero\n");
> +			/* object has been recovered by fast recovery,
> +			 * move to the next. */
> +			goto again;
> +		}

This goto looks abusing to me. We'd better have a helper function that
code the logic more explicit and abstract away lower implementation,
such as pick_next_oid(), which tries to pick the next oid which your
algorithm thinks of appropriate.