[sheepdog] [PATCH experimental 0/2] tests: add a fault injector based on DynamoRIO

Mon Apr 15 16:36:19 CEST 2013

At Mon, 15 Apr 2013 16:41:12 +0800,
Liu Yuan wrote:
> 
> On 04/15/2013 01:07 PM, Hitoshi Mitake wrote:
> > Hitoshi Mitake (2):
> >   stop using timerfd and signalfd
> 
> The price is to too high to use DR.

The limitation is not an essential thing. As I describe in the commit
log, the first patch is a temporal stuff. I believe we must not decide
to avoid using DR with this reason.

> 
> >   tests: add a DynamoRIO client for testing the jounaling mechanism
> 
> The purpose of this patch set is really helpful to find more subtle and
> hard-to-emulate bugs. It is better to rework DR for our needs but DR is
> such a big project and hard to tweak.
> 
> Is it possible to roll our own instrumentation infrastructure that is
> better integrated into SD? If we can use collie to control the
> instrumentation on the functions used by Sheeodog instead of library
> symbols. I think current tracer would be a play ground for it. (I'll
> reworks it to work with current master soon later).
> 
> The current tracer infrastructure is already capable to catch every call
> sites on the granularity of function call, so maybe we can future the
> work to get more instrumentation features.
> 
> What do you think?

As you say, the tracer will be a good infrastructure for better
testing, too. But I think that DR is the most suitable infrastructure
for our current situation. Because we have some urgent requirements
(especially from internal users of our company) related to stability
of sheep. So utilizing existing technologies is important.

I have to stress that mocking some parts of DR would be very
difficult. Simple function call tracing is far different from DR. The
most important technical achievement of DR is its transparency (from
my perspective). We can write DR clients easily because DR provides
many transparency aware APIs (e.g. __wrap_malloc()). If we write our
own instrumentation infrastructure, we have to prepare our own
transparency aware APIs. It will be a time consuming task. Even if we
choose the simplest and adhoc way, we have to prepare some of
notrace-ed APIs of libsheepdog.a and it will result code
duplication. We can learn the difficulty of implementing transparency
aware APIs from the paper [1].

So I think that we should use DR for the fault injector. Even if we
implment our own instrumentation infrastructure, the development
should be done in parallel.

In addition, I believe merging the patchset to a dedicated branch will
be useful. Because new failure scenarios can be implemented easily as
patches for the branch. e.g. I'm planning to implement a new scenario
which emulates a crash of machine during writing a journal record.

[1] http://www.burningcutlery.com/derek/docs/transparency-VEE12.pdf

Thanks,
Hitoshi