[sheepdog] [PATCH experimental 0/2] tests: add a fault injector based on DynamoRIO

Mon May 6 15:44:50 CEST 2013

At Mon, 15 Apr 2013 23:36:19 +0900,
Hitoshi Mitake wrote:
> 
> At Mon, 15 Apr 2013 16:41:12 +0800,
> Liu Yuan wrote:
> > 
> > On 04/15/2013 01:07 PM, Hitoshi Mitake wrote:
> > > Hitoshi Mitake (2):
> > >   stop using timerfd and signalfd
> > 
> > The price is to too high to use DR.
> 
> The limitation is not an essential thing. As I describe in the commit
> log, the first patch is a temporal stuff. I believe we must not decide
> to avoid using DR with this reason.
> 
> > 
> > >   tests: add a DynamoRIO client for testing the jounaling mechanism
> > 
> > The purpose of this patch set is really helpful to find more subtle and
> > hard-to-emulate bugs. It is better to rework DR for our needs but DR is
> > such a big project and hard to tweak.
> > 
> > Is it possible to roll our own instrumentation infrastructure that is
> > better integrated into SD? If we can use collie to control the
> > instrumentation on the functions used by Sheeodog instead of library
> > symbols. I think current tracer would be a play ground for it. (I'll
> > reworks it to work with current master soon later).
> > 
> > The current tracer infrastructure is already capable to catch every call
> > sites on the granularity of function call, so maybe we can future the
> > work to get more instrumentation features.
> > 
> > What do you think?
> 
> As you say, the tracer will be a good infrastructure for better
> testing, too. But I think that DR is the most suitable infrastructure
> for our current situation. Because we have some urgent requirements
> (especially from internal users of our company) related to stability
> of sheep. So utilizing existing technologies is important.
> 
> I have to stress that mocking some parts of DR would be very
> difficult. Simple function call tracing is far different from DR. The
> most important technical achievement of DR is its transparency (from
> my perspective). We can write DR clients easily because DR provides
> many transparency aware APIs (e.g. __wrap_malloc()). If we write our
> own instrumentation infrastructure, we have to prepare our own
> transparency aware APIs. It will be a time consuming task. Even if we
> choose the simplest and adhoc way, we have to prepare some of
> notrace-ed APIs of libsheepdog.a and it will result code
> duplication. We can learn the difficulty of implementing transparency
> aware APIs from the paper [1].
> 
> So I think that we should use DR for the fault injector. Even if we
> implment our own instrumentation infrastructure, the development
> should be done in parallel.
> 
> In addition, I believe merging the patchset to a dedicated branch will
> be useful. Because new failure scenarios can be implemented easily as
> patches for the branch. e.g. I'm planning to implement a new scenario
> which emulates a crash of machine during writing a journal record.
> 
> [1] http://www.burningcutlery.com/derek/docs/transparency-VEE12.pdf

ping?

Thanks,
Hitoshi