[sheepdog] [PATCH experimental 0/2] tests: add a fault injector based on DynamoRIO

Mon May 6 16:08:35 CEST 2013

At Mon, 06 May 2013 21:54:51 +0800,
Liu Yuan wrote:
> 
> On 05/06/2013 09:44 PM, Hitoshi Mitake wrote:
> > At Mon, 15 Apr 2013 23:36:19 +0900,
> > Hitoshi Mitake wrote:
> >>
> >> At Mon, 15 Apr 2013 16:41:12 +0800,
> >> Liu Yuan wrote:
> >>>
> >>> On 04/15/2013 01:07 PM, Hitoshi Mitake wrote:
> >>>> Hitoshi Mitake (2):
> >>>>   stop using timerfd and signalfd
> >>>
> >>> The price is to too high to use DR.
> >>
> >> The limitation is not an essential thing. As I describe in the commit
> >> log, the first patch is a temporal stuff. I believe we must not decide
> >> to avoid using DR with this reason.
> >>
> >>>
> >>>>   tests: add a DynamoRIO client for testing the jounaling mechanism
> >>>
> >>> The purpose of this patch set is really helpful to find more subtle and
> >>> hard-to-emulate bugs. It is better to rework DR for our needs but DR is
> >>> such a big project and hard to tweak.
> >>>
> >>> Is it possible to roll our own instrumentation infrastructure that is
> >>> better integrated into SD? If we can use collie to control the
> >>> instrumentation on the functions used by Sheeodog instead of library
> >>> symbols. I think current tracer would be a play ground for it. (I'll
> >>> reworks it to work with current master soon later).
> >>>
> >>> The current tracer infrastructure is already capable to catch every call
> >>> sites on the granularity of function call, so maybe we can future the
> >>> work to get more instrumentation features.
> >>>
> >>> What do you think?
> >>
> >> As you say, the tracer will be a good infrastructure for better
> >> testing, too. But I think that DR is the most suitable infrastructure
> >> for our current situation. Because we have some urgent requirements
> >> (especially from internal users of our company) related to stability
> >> of sheep. So utilizing existing technologies is important.
> >>
> >> I have to stress that mocking some parts of DR would be very
> >> difficult. Simple function call tracing is far different from DR. The
> >> most important technical achievement of DR is its transparency (from
> >> my perspective). We can write DR clients easily because DR provides
> >> many transparency aware APIs (e.g. __wrap_malloc()). If we write our
> >> own instrumentation infrastructure, we have to prepare our own
> >> transparency aware APIs. It will be a time consuming task. Even if we
> >> choose the simplest and adhoc way, we have to prepare some of
> >> notrace-ed APIs of libsheepdog.a and it will result code
> >> duplication. We can learn the difficulty of implementing transparency
> >> aware APIs from the paper [1].
> >>
> >> So I think that we should use DR for the fault injector. Even if we
> >> implment our own instrumentation infrastructure, the development
> >> should be done in parallel.
> >>
> >> In addition, I believe merging the patchset to a dedicated branch will
> >> be useful. Because new failure scenarios can be implemented easily as
> >> patches for the branch. e.g. I'm planning to implement a new scenario
> >> which emulates a crash of machine during writing a journal record.
> >>
> >> [1] http://www.burningcutlery.com/derek/docs/transparency-VEE12.pdf
> > 
> > ping?
> > 
> > Thanks,
> > Hitoshi
> > 
> 
> Ping or what? You want to merge this into master branch? Removing
> timerfd and signalfd is too high.

The patchset is not suitable for the master branch. But I think if you
can create a new dedicated branch for the patchset and maintain it,
the fault injector would be helpful.

If you don't agree with creating a new branch, of course I'll maintain
the patchset on my own repository.

Thanks,
Hitoshi