[sheepdog] [PATCH experimental 0/2] tests: add a fault injector based on DynamoRIO

Mon May 6 15:54:51 CEST 2013

On 05/06/2013 09:44 PM, Hitoshi Mitake wrote:
> At Mon, 15 Apr 2013 23:36:19 +0900,
> Hitoshi Mitake wrote:
>>
>> At Mon, 15 Apr 2013 16:41:12 +0800,
>> Liu Yuan wrote:
>>>
>>> On 04/15/2013 01:07 PM, Hitoshi Mitake wrote:
>>>> Hitoshi Mitake (2):
>>>>   stop using timerfd and signalfd
>>>
>>> The price is to too high to use DR.
>>
>> The limitation is not an essential thing. As I describe in the commit
>> log, the first patch is a temporal stuff. I believe we must not decide
>> to avoid using DR with this reason.
>>
>>>
>>>>   tests: add a DynamoRIO client for testing the jounaling mechanism
>>>
>>> The purpose of this patch set is really helpful to find more subtle and
>>> hard-to-emulate bugs. It is better to rework DR for our needs but DR is
>>> such a big project and hard to tweak.
>>>
>>> Is it possible to roll our own instrumentation infrastructure that is
>>> better integrated into SD? If we can use collie to control the
>>> instrumentation on the functions used by Sheeodog instead of library
>>> symbols. I think current tracer would be a play ground for it. (I'll
>>> reworks it to work with current master soon later).
>>>
>>> The current tracer infrastructure is already capable to catch every call
>>> sites on the granularity of function call, so maybe we can future the
>>> work to get more instrumentation features.
>>>
>>> What do you think?
>>
>> As you say, the tracer will be a good infrastructure for better
>> testing, too. But I think that DR is the most suitable infrastructure
>> for our current situation. Because we have some urgent requirements
>> (especially from internal users of our company) related to stability
>> of sheep. So utilizing existing technologies is important.
>>
>> I have to stress that mocking some parts of DR would be very
>> difficult. Simple function call tracing is far different from DR. The
>> most important technical achievement of DR is its transparency (from
>> my perspective). We can write DR clients easily because DR provides
>> many transparency aware APIs (e.g. __wrap_malloc()). If we write our
>> own instrumentation infrastructure, we have to prepare our own
>> transparency aware APIs. It will be a time consuming task. Even if we
>> choose the simplest and adhoc way, we have to prepare some of
>> notrace-ed APIs of libsheepdog.a and it will result code
>> duplication. We can learn the difficulty of implementing transparency
>> aware APIs from the paper [1].
>>
>> So I think that we should use DR for the fault injector. Even if we
>> implment our own instrumentation infrastructure, the development
>> should be done in parallel.
>>
>> In addition, I believe merging the patchset to a dedicated branch will
>> be useful. Because new failure scenarios can be implemented easily as
>> patches for the branch. e.g. I'm planning to implement a new scenario
>> which emulates a crash of machine during writing a journal record.
>>
>> [1] http://www.burningcutlery.com/derek/docs/transparency-VEE12.pdf
> 
> ping?
> 
> Thanks,
> Hitoshi
> 

Ping or what? You want to merge this into master branch? Removing
timerfd and signalfd is too high.

Thanks,
Yuan