[Sheepdog] [PATCH v3 06/13] trace: low-level trace infrastructure proper

Liu Yuan namei.unix at gmail.com
Mon Apr 9 07:42:49 CEST 2012


On 04/09/2012 01:02 PM, MORITA Kazutaka wrote:

> On Mon, Apr 9, 2012 at 1:14 PM, Liu Yuan <namei.unix at gmail.com> wrote:
>> On 04/09/2012 10:47 AM, MORITA Kazutaka wrote:
>>
>>> On Mon, Apr 9, 2012 at 10:28 AM, Liu Yuan <namei.unix at gmail.com> wrote:
>>>> On 04/06/2012 12:09 AM, MORITA Kazutaka wrote:
>>>>
>>>>> At Thu,  1 Mar 2012 10:20:18 +0800,
>>>>> Liu Yuan wrote:
>>>>>>
>>>>>> +}
>>>>>> +
>>>>>> +notrace struct caller *trace_lookup_ip(unsigned long ip, int create)
>>>>>> +{
>>>>>> +    int h = trace_hash(ip);
>>>>>> +    struct hlist_head *head = trace_hashtable + h;
>>>>>> +    struct hlist_node *node;
>>>>>> +    struct ipinfo info;
>>>>>> +    struct caller *new = NULL;
>>>>>> +
>>>>>> +    pthread_mutex_lock(&trace_lock);
>>>>>> +    if (hlist_empty(head))
>>>>>
>>>>> hlist_empty() calls trace_lookup_ip() again if it is not inlined, and
>>>>> it results in a deadlock.
>>>>>
>>>>> How about preventing a recursive call of trace_lookup_ip()?
>>>>>
>>>>> E.g.:
>>>>>
>>>>> notrace struct caller *trace_lookup_ip(unsigned long ip, int create)
>>>>> {
>>>>>         int h;
>>>>>         struct hlist_head *head;
>>>>>         struct hlist_node *node;
>>>>>         struct ipinfo info;
>>>>>         struct caller *new = NULL;
>>>>>         static __thread int looking_up_ip;
>>>>>
>>>>>         if (looking_up_ip)
>>>>>                 return NULL;
>>>>>
>>>>>         looking_up_ip = 1;
>>>>>
>>>>>         pthread_mutex_lock(&trace_lock);
>>>>>
>>>>>         ...
>>>>>
>>>>>         pthread_mutex_unlock(&trace_lock);
>>>>>
>>>>>         looking_up_ip = 0;
>>>>>         return new;
>>>>> }
>>>>>
>>>>
>>>>
>>>> Nope, this bring more problems than it resolves:
>>>>
>>>> 1) return NULL while there entry does exist will break current trace logic
>>>
>>> Is it difficult to change the logic?
>>
>>
>> Yes, kind of hard, currently we call trace_lookup_ip() to check if IP is
>> registered or not, upper layer (graph.c) cannot handle the case that
>> trace_lookup_ip() fails but IP is already registered.
> 
> If trace_lookup_ip() returns NULL, tracer returns without doing anything.
> Doesn't it work?
> ==
> diff --git a/sheep/trace/graph.c b/sheep/trace/graph.c
> index fff9060..8c9405a 100644
> --- a/sheep/trace/graph.c
> +++ b/sheep/trace/graph.c
> @@ -86,6 +86,8 @@ static notrace void graph_tracer(unsigned long ip,
> unsigned long *ret_addr)
>         struct caller *cr;
> 
>         cr = trace_lookup_ip(ip, 0);
> +       if (!cr)
> +               return;
>         assert(cr->namelen + 1 < TRACE_FNAME_LEN);
>         memcpy(trace.fname, cr->name, cr->namelen);
>         memset(trace.fname + cr->namelen, '\0', 1);
> ==
> 


I think the change of the semantic of trace_lookup_ip() is not
affordable, it is much easier for callers to expect "Yes" or "No" usage
other than "Yes" or "Maybe".

I'd suggest remove the re-entrance race inside trace_lookup_ip() instead
of workaround it outside. Your patch means let re-entrance happen, and
when it happens, just ignores it. This mean we don't trace the function
when the race is happing.

So I think tracer itself is a very isolated subsystem without a big code
set, so it is worth our time to make it clean and deal with 'notrace',
'always inline' attribute ourselves with care.

Now xzalloc cause problems, I think we can call malloc or calloc instead
and write down a warning to any future patcher that, "Hey, we should not
call sheep functions inside trace_lookup_ip() that are expected to be
traced".

>>
>>
>>>
>>>> 2) __thread variable doesn't make any effect to exclude concurrent access
>>>
>>> Concurrent access is excluded by pthread_mutex_lock().  The __thread
>>> variable just prevents the same thread from re-entering the core of
>>> trace_lookup_ip code.
>>
>>
>>>> 3) one line function of hlist_empty() will be inline for any sane
>>>> compiler if it supports it.
>>>
>>> If we add --enable-debug to configure options, the function is not
>>> inlined.
>>>
>>
>>
>> Even for now, I can't get the point of --enable-debug, which simply add
>> -O0 to gcc, does this mean we don't trust GCC reorder & optimisation? I
>> don't think sheepdog is low level enough to meet this problem.
> 
> Sometimes -O0 is useful to debug Sheepdog with GDB.
> 


I can't imagine what kind of problem need -O0, but anyway, if anyone
finds it useful, we need to keep it.

>>
>> If you really concerns of this, we can add a always_inline attribute to
>> hlist_empty().
> 
> It means we need to add always_inline for each time we add a function
> in tracer_lookup_ip().  It's not good.
> 
>>
>>>>
>>>>
>>>>>
>>>>>> +            goto not_found;
>>>>>> +
>>>>>> +    hlist_for_each_entry(new, node, head, hash) {
>>>>>> +            if (new->mcount == ip)
>>>>>> +                    goto out;
>>>>>> +    }
>>>>>> +not_found:
>>>>>> +    if (get_ipinfo(ip, &info) < 0) {
>>>>>> +            dprintf("ip: %lx not found\n", ip);
>>>>>> +            new = NULL;
>>>>>> +            goto out;
>>>>>> +    }
>>>>>> +    if (create) {
>>>>>> +            /* unlock to avoid deadlock */
>>>>>> +            pthread_mutex_unlock(&trace_lock);
>>>>>
>>>>> I think we don't need unlock here if we prevent recursion.
>>>>>
>>>>>
>>>>>> +            new = xzalloc(sizeof(*new));
>>>>>> +            pthread_mutex_lock(&trace_lock);
>>>
>>> IIUC, you call pthread_mutex_unlock()/lock() here because xzalloc()
>>> will call another trace_lookup_ip() and it results in deadlock, yes?
>>>
>>
>>
>> Yes
>>
>>> I think this code is really hard to maintain because we cannot call
>>> other functions which wouldn't be inlined.  In addition, it smells
>>> like a race condition since other function can enter the core of
>>> trace_lookup_ip() just after we release the lock here.
>>>
>>
>>
>> Yes, some code of Linux kernel has this race too (A ugly practice), so
>> here I just assume this wouldn't happen. But if we are really concerned
>> of it, we can add 'notrace' to it to get rid of race completely.
> 
> It's not good too to add notrace for each time we add a function in
> trace_lookup_ip(). It will reduce code maintainability.

I am afraid we have to take more care of ourselves when patching tracer
code. But once it is done, we don't need extra care for tracer users.
Tracer internal code is a bit lower level than other sheep code, so I
think it deserver our special care.

>>
>> BTW, speaking of cod maintainability, coroutine is much harder to read &
>> maintain.
> 
> I don't thinks so, but I don't mind dropping coroutine code.
> 
> Anyway, It's okay for me to merge trace code without addressing the
> above comments,
> but please disable tracer in configure by default until the code
> becomes mature and
> the majority of users want to use it.

Yes, I'll make it mute as default and it needs rebase quite a lot to the
master.

Thanks,
Yuan



More information about the sheepdog mailing list