[stgt] [PATCH] new timer-based work scheduling

Mon Jan 17 15:55:18 CET 2011

On 01/17/2011 01:05 PM, FUJITA Tomonori wrote:
> On Sun, 9 Jan 2011 18:02:20 +0200
> Alexander Nezhinsky <alexandern at Voltaire.COM> wrote:

>> Work items are scheduled from various application contexts, and put on
>> a queue. The current time from gettimeofday() is used instead of
>> a jiffie-based mechanism used previosuly. The work item is stamped with
>> the expiration time, obtained as the current time plus the timeout period.
>>
>> A global timer (one per process) is registered and fired periodically,
>> few times a second. When the timer signal is handled, a word is written
>> to a dedicated pipe, whose fd is registered with the event loop.
>>
>> The event handler reads from the pipe, and examines the inactive work
>> queue. All items that have expired (the immediate current time ise used
>> for comparison, not the timer or pipe event handling times) are moved
>> to the active list and processed one after another. The items that
>> might have accumulated and expired in the meantime are not handled
>> and are postponed until the next timer event.

> This signal code gets a signal, writes and reads via pipe, and calls
> gettimeofday every 250 msecs.
> 
> On the other hand, the timerfd code just reads every 250 msecs.
> 

I'm in for using timerfd if possible. Another question which is, i think,
orthogonal to using timerfd, is about calling gettimeofday().

Note that gettimeofday() was called in 2 contexts. First it is called when 
a new work unit is being scheduled. Second, it is called in the timer handler.

There are 3 different sources of inaccuracy.
The first one is related to the random distribution of add_work() calls 
within the 250 (or whatever) msec timer interval. The second one is related
to the random distribution of work expiration times within the 250 msec 
timer interval. These are bounded by 250 ms, averaging in 125 msec each. 

The third source of inaccuracy is the random delays introduced by other
event handlers. It is theoretically unbounded, but actually, i believe, 
its average is very small, below 1 msec.

Using gettimeofday() eliminates the first inaccuracy, leaving the second
one in place, with some alleviation. Because work descriptors are stamped
with the correct expiration time (current + timeout) and the decision 
that it is expired is also based upon the actual time, we can sometimes 
benefit from delays in the timer event. If the work expires until 
the timer is actually handled, it is serviced correctly.

In your patch the big inaccuracies are doubled and no alleviation as above
is provided. But, actually, if you want to spare gettimeofday sys.calls,
then just decreasing the timer interval brings a similar result.

> If you don't care about old kernels much, I like to have the timerfd
> and old jiffies code. If you care, I add the timerfd code and yours.

I do care for old kernels, and don't like combining timerfd and jiffies.
So please add timerfd / timer signal combination. Whether to retain 
gettimeofday or not, is up to you, i'm ok with both.

>  	sched_remains = tgt_exec_scheduled();
> -	timeout = sched_remains ? 0 : TGTD_TICK_PERIOD * 1000;
> +	timeout = sched_remains ? 0 : -1;
>  
>  	nevent = epoll_wait(ep_fd, events, ARRAY_SIZE(events), timeout);

This is, of course, right. 
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html