[stgt] [PATCH] nonblocking epoll_wait loop, sched events, ISER/IB polling
FUJITA Tomonori
fujita.tomonori at lab.ntt.co.jp
Fri Sep 26 12:25:18 CEST 2008
On Thu, 25 Sep 2008 19:09:09 +0300
Alexander Nezhinsky <nezhinsky at gmail.com> wrote:
> This patch introduces a few interconnected improvements, that
> ultimately lead to significant reduction in interrupts rate
> for iser/ib, while adding flexibility to tgtd event processing scheme.
>
> First, it implements a kind of "scheduler" list, to replace the
> counter events. Event handlers can schedule other events that are
> placed on the scheduler's loop. This has some small advantages in
> itself, because the same event descriptor is used for all events
> in the system, events are executed in order they'd been scheduled,
> they can be removed from list, few instances of the same event
> may be scheduled.
>
> But the most important change is the logic of the processing loop,
> as a whole. It goes as follows:
>
> The scheduler checks events on the list, does their processing,
> which can schedule more items, but does not process the new items
> (in order to avoid infinite/long loops).
>
> If after processing all "old" events, some "new" ones were scheduled,
> epoll_wait() is executed in non-blocking manner. This guarantees
> that the file descriptors that became ready during the scheduler
> processing are handled, but if there no ready fds, the remaining
> scheduler events are processed immediately.
>
> But if after processing the scheduler list nothing new is added,
> then epoll_wait is blocking as in the current scheme.
>
> This way we can be very flexible, because event handlers and deferred
> work can not block one another. Potentially long handlers can be
> split into shorter ones easily without blocking the entire target.
>
> Finally, IB completion queue is the first guy who benefits, because
> we can postpone interrupt re-arming, until no completion entries
> remain, and to schedule CQ "draining" after all events are serviced.
>
> When a completion event is handled, CQ is polled so that up to a
> given number (now set to 8) of WCs are processed.
>
> If more completions are left on the CQ, essentially the same
> handler is scheduled, to carry out the next round of polling,
> but in the meanwhile, other events in the system can be serviced.
>
> If CQ is found empty, interrupts are re-armed and a handler is
> scheduled to consume the completions which could sneak in
> between the moment the CQ was seen empty and just before
> the interrupts were re-armed.
>
> Thus in iSER IB there is a marked interrupts rate reduction.
> Here are typical results:
> Current target: 62-65 KIOPS, ~110,000 interrupt/sec, CPU% ~68%
> Patched target: 65-70 KIOPS, ~65,000 interrupt/sec, CPU% ~65%
>
> Signed-off-by: Alexander Nezhinsky <nezhinsky at gmail.com>
Thanks,
Seems that you fixed most of the style issues but there are still some:
ERROR: trailing whitespace
#294: FILE: usr/iscsi/iscsi_rdma.c:1058:
+/* Scheduled to poll cq after a completion event has been $
ERROR: trailing whitespace
#304: FILE: usr/iscsi/iscsi_rdma.c:1068:
+ after the cq had been seen empty but just before $
ERROR: trailing whitespace
#305: FILE: usr/iscsi/iscsi_rdma.c:1069:
+ the notification interrupts were re-armed. $
ERROR: trailing whitespace
#338: FILE: usr/iscsi/iscsi_rdma.c:1102:
+^I/* if a poll was previosuly scheduled, remove it, $
total: 4 errors, 0 warnings, 565 lines checked
> diff --git a/usr/tgtd.c b/usr/tgtd.c
> index 0b1cb4c..ac3ff76 100644
> --- a/usr/tgtd.c
> +++ b/usr/tgtd.c
> @@ -38,26 +38,13 @@
> #include "work.h"
> #include "util.h"
>
> -struct tgt_event {
> - union {
> - event_handler_t *handler;
> - counter_event_handler_t *counter_handler;
> - };
> - union {
> - int fd;
> - int *counter;
> - };
> - void *data;
> - struct list_head e_list;
> -};
exporting struct tgt_event fine for now but I don't fancy it much. If
we have proper APIs for it, we could avoid it, I guess. But I've not
see rdma code so I'm not sure.
> -static void event_loop(void)
> +void tgt_init_sched_event(struct tgt_event *evt,
> + sched_event_handler_t sched_handler, void *data)
> {
> - int nevent, i, done, timeout = TGTD_TICK_PERIOD * 1000;
> - struct epoll_event events[1024];
> + evt->sched_handler = sched_handler;
> + evt->scheduled = 0;
> + evt->data = data;
> + INIT_LIST_HEAD(&evt->e_list);
> +}
> +
> +void tgt_add_sched_event(struct tgt_event *evt)
> +{
> + if (!evt->scheduled) {
> + evt->scheduled = 1;
> + list_add_tail(&evt->e_list, &tgt_sched_events_list);
> + }
> +}
> +
> +void tgt_remove_sched_event(struct tgt_event *evt)
> +{
> + if (evt->scheduled) {
> + evt->scheduled = 0;
> + list_del_init(&evt->e_list);
> + }
> +}
> +
> +static void tgt_exec_scheduled(void)
> +{
> + struct list_head *last_sched;
> struct tgt_event *tev, *tevn;
>
> -retry:
> - /*
> - * Check the counter events to see if they have any work to run.
> - */
> - do {
> - done = 1;
> - list_for_each_entry_safe(tev, tevn, &tgt_counter_events_list,
> - e_list) {
> - if (*tev->counter) {
> - done = 0;
> - tev->counter_handler(tev->counter, tev->data);
> - }
> - }
> - } while (!done);
> + if (list_empty(&tgt_sched_events_list))
> + return;
> +
> + /* execute only work scheduled till now */
> + last_sched = tgt_sched_events_list.prev;
> + list_for_each_entry_safe(tev, tevn, &tgt_sched_events_list, e_list) {
> + tgt_remove_sched_event(tev);
> + tev->sched_handler(tev);
> + if (&tev->e_list == last_sched)
> + break;
> + }
> +}
> +
> +static void tgt_poll_events(int timeout)
> +{
> + int nevent, i;
> + struct tgt_event *tev;
> + struct epoll_event events[1024];
>
> nevent = epoll_wait(ep_fd, events, ARRAY_SIZE(events), timeout);
> if (nevent < 0) {
> @@ -255,9 +229,19 @@ retry:
> }
> } else
> schedule();
> +}
>
> - if (system_active)
> - goto retry;
> +static void event_loop(void)
> +{
> + int timeout, wait_timeout = TGTD_TICK_PERIOD * 1000;
> +
> + while (system_active) {
> + tgt_exec_scheduled();
> + /* wait if no scheduled work, poll if there is */
> + timeout = list_empty(&tgt_sched_events_list) ?
> + wait_timeout : 0;
> + tgt_poll_events(timeout);
> + }
Can you avoid rewriting (or cleaning up) even_loop()?
That's not related with this patch. You want to replace the first half
(counter_events code) in the original even_loop(). I don't think that
you need to rewrite event_loop in the above way, right?
If you do something like the following, your patch is more smaller, I
think.
static void event_loop(void)
{
int nevent, i, done, timeout = TGTD_TICK_PERIOD * 1000;
struct epoll_event events[1024];
struct tgt_event *tev, *tevn;
retry:
done = schedule_event(); /* choose whatever name you like */
if (!done)
timeout = 0;
nevent = epoll_wait(ep_fd, events, ARRAY_SIZE(events), timeout);
if (nevent < 0) {
if (errno != EINTR) {
eprintf("%m\n");
exit(1);
}
} else if (nevent) {
for (i = 0; i < nevent; i++) {
tev = (struct tgt_event *) events[i].data.ptr;
tev->handler(tev->fd, events[i].events, tev->data);
}
} else
schedule();
if (system_active)
goto retry;
}
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the stgt
mailing list