[stgt] [PATCH v2] redirect: protect again tgtd process hang as of cluster software hang

Or Gerlitz ogerlitz at mellanox.com
Mon Mar 21 11:50:28 CET 2011


FUJITA Tomonori wrote:
> The temporary solution is fine by me.

okay

>> +		do {
[...]
>> +			ret_sel = select(fds[0]+1, &rfds, NULL, NULL, &tv);
>> +		} while (ret_sel < 0 && errno == EINTR);
>> +		if (ret_sel <= 0) { /* error or timeout */
>> +			eprintf("timeout on redirect callback, \
>> +					terminating child pid %d\n", pid);
>> +			kill(pid, SIGTERM);
>> +		}

> Why this is necessary before waitpid()?
> I thought that we need to make sure that fds[0] is readable (with
> a timeout) before calling read() for fds[0].

if the timeout expires and fds[0] isn't readable, we don't want to wait 
to the child process anymore, so we must terminate it - else we could 
hang on waitpid forever, something we wanted to avoid in the first place.

Or.

Or.
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the stgt mailing list