[Stgt-devel] data corruption problems with stgt (aborted journal, remounting ro)?

Sat Feb 2 10:39:02 CET 2008

FUJITA Tomonori schrieb:
> On Fri, 01 Feb 2008 13:05:41 +0100
> Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> 
>> Tomasz Chmielewski schrieb:
>>> Doesn't look my posts get to this list... or is it just lagged a lot? 
>>> Resending.
>>>
>>>
>>> Perhaps I'm doing something wrong - but with stgt I'm facing problems I 
>>> didn't have with IET or SCST.
>>>
>>> Whenever I kill tgtd daemon and start it again (i.e., target server 
>>> restart), the initiator detects an aborted journal and remount the 
>>> device ro.
>>>
>>> Why is it so?
>>>
>>> What is the recommended way to kill the tgtd daemon? It doesn't seem to 
>>> react on TERM signal.
>> Hello, anyone there?
>>
>> Is there a way to restart tgtd daemon or a machine running tgtd, so that 
>> iSCSI connections don't break?
> 
> What does your 'restart tgtd daemon' mean? For me, 'restart' involves
> stopping tgtd daemon and it closes all the iSCSI connections.

Stop it, and start again?

Imagine you want to upgrade your tgtd daemon, a kernel running on that 
machine, or you have to restart the target machine for some other reason 
(i.e. your target machine died).

With IET or SCST there is no problem with that - stop the target, and 
iSCSI initiator will try to reconnect.

By default, open-iscsi tries to reconnect for 120 seconds without 
returning an error to the SCSI layer, as defined /etc/iscsi/iscsid.conf:

node.session.timeo.replacement_timeout = 120

But there is no problem to increase that value to even a couple of days 
(imagine: admin stops the target machine erroneously on Friday, comes 
back to work on Monday, starts the target, and initiators continue to 
work as if nothing happened - processes were in an uninterruptible sleep 
state, waiting for I/O operations to complete).

With tgtd, it seems impossible to me - or am I wrong?

-- 
Tomasz Chmielewski
http://wpkg.org