[stgt] tgtd does data corruption after "Forcing release of tx task"?

Wed Jun 29 08:52:26 CEST 2011

On Tue, 28 Jun 2011 11:38:51 +0200
Roland Friedwagner <roland.friedwagner at wu.ac.at> wrote:

> we have a weird data integrity problem with tgtd.
> 
> The problem gets evident when seeing random rpm checksum errors
> on iscsi clients by doing "rpm -V -a" consistency checks.
> (The checksum errors are on different files on each run after
> a drop_caches)
> 
> Further tests revealed also data corruption when writing to
> the target. 
> There gets about one block corrupted per GByte.
> The amount is strongly workload dependant.
> 
> The tgtd daemon is running without any problem before.
> Current tgtd Version is 1.0.8 (as provided by RHEL5).
> 
> The only difference than before was a backing storage
> expansion going on the day before.
> During storage expansion the controller disabled his caching
> module and the backing storage gets very slow for about 4 hours.
> 
> During this expansion period those kind messages where logged to
> syslog by tgtd:
> 
> ...
> tgtd: conn_close(129) Forcing release of tx task 0x16eb12c0 0 0
> ...
> tgtd: conn_close(163) Forcing release of tx task 0x16fd1570 0
> ...
> tgtd: conn_close(129) Forcing release of tx task 0x16f9b010 10000038 1
> ...
> (about 190 lines; with different addresses and numbers; no errors where
> logged after expansion completed)
> 
> No other indication of an error found.
> After restarting tgtd the problem vanished.

Probably, disk was too slow so some requests from the initiators got
time out. File systems should handle such to avoid data corruption but
some might not. I recommend you to check the integrity of all the data
on the file system.
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html