On Tue, 28 Jun 2011 11:38:51 +0200 Roland Friedwagner <roland.friedwagner at wu.ac.at> wrote: > we have a weird data integrity problem with tgtd. > > The problem gets evident when seeing random rpm checksum errors > on iscsi clients by doing "rpm -V -a" consistency checks. > (The checksum errors are on different files on each run after > a drop_caches) > > Further tests revealed also data corruption when writing to > the target. > There gets about one block corrupted per GByte. > The amount is strongly workload dependant. > > The tgtd daemon is running without any problem before. > Current tgtd Version is 1.0.8 (as provided by RHEL5). > > The only difference than before was a backing storage > expansion going on the day before. > During storage expansion the controller disabled his caching > module and the backing storage gets very slow for about 4 hours. > > During this expansion period those kind messages where logged to > syslog by tgtd: > > ... > tgtd: conn_close(129) Forcing release of tx task 0x16eb12c0 0 0 > ... > tgtd: conn_close(163) Forcing release of tx task 0x16fd1570 0 > ... > tgtd: conn_close(129) Forcing release of tx task 0x16f9b010 10000038 1 > ... > (about 190 lines; with different addresses and numbers; no errors where > logged after expansion completed) > > No other indication of an error found. > After restarting tgtd the problem vanished. Probably, disk was too slow so some requests from the initiators got time out. File systems should handle such to avoid data corruption but some might not. I recommend you to check the integrity of all the data on the file system. -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html |