Hi, we have a weird data integrity problem with tgtd. The problem gets evident when seeing random rpm checksum errors on iscsi clients by doing "rpm -V -a" consistency checks. (The checksum errors are on different files on each run after a drop_caches) Further tests revealed also data corruption when writing to the target. There gets about one block corrupted per GByte. The amount is strongly workload dependant. The tgtd daemon is running without any problem before. Current tgtd Version is 1.0.8 (as provided by RHEL5). The only difference than before was a backing storage expansion going on the day before. During storage expansion the controller disabled his caching module and the backing storage gets very slow for about 4 hours. During this expansion period those kind messages where logged to syslog by tgtd: ... tgtd: conn_close(129) Forcing release of tx task 0x16eb12c0 0 0 ... tgtd: conn_close(163) Forcing release of tx task 0x16fd1570 0 ... tgtd: conn_close(129) Forcing release of tx task 0x16f9b010 10000038 1 ... (about 190 lines; with different addresses and numbers; no errors where logged after expansion completed) No other indication of an error found. After restarting tgtd the problem vanished. IO-Stack: cciss <-> LVM <-> tgtd <=Ethernet=> open-iscsi <-> KVM A dumped cores with gcore from tgtd processes before terminating the (corrupted) tgtd daemon. (See http://bach.wu.ac.at/rfried/tgtd_cores.zip) Does anyone ever had problems like this? Kind Regards, Roland -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html |