Pete Wyckoff wrote: > erezz at Voltaire.COM wrote on Tue, 18 Dec 2007 14:11 +0200: > >>>>>> We ran some tests on it. Most of them are ok except for fsck. We ran it >>>>>> in the following way: >>>>>> >>>>>> seed5:/tmp/regtest # parted -s /dev/sdb mkpart primary 0 8500 >>>>>> seed5:/tmp/regtest # for ((i=1;i<=1000;i++)) do mkfs -t ext2 -q >>>>>> /dev/sdb1; fsck -y -ft ext2 /dev/sdb1; echo iteration $i is done; done >>>>>> >>>>>> fsck is ok most of the time, but once in a while it looks like this >>>>>> (after ~300 iterations): >>>>>> >>>>>> fsck 1.38 (30-Jun-2005) >>>>>> e2fsck 1.38 (30-Jun-2005) >>>>>> Pass 1: Checking inodes, blocks, and sizes >>>>>> Pass 2: Checking directory structure >>>>>> Pass 3: Checking directory connectivity >>>>>> Pass 4: Checking reference counts >>>>>> Pass 5: Checking group summary information >>>>>> /dev/sdb1: 11/1038336 files (0.0% non-contiguous), 32599/2075195 blocks >>>>>> seed5:/tmp/regtest # mkfs -t ext2 -q /dev/sdb1 >>>>>> seed5:/tmp/regtest # fsck -y -ft ext2 /dev/sdb1 >>>>>> >>>>>> >>>>>> >>>>> Sounds like data corruption. Do you see the same problem with IPoIB? >>>>> >>>>> >>>> I'm working with Erez and I tried this with tcp session and there >>>> weren't any problems. >>>> >>> Thanks for confirming. So it's the iSER problem. >>> >>> I might break Pete's iSER code so can you revoke the latest three >>> patches and try the same tests? >>> >>> >>> rouen:~/git/tgt$ git-reset --hard HEAD~3 >>> HEAD is now at 224ca81... iscsi: add iser support >>> >>> rouen:~/git/tgt$ git-log |head -5 >>> commit 224ca81bca8dead8dd355d62422e11fe23f7bdc4 >>> Author: Pete Wyckoff <pw at osc.edu> >>> Date: Mon Dec 10 10:06:27 2007 -0500 >>> >> Yes, I still see the same bad behavior with iSER. Pete & Robin - can you >> try to run the same test (see above) with iSER and see if you get the >> same behavior? >> > > I tried your exact script above with 2100 MB then 8500 MB as you did > and could not get any corruption for 1000 iterations. Maybe my disk > is to slow---internal ATA accessed via file in ext3. Likely some > sort of iser issue, though there is an off-chance of a race in bs_sync or > that neighborhood that only appears at high speeds. > > You were able to get lm_dd to break iser in the past. That was > something I could repeat and fix. Any more failures there? Or if > you can help figure out the nature of the corrpution: missing > blocks or rearrangements, etc., that would definitely help. > I ran it again on a different target machine with different storage (I suspect that the other storage that I used is bad), and it looks ok now (no fsck errors). We will run more tests, and if we find anything, I will send a message. Erez |