[Stgt-devel] Errors in fsck with iSER

Pete Wyckoff pw
Tue Dec 18 19:11:09 CET 2007


erezz at Voltaire.COM wrote on Tue, 18 Dec 2007 14:11 +0200:
> >>>> We ran some tests on it. Most of them are ok except for fsck. We ran it
> >>>> in the following way:
> >>>>
> >>>> seed5:/tmp/regtest # parted -s /dev/sdb mkpart primary 0 8500
> >>>> seed5:/tmp/regtest # for ((i=1;i<=1000;i++)) do mkfs -t ext2 -q
> >>>> /dev/sdb1; fsck -y -ft ext2 /dev/sdb1; echo iteration $i is done; done
> >>>>
> >>>> fsck is ok most of the time, but once in a while it looks like this
> >>>> (after ~300 iterations):
> >>>>
> >>>> fsck 1.38 (30-Jun-2005)
> >>>> e2fsck 1.38 (30-Jun-2005)
> >>>> Pass 1: Checking inodes, blocks, and sizes
> >>>> Pass 2: Checking directory structure
> >>>> Pass 3: Checking directory connectivity
> >>>> Pass 4: Checking reference counts
> >>>> Pass 5: Checking group summary information
> >>>> /dev/sdb1: 11/1038336 files (0.0% non-contiguous), 32599/2075195 blocks
> >>>> seed5:/tmp/regtest # mkfs -t ext2 -q /dev/sdb1
> >>>> seed5:/tmp/regtest # fsck -y -ft ext2 /dev/sdb1
> >>>>     
> >>>>         
> >>> Sounds like data corruption. Do you see the same problem with IPoIB?
> >>>       
> >> I'm working with Erez and I tried this with tcp session and there 
> >> weren't any problems.
> >
> > Thanks for confirming. So it's the iSER problem.
> >
> > I might break Pete's iSER code so can you revoke the latest three
> > patches and try the same tests?
> >
> >
> > rouen:~/git/tgt$ git-reset --hard HEAD~3
> > HEAD is now at 224ca81... iscsi: add iser support
> >
> > rouen:~/git/tgt$ git-log |head -5
> > commit 224ca81bca8dead8dd355d62422e11fe23f7bdc4
> > Author: Pete Wyckoff <pw at osc.edu>
> > Date:   Mon Dec 10 10:06:27 2007 -0500
> 
> Yes, I still see the same bad behavior with iSER. Pete & Robin - can you
> try to run the same test (see above) with iSER and see if you get the
> same behavior?

I tried your exact script above with 2100 MB then 8500 MB as you did
and could not get any corruption for 1000 iterations.  Maybe my disk
is to slow---internal ATA accessed via file in ext3.  Likely some
sort of iser issue, though there is an off-chance of a race in bs_sync or
that neighborhood that only appears at high speeds.

You were able to get lm_dd to break iser in the past.  That was
something I could repeat and fix.  Any more failures there?  Or if
you can help figure out the nature of the corrpution:  missing
blocks or rearrangements, etc., that would definitely help.

		-- Pete



More information about the stgt mailing list