[Stgt-devel] Errors in fsck with iSER

Wed Dec 26 15:23:00 CET 2007

Pete Wyckoff wrote:
> erezz at Voltaire.COM wrote on Tue, 18 Dec 2007 14:11 +0200:
>   
>>>>>> We ran some tests on it. Most of them are ok except for fsck. We ran it
>>>>>> in the following way:
>>>>>>
>>>>>> seed5:/tmp/regtest # parted -s /dev/sdb mkpart primary 0 8500
>>>>>> seed5:/tmp/regtest # for ((i=1;i<=1000;i++)) do mkfs -t ext2 -q
>>>>>> /dev/sdb1; fsck -y -ft ext2 /dev/sdb1; echo iteration $i is done; done
>>>>>>
>>>>>> fsck is ok most of the time, but once in a while it looks like this
>>>>>> (after ~300 iterations):
>>>>>>
>>>>>> fsck 1.38 (30-Jun-2005)
>>>>>> e2fsck 1.38 (30-Jun-2005)
>>>>>> Pass 1: Checking inodes, blocks, and sizes
>>>>>> Pass 2: Checking directory structure
>>>>>> Pass 3: Checking directory connectivity
>>>>>> Pass 4: Checking reference counts
>>>>>> Pass 5: Checking group summary information
>>>>>> /dev/sdb1: 11/1038336 files (0.0% non-contiguous), 32599/2075195 blocks
>>>>>> seed5:/tmp/regtest # mkfs -t ext2 -q /dev/sdb1
>>>>>> seed5:/tmp/regtest # fsck -y -ft ext2 /dev/sdb1
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> Sounds like data corruption. Do you see the same problem with IPoIB?
>>>>>       
>>>>>           
>>>> I'm working with Erez and I tried this with tcp session and there 
>>>> weren't any problems.
>>>>         
>>> Thanks for confirming. So it's the iSER problem.
>>>
>>> I might break Pete's iSER code so can you revoke the latest three
>>> patches and try the same tests?
>>>
>>>
>>> rouen:~/git/tgt$ git-reset --hard HEAD~3
>>> HEAD is now at 224ca81... iscsi: add iser support
>>>
>>> rouen:~/git/tgt$ git-log |head -5
>>> commit 224ca81bca8dead8dd355d62422e11fe23f7bdc4
>>> Author: Pete Wyckoff <pw at osc.edu>
>>> Date:   Mon Dec 10 10:06:27 2007 -0500
>>>       
>> Yes, I still see the same bad behavior with iSER. Pete & Robin - can you
>> try to run the same test (see above) with iSER and see if you get the
>> same behavior?
>>     
>
> I tried your exact script above with 2100 MB then 8500 MB as you did
> and could not get any corruption for 1000 iterations.  Maybe my disk
> is to slow---internal ATA accessed via file in ext3.  Likely some
> sort of iser issue, though there is an off-chance of a race in bs_sync or
> that neighborhood that only appears at high speeds.
>
> You were able to get lm_dd to break iser in the past.  That was
> something I could repeat and fix.  Any more failures there?  Or if
> you can help figure out the nature of the corrpution:  missing
> blocks or rearrangements, etc., that would definitely help.
>   

I ran it again on a different target machine with different storage (I
suspect that the other storage that I used is bad), and it looks ok now
(no fsck errors). We will run more tests, and if we find anything, I
will send a message.

Erez