[Stgt-devel] Errors in fsck with iSER
Erez Zilber
erezz
Wed Dec 26 15:23:00 CET 2007
Pete Wyckoff wrote:
> erezz at Voltaire.COM wrote on Tue, 18 Dec 2007 14:11 +0200:
>
>>>>>> We ran some tests on it. Most of them are ok except for fsck. We ran it
>>>>>> in the following way:
>>>>>>
>>>>>> seed5:/tmp/regtest # parted -s /dev/sdb mkpart primary 0 8500
>>>>>> seed5:/tmp/regtest # for ((i=1;i<=1000;i++)) do mkfs -t ext2 -q
>>>>>> /dev/sdb1; fsck -y -ft ext2 /dev/sdb1; echo iteration $i is done; done
>>>>>>
>>>>>> fsck is ok most of the time, but once in a while it looks like this
>>>>>> (after ~300 iterations):
>>>>>>
>>>>>> fsck 1.38 (30-Jun-2005)
>>>>>> e2fsck 1.38 (30-Jun-2005)
>>>>>> Pass 1: Checking inodes, blocks, and sizes
>>>>>> Pass 2: Checking directory structure
>>>>>> Pass 3: Checking directory connectivity
>>>>>> Pass 4: Checking reference counts
>>>>>> Pass 5: Checking group summary information
>>>>>> /dev/sdb1: 11/1038336 files (0.0% non-contiguous), 32599/2075195 blocks
>>>>>> seed5:/tmp/regtest # mkfs -t ext2 -q /dev/sdb1
>>>>>> seed5:/tmp/regtest # fsck -y -ft ext2 /dev/sdb1
>>>>>>
>>>>>>
>>>>>>
>>>>> Sounds like data corruption. Do you see the same problem with IPoIB?
>>>>>
>>>>>
>>>> I'm working with Erez and I tried this with tcp session and there
>>>> weren't any problems.
>>>>
>>> Thanks for confirming. So it's the iSER problem.
>>>
>>> I might break Pete's iSER code so can you revoke the latest three
>>> patches and try the same tests?
>>>
>>>
>>> rouen:~/git/tgt$ git-reset --hard HEAD~3
>>> HEAD is now at 224ca81... iscsi: add iser support
>>>
>>> rouen:~/git/tgt$ git-log |head -5
>>> commit 224ca81bca8dead8dd355d62422e11fe23f7bdc4
>>> Author: Pete Wyckoff <pw at osc.edu>
>>> Date: Mon Dec 10 10:06:27 2007 -0500
>>>
>> Yes, I still see the same bad behavior with iSER. Pete & Robin - can you
>> try to run the same test (see above) with iSER and see if you get the
>> same behavior?
>>
>
> I tried your exact script above with 2100 MB then 8500 MB as you did
> and could not get any corruption for 1000 iterations. Maybe my disk
> is to slow---internal ATA accessed via file in ext3. Likely some
> sort of iser issue, though there is an off-chance of a race in bs_sync or
> that neighborhood that only appears at high speeds.
>
> You were able to get lm_dd to break iser in the past. That was
> something I could repeat and fix. Any more failures there? Or if
> you can help figure out the nature of the corrpution: missing
> blocks or rearrangements, etc., that would definitely help.
>
I ran it again on a different target machine with different storage (I
suspect that the other storage that I used is bad), and it looks ok now
(no fsck errors). We will run more tests, and if we find anything, I
will send a message.
Erez
More information about the stgt
mailing list