[Sheepdog] support object recovery - too many open files

Piavlo piavka at cs.bgu.ac.il
Mon Jan 25 13:33:47 CET 2010


MORITA Kazutaka wrote:
> On 2010/01/25 18:39, Piavlo wrote:
>   
>>>>  Durin first image creation with
>>>> fire-srv3 ~ # qemu-img convert -f raw -O sheepdog  /dev/sys/kvm-img zopa
>>>> sd_claim 1351: zopa
>>>>
>>>> I get "Too many open files" for the collie process on just one of the
>>>> nodes -> fire-srv4:
>>>>     
>>>>         
>>> There seem to be fd leaks, sorry.
>>> Probably it is because of request forwarding codes.
>>> I'll fix it later.
>>>   
>>>       
>> Was it fixed with todays merge of next and master branch?
>>     
>
> Probably, it is fixed.
> Could you try the current master branch?
>   

now it gets stuck on error :

fire-srv3 ~ #  qemu-img convert -f raw -O sheepdog  /dev/sys/kvm-img zopa
sd_claim 1351: zopa
aio_read_response 855: No object found
....

There is error on fire-srv4 logs only:
Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
3, 3
Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
3, 3
Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
3, 3
Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
3, 3
Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
3, 3
Jan 25 14:26:16 localhost collie: store_queue_request(573) failed, 0, 4,
40144 , 3, 3

and the collie porcess has 244 sockets open (thought not 1024)
lsof -p 7159 | awk '$5 ~ /^sock$/' | wc
    214    2354   19688
fire-srv4 ~ # 

on other nodes collies do not have any such sockets open

> Thanks,
>
> Kazutaka Morita
>   




More information about the sheepdog mailing list