[Sheepdog] support object recovery - too many open files

Piavlo piavka at cs.bgu.ac.il
Sun Jan 31 07:21:29 CET 2010


 Hi,

Any progress with this issue?

Thanks
Alex


Piavlo wrote:
> MORITA Kazutaka wrote:
>   
>> On 2010/01/25 18:39, Piavlo wrote:
>>   
>>     
>>>>>  Durin first image creation with
>>>>> fire-srv3 ~ # qemu-img convert -f raw -O sheepdog  /dev/sys/kvm-img zopa
>>>>> sd_claim 1351: zopa
>>>>>
>>>>> I get "Too many open files" for the collie process on just one of the
>>>>> nodes -> fire-srv4:
>>>>>     
>>>>>         
>>>>>           
>>>> There seem to be fd leaks, sorry.
>>>> Probably it is because of request forwarding codes.
>>>> I'll fix it later.
>>>>   
>>>>       
>>>>         
>>> Was it fixed with todays merge of next and master branch?
>>>     
>>>       
>> Probably, it is fixed.
>> Could you try the current master branch?
>>   
>>     
>
> now it gets stuck on error :
>
> fire-srv3 ~ #  qemu-img convert -f raw -O sheepdog  /dev/sys/kvm-img zopa
> sd_claim 1351: zopa
> aio_read_response 855: No object found
> ....
>
> There is error on fire-srv4 logs only:
> Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
> 3, 3
> Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
> 3, 3
> Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
> 3, 3
> Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
> 3, 3
> Jan 25 14:26:16 localhost collie: store_queue_request(540) 0, 4, 40144 ,
> 3, 3
> Jan 25 14:26:16 localhost collie: store_queue_request(573) failed, 0, 4,
> 40144 , 3, 3
>
> and the collie porcess has 244 sockets open (thought not 1024)
> lsof -p 7159 | awk '$5 ~ /^sock$/' | wc
>     214    2354   19688
> fire-srv4 ~ # 
>
> on other nodes collies do not have any such sockets open
>
>   
>> Thanks,
>>
>> Kazutaka Morita
>>   
>>     
>
>
>   




More information about the sheepdog mailing list