[Sheepdog] support object recovery - too many open files

Piavlo piavka at cs.bgu.ac.il
Thu Jan 21 08:36:22 CET 2010


MORITA Kazutaka wrote:
> Hi all,
>
> I've updated the next branch.
>   
 Durin first image creation with
fire-srv3 ~ # qemu-img convert -f raw -O sheepdog  /dev/sys/kvm-img zopa
sd_claim 1351: zopa

I get "Too many open files" for the collie process on just one of the
nodes -> fire-srv4:
...
Jan 21 09:17:58 localhost collie: store_queue_request(540) 0, 4, 40176 ,
3, 3
Jan 21 09:17:58 localhost collie: store_queue_request(540) 0, 4, 40176 ,
3, 3
Jan 21 09:17:58 localhost collie: listen_handler(313) can't accept a new
connection, Too many open files
Jan 21 09:17:59 localhost collie: listen_handler(313) can't accept a new
connection, Too many open files
Jan 21 09:17:59 localhost collie: listen_handler(313) can't accept a new
connection, Too many open files
Jan 21 09:17:59 localhost collie: listen_handler(313) can't accept a new
connection, Too many open files
...

the collie process has all it's fds consumed by the following sockets:
...
collie  15235 root    9u  sock                0,6      0t0 195033402
can't identify protocol
collie  15235 root   10u  sock                0,6      0t0 195033404
can't identify protocol
collie  15235 root   11u  sock                0,6      0t0 195033406
can't identify protocol
collie  15235 root   12u  sock                0,6      0t0 195033408
can't identify protocol
collie  15235 root   13u  sock                0,6      0t0 195033410
can't identify protocol
collie  15235 root   14u  sock                0,6      0t0 195033412
can't identify protocol
collie  15235 root   15u  sock                0,6      0t0 195033414
can't identify protocol
...

And of course all cluster queries that depend on the collie process on
fire-srv4 node hang
for example on fire-srv3:
shepherd info -t dog -> works
shepherd info -t sheep - > hangs
> git://sheepdog.git.sourceforge.net/gitroot/sheepdog/sheepdog next
>
> Object recovery is partially supported.
> If multiple nodes are down sequentially, object recovery wouldn't work.
> But otherwise, lost object should be recovered correctly.
>
> Thanks.
>
>   




More information about the sheepdog mailing list