[Sheepdog] support object recovery - too many open files
Piavlo
piavka at cs.bgu.ac.il
Thu Jan 21 08:36:22 CET 2010
MORITA Kazutaka wrote:
> Hi all,
>
> I've updated the next branch.
>
Durin first image creation with
fire-srv3 ~ # qemu-img convert -f raw -O sheepdog /dev/sys/kvm-img zopa
sd_claim 1351: zopa
I get "Too many open files" for the collie process on just one of the
nodes -> fire-srv4:
...
Jan 21 09:17:58 localhost collie: store_queue_request(540) 0, 4, 40176 ,
3, 3
Jan 21 09:17:58 localhost collie: store_queue_request(540) 0, 4, 40176 ,
3, 3
Jan 21 09:17:58 localhost collie: listen_handler(313) can't accept a new
connection, Too many open files
Jan 21 09:17:59 localhost collie: listen_handler(313) can't accept a new
connection, Too many open files
Jan 21 09:17:59 localhost collie: listen_handler(313) can't accept a new
connection, Too many open files
Jan 21 09:17:59 localhost collie: listen_handler(313) can't accept a new
connection, Too many open files
...
the collie process has all it's fds consumed by the following sockets:
...
collie 15235 root 9u sock 0,6 0t0 195033402
can't identify protocol
collie 15235 root 10u sock 0,6 0t0 195033404
can't identify protocol
collie 15235 root 11u sock 0,6 0t0 195033406
can't identify protocol
collie 15235 root 12u sock 0,6 0t0 195033408
can't identify protocol
collie 15235 root 13u sock 0,6 0t0 195033410
can't identify protocol
collie 15235 root 14u sock 0,6 0t0 195033412
can't identify protocol
collie 15235 root 15u sock 0,6 0t0 195033414
can't identify protocol
...
And of course all cluster queries that depend on the collie process on
fire-srv4 node hang
for example on fire-srv3:
shepherd info -t dog -> works
shepherd info -t sheep - > hangs
> git://sheepdog.git.sourceforge.net/gitroot/sheepdog/sheepdog next
>
> Object recovery is partially supported.
> If multiple nodes are down sequentially, object recovery wouldn't work.
> But otherwise, lost object should be recovered correctly.
>
> Thanks.
>
>
More information about the sheepdog
mailing list