after some more thinking, this is also wrong. I would be really great to have some documentation about that? How does read/write during recovery works exactly. I first though we can simply reject reads. For writes, we can reject them until we start object recovery. After that point we cache written data (but still reject reads). When we have received the whole object we merge with write cache. But that is not the way it works? - Dietmar From: Dietmar Maurer Sent: Freitag, 20. Juli 2012 17:59 To: Dietmar Maurer; Liu Yuan Cc: Chris Webb; sheepdog at lists.wpkg.org Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation I guess I finally got it ;-) We only get slow response for a few objects where we received an epoch mismatch – most request are directed to a living node which already have the object? That still means that IO on the KVM side is extremely slow during (2h) recovery? From: Liu Yuan [mailto:namei.unix at gmail.com]<mailto:[mailto:namei.unix at gmail.com]> Sent: Freitag, 20. Juli 2012 16:18 To: Dietmar Maurer Cc: sheepdog at lists.wpkg.org<mailto:sheepdog at lists.wpkg.org>; Chris Webb Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation nope, at most dozens of sec as I observed 在 2012-7-20 PM10:04,"Dietmar Maurer" <dietmar at proxmox.com<mailto:dietmar at proxmox.com>>写道: > > Let's assume a complete recovery takes about 2 hours. Does that mean > > my VMs are blocked for 2 hours (instead of continue operation on other > nodes)? > > > > This is actually why we spend lots of lines in recovery and IO patch, there are > some mechanism, such as request retry, oid scheduling that tries to complete > any request in a very short period because IOs from VM are timeouted by > guest kernel, for e.g, > 120 seconds for Linux kernel. So if a VM access such object, it would block for 2 hours (confused)? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120720/5ce706b6/attachment.html> |