[sheepdog] 答复:Subject: BUG: dirty object cache stop pushing

ivanzhu 402017647 at qq.com
Sat Jul 5 18:07:07 CEST 2014


Hi, Xu Fang

Good catch!

This issues arises, when nodes number is less than the number of the vdi in
the cache on that host, because the push work for one vdi cache requires 2
threads at least. Right?

I'm not familiar enough with sheepdog code. But I agree WQ_UNLIMITED is not
rational, (2 * number of
object cache) is a good choice, but firstly you need get the number. And I
just look through the code quickly,
I havn't find the record in global_cache for the vdi number in the cache, so
maybe you need more work, and we should do it.
We can avoid the side effect via a new option WQ_CACHE besides WQ_ORDERED,
WQ_DYNAMIC, WQ_UNLIMITED.
Hope other guys can give a more direct way.:)


Currently, a quick and simple way to work around this issue is that deploy
sheepdog on more nodes.


Thanks & Regards
Ivan

Message: 2
Date: Fri, 4 Jul 2014 12:21:21 +0800
From: ?? <xufango at gmail.com>
To: sheepdog at lists.wpkg.org, namei.unix at gmail.com
Subject: [sheepdog] BUG: dirty object cache stop pushing
Message-ID:
	<CA+WfGEbsP-jk5uLRxwp6eAj1R+N_mhZyfj-iEXpO7Nm8svfUxw at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

If 5 sheepdog nodes are running with cache, and more than 10 vms running on
each node.

I mount a tmpfs to /cache directory, and start sheep with:

sheep -l level=debug -n
/home/admin/sheepdogmetadata,/disk1/sheepdogstoredata,/disk2/sheepdogstoreda
ta,/disk3/sheepdogstoredata,/disk4/sheepdogstoredata,/disk5/sheepdogstoredat
a,/disk7/sheepdogstoredata,/disk8/sheepdogstoredata,/disk9/sheepdogstoredata
-w size=20G dir=/cache -b 0.0.0.0 -y **.**.**.** -c
zookeeper:**.**.**.**:2181

There is a possibility that all object push threads are
running do_background_push work, and no threads is running do_push_object
work.

In my test environment, this occurs:

[1] 13:09:30 [SUCCESS] vmsecdomainhost1
Name     Tag     Total     Dirty     Clean
win7_type4_node8.img          4.7 GB     4.7 GB     4.0 MB
standard.img     images     0.0 MB     0.0 MB     0.0 MB
win7_type4_node1.img          4.8 GB     4.8 GB     28 MB
win7_type4_node10.img          5.0 GB     4.9 GB     32 MB
win7_type4_node2.img          4.7 GB     4.6 GB     68 MB
win7_type4_node3.img          4.7 GB     4.7 GB     4.0 MB
win7_type4_node6.img          4.8 GB     4.7 GB     40 MB
win7_type4_node4.img          4.8 GB     4.7 GB     20 MB
win7_type4_node7.img          4.8 GB     4.8 GB     24 MB
win7_type4_node9.img          4.7 GB     4.7 GB     32 MB
win7_type4_node5.img          4.2 GB     4.2 GB     8.0 MB

Cache size 20 GB, used 47 GB, non-directio


I found that, 7 object push threads are working with work_queue "oc_push",
and their call stacks are:

Thread 37 (Thread 0x7f3c2a1fc700 (LWP 116747)):
#0  0x0000003916eda37d in read () from /lib64/libc.so.6
#1  0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6
#2  0x000000000042a89d in eventfd_xread ()
#3  0x0000000000419acb in object_cache_push ()
*#4  0x0000000000419b83 in do_background_push ()*
#5  0x000000000042e56a in worker_routine ()
#6  0x0000003917207851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003916ee767d in clone () from /lib64/libc.so.6

Thread 36 (Thread 0x7f3c2abfd700 (LWP 116775)):
#0  0x0000003916eda37d in read () from /lib64/libc.so.6
#1  0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6
#2  0x000000000042a89d in eventfd_xread ()
#3  0x0000000000419acb in object_cache_push ()
*#4  0x0000000000419b83 in do_background_push ()*
#5  0x000000000042e56a in worker_routine ()
#6  0x0000003917207851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003916ee767d in clone () from /lib64/libc.so.6

Thread 35 (Thread 0x7f3b5d7fb700 (LWP 116889)):
#0  0x0000003916eda37d in read () from /lib64/libc.so.6
#1  0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6
#2  0x000000000042a89d in eventfd_xread ()
#3  0x0000000000419acb in object_cache_push ()
*#4  0x0000000000419b83 in do_background_push ()*
#5  0x000000000042e56a in worker_routine ()
#6  0x0000003917207851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003916ee767d in clone () from /lib64/libc.so.6

Thread 34 (Thread 0x7f3b4ffff700 (LWP 116891)):
#0  0x0000003916eda37d in read () from /lib64/libc.so.6
#1  0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6
#2  0x000000000042a89d in eventfd_xread ()
#3  0x0000000000419acb in object_cache_push ()
*#4  0x0000000000419b83 in do_background_push ()*
#5  0x000000000042e56a in worker_routine ()
#6  0x0000003917207851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003916ee767d in clone () from /lib64/libc.so.6

Thread 33 (Thread 0x7f3ac8dfa700 (LWP 117040)):
#0  0x0000003916eda37d in read () from /lib64/libc.so.6
#1  0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6
#2  0x000000000042a89d in eventfd_xread ()
#3  0x0000000000419acb in object_cache_push ()
*#4  0x0000000000419b83 in do_background_push ()*
#5  0x000000000042e56a in worker_routine ()
#6  0x0000003917207851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003916ee767d in clone () from /lib64/libc.so.6

Thread 32 (Thread 0x7f3ac83f9700 (LWP 117041)):
#0  0x0000003916eda37d in read () from /lib64/libc.so.6
#1  0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6
#2  0x000000000042a89d in eventfd_xread ()
#3  0x0000000000419acb in object_cache_push ()
*#4  0x0000000000419b83 in do_background_push ()*
#5  0x000000000042e56a in worker_routine ()
#6  0x0000003917207851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003916ee767d in clone () from /lib64/libc.so.6

Thread 31 (Thread 0x7f3ac65f6700 (LWP 117044)):
#0  0x0000003916eda37d in read () from /lib64/libc.so.6
#1  0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6
#2  0x000000000042a89d in eventfd_xread ()
#3  0x0000000000419acb in object_cache_push ()
*#4  0x0000000000419b83 in do_background_push ()*
#5  0x000000000042e56a in worker_routine ()
#6  0x0000003917207851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003916ee767d in clone () from /lib64/libc.so.6

No threads are pushing objects, so no object_cache_push work finished.


In gdb,  we can see the information of each object cache in
object_cache_push:

vid = 9627038, push_count = 26, dirty_count = 150, total_count = 154
vid = 3508964, push_count = 22, dirty_count = 1456, total_count = 1464
vid = 360229, push_count = 18, dirty_count = 1437, total_count = 1444
vid = 9678955, push_count = 34, dirty_count = 1462, total_count = 1470
vid = 9008538, push_count = 17, dirty_count = 1490, total_count = 1493
vid = 2383510, push_count = 28, dirty_count = 1494, total_count = 1498
vid = 16192623, push_count = 19, dirty_count = 1447, total_count = 1451

push_count is far less than dirty_count, and no threads is
doing do_push_object work, so

static void do_push_object(struct work *work)
     if (uatomic_sub_return(&oc->push_count, 1) == 0)
          eventfd_xwrite(oc->push_efd, 1);

will never be kicked.

And in

static bool wq_need_grow(struct wq_info *wi)
{
     if (wi->nr_threads < uatomic_read(&wi->nr_queued_work) &&
         wi->nr_threads * 2 <= wq_get_roof(wi)) {
          wi->tm_end_of_protection = get_msec_time() +
               WQ_PROTECTION_PERIOD;
          return true;
     }

     return false;
}

nr_threads is 7,  wq_get_roof(wi) returns 10( 2 * five nodes).
so no more threads will be created, and all threads are waiting
for do_push_object finished.


Hope that the above information is clearly for everyone.


Let's discuss the solution now.
The oc_push_wqueue is created with WQ_DYNAMIC:

sys->oc_push_wqueue = create_work_queue("oc_push", WQ_DYNAMIC)

So the roof of threads number will be

     case WQ_DYNAMIC:
          /* FIXME: 2 * nr_nodes threads. No rationale yet. */
          nr = nr_nodes * 2;
          break;

There are also other work queue created with  WQ_DYNAMIC:

wq = create_work_queue("vdi check", WQ_DYNAMIC);

sys->http_wqueue = create_work_queue("http", WQ_DYNAMIC);


oc_push created with WQ_UNLIMITED is not rational too.


*I think that, the nr_threads working with oc_push should be (2 * number of
object cache), not (2 * nr_nodes), to ensure that there will be always
enougth threads doing do_push_object work.*


With your advises, I wish to submit patches to solve this problem.


Thanks.

-- 
Xu Fang






More information about the sheepdog mailing list