[sheepdog] [PATCH 0/2] avoid heavy disk I/O in the main thread
Hitoshi Mitake
mitake.hitoshi at lab.ntt.co.jp
Wed Oct 15 11:20:06 CEST 2014
At Wed, 15 Oct 2014 17:26:50 +0900,
tuji wrote:
>
> Hi,
>
> > I am going to teset it.
> > And I will check real elapsed time of purgeing for our system.
> > It will help to know it is bottoleneck or not.
>
> I check elapsed time.
>
> It was 101 sec for about 1TB(263967 object) delete.
>
> Oct 15 10:58:10 INFO [main] recover_object_main(624) object c7ab8e0001de02 is recovered (263966/263967)
> Oct 15 10:58:11 INFO [main] recover_object_main(624) object c7ab8e0003fe15 is recovered (263967/263967)
> Oct 15 10:58:11 INFO [main] default_cleanup(177) cleaning pruge_state_dir
> Oct 15 10:59:52 INFO [main] default_cleanup(179) cleaning pruge_state_dir done
>
> It is not moment time. I think we needs mechanism avoid-blocking during
> pruge state dir.
>
> Mitake-san,
>
> Becase our system running on 0.7.x,
> I can not test with your avoid-blocking branch.
>
> I have tested v0.7.6 + your patch.
> https://github.com/masa-tu/sheepdog/tree/v0.7.6-avoid-blocking
>
> let me know if this patch is not correct or If it is better test with your
> original blanch.
>
> The result was not good.
> I/O had been blocked when cleaning pruge_state_dir was started.
> and after 2 min, It said "Cannot allocate memory" and crash.
>
> Oct 15 16:28:33 INFO [main] recover_object_main(624) object 84a0df00005937 is recovered (155121/155125)
> Oct 15 16:28:33 INFO [main] recover_object_main(624) object 84a0df00013fd9 is recovered (155122/155125)
> Oct 15 16:28:33 INFO [main] recover_object_main(624) object 84a0df00011bbf is recovered (155123/155125)
> Oct 15 16:28:33 INFO [main] recover_object_main(624) object 84a0df0002fdfe is recovered (155124/155125)
> Oct 15 16:28:33 INFO [main] recover_object_main(624) object 84a0df00002ec0 is recovered (155125/155125)
> Oct 15 16:28:33 INFO [main] default_cleanup(168) cleaning pruge_state_dir start
> Oct 15 16:30:20 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> 〜省略〜
> Oct 15 16:30:37 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37 ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37 ERROR [main] crash_handler(524) sheep pid 6694 exited unexpectedly.
>
> I had been running stress command and watching by vmstat command on the
> vm.
> vmstat saied that the I/O was stop at 16:28:34.
>
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 1 3 0 971316 28908 792136 0 0 0 34380 259 181 0 8 14 78 0 2014-10-15 16:28:08 JST
> 0 3 0 941928 28908 820760 0 0 0 30696 161 132 0 4 0 96 1 2014-10-15 16:28:09 JST
> 0 3 0 903116 28908 857456 0 0 0 53328 233 185 0 7 0 93 0 2014-10-15 16:28:10 JST
> 1 3 0 865668 28908 896516 0 0 0 21044 180 86 0 5 0 95 0 2014-10-15 16:28:11 JST
> 0 3 0 807760 28908 951316 0 0 0 71156 314 145 0 10 1 88 1 2014-10-15 16:28:12 JST
> 0 3 0 757912 28908 1000272 0 0 0 33776 279 195 0 8 12 79 2 2014-10-15 16:28:13 JST
> 0 3 0 711040 28908 1045128 0 0 0 57344 255 129 0 7 0 92 1 2014-10-15 16:28:14 JST
> 0 3 0 648048 28908 1106340 0 0 0 57344 297 138 0 8 0 92 0 2014-10-15 16:28:15 JST
> 0 3 0 597456 28912 1151224 0 0 0 74236 246 133 0 8 0 92 1 2014-10-15 16:28:16 JST
> 0 1 0 570052 28912 1179744 0 0 0 33272 130 56 0 5 15 80 0 2014-10-15 16:28:17 JST
> 0 3 0 519584 28916 1228716 0 0 0 35416 188 96 0 7 4 90 0 2014-10-15 16:28:18 JST
> 0 3 0 481888 28916 1265460 0 0 0 34840 160 87 0 5 7 88 0 2014-10-15 16:28:19 JST
> 0 3 0 456964 28916 1289912 0 0 0 27616 131 94 0 4 24 71 2 2014-10-15 16:28:20 JST
> 0 3 0 431420 28916 1314392 0 0 0 41968 117 79 0 4 6 91 0 2014-10-15 16:28:21 JST
> 0 3 0 1379524 28916 387052 0 0 0 20540 403 151 0 17 0 83 0 2014-10-15 16:28:22 JST
> 0 3 0 1352120 28916 417192 0 0 0 0 137 118 0 3 32 65 1 2014-10-15 16:28:23 JST
> 0 3 0 1323228 28916 450744 0 0 0 0 193 146 0 3 47 50 0 2014-10-15 16:28:24 JST
> 0 1 0 1383616 28928 388208 0 0 0 64012 404 304 0 9 20 71 1 2014-10-15 16:28:25 JST
> 0 1 0 1383740 28928 388200 0 0 0 27676 71 20 0 2 50 48 0 2014-10-15 16:28:26 JST
> 0 2 0 1313556 28928 457508 0 0 0 55808 274 42 0 10 39 50 1 2014-10-15 16:28:27 JST
> 0 2 0 1268792 28928 502428 0 0 0 43000 249 138 0 6 5 89 1 2014-10-15 16:28:28 JST
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 1 2 0 1214728 28928 555472 0 0 0 38944 261 148 0 7 23 70 0 2014-10-15 16:28:29 JST
> 1 2 0 1152232 28936 615712 0 0 0 69776 397 166 0 12 29 58 2 2014-10-15 16:28:30 JST
> 1 3 0 1105732 28936 661564 0 0 0 29188 226 166 0 5 0 95 0 2014-10-15 16:28:31 JST
> 0 3 0 1068160 28936 698252 0 0 0 40432 249 266 0 5 0 94 1 2014-10-15 16:28:32 JST
> 0 3 0 1055140 28936 710492 0 0 0 12300 80 85 1 2 0 98 0 2014-10-15 16:28:33 JST
> 0 3 0 1055140 28936 710492 0 0 0 0 30 34 0 0 0 100 0 2014-10-15 16:28:34 JST
> 1 3 0 1055140 28936 710492 0 0 0 0 49 32 0 0 0 100 0 2014-10-15 16:28:35 JST
> 0 3 0 1055140 28936 710492 0 0 0 0 31 36 0 0 0 100 0 2014-10-15 16:28:36 JST
> 0 3 0 1055140 28936 710492 0 0 0 0 22 27 0 0 0 100 0 2014-10-15 16:28:37 JST
> 0 3 0 1055140 28936 710492 0 0 0 0 22 29 0 0 0 100 0 2014-10-15 16:28:38 JST
> 0 3 0 1055140 28936 710492 0 0 0 0 30 29 0 0 0 100 0 2014-10-15 16:28:39 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 29 30 0 0 0 100 0 2014-10-15 16:28:40 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 26 30 0 0 0 100 0 2014-10-15 16:28:41 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 23 31 0 0 0 100 0 2014-10-15 16:28:42 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 28 28 0 0 0 100 0 2014-10-15 16:28:43 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 30 30 0 0 0 99 0 2014-10-15 16:28:44 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 24 30 0 0 0 100 0 2014-10-15 16:28:45 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 29 39 0 0 0 99 1 2014-10-15 16:28:46 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 28 28 0 0 0 100 0 2014-10-15 16:28:47 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 30 31 0 0 0 100 0 2014-10-15 16:28:48 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 31 28 0 0 0 100 0 2014-10-15 16:28:49 JST
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 0 3 0 1055264 28936 710492 0 0 0 0 29 35 0 0 0 100 0 2014-10-15 16:28:50 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 27 34 0 0 0 100 0 2014-10-15 16:28:51 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 31 35 0 0 0 100 0 2014-10-15 16:28:52 JST
> 1 3 0 1055264 28936 710492 0 0 0 0 33 28 0 0 0 100 0 2014-10-15 16:28:53 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 36 33 0 0 0 100 0 2014-10-15 16:28:54 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 25 30 0 0 0 100 1 2014-10-15 16:28:55 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 34 33 0 0 0 100 0 2014-10-15 16:28:56 JST
> 1 3 0 1055264 28936 710492 0 0 0 0 42 32 0 0 0 100 0 2014-10-15 16:28:57 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 32 37 0 0 0 100 0 2014-10-15 16:28:58 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 27 29 0 0 0 100 0 2014-10-15 16:28:59 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 27 30 0 0 0 100 0 2014-10-15 16:29:00 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 34 31 0 0 0 100 0 2014-10-15 16:29:01 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 45 32 0 0 0 100 0 2014-10-15 16:29:02 JST
> 1 3 0 1055264 28936 710492 0 0 0 0 33 29 0 0 0 100 0 2014-10-15 16:29:03 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 26 31 0 0 0 100 0 2014-10-15 16:29:04 JST
> 0 3 0 1055264 28936 710492 0 0 0 0 28 27 0 0 0 100 0 2014-10-15 16:29:05 JST
> 0 3 0 1055140 28936 710496 0 0 0 0 31 33 0 0 0 100 0 2014-10-15 16:29:06 JST
> 0 3 0 1055140 28936 710496 0 0 0 0 24 27 0 0 0 100 0 2014-10-15 16:29:07 JST
> 0 3 0 1055140 28936 710496 0 0 0 0 36 30 0 0 0 100 0 2014-10-15 16:29:08 JST
> 0 3 0 1055140 28936 710496 0 0 0 0 30 29 0 0 0 100 0 2014-10-15 16:29:09 JST
> 0 3 0 1055140 28936 710496 0 0 0 0 30 35 0 0 0 100 0 2014-10-15 16:29:10 JST
>
>
> Masahiro Tsuji
>
Thanks for your testing. And sorry for the bad result. I'll dig in it soon.
Thanks,
Hitoshi
More information about the sheepdog
mailing list