[sheepdog] [PATCH 0/2] avoid heavy disk I/O in the main thread

Hitoshi Mitake mitake.hitoshi at lab.ntt.co.jp
Wed Oct 15 11:20:06 CEST 2014


At Wed, 15 Oct 2014 17:26:50 +0900,
tuji wrote:
> 
> Hi,
> 
> > I am going to teset it.
> > And I will check real elapsed time of  purgeing  for our system.
> > It will help to know it is bottoleneck or not.
> 
> I check elapsed time.
> 
> It was  101 sec for about 1TB(263967 object) delete.
> 
> Oct 15 10:58:10   INFO [main] recover_object_main(624) object c7ab8e0001de02 is recovered (263966/263967)
> Oct 15 10:58:11   INFO [main] recover_object_main(624) object c7ab8e0003fe15 is recovered (263967/263967)
> Oct 15 10:58:11   INFO [main] default_cleanup(177) cleaning pruge_state_dir
> Oct 15 10:59:52   INFO [main] default_cleanup(179) cleaning pruge_state_dir done
> 
> It is not moment time. I think we needs mechanism avoid-blocking during
> pruge  state dir.
> 
> Mitake-san,
> 
> Becase our system running on 0.7.x,
> I can not test with your avoid-blocking branch.
> 
> I have tested v0.7.6 + your patch.
> https://github.com/masa-tu/sheepdog/tree/v0.7.6-avoid-blocking
> 
> let me know if this patch is not correct or If it is better test with your
> original blanch.
> 
> The result was not good.
> I/O had been blocked when cleaning pruge_state_dir was started.
> and after 2 min, It said "Cannot allocate memory" and crash.
> 
> Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df00005937 is recovered (155121/155125)
> Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df00013fd9 is recovered (155122/155125)
> Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df00011bbf is recovered (155123/155125)
> Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df0002fdfe is recovered (155124/155125)
> Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df00002ec0 is recovered (155125/155125)
> Oct 15 16:28:33   INFO [main] default_cleanup(168) cleaning pruge_state_dir start
> Oct 15 16:30:20  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> 〜省略〜
> Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
> Oct 15 16:30:37  ERROR [main] crash_handler(524) sheep pid 6694 exited unexpectedly.
> 
> I had been running stress command and watching by vmstat command on the
> vm.
> vmstat saied that the I/O was stop at 16:28:34.
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  3      0 971316  28908 792136    0    0     0 34380  259  181  0  8 14 78  0        2014-10-15 16:28:08 JST
>  0  3      0 941928  28908 820760    0    0     0 30696  161  132  0  4  0 96  1        2014-10-15 16:28:09 JST
>  0  3      0 903116  28908 857456    0    0     0 53328  233  185  0  7  0 93  0        2014-10-15 16:28:10 JST
>  1  3      0 865668  28908 896516    0    0     0 21044  180   86  0  5  0 95  0        2014-10-15 16:28:11 JST
>  0  3      0 807760  28908 951316    0    0     0 71156  314  145  0 10  1 88  1        2014-10-15 16:28:12 JST
>  0  3      0 757912  28908 1000272    0    0     0 33776  279  195  0  8 12 79  2       2014-10-15 16:28:13 JST
>  0  3      0 711040  28908 1045128    0    0     0 57344  255  129  0  7  0 92  1       2014-10-15 16:28:14 JST
>  0  3      0 648048  28908 1106340    0    0     0 57344  297  138  0  8  0 92  0       2014-10-15 16:28:15 JST
>  0  3      0 597456  28912 1151224    0    0     0 74236  246  133  0  8  0 92  1       2014-10-15 16:28:16 JST
>  0  1      0 570052  28912 1179744    0    0     0 33272  130   56  0  5 15 80  0       2014-10-15 16:28:17 JST
>  0  3      0 519584  28916 1228716    0    0     0 35416  188   96  0  7  4 90  0       2014-10-15 16:28:18 JST
>  0  3      0 481888  28916 1265460    0    0     0 34840  160   87  0  5  7 88  0       2014-10-15 16:28:19 JST
>  0  3      0 456964  28916 1289912    0    0     0 27616  131   94  0  4 24 71  2       2014-10-15 16:28:20 JST
>  0  3      0 431420  28916 1314392    0    0     0 41968  117   79  0  4  6 91  0       2014-10-15 16:28:21 JST
>  0  3      0 1379524  28916 387052    0    0     0 20540  403  151  0 17  0 83  0       2014-10-15 16:28:22 JST
>  0  3      0 1352120  28916 417192    0    0     0     0  137  118  0  3 32 65  1       2014-10-15 16:28:23 JST
>  0  3      0 1323228  28916 450744    0    0     0     0  193  146  0  3 47 50  0       2014-10-15 16:28:24 JST
>  0  1      0 1383616  28928 388208    0    0     0 64012  404  304  0  9 20 71  1       2014-10-15 16:28:25 JST
>  0  1      0 1383740  28928 388200    0    0     0 27676   71   20  0  2 50 48  0       2014-10-15 16:28:26 JST
>  0  2      0 1313556  28928 457508    0    0     0 55808  274   42  0 10 39 50  1       2014-10-15 16:28:27 JST
>  0  2      0 1268792  28928 502428    0    0     0 43000  249  138  0  6  5 89  1       2014-10-15 16:28:28 JST
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  2      0 1214728  28928 555472    0    0     0 38944  261  148  0  7 23 70  0       2014-10-15 16:28:29 JST
>  1  2      0 1152232  28936 615712    0    0     0 69776  397  166  0 12 29 58  2       2014-10-15 16:28:30 JST
>  1  3      0 1105732  28936 661564    0    0     0 29188  226  166  0  5  0 95  0       2014-10-15 16:28:31 JST
>  0  3      0 1068160  28936 698252    0    0     0 40432  249  266  0  5  0 94  1       2014-10-15 16:28:32 JST
>  0  3      0 1055140  28936 710492    0    0     0 12300   80   85  1  2  0 98  0       2014-10-15 16:28:33 JST
>  0  3      0 1055140  28936 710492    0    0     0     0   30   34  0  0  0 100  0      2014-10-15 16:28:34 JST
>  1  3      0 1055140  28936 710492    0    0     0     0   49   32  0  0  0 100  0      2014-10-15 16:28:35 JST
>  0  3      0 1055140  28936 710492    0    0     0     0   31   36  0  0  0 100  0      2014-10-15 16:28:36 JST
>  0  3      0 1055140  28936 710492    0    0     0     0   22   27  0  0  0 100  0      2014-10-15 16:28:37 JST
>  0  3      0 1055140  28936 710492    0    0     0     0   22   29  0  0  0 100  0      2014-10-15 16:28:38 JST
>  0  3      0 1055140  28936 710492    0    0     0     0   30   29  0  0  0 100  0      2014-10-15 16:28:39 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   29   30  0  0  0 100  0      2014-10-15 16:28:40 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   26   30  0  0  0 100  0      2014-10-15 16:28:41 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   23   31  0  0  0 100  0      2014-10-15 16:28:42 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   28   28  0  0  0 100  0      2014-10-15 16:28:43 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   30   30  0  0  0 99  0       2014-10-15 16:28:44 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   24   30  0  0  0 100  0      2014-10-15 16:28:45 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   29   39  0  0  0 99  1       2014-10-15 16:28:46 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   28   28  0  0  0 100  0      2014-10-15 16:28:47 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   30   31  0  0  0 100  0      2014-10-15 16:28:48 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   31   28  0  0  0 100  0      2014-10-15 16:28:49 JST
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  0  3      0 1055264  28936 710492    0    0     0     0   29   35  0  0  0 100  0      2014-10-15 16:28:50 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   27   34  0  0  0 100  0      2014-10-15 16:28:51 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   31   35  0  0  0 100  0      2014-10-15 16:28:52 JST
>  1  3      0 1055264  28936 710492    0    0     0     0   33   28  0  0  0 100  0      2014-10-15 16:28:53 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   36   33  0  0  0 100  0      2014-10-15 16:28:54 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   25   30  0  0  0 100  1      2014-10-15 16:28:55 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   34   33  0  0  0 100  0      2014-10-15 16:28:56 JST
>  1  3      0 1055264  28936 710492    0    0     0     0   42   32  0  0  0 100  0      2014-10-15 16:28:57 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   32   37  0  0  0 100  0      2014-10-15 16:28:58 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   27   29  0  0  0 100  0      2014-10-15 16:28:59 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   27   30  0  0  0 100  0      2014-10-15 16:29:00 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   34   31  0  0  0 100  0      2014-10-15 16:29:01 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   45   32  0  0  0 100  0      2014-10-15 16:29:02 JST
>  1  3      0 1055264  28936 710492    0    0     0     0   33   29  0  0  0 100  0      2014-10-15 16:29:03 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   26   31  0  0  0 100  0      2014-10-15 16:29:04 JST
>  0  3      0 1055264  28936 710492    0    0     0     0   28   27  0  0  0 100  0      2014-10-15 16:29:05 JST
>  0  3      0 1055140  28936 710496    0    0     0     0   31   33  0  0  0 100  0      2014-10-15 16:29:06 JST
>  0  3      0 1055140  28936 710496    0    0     0     0   24   27  0  0  0 100  0      2014-10-15 16:29:07 JST
>  0  3      0 1055140  28936 710496    0    0     0     0   36   30  0  0  0 100  0      2014-10-15 16:29:08 JST
>  0  3      0 1055140  28936 710496    0    0     0     0   30   29  0  0  0 100  0      2014-10-15 16:29:09 JST
>  0  3      0 1055140  28936 710496    0    0     0     0   30   35  0  0  0 100  0      2014-10-15 16:29:10 JST
> 
> 
> Masahiro Tsuji
> 

Thanks for your testing. And sorry for the bad result. I'll dig in it soon.

Thanks,
Hitoshi



More information about the sheepdog mailing list