[sheepdog] [PATCH 0/2] avoid heavy disk I/O in the main thread

tuji tuji at atworks.co.jp
Wed Oct 15 10:26:50 CEST 2014


Hi,

> I am going to teset it.
> And I will check real elapsed time of  purgeing  for our system.
> It will help to know it is bottoleneck or not.

I check elapsed time.

It was  101 sec for about 1TB(263967 object) delete.

Oct 15 10:58:10   INFO [main] recover_object_main(624) object c7ab8e0001de02 is recovered (263966/263967)
Oct 15 10:58:11   INFO [main] recover_object_main(624) object c7ab8e0003fe15 is recovered (263967/263967)
Oct 15 10:58:11   INFO [main] default_cleanup(177) cleaning pruge_state_dir
Oct 15 10:59:52   INFO [main] default_cleanup(179) cleaning pruge_state_dir done

It is not moment time. I think we needs mechanism avoid-blocking during
pruge  state dir.

Mitake-san,

Becase our system running on 0.7.x,
I can not test with your avoid-blocking branch.

I have tested v0.7.6 + your patch.
https://github.com/masa-tu/sheepdog/tree/v0.7.6-avoid-blocking

let me know if this patch is not correct or If it is better test with your
original blanch.

The result was not good.
I/O had been blocked when cleaning pruge_state_dir was started.
and after 2 min, It said "Cannot allocate memory" and crash.

Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df00005937 is recovered (155121/155125)
Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df00013fd9 is recovered (155122/155125)
Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df00011bbf is recovered (155123/155125)
Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df0002fdfe is recovered (155124/155125)
Oct 15 16:28:33   INFO [main] recover_object_main(624) object 84a0df00002ec0 is recovered (155125/155125)
Oct 15 16:28:33   INFO [main] default_cleanup(168) cleaning pruge_state_dir start
Oct 15 16:30:20  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:21  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
〜省略〜
Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:37  ERROR [main] create_worker_threads(147) failed to create worker thread: Cannot allocate memory
Oct 15 16:30:37  ERROR [main] crash_handler(524) sheep pid 6694 exited unexpectedly.

I had been running stress command and watching by vmstat command on the
vm.
vmstat saied that the I/O was stop at 16:28:34.

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  3      0 971316  28908 792136    0    0     0 34380  259  181  0  8 14 78  0        2014-10-15 16:28:08 JST
 0  3      0 941928  28908 820760    0    0     0 30696  161  132  0  4  0 96  1        2014-10-15 16:28:09 JST
 0  3      0 903116  28908 857456    0    0     0 53328  233  185  0  7  0 93  0        2014-10-15 16:28:10 JST
 1  3      0 865668  28908 896516    0    0     0 21044  180   86  0  5  0 95  0        2014-10-15 16:28:11 JST
 0  3      0 807760  28908 951316    0    0     0 71156  314  145  0 10  1 88  1        2014-10-15 16:28:12 JST
 0  3      0 757912  28908 1000272    0    0     0 33776  279  195  0  8 12 79  2       2014-10-15 16:28:13 JST
 0  3      0 711040  28908 1045128    0    0     0 57344  255  129  0  7  0 92  1       2014-10-15 16:28:14 JST
 0  3      0 648048  28908 1106340    0    0     0 57344  297  138  0  8  0 92  0       2014-10-15 16:28:15 JST
 0  3      0 597456  28912 1151224    0    0     0 74236  246  133  0  8  0 92  1       2014-10-15 16:28:16 JST
 0  1      0 570052  28912 1179744    0    0     0 33272  130   56  0  5 15 80  0       2014-10-15 16:28:17 JST
 0  3      0 519584  28916 1228716    0    0     0 35416  188   96  0  7  4 90  0       2014-10-15 16:28:18 JST
 0  3      0 481888  28916 1265460    0    0     0 34840  160   87  0  5  7 88  0       2014-10-15 16:28:19 JST
 0  3      0 456964  28916 1289912    0    0     0 27616  131   94  0  4 24 71  2       2014-10-15 16:28:20 JST
 0  3      0 431420  28916 1314392    0    0     0 41968  117   79  0  4  6 91  0       2014-10-15 16:28:21 JST
 0  3      0 1379524  28916 387052    0    0     0 20540  403  151  0 17  0 83  0       2014-10-15 16:28:22 JST
 0  3      0 1352120  28916 417192    0    0     0     0  137  118  0  3 32 65  1       2014-10-15 16:28:23 JST
 0  3      0 1323228  28916 450744    0    0     0     0  193  146  0  3 47 50  0       2014-10-15 16:28:24 JST
 0  1      0 1383616  28928 388208    0    0     0 64012  404  304  0  9 20 71  1       2014-10-15 16:28:25 JST
 0  1      0 1383740  28928 388200    0    0     0 27676   71   20  0  2 50 48  0       2014-10-15 16:28:26 JST
 0  2      0 1313556  28928 457508    0    0     0 55808  274   42  0 10 39 50  1       2014-10-15 16:28:27 JST
 0  2      0 1268792  28928 502428    0    0     0 43000  249  138  0  6  5 89  1       2014-10-15 16:28:28 JST
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  2      0 1214728  28928 555472    0    0     0 38944  261  148  0  7 23 70  0       2014-10-15 16:28:29 JST
 1  2      0 1152232  28936 615712    0    0     0 69776  397  166  0 12 29 58  2       2014-10-15 16:28:30 JST
 1  3      0 1105732  28936 661564    0    0     0 29188  226  166  0  5  0 95  0       2014-10-15 16:28:31 JST
 0  3      0 1068160  28936 698252    0    0     0 40432  249  266  0  5  0 94  1       2014-10-15 16:28:32 JST
 0  3      0 1055140  28936 710492    0    0     0 12300   80   85  1  2  0 98  0       2014-10-15 16:28:33 JST
 0  3      0 1055140  28936 710492    0    0     0     0   30   34  0  0  0 100  0      2014-10-15 16:28:34 JST
 1  3      0 1055140  28936 710492    0    0     0     0   49   32  0  0  0 100  0      2014-10-15 16:28:35 JST
 0  3      0 1055140  28936 710492    0    0     0     0   31   36  0  0  0 100  0      2014-10-15 16:28:36 JST
 0  3      0 1055140  28936 710492    0    0     0     0   22   27  0  0  0 100  0      2014-10-15 16:28:37 JST
 0  3      0 1055140  28936 710492    0    0     0     0   22   29  0  0  0 100  0      2014-10-15 16:28:38 JST
 0  3      0 1055140  28936 710492    0    0     0     0   30   29  0  0  0 100  0      2014-10-15 16:28:39 JST
 0  3      0 1055264  28936 710492    0    0     0     0   29   30  0  0  0 100  0      2014-10-15 16:28:40 JST
 0  3      0 1055264  28936 710492    0    0     0     0   26   30  0  0  0 100  0      2014-10-15 16:28:41 JST
 0  3      0 1055264  28936 710492    0    0     0     0   23   31  0  0  0 100  0      2014-10-15 16:28:42 JST
 0  3      0 1055264  28936 710492    0    0     0     0   28   28  0  0  0 100  0      2014-10-15 16:28:43 JST
 0  3      0 1055264  28936 710492    0    0     0     0   30   30  0  0  0 99  0       2014-10-15 16:28:44 JST
 0  3      0 1055264  28936 710492    0    0     0     0   24   30  0  0  0 100  0      2014-10-15 16:28:45 JST
 0  3      0 1055264  28936 710492    0    0     0     0   29   39  0  0  0 99  1       2014-10-15 16:28:46 JST
 0  3      0 1055264  28936 710492    0    0     0     0   28   28  0  0  0 100  0      2014-10-15 16:28:47 JST
 0  3      0 1055264  28936 710492    0    0     0     0   30   31  0  0  0 100  0      2014-10-15 16:28:48 JST
 0  3      0 1055264  28936 710492    0    0     0     0   31   28  0  0  0 100  0      2014-10-15 16:28:49 JST
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  3      0 1055264  28936 710492    0    0     0     0   29   35  0  0  0 100  0      2014-10-15 16:28:50 JST
 0  3      0 1055264  28936 710492    0    0     0     0   27   34  0  0  0 100  0      2014-10-15 16:28:51 JST
 0  3      0 1055264  28936 710492    0    0     0     0   31   35  0  0  0 100  0      2014-10-15 16:28:52 JST
 1  3      0 1055264  28936 710492    0    0     0     0   33   28  0  0  0 100  0      2014-10-15 16:28:53 JST
 0  3      0 1055264  28936 710492    0    0     0     0   36   33  0  0  0 100  0      2014-10-15 16:28:54 JST
 0  3      0 1055264  28936 710492    0    0     0     0   25   30  0  0  0 100  1      2014-10-15 16:28:55 JST
 0  3      0 1055264  28936 710492    0    0     0     0   34   33  0  0  0 100  0      2014-10-15 16:28:56 JST
 1  3      0 1055264  28936 710492    0    0     0     0   42   32  0  0  0 100  0      2014-10-15 16:28:57 JST
 0  3      0 1055264  28936 710492    0    0     0     0   32   37  0  0  0 100  0      2014-10-15 16:28:58 JST
 0  3      0 1055264  28936 710492    0    0     0     0   27   29  0  0  0 100  0      2014-10-15 16:28:59 JST
 0  3      0 1055264  28936 710492    0    0     0     0   27   30  0  0  0 100  0      2014-10-15 16:29:00 JST
 0  3      0 1055264  28936 710492    0    0     0     0   34   31  0  0  0 100  0      2014-10-15 16:29:01 JST
 0  3      0 1055264  28936 710492    0    0     0     0   45   32  0  0  0 100  0      2014-10-15 16:29:02 JST
 1  3      0 1055264  28936 710492    0    0     0     0   33   29  0  0  0 100  0      2014-10-15 16:29:03 JST
 0  3      0 1055264  28936 710492    0    0     0     0   26   31  0  0  0 100  0      2014-10-15 16:29:04 JST
 0  3      0 1055264  28936 710492    0    0     0     0   28   27  0  0  0 100  0      2014-10-15 16:29:05 JST
 0  3      0 1055140  28936 710496    0    0     0     0   31   33  0  0  0 100  0      2014-10-15 16:29:06 JST
 0  3      0 1055140  28936 710496    0    0     0     0   24   27  0  0  0 100  0      2014-10-15 16:29:07 JST
 0  3      0 1055140  28936 710496    0    0     0     0   36   30  0  0  0 100  0      2014-10-15 16:29:08 JST
 0  3      0 1055140  28936 710496    0    0     0     0   30   29  0  0  0 100  0      2014-10-15 16:29:09 JST
 0  3      0 1055140  28936 710496    0    0     0     0   30   35  0  0  0 100  0      2014-10-15 16:29:10 JST


Masahiro Tsuji




More information about the sheepdog mailing list