Menu

#35 Excessive (over a minute) time spent in garbage collection.

open
nobody
None
5
2017-11-13
2017-08-14
Bob Isch
No

Consider the following example code that allocates a large number of records and stores them in a local array - rs("foo",1..n). At about 500 records we start seeing memory expanding and apparent garbage collection take place, at about 2000 to 4000 records these events start taking over 10ms, at 8000 to 10000 we start approaching 100ms and at around 40000 we start exceeding 1s and regularly see multi-second events. Worse, at slightly over 200K records we see cases of 54s and 78s, which seem to be very extreme examples.

Tests were run with GT.M V6.3-001A (which seemed to exhibit some slight improvements over 6.3-001) on an Ubuntu VM with 2 cores of an I7 processor and 4GB of memory allocated.

Apparently, garbage collection with this amount of memory usage (30 to 60MB) causes some sort of pathological behavior. Further evidence shown in gcTest2 and gcTest3 is that "pre-allocating" and freeing some memory makes the example run MUCH faster, although still taking several (10 to 15) seconds (instead of nearly 3 minutes).

We are seeing this occur in production with 40K to 50K records. This example uses 300K records to demonstrate the problem. Also, in production we appear to see similar issues when memory is reclaimed but that is not demonstrated here.

Pre-allocating memory, while a possible work-around in some cases, is not a viable solution since the amount of memory required is highly variable and not known prior to process initiation. Guessing too low still results in hitting the 1 minute garbage collection times so it does little good. Having too many (dozens) of processes pre-allocate this much memory will result in the system completely exhausting memory quite quickly, especially if the run-times exceed the expected "few seconds" resulting in even more simultaneous processes.

Obviously, using globals instead of locals eliminates the problem (and runs reasonably fast for building the structures) but they still cannot be loaded into local memory without experiencing even worse performance hits (several minutes to simply merge the result from a global into a local and pre-allocating does NOT seem to help here.)

We typically run in UTF-8 mode, which has it's own performance issues with long strings for example, however, this was tested in both M and UTF-8 mode with similar results.

QUESTIONS: Is this "normal" due to the amount of local memory being used? Are there any mechanisms to "tune" the memory management for something like this?

$ mumps gcTest.m && time mumps -run 'gcTest1^gcTest'
tm:     0.0000000s 101960,87,0 test-Start
tm:     0.0004250s 101960,198,0 test-0
tm:     0.0033930s 101960,80741,0 test-537 101960,101892,0
tm:     0.0007770s 130184,97517,0 test-649 101960,101859,0
tm:     0.0012620s 130184,123020,0 test-819 130184,130093,0
tm:     0.0005700s 163000,128564,0 test-856 130184,129969,0
tm:     0.0015660s 163000,155445,0 test-1035 163000,162919,0
tm:     0.0009150s 203864,161265,0 test-1074 163000,162780,0
tm:     0.0013280s 203864,194421,0 test-1295 203864,203726,0
tm:     0.0006080s 258712,201771,0 test-1344 203864,203686,0
tm:     0.0028550s 258712,246015,0 test-1639 258712,258514,0
tm:     0.0008290s 323896,255921,0 test-1705 258712,258561,0
tm:     0.0097580s 323896,308715,0 test-2057 323896,323665,0
tm:     0.0014770s 408520,320571,0 test-2136 323896,323770,0
tm:     0.0032640s 408520,388971,0 test-2592 408520,408387,0
tm:     0.0032780s 511880,404115,0 test-2693 408520,408272,0
tm:     0.0000430s 511880,404415,0 test-2694 408520,404009,0
tm:     0.0035150s 511880,487821,0 test-3251 511880,511724,0
tm:     0.0014610s 642296,506571,0 test-3376 511880,511754,0
tm:     0.0171080s 642296,612015,0 test-4079 642296,642058,0
tm:     0.0048870s 806760,635571,0 test-4236 642296,642124,0
tm:     0.0252160s 806760,768615,0 test-5123 806760,806570,0
tm:     0.0039530s 1011432,798325,0 test-5321 806760,806637,0
tm:     0.0194400s 1011432,961407,0 test-6408 1011432,1011353,0
tm:     0.0267270s 1265224,999657,0 test-6663 1011432,1011363,0
tm:     0.0221780s 1265224,1199883,0 test-7998 1265224,1265118,0
tm:     0.0693960s 1581880,1249260,0 test-8327 1265224,1265156,0
tm:     0.0186070s 1581880,1500033,0 test-9999 1581880,1581790,0
tm:     0.0369740s 1978648,1561876,0 test-10411 1581880,1581823,0
tm:     0.1100580s 1978648,1876110,0 test-12506 1978648,1978583,0
tm:     0.0679510s 2475208,1953326,0 test-13021 1978648,1978397,0
tm:     0.0310030s 2475208,2346784,0 test-15644 2475208,2475105,0
tm:     0.1110180s 3094280,2443711,0 test-16290 2475208,2475140,0
tm:     0.2662200s 3094280,2934027,0 test-19559 3094280,3094044,0
tm:     0.2697210s 3871128,3054976,0 test-20365 3094280,3094223,0
tm:     0.3646630s 3871128,3670077,0 test-24466 3871128,3870877,0
tm:     0.4478820s 4840584,3821776,0 test-25477 3871128,3871068,0
tm:     0.0978320s 4840584,4589734,0 test-30597 4840584,4840458,0
tm:     0.0510140s 6052984,4778884,0 test-31858 4840584,4840475,0
tm:     0.1280970s 6052984,5739334,0 test-38261 6052984,6052883,0

tm:     1.3926120s 7569512,5975727,0 test-39837 6052984,6052760,0

tm:     2.5212680s 7569512,7177077,0 test-47846 7569512,7569320,0

tm:     2.1682290s 9464600,7472877,0 test-49818 7569512,7569307,0
tm:     0.2267600s 9464600,8974234,0 test-59827 9464600,9464470,0

tm:     3.9496750s 11831816,9343831,0 test-62291 9464600,9464372,0

tm:     5.8819670s 11831816,11171438,0 test-74475 11831816,11831600,0

tm:     6.1044010s 14792440,11654775,0 test-77697 11831816,11831744,0
tm:     0.4754620s 14792440,13950646,0 test-93003 14792440,14792301,0

tm:     9.7952690s 18491000,14566743,0 test-97110 14792440,14792377,0
tm:     0.6320740s 18491000,17438145,0 test-116253 18491000,18490854,0
tm:     0.3481780s 23114104,18208547,0 test-121389 18491000,18490821,0

tm:     1.1334930s 23114104,21798047,0 test-145319 23114104,23113993,0
tm:     0.5759040s 28893400,22761046,0 test-151739 23114104,23113943,0

tm:     1.5282990s 28893400,27248147,0 test-181653 28893400,28893212,0
tm:     0.6180470s 36119256,28452047,0 test-189679 28893400,28893273,0

tm:    54.3811250s 36119256,34062189,0 test-227080 36119256,36119048,0

tm:     1.2386150s 45152648,35567447,0 test-237115 36119256,36119160,0

tm:    78.5540950s 45152648,42580989,0 test-283872 45152648,45152428,0

tm:     1.4756170s 56442264,44462747,0 test-296417 45152648,45152510,0
tm:     0.3716910s 56442264,45197506,0 test-DONE

real    2m55.625s
user    2m42.316s
sys 0m2.340s

$ mumps gcTest.m && time mumps -run 'gcTest2^gcTest'
tm:     0.0000000s 101960,87,0
tm:     0.0345950s 69238344,34700161,0 Initial space.
tm:     0.0000700s 69238344,34700255,0 Kill...
tm:     0.0000250s 69238344,34700349,0 GC:test1-DONE
tm:     0.0000250s 69238344,34700443,0 test-Start
tm:     0.0000640s 69238344,34700599,0 test-0

tm:     3.9339810s 69238344,25271747,0 test-168477 69238344,69238180,0

tm:     8.5417300s 69238344,52234206,0 test-DONE

real    0m12.545s
user    0m11.556s
sys 0m0.552s

$ mumps gcTest.m && time mumps -run 'gcTest3^gcTest'
tm:     0.0000000s 101960,87,0
tm:     0.0329500s 69238344,34700161,0 Initial space.
tm:     0.0001200s 69238344,34700255,0 Kill...
tm:     0.0000320s 69238344,103,0 GC:test1-DONE
tm:     0.0000150s 69238344,182,0 test-Start
tm:     0.0000300s 69238344,308,0 test-0

tm:    11.3966260s 69238344,61334997,0 test-DONE

real    0m11.475s
user    0m10.364s
sys 0m0.224s

Code:

gcTest   ; Examine some GT.M GC issues.

gcTest1  ; Basic test - 94.8s.
   d dotest()
   q  ; >>> gcTest1

gcTest2  ; Test with preallocation but no forced GC 5.0s.
   d prealloc(50E6,0)
   d dotest()
   q  ; >>> gcTest2

gcTest3  ; Test with preallocation and forced GC - 4.5s.
   d prealloc(50E6,1)
   d dotest()
   q  ; >>> gcTest2

zhtm(zh,ms)  ; Return timestamp based on $zh value
   ;
   n (z,zh,ms)
   s zh=$g(zh,$zh)
   s t=$p(zh,",")*3600*24+$p(zh,",",2)+($p(zh,",",3)/1E6)
   i '$quit d
   .  s z("lszh")=$g(z("lszh"),t)
   .  w:t-z("lszh")>1 !
   .  w "tm:"_$j(t-z("lszh"),14,7)_"s"_" "_$VIEW("SPSIZE")_" "_$g(ms),!
   .  s z("lszh")=t
   q:$quit t q  ; >>> zhtm

dotest()  ; Try to produce a catastrophic GC situation.
   ;
   n (z)
   ;
   d zhtm(,"test-Start")
   ;
   ; 200,000 * 10 * (15+6) = ~42,000,000
   s lsspsz=""
   f i=0:1:300000 d
   .  s spsz=$VIEW("SPSIZE")
   .  i (+spsz'=+lsspsz)!($p(spsz,",",2)<$p(lsspsz,",",2)) d zhtm(,"test-"_i_" "_lsspsz)
   .  s lsspsz=spsz
   .  k rc
   .  f r=1:1:10 d
   .  .  s rc($ze("abcdefghijklmnopqrstuvwxyz",r,r+5))=$zj("",15) ; ="aaaaabbbbbccccc" (constant here results in interesting optimizations)
   .  m rs("foo",i)=rc
   .  ;m rs("foo",300000-i)=rc ; Reversing subscripts - no help
   .  ;m rs("foo",-i)=rc   ; Negative subscript - no help
   .  ;s n="" f  s n=$o(rc(n)) q:n=""  s rs("foo",i,n)=rc(n)  ; No change here.
   ;
   d zhtm(,"test-DONE")
   ;
   q  ; >>> dotest

prealloc(sz,md)  ; Test pre-allocating string-pool space (heap):
   ;
   n (z,sz,md)
   ;
   s sz=$g(sz,1000E6)
   s md=$g(md)
   ;
   d zhtm()
   f i=0:1:1000 s i(i)=$zj(i,100000) q:$VIEW("SPSIZE")>sz
   d zhtm(,"Initial space.")
   k i
   d zhtm(,"Kill...")
   ;s n3=$VIEW("LV_GCOL")
   ;d zhtm(,"GC:"_n3)
   i md d
   .  ; --- Full garbage collection/cleanup
   .  VIEW "LV_GCOL"    ;- Starts a data-space garbage collection.
   .  VIEW "LV_REHASH"  ;- Starts a reorg of the local variable lookup table.
   .  VIEW "STP_GCOL"   ;- Starts a string-pool garbage collection.
   ; $VIEW:
   ;  - LV_CREF    - Returns the number of references by alias containers to the array associated...
   ;  - LV_GCOL    * Returns the number of data-spaces recovered during a local variable data-space GC it triggers.
   ;  - LV_REF     - Returns the total number of references to data-space assocated with an unsubscripted LVN.
   d zhtm(,"GC:"_"test1-DONE")
   ;
   q  ; >>> prealloc

Discussion

  • FIS GT.M Support

    Bob:

    Thank you for providing the sample program and being a long time user of GT.M 😊

    From below:
    Obviously, using globals instead of locals eliminates the problem…

    Would it be possible to use just use globals?

    Process specific globals can be done by using an environment variable in the global directory (and setting the environment variable appropriately for each process) or by using the processID as one of the indices.

    More recently versions of GT.M support automatically creating databases (they still need to be manually deleted):
    http://tinco.pair.com/bhaskar/gtm/doc/articles/GTM_V6.3-001_Release_Notes.html
    GDE, ADD. CHANGE and TEMPLATE commands accept the following qualifiers for REGION objects:
    • -[NO]AUTODB specifies whether GT.M should automatically create the associated database file if it does not exist when a process first attempts to access it; it defaults to NOAUTODB. This feature is not intended to replace MUPIP CREATE, but rather permit somewhat easier management of temporary databases or process private databases.

    The amount of work the garbage collector needs to do increases relative to the number of local variables. Getting the string pool to the appropriate size for your working set will improve performance (as you demonstrated). You should try to use the fewest “temporary” local variables to do this (since garbage collection work is relative to the number of local variables). Note: local variables can now hold up to 1MB (you used 100K).

    I have created a tracking number for allowing a process to specify its initial string pool size (included in the subject line).

    Thanks,
    Bill

     
  • Bob Isch

    Bob Isch - 2017-09-09

    Hi Bill,

    Thanks for the feedback. Using globals is always an option but it just seemed like the amount of time being take was totally out of line. Not that I've worked on garbage collection code lately but if we can create a "mess" of the memory in less than 1 cpu second we should be able "reorganize" it in less than 100 times that.

    Thanks again for taking a look though,
    -bob

     
  • K.S. Bhaskar

    K.S. Bhaskar - 2017-11-10

    On YottaDB (http://yottaddb.com / https://github.com/YottaDB/YottaDB), which is built on the foundation of the GT.M code base, we were able to replicate this issue (https://github.com/YottaDB/YottaDB/issues/85) and fix it (in one case, a test that took 83 seconds now takes .5 seconds). The fix will be in the next YottaDB release; if you need it sooner, please download from the source code committed at https://github.com/YottaDB/YottaDB and build it.

     
  • Bob Isch

    Bob Isch - 2017-11-13

    Hi Bhaskar. This is awsome news! Thanks to you and the others involved that tracked this down.

    Does anyone know if/when this will be backported into GT.M?

     
    • K.S. Bhaskar

      K.S. Bhaskar - 2017-11-13

      Bob, as I don't work for FIS, I cannot speak to FIS' intentions. I would certainly welcome it if they did.

       

Log in to post a comment.