From: Mark U. <ma...@cs...> - 2013-04-13 03:30:23
|
Andrius, Very interesting that duplication of big ints and signatures etc. is consuming so much memory. Assuming that these are immutable after they have been created, it sounds like it would be worth using a factory design pattern that caches the constructed objects. It would need to be a weak cache though, so the garbage collector could reclaim structures that are in the cache but are not used elsewhere. Like the Google Guava Interner ( http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Interner.html ) Cheers Mark On 13 April 2013 01:40, Andrius Velykis <and...@ne...>wrote: > Yes, the big integers will get large. But in your previous e-mail you said > "given most of them are shared"? So I wanted to object, that I think they > are not actually shared. > > -16..16 comes from BigInteger.valueOf() method - check the Java sources. I > wanted to say that the only sharing of BigInteger objects is in this range.. > > So I just wanted to say that BigIntegers do consitute a significant part > of CZT memory usage :) > > Andrius > > > On Fri, Apr 12, 2013 at 4:28 PM, Leo Freitas <leo...@ne...>wrote: > >> Hi Andrius, >> >> I don't think so. The big integers are used for the line / column / >> buffer length numbers no? >> So if you have large files they will get large. Or am I mistaken? Where >> did the -16..16 came from? >> >> Leo >> >> On 12 Apr 2013, at 09:12, Andrius Velykis < >> and...@ne...> wrote: >> >> Hi Leo, >> >> (reviving and old post..) >> >> As is, the only major problem with memory consumption is the >>> duplication (e.g., ArrayList/Object[]). >>> The presence of various (4-6) BigIntegers in LocAnn is also an issue, >>> but given most of them are shared >>> it's not such a big deal. >>> >> >> From what I gather, BigIntegers are only shared when in the range of >> [-16, 16] -- looking at BigInteger.valueOf()? Or am I mistaken? If so, most >> of the BigIntegers would not be shared in CZT.. >> >> Andrius >> >> >> On Wed, Jan 25, 2012 at 6:33 AM, Leo Freitas <leo...@ne... >> > wrote: >> >>> Hi Tim, >>> >>> Yes, after parsing. And yes, I found that rather odd too. There is >>> something funny going on >>> in Garbage Collection somewhere. I guess the trouble is that there are >>> certain structures >>> (e.g., TokenSequence iterators) that end up retaining the biggest part >>> of the object memory. >>> >>> Having said that, I was now trying to find the sources of the problem(s) >>> using more intrusive >>> snapshot points (AKA memory dumps at particular execution points). >>> >>> As is, the only major problem with memory consumption is the duplication >>> (e.g., ArrayList/Object[]). >>> The presence of various (4-6) BigIntegers in LocAnn is also an issue, >>> but given most of them are shared >>> it's not such a big deal. >>> >>> I've tweaked the PerformanceSettings constant(s) a little bit more to >>> keep arraylists/object[] capacity 0 at creation, >>> and that just about halved the memory used (!)... possibly at some speed >>> expense given it will take some time to >>> increase capacity. >>> >>> At the profiling sessions done, this wasn't a problem: I've put initial >>> capacity at 0, 1, and 10 (e.g., only very few go >>> beyond that, mainly on type info), and the CPU increase was marginal >>> (e,g., with 10 it was 300ms - 2.75% - quicker; with 1 100ms quicker), >>> yet the memory gain was significant (e.g., with 10 and 1 it was it was >>> twice the heap at 100MB instead of 50MB for 0). >>> >>> One nice thing would be to have this as a configuration file or a >>> SectionManager option.... once all key bottleneck points are identified. >>> By default I will keep it at 0, as the performance penalty in CPU time >>> seems marginal/acceptable (e.g., 2-3% penalty for halving memory needs). >>> >>> Suggestions / comments? >>> >>> Best, >>> Leo >>> >>> On 24 Jan 2012, at 23:11, Tim Miller wrote: >>> >>> > Nice work! >>> > >>> > When you say you added "parser = null", I assume you mean after you do >>> > the parsing? This would mean that, prior to your change, a reference to >>> > the parser is being kept even after the variable scope of the variable >>> > "parser" ends? That seems odd. >>> > >>> > On 25/01/12 01:22, Leo Freitas wrote: >>> >> Tim, >>> >> >>> >> I've forgot to mention below that I was talking about parsing and >>> typechecking only. >>> >> >>> >> I've just run the same profiling setup on the VCG for these larger >>> examples and yes, the worst ones were >>> >> Object[] (24%), char[] (17.3%), ArrayList[] (9%) and BigInteger >>> (2-3%). >>> >> >>> >> Surprising ones were int[] (6.5%) and short[] (6%) arrays, since they >>> are not explicitly created anywhere in CZT. >>> >> LocalAnn took 3%; Iterators were also a surprise: both iterator() and >>> listIterator() cover about 10%, where more >>> >> obvious ones like String (0.9%) and ZName (1.7%) were quite low. >>> >> >>> >> I've run the profiling sessions about 3-5 times on each example >>> taking the average. >>> >> >>> >> ---- >>> >> >>> >> Searching for potential sources of such unexpected types, I managed >>> to find a few successful candidates: >>> >> >>> >> a) ParseUtils >>> >> >>> >> char[], short[], and int[] are mostly present in the Java CUP >>> low-level classes to represent the internal parsing tables from the grammar. >>> >> I simply added an explicit "parser = null" to the main >>> ParseUtils.parse method. These arrays combined account for about 30% of >>> memory. >>> >> >>> >> after the change, these arrays now account for 1.1% (!!!) that's >>> quite an improvement. >>> >> >>> >> b) ListIterator and Iterator >>> >> >>> >> These appear in various places, but places that should influence all >>> runs are only on SmartScanner (and possibly PrettyPrinter). >>> >> After this change, they went from a combined (ListIterator + >>> Iterator) 49% footprint to 17%; another good improvement. >>> >> >>> >> d) ArrayList and Object[] >>> >> >>> >> There are various places and identifying where are the ones of most >>> significance is tedious/time-consuming. >>> >> Instead, I've done a thorough search through various projects and >>> changed default constructors to more sensible values, when possible, >>> >> or to a parameterised default when not (e.g., PerformanceSettings >>> interface in util project). >>> >> >>> >> This led to a decrease in the memory footprint of these objects of.... >>> >> >>> >> e) BigInteger >>> >> >>> >> Why do we need big integers within LocAnn and other places? I guess >>> because long would be potentially small? >>> >> But would there really be something bigger than 2^32 or 2^64 in >>> number of lines of source in a file say? Hum... >>> >> >>> >> In terms of memory footprint, it varied between 0.7% to 3%, depending >>> on the example. I won't change this one for now. >>> >> >>> >> f) Garbage collection time >>> >> >>> >> Firstly it was about 16-20% now it was about 46.1% of time taken. >>> Don't know if that's related to changes, but looks like it. >>> >> >>> >> ====== >>> >> >>> >> Changes a) and b) led to a drastic memory footprint decrease from 8MB >>> to 0.91MB; >>> >> Other changes led to a minor improvement in comparison (i.e., 0.80MB). >>> >> >>> >> That's despite the fact that changes in d) were the most numerous - >>> they didn't amount to much improvement, but some. >>> >> >>> >> Hopefully this will enable much better performance on larger specs. I >>> am committing it now to see how it goes. >>> >> >>> >> Best, >>> >> Leo >>> >> >>> >> On 23 Jan 2012, at 14:00, Leo Freitas wrote: >>> >> >>> >>> Hi Tim, >>> >>> >>> >>> That's interesting. I';ve been doing similar (profiling) tests over >>> specs of some size (e.g., Mondex, Tokeneer, Xenon, IEEE float point unit); >>> >>> although I guess they are smaller than iFACTS - apart from Xenon, >>> which is quite large. >>> >>> >>> >>> On the profiling sessions, the worst culprit was "char[]" arrays, >>> mostly from the java_cup lexer. The Object[] and ArrayList[] were about >>> 2-3% each, >>> >>> for what the char[] was 90% (!)... Similarly, on profiling CPU, it >>> was the IO operations on zzRefill within the java_cup scanner that took the >>> largest chunk >>> >>> of the time (27%). Smart scanning (e.g., lookahead) has taken only >>> about 2-3%. >>> >>> >>> >>> I wonder... what was the profilling setup that you used to get to >>> the creation of Object[] ArrayList as the main problem? >>> >>> Although the change you refer to below shouldn't be relatively >>> simple to change, as Petra pointed out. >>> >>> >>> >>> I just want to get the right picture to tackle such performance >>> problem for larger specs. >>> >>> >>> >>> Best, >>> >>> Leo >>> >>> >>> >>> On 5 Jan 2012, at 23:51, Tim Miller wrote: >>> >>> >>> >>>> Hi everyone, >>> >>>> >>> >>>> Anthony Hall and I have been discussing some memory problems that >>> CZT >>> >>>> has when parsing large specifications. Anthony has been trying to >>> >>>> typecheck the iFacts specification, but without much luck due to the >>> >>>> large memory resources. >>> >>>> >>> >>>> We've each been playing around with VisualVM, and Anthony pointed >>> out is >>> >>>> that the two largest memory hogs are Object[] and ArrayList, taking >>> up >>> >>>> around 23% and 10% of the heap respectively. >>> >>>> Most of the object arrays and ArrayLists contain exactly 10 items, >>> and >>> >>>> almost all items in these lists are null. >>> >>>> >>> >>>> After some poking around, I discovered that when ArrayList is >>> created >>> >>>> using the default constructor, it allocates 10 items initially. This >>> >>>> appears to be where the 10 items come from in each case. I suspect >>> this >>> >>>> may also be contributing to some of the memory problems, considering >>> >>>> that CZT has so many empty annotation lists, etc. Creating >>> ArrayLists >>> >>>> with an initial capacity of 0 or 1 (using the constructor >>> ArrayList(int >>> >>>> initialCapacity)) may give us some substantial space savings >>> >>>> >>> >>>> It appears that most ArrayLists are created in the gnast-generated >>> code. >>> >>>> Petra, how difficult would it be to create these lists using this >>> other >>> >>>> constructor? >>> >>>> >>> >>>> Regards, >>> >>>> Tim >>> >>>> >>> >>>> >>> ------------------------------------------------------------------------------ >>> >>>> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a >>> complex >>> >>>> infrastructure or vast IT resources to deliver seamless, secure >>> access to >>> >>>> virtual desktops. With this all-in-one solution, easily deploy >>> virtual >>> >>>> desktops for less than the cost of PCs and save 60% on VDI >>> infrastructure >>> >>>> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox >>> >>>> _______________________________________________ >>> >>>> CZT-Devel mailing list >>> >>>> CZT...@li... >>> >>>> https://lists.sourceforge.net/lists/listinfo/czt-devel >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Try before you buy = See our experts in action! >>> >>> The most comprehensive online learning library for Microsoft >>> developers >>> >>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, >>> MVC3, >>> >>> Metro Style Apps, more. Free future releases when you subscribe now! >>> >>> http://p.sf.net/sfu/learndevnow-dev2 >>> >>> _______________________________________________ >>> >>> CZT-Devel mailing list >>> >>> CZT...@li... >>> >>> https://lists.sourceforge.net/lists/listinfo/czt-devel >>> >> >>> >> >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Keep Your Developer Skills Current with LearnDevNow! >>> > The most comprehensive online learning library for Microsoft developers >>> > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, >>> MVC3, >>> > Metro Style Apps, more. Free future releases when you subscribe now! >>> > http://p.sf.net/sfu/learndevnow-d2d >>> > _______________________________________________ >>> > CZT-Devel mailing list >>> > CZT...@li... >>> > https://lists.sourceforge.net/lists/listinfo/czt-devel >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Keep Your Developer Skills Current with LearnDevNow! >>> The most comprehensive online learning library for Microsoft developers >>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>> Metro Style Apps, more. Free future releases when you subscribe now! >>> http://p.sf.net/sfu/learndevnow-d2d >>> _______________________________________________ >>> CZT-Devel mailing list >>> CZT...@li... >>> https://lists.sourceforge.net/lists/listinfo/czt-devel >>> >> >> >> > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > CZT-Devel mailing list > CZT...@li... > https://lists.sourceforge.net/lists/listinfo/czt-devel > > |