Re: [CZT-Devel] Reducing memory usage: Signatures are the biggest culprit?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Andrius,

Brilliant! Another point of memory improvement.

Anthony, the first round of memory reclaim (a few months / years ago) improve about 50-70% and they
had to do with dangling references by the CUP parser as well as empty Ann lists (with 10 elements in memory)
being created for every AST. The many names being around are now possible to trace (i.e. the extra code in
ZNameImpl + debugging info to output that).  --- i.e. in case you want to see how names proliferate.

A long time ago Petra and I started a flyweight factory at Corejava but didn't go very far. It seems like
this would be the way forward.

@all another point of interest is the use of mutable references returned by various AST methods, which
leads to potential memory leaks (i.e. references that would never be GCed). For instance, AST methods
have "getXXX()" which returns a List<??> that can be modified and referenced from outside. If that
reference persists, the AST will never be GCed.

@Mark thanks for the pointer on Guava. I will try and have a look at this next week. Also, it's incredible
to think CZT is 10 years old! I am remembering my first encounters with it through Jaza during my PhD :-)

Best,
Leo

On 13 Apr 2013, at 09:46, Anthony Hall <an...@an...<mailto:an...@an...>>
 wrote:

Dear Andrius

This is brilliant!

Since I had a profiler on CZT running for catching some leftover debugging code, I thought I would give a quick look into the CZT memory usage as well.

I have discovered that most of the large amount of ZNames that are created are referenced from typechecker Signatures (via NameTypePairs). This is the case because when creating type signatures during typechecking, the Signatures are duplicated in a lot of places.

A prominent example of this is in net.sourceforge.czt.typechecker.z.ExprChecker:94 (method visitRefExpr). This bit of code calculates the type of a RefExpr. If a reference is to a Schema, it duplicates the Schema signature (assigns new IDs to all its names) and then uses the new signature within the power type.

As a quick test, I tried removing this duplication and instead reuse the original schema signature (with original ZName ids): `Signature sig = signature;`

This single change reduced the memory consumption of typechecked `spec.tex` by about 60%!
Excellent!

This lead me to thinking - do signatures really need to be duplicated everywhere? I imagine that the signature of schema A would be the same everywhere? Can we reuse the signature, or is it important to duplicate and assign new IDs to the name?

The massive space saving becomes quite obvious when you think about it. If we have schema references, every RefExpr would duplicate the whole schema definition when typechecked, hence creating a lot of new ZName instances and consuming all this memory.

Leo advised that Tim would be the person to ask about the Signature duplication in typechecker? Is it really necessary, or can we reuse the objects?
Andrius, this is exactly the sort of thing I hoped might be possible. I don’t know how much of the previous discussion you’re aware of, but at first sight it does look as if objects are being copied many, many times and the total number of objects created seems to be bigger than should be needed, not by a few percent, but by orders of magnitude.

If Tim can confirm that these objects and others like them are indeed immutable, then it seems not only more efficient but also the Right Thing simply to copy the object reference. If there are a few more places you can find savings of this sort of percentage then the problem would be fixed. (Two more 60% and you’d have an order of magnitude already!)

I have a selfish follow up question. Assuming Tim can confirm that this, or something like it, is acceptable then would you be able to do a bit of work on this? It’s true that I had very diffidently volunteered to look at this area, but it would be clearly much more effective and efficient for you, who obviously know what you are doing, to look at it than for me to spend a lot of time groping around and wasting your time asking dumb questions. Of course I’d be extremely happy to help with any testing, especially if you get to the point where typechecking large specifications becomes feasible (at the moment it simply isn’t possible to typecheck the whole of the iFACTS spec in CZT on any machine I’ve been able to try.)

Anyway, I’m really encouraged by this result and thank you so much for doing this – I’m delighted (and impressed!)

Anthony

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________
CZT-Devel mailing list
CZT...@li...<mailto:CZT...@li...>
https://lists.sourceforge.net/lists/listinfo/czt-devel

Re: [CZT-Devel] Reducing memory usage: Signatures are the biggest culprit?

Tool support for the Z formal notation

Re: [CZT-Devel] Reducing memory usage: Signatures are the biggest culprit?