Dev owner: Peter Donovan
My initial strategy has been to focus on the Asc front-end and specific high cost (hot-spot) classes and methods. My goals aren't only overall compile performance improvement, but hopefully, changes that result in clearer, more modular compiler internal structure, and a better overall understanding of the compiler and language.
Results are promising, but slower than I'd like and mixed. Changes to the Asc front-end have improved overall compiler performance significantly.
However, profiler based hot-spot improvements have been less successful, but pretty enlightening. -Java profiling tools are not always accurate and garbage collection within the compiler causes large changes in compile-time. It's also become clear that there is significant overhead and complexity within the compiler that can be improved incrementally, but that the net result of such improvements might be less than we'd hope for.
Overall, I expect full compile performance can be improved by at least 25% within 6 months without significant restructuring.
In order to achieve a 3-4x compile-time speedup, we need to re-design overall compiler flow and main classes, such as the intermediate form and symbol tables. I recommend a stage-by-stage approach, in which we replace main components of the compiler incrementally. The goals should not be solely compile-time performance, but should include better debug, object space/speed optimization and a more modular, reusable compiler framework.
Profiling showed 'Names' to be one of the most expensive classes in the Asc compiler. To better understand what was going on, I added some counters to the Names hash table entry/lookup methods.
On the redesigned hash table, while compiling frameworks, I got the following results:
Tables
11541 \
Lookups
13,04,2018
Misses
10,053,528
Collisions
16,951,525
Name collisions \
13,172,160
Names \
166,306
Table grows \
8286
From this it's pretty clear that Name lookup happens a lot and its design (or usage model) is inefficient for large numbers of Name tables.
The prior version of Names.get searched though all names in a table to find occurrences of the same unqualified name with differing namespaces.
The number of lookups for the prior version of Names was about double this one.
Rather than using a quadratic jump and hashing on name&type, I used a bucket list hashed off of name alone.
Results for compile tests showed about a 3% performance improvement for smaller apps, but hospitalApp, probably due to increased memory use, takes about 10% longer (consistent with garbage collection).
Conclusions:
Secondary effects of n-squared iteration over Names probably is responsible for a lot of memory/time in the compiler. Improving it will require changing the way the tables are organized and used. This will take several weeks to do and has to be weighed against other optimization opportunities, unless we consider a clear redesign of the Names functionality being a module we can carry forward.
Java solves some programmer inefficiencies and creates others of its own.
I added a Name class to represent the {name,namespace,binding} tuple, and an array for linear iteration…it added a LOT of space. Java object overhead appears to be 16 bytes per object.
(Used roughly 4 mb for 169,000 names…) shifting back to separate arrays was much smaller.
In one case, the Names.get method was being used to return a Boolean. Since this method constructed a set of matching names as its result, it was pretty inefficient. I replaced it with a much simpler boolean returning search method which cut memory usage and improved overall compiler performance by \~2%.
Asc InputBuffer/Scanner/Parser performance improvements account for over 12% decrease in full compile time for most applications. Assuming the Asc parse/scan phase was responsible for 25% of overall compile time, this means the parser is now more than twice as fast as it was.
For certain applications, garbage collection was responsible for over 25% of compile time. InputBuffer changes resulting in decreased memory use dropped overall memory utilization below garbage collection startup thresholds, resulting in disproportionate performance improvements. E.g. CompileTimeTestApp1 performance improved overall by 35%.
The InputBuffer was simplified to block load all files. This eliminated a complicated pile of code that had grown over time. This change resulted in an overall compiler performance improvement of \~10%.
Source line reference management currently requires a line map and the source text to be buffered. Each Ast node's source reference information consists of a integer 'pos' field. This should be replaced by an Ast reference to a SourceReference class, which for each node contains the source filepath, linenumber, linecolumn and file position stored in some reasonably efficient way. This would eliminate complexity and excess memory use in many areas of the compiler. The more generally used 'Source' class could be extended for this.
Figure 1 Asc front end performance changes -framework
Frameworks performance improvements above. Over 20% overall compile time speedup.
Figure 2 Compiler performance change tracking gets skewed by ongoing library development
Compile time, and memory use variation over the last 4 months have been significant. Ongoing changes to library (framework) code and the compiler itself make verifying performance improvements difficult. For each performance improvement, two performance test runs were done, one in a compiler with the improvement, one without. Establishment of a static benchmark over multiple revisions of the compiler was not possible.