Menu

Notes on Compiler Performance Improvements

SourceForge Editorial Staff

Notes on Compiler Performance Improvements

Dev owner: Peter Donovan

Summary

My initial strategy has been to focus on the Asc front-end and specific high cost (hot-spot) classes and methods. My goals aren't only overall compile performance improvement, but hopefully, changes that result in clearer, more modular compiler internal structure, and a better overall understanding of the compiler and language.

Results are promising, but slower than I'd like and mixed. Changes to the Asc front-end have improved overall compiler performance significantly.
However, profiler based hot-spot improvements have been less successful, but pretty enlightening. -Java profiling tools are not always accurate and garbage collection within the compiler causes large changes in compile-time. It's also become clear that there is significant overhead and complexity within the compiler that can be improved incrementally, but that the net result of such improvements might be less than we'd hope for.

Overall, I expect full compile performance can be improved by at least 25% within 6 months without significant restructuring.
In order to achieve a 3-4x compile-time speedup, we need to re-design overall compiler flow and main classes, such as the intermediate form and symbol tables. I recommend a stage-by-stage approach, in which we replace main components of the compiler incrementally. The goals should not be solely compile-time performance, but should include better debug, object space/speed optimization and a more modular, reusable compiler framework.

Asc Names class hot-spot performance effort

Profiling showed 'Names' to be one of the most expensive classes in the Asc compiler. To better understand what was going on, I added some counters to the Names hash table entry/lookup methods.
On the redesigned hash table, while compiling frameworks, I got the following results:

Tables
11541 \

Lookups
13,04,2018

Misses
10,053,528

Collisions
16,951,525

Name collisions \
13,172,160

Names \
166,306

Table grows    \
8286

From this it's pretty clear that Name lookup happens a lot and its design (or usage model) is inefficient for large numbers of Name tables.
The prior version of Names.get searched though all names in a table to find occurrences of the same unqualified name with differing namespaces.

The number of lookups for the prior version of Names was about double this one.

Rather than using a quadratic jump and hashing on name&type, I used a bucket list hashed off of name alone.
Results for compile tests showed about a 3% performance improvement for smaller apps, but hospitalApp, probably due to increased memory use, takes about 10% longer (consistent with garbage collection).

Conclusions:

Secondary effects of n-squared iteration over Names probably is responsible for a lot of memory/time in the compiler. Improving it will require changing the way the tables are organized and used. This will take several weeks to do and has to be weighed against other optimization opportunities, unless we consider a clear redesign of the Names functionality being a module we can carry forward.

Java solves some programmer inefficiencies and creates others of its own.
I added a Name class to represent the {name,namespace,binding} tuple, and an array for linear iteration…it added a LOT of space. Java object overhead appears to be 16 bytes per object.
(Used roughly 4 mb for 169,000 names…) shifting back to separate arrays was much smaller.

In one case, the Names.get method was being used to return a Boolean. Since this method constructed a set of matching names as its result, it was pretty inefficient. I replaced it with a much simpler boolean returning search method which cut memory usage and improved overall compiler performance by \~2%.

Asc front-end performance changes

Asc InputBuffer/Scanner/Parser performance improvements account for over 12% decrease in full compile time for most applications. Assuming the Asc parse/scan phase was responsible for 25% of overall compile time, this means the parser is now more than twice as fast as it was.
For certain applications, garbage collection was responsible for over 25% of compile time. InputBuffer changes resulting in decreased memory use dropped overall memory utilization below garbage collection startup thresholds, resulting in disproportionate performance improvements. E.g. CompileTimeTestApp1 performance improved overall by 35%.

The InputBuffer was simplified to block load all files. This eliminated a complicated pile of code that had grown over time. This change resulted in an overall compiler performance improvement of \~10%.

  1. Reserved word recognition in the Scanner was reduced from an 8000 line state machine to a Hashtable lookup. This actually showed as a minor slowdown in performance tests, however, real code speeded up significantly due to reduced thrash time on code size.
  2. The parser primitive methods, lookahead and match, showed as hugely inefficient using profiling tools and so were targeted for simplification. The lookahead method was changed to reduce calls by over 60%. Performance improvements from this change were minimal. Some analysis lead me to the conclusion that some of the profilers modify Java compiler behavior by inserting profiling instrumentation code that modifies Just-In-Time runtime code optimizations, leading to skewed results.

Further performance improvements in the Asc front-end are possible, work remains in the following areas:

Source line reference management currently requires a line map and the source text to be buffered.  Each Ast node's source reference information consists of a integer 'pos' field. This should be replaced by an Ast reference to a SourceReference class, which for each node contains the source filepath, linenumber, linecolumn and file position stored in some reasonably efficient way. This would eliminate complexity and excess memory use in many areas of the compiler. The more generally used 'Source' class could be extended for this.

  1. The parser could be rewritten using a grammar. This could improve error handling, clarify the language definition and simplify ongoing language evolution.
  2. An Ast node should not contain a Token. The attributes of a Token that are of interest in the Ast node should be placed there. There should not be a Token array.
  3. Memory management for Ast nodes should be explicit and not garbage collected.
  4. There are efficiency problems related to String use that should be investigated.


Figure 1 Asc front end performance changes -framework

Frameworks performance improvements above.  Over 20% overall compile time speedup.

Figure 2 Compiler performance change tracking gets skewed by ongoing library development

Compile time, and memory use variation over the last 4 months have been significant. Ongoing changes to library (framework) code and the compiler itself make verifying performance improvements difficult. For each performance improvement, two performance test runs were done, one in a compiler with the improvement, one without. Establishment of a static benchmark over multiple revisions of the compiler was not possible.


Related

Wiki: Flex 4

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.