[openvrml-develop] Status update: breaking up large files

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

The occurrence of a few particularly large source files in the OpenVRML
codebase has been a point of annoyance for several (potential) users
over the years.

My inclination toward files of such magnitude (~20k lines) had to do
with the fact that the only way really to hide a symbol in C++ was to
put it in an unnamed namespace, making it local to a particular
translation unit.  So code that needs to share implementation code all
needs to share the same file in order to keep the non-public symbols
properly hidden.  Inevitably, as code grows and matures under such a
scheme, commonalities push related implementation code together.  And
sometimes there can be a *lot* of such related code.

There are a couple of significant downsides to large translation units:
      * They parallelize poorly.  While a single-processor machine might
        process more code in fewer files somewhat faster, parallel
        builds process multiple smaller files much more efficiently.
      * They consume a lot of memory when compiling.  gcc's memory
        demands seem to have gone up significantly in recent years,
        making this even more of a problem than it once was.

In the last few gcc releases, "symbol visibility" attributes have been
introduced.  These allow library authors to inform the linker
specifically whether a symbol should be publicly exposed, rather than
relying on details that may or may not be implied by particular language
features.  Using these attributes, it's no longer necessary to bury
implementation details in unnamed namespaces (or similar) to keep them
hidden in the compiled binary.

So I've been taking advantage of this feature to attack the problem.
The most egregious offender, vrml97node.cpp, was broken up some months
ago before I started converting openvrml-xembed to use D-Bus.  More
recently I've broken up the other node implementation files and put a
significant dent in the second worst offender, browser.cpp.

I've added a namespace openvrml::local as a place to put things that
will have hidden symbols (i.e., the OPENVRML_LOCAL macro is applied),
yet need to be part of more than one translation unit.  Note, though,
that the headers associated with this namespace *do not get installed*.
That means that no public headers are allowed to include them.

So far I've pared browser.cpp down to a little more than 6000 lines.
Compiling on my x86_64 Linux machine, its high-water mark in memory is
around 1.3 GB.  If that sounds big, consider that before this surgery it
was taking at least 2.2 GB to compile.  (And recall that for a 32-bit
platform, you can expect to cut this memory footprint roughly in half.)

I suspect that the better part of that 1.3 GB footprint has to do with
the fact that two big Spirit parsers get instantiated in browser.cpp.
Pushing these instantiations out to different translation units would
probably be a significant win; and that's something I'll probably pursue
before releasing 0.18.

-- 
Braden McDaniel                           e-mail: <br...@en...>
<http://endoframe.com>                    Jabber: <br...@ja...>