[Somelib] SOMELib STL and library bloat

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I recently had an e-mail discussion with Pierre Phaneuf, the author of
another inprocess object manager XPLC (http://xplc.sourceforge.net) and
early contributor to the old SOMLib. He is very vocal in his peer review
which I think is very invaluable in assessing a projects strengths and
weaknesses.  I will post his e-mail as well as my reply at the bottom of
this message.

One point that he brought up was the sear size of our "simple" object
model.  With optimizations turned on it compiles down to a somewhat
hefty 200K which is fine for a large system but not so much for embedded
systems and other small memory systems.  I did some tests and this is
indeed a result of the STL.  A simple hello world program compiled down
to 23bytes.  When string was added it jumped up to 400bytes, vector 23k,
map 30k, multiple initializations of vector and map churned up a 43k
executable.  I think using the STL was a good start because it allowed
us to create the system rapidly and fairly bug free and it also allows
developers to use familiar tools and not write to confusing non-standard
interfaces.  I think now we must assess each data structure and
algorithm and use only those which are needed while perhaps creating our
own paired down STL compatible data structures. Our first goal is
simplicity and we should not sacrifice this for the size of the library
but beyond that whatever we can do to cut the library size would be
great.  Suggestions?

The e-mail exchange follows:
from: <pp...@lu...>

Note: I can seem "abrasive" in some comments that I make in this
message, but I do not have any bad feelings toward you or your library,
I am just being honest, expressing my impressions, which might or might
not be the reality. If I say that SOMELib suck (which I don't), it could

very well be just because I do not understand it. So, having said
this...

> Our code has vastly changed and is moving to 1.0.  Get the new
> code from CVS and see what you think.  There are a few examples
> on how to use it in the examples dir.  Perhaps we can meld the
> two having XPLC be an extra component built on top of SOMELib
> much like SOMELib is an extra component built on top of libdl.
> Collaboration is good if it reduces work :-)

My goals with XPLC seem so similar to those of SOMELib that XPLC
wouldn't gain much by being on top of SOMELib (having parallel things
for almost everything in it). In particular, I have additional goals of
having this work in large complex systems like a web browser, a window
manager or a network service server. I see you are working with strings,

which I have found to be a great impediment toward this goal for
example.

I remember contributing something to SOMELib a long time ago, but I
can't remember what... The makefiles seem a bit familiar, but they are
recursive, unlike those I usually make...

I checked SOMELib and even parts of Snaglepuss. Could you do the same
with XPLC? The test program exercise all the current classes and
components of XPLC, even if they are not very visually interesting, the
code is.

You seem to have good ideas, but you seem lack some kind of inner
feeling in your code and design, some things you did not seem to break
down to their lowest level to understand. Experience maybe...

For example, let me explain the way the XPLC service manager came into
being as it is now...

Firstly the goal: I wanted to do something like using a regular C++
library, but at runtime instead of at compile/link-time.

I found two basic parts to a C++ library, the headers and the object
files. The headers are pretty easy to deal with, these *have* to be
stable, and are dealt with using the interface system (based on
IObject). Pretty easy (consider that I stole the idea anyway).

Making a runtime linker is much harder though... A linker works by
resolving symbols and giving them to whoever needs them. One of the
problem often encountered with "regular" C++ development is symbol
collision (occurs more often than you'd think in larger programs). Using

UUIDs for global symbols resolves this problem, and you can always make
a local alias for a UUID so that you do not have to type them in.

So I made a table, similar to the map STL template (and once was
implemented with map, actually). This was good I thought. Now I had a
runtime linker. But this is not that easy.

I found that having all the required factories in the table (so that you

could create any component at a moment notice) would require having all
the modules loaded at all time, to have their factory object in the
service manager! This was not acceptable. Keeping a map of UUIDs to
files and demand-loading them was not very interesting to me, because I
was starting to bloat the service manager.

So I had the idea of having "handlers". When the service manager didn't
have an object in its map<>, it asked its "handlers" to see if they
could get it for him. One of the handlers would be a dynamic loader and
would keep the mapping of UUID to module files, keeping related things
together.

This was good. But the design of the service manager suffered: there was

more methods now, and the object lookup code was done twice, once for
its map, and once for the handlers.

Some more thinking, and I got the idea of spinning off the map<> into
its own handler, making the service manager using only handlers. At
initialization time, the "static service handler" is populated with all
the components that are included with XPLC, then added to the service
manager. From there, the application can instantiate the dynamic module
handler component and add it to the service manager as another handler,
configuring things like where to look for modules first.

This result in a stripped libxplc.so of of less than 24k, compared to a
stripped libSOME.so (which doesn't do much more) of over 200k. When I
say "lightweight", I mean it, don't I? ;-)

So, saving a few kilobytes of code from XPLC by using your multiple
hundred kilobytes SOMELib doesn't seem like a very good idea to me. When

I will have a table-based getInterface implementation (this weekend), I
think XPLC binary and source code size will decrease even more.

I tried to understand why it is so big, I think it is because of the
templates. Beware, as templates are a very static thing that does not
meld very well with dynamic code. They have good uses though, mainly as
a super-inlining and super-clean macro facility (compare my
GenericComponent template to the fantastically obscure macro that XPCOM
uses to do the same thing!). As smart pointers, they can be useful. I
found the STL to be a particularly nasty thing to use if you try to be
lightweight... :-)

About your RANT file: garbage collection is very good to have in a
complex system, because you can then share more general objects between
the subsystems. For example, if you have an object for the X connection,

you can have unrelated objects create windows, which increate the
connection refcount, which is then destroyed when the last window is
destroyed. Or maybe not, if something else decided it might have to
recreate a window later (for example, a screen saver).

But I *know* why you found it Hard to have working correctly. There is
one key realization to garbage collection: some references are strong
and other are weak. Only the strong ones should prevent the destruction
of an object.

The weak reference implementation of XPCOM is a nice one, and pretty
easy to understand.

If I keep going at the rate of nearly a thousand lines of code a week
here on XPLC that I'm already going, I estimate that by early January,
XPLC will have a whole lot of features over SOMELib.

--
"How much does it cost to entice a dope-smoking Unix system guru
to Dayton?" -- Brian Boyle, UNIX/WORLD's First Annual Salary Survey

-
-----------------------my reply-----------------------------
Pierre,

Sorry for not writing back sooner but with the holidays and all it has
been hectic.  You contributed the makefiles at one point.  I had to
change them back to recursive make files because it is just easier to
maintain.  Also I would get dependency warnings because of header files
it could not find every time I did a clean build.  I was considering
using XPLC at one point but it took to long to get into a usable state
and a bunch of people contacted me over the summer with porting SOMELib
to windows so I put it on Source Forge and the rest is history.  Some
counterpoints on your design vs. our design:

In regards to IObject we decided to chuck any standard base classes.
While most current object models require you to use some sort of base
class or interface(IObject in your implementation) we felt that this,
although easy to use, was not desirable.  As it stands now, you can take
any C++ class or object file that has its own pure virtual interface and
without modifying the code, link it to a SOMELib class descriptor and
load it up using SOMELib.  This leaves the implementation of a base
interface up to the user.

On the topic of UUIDs, objects are loaded in SOMELib by categorization.
In essence if a user wishes to use UUIDs to categorize objects then they
simply can add a UUID category and assign each class a UUID in the class
descriptor.  This is not imposed by SOMELib.

On loaded modules.  Our system works by loading descriptors (Catalogs in
SOMELib) of classes and not the actual classes themselves until an
object is physically constructed so memory usage is kept at a minimum.
In the future a caching algorithm will be added to also unload all
catalogs not used in a while.  Your idea of loading handlers was a topic
we had thought of when discussing the ability to swap in and out
different algorithms for caching but it would mean that there would have
to be a configuration file or environment variable somewhere telling
SOMELib where to find its own components. It remains an option in the
future.

As for the STL.  I looked at the size of the library and I concur that
it is a bit large.  We wanted to get all the features in and bug free
before we started optimization.  Now we might just go in and take out
the STL pieces replacing them with our own similar versions.  I still
want to work with iterates because they are a lot easier to work with
and understand.

Perhaps our projects can't work together but they can feed of each
other.  Thank you for the input.

--John Palmieri