From: Mateu B. <mb...@gm...> - 2007-04-02 18:13:59
|
Hi guys, I'd like to share some details about an extension we have done to the Nebula 2 engine kernel to begin some discussion and getting some comments. Basically it is the substitution of four character codes (fourcc or 4cc) by symbols. I'll introduce now to the concepts with some examples and details about the implementation. BTW, this is a copy from a post in my blog, which I've inaugurated recently, http://sharedming.blogspot.com, although there is no much to see there yet. A fourcc is basically a very efficient way to represent a string. It is small in size (just the size of an integer variable), and it is way faster to compare fourccs than strings. An additional adavantage is that it can be used as a hash key in order to get fast lookups. Those where the advantages, but which are the drawbacks ? Just one, it makes programming more cumbersome. Some examples of fourccs are 'SCPN' for "SetCompanyName" and 'GCRS' for "GetCurrentState". I agree that fourccs are efficient and I'd like to keep that somehow, but they make the programmer's life more difficult, they force us to write more code than needed making the resulting code more difficult to read and maintain. "Less code, better code". Let's analyze what a programmer needs to do in order to use fourccs: * Create a fourcc from a string they represent (and remember them). * Register somehow the relation of the fourcc and its string. * Write two versions of the methods, one for accessing by string (slower) and another accesing by fourcc (faster but harder to use and read). There are several examples of these in Nebula 2 code base, like the command names and the signal names. And there are even more examples in Nebula 3, like the class names and the attributes. Let's go to the point. The idea of symbols is basically a constant string, a string that does not change during the runtime of the application. Using symbols the programmer just have to rememeber one string and code one version of the function, that's all, less work and more important easier to read and therefore to maintain. Let's see some examples of usage to clarify it: void IncIntAttribute(nSymbol attributeName) { int val = this->GetIntAttribute(attributeName); val++; this->SetIntAttribute(attributeName, val); } obj->RegisterIntAttribute( NS(LoopCount) ); obj->SetIntAttribute( NS(LoopCount), 0 ); obj->IncIntAttribute( NS(LoopCount) ); Note: the macro NS(XXX) is a preprocessor macro that does some magic to convert the parameter XXX into an actual value (NS is a shortcut for NEBULASYMBOL). Actually this is the hard part of the system, but it can be done since symbols are known at compile time. Implementation details: * There is a preprocessor macro NS(XXX) which basically maps into a C++ preprocessor define. These defines are generated automatically in a process explained later on. For example NS(LoopCount) translates into the preprocessor define NSYMBOLID_LoopCount (which maps to an integer). #define NS(XXX) NSYMBOLID_ ## XXX * There is a nSymbolId type which is basically a typedef of an int. This is the same size of a fourcc. * There is a nSymbol C++ object, which wraps a nSymbolId and provides some handy functions to do conversions to and from strings and fast symbol comparison. Passing nSymbol and nSymbolId as function arguments is as efficient as with fourccs. * How to calculate the symbol id ? Any way for mapping from string to an intteger can be used. But one property must be enforced, it has always to give the same value in any source code in any file. That's why we use the CRC (Cyclic Redundancy Check) algorithm, which could provide some collisions in theory (two different symbols given the same id), but it has never happenned in practice. In the case of this situation happens, we detect it and warn the programmer. * When to calculate the symbol id ? It can be done in several ways, but basically it has to be done between the time after writing the source code and before compiling. It could be a pre-compile build step, or part of the build system of Nebula. This process basically generates an include file common for the whole target which has all the symbols included in the target, for example: #define NSYMBOLID_CLASS 2819245958 #define NSYMBOLID_nroot 4018013252 * Additionally, we have a symbol table, which basically maps from nSymbolId to a C string. So when the string has to be recovered from the symbol there is a small penalty, although this operation is not done normally (and in some systems it is just kept as a debug feature). There is also an autogenereated C++ file (generated in the same build process) which does the automatic registration of the symbol ids and symbol strings. cheers Mateu |