tcl9-colibri Mailing List for Tcl9
Status: Alpha
Brought to you by:
fbonnet
You can subscribe to this list here.
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(7) |
Dec
(2) |
From: Frédéric B. <fre...@fr...> - 2011-12-02 16:16:25
|
ANNOUNCE: Colibri version 0.13 ============================== As part of my ongoing work on the Cloverfield project, I am pleased to announce the thirteenth version of Colibri. CHANGES SINCE VERSION 0.12 ========================== 1. Bugfix release: there was a deadlock in the Unix version where a pthread mutex was entered recursively, which is forbidden. WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is an abstract data type infrastructure. It features: - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes on 32-bit systems, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, floating points, single chars and small strings (up to 3 chars on 32-bit and 7 on 64-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as ropes (binary tree-based strings, see http://wiki.tcl.tk/20690), vectors (flat arrays), lists (binary trees of vectors with support for cycles and sparse representation), integer- and string-keyed maps, along with easy to use iterator interfaces * ropes are immutable strings that allow for fast operations (extraction, insertion, deletion, concatenation) with maximized data sharing * string types support the full Unicode range (up to 24-bit codepoints) in 1/2/4-byte fixed-width encodings as well as variable-width UTF-8, with transparent conversions and access * vectors and lists come in both immutable and mutable forms. The former share the same features as ropes. The latter implement in-place modifications and gracious degradation to immutable forms * custom word types are supported: lazy or procedural string representations, mutable or immutable types, etc. - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 4-word cells (16 bytes on 32-bit, 32 on 64-bit systems) fit most needs, but elements can allocatean arbitrary number of cells if needed * overhead is small: 2 bits per cell * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) be declared by the application, using a simple API * mutations are automatically detected thanks to page write tracking * custom types can define a finalizer that is called at deletion time; one of the applications is to plug in externally managed structures (e.g. using malloc/free) * the GC process is fully controllable (pause/resume) so that the applications don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs support circular references without memory leaks nor specific handling; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 67kB. The source code is heavily commented using the very readable Natural Docs syntax. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. Improvements on map types: custom key types 2. Use word synonym chains instead of strings as the basic type in Picolibri PORTABILITY =========== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also use Ubuntu Karmic Koala 9.10 for Linux 32-bit development (Ubuntu 10.10 for 64-bit) so the archive also includes minimalist GNU makefiles for building with the provided GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit and 64-bit systems: all public and internal types are based on portable C-99 types such as intptr_t. The only parts that need platform-specific code are low-level page allocation, memory protection and related exception/signal handling, and GC-related synchronization. Colibri needs system calls that allocate boundary-aligned pages, as well as synchronization primitives such as mutexes and condition variables. At present both Windows and Unix (Linux) versions are provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source subtree. Platform-specific peculiarities should not impact the overall architecture. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. However I believe that this conversion is possible starting at version 0.9, as all the necessary building blocks are now present (data structures, threading model...). A quick port of the Picol miminimalist Tcl implementation has been done with success (see here http://wiki.tcl.tk/17893 ). At present it uses a string-based model and makes systematic string conversions but I plan to convert it to the Tcl_Obj-like word synonym model. I also intend to experiment integration with other C-based Tcl implementations such as Jim in the near future. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.13/colibri0.13.src.zip/download - documentation: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.13/colibri0.13.doc.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.13/colibri0.13.win32.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2011-12-01 08:42:37
|
Le 30/11/2011 19:17, Christophe CURIS a écrit : > Answering to myself... > > Picked the wrong version, because the page: > http://wiki.tcl.tk/Colibri > seems quite outdated! > > Just tryed with the real v0.12, and it compiles fine under Linux/x86-64. All > tests passed. > > Sorry for the noise, > Christophe. You're welcome, thanks for the feedback as I haven't manage to compile on my Ubuntu/64 for some obscure reason. Maybe it's due to my compiling from a SMB share? I really should update this wikipage ;) Cheers, Fred |
From: Steve B. <st...@wo...> - 2011-11-30 22:37:26
|
On 30/11/2011, at 11:34 PM, Frédéric Bonnet wrote: > ANNOUNCE: Colibri version 0.12 > ============================== > As part of my ongoing work on the Cloverfield project, I am pleased to announce > the twelveth version of Colibri. > > [snip] > A quick port of the Picol miminimalist Tcl implementation has been done with > success (see here http://wiki.tcl.tk/17893 ). At present it uses a string-based > model and makes systematic string conversions but I plan to convert it to the > Tcl_Obj-like word synonym model. I also intend to experiment integration with > other C-based Tcl implementations such as Jim in the near future. I would be quite interested to see how a port of Jim Tcl to Colibri looks in terms of size, speed and ease of interfacing with applications and extensions. Cheers, Steve -- µWeb: Embedded Web Framework - http://uweb.workware.net.au/ WorkWare Systems Pty Ltd W: www.workware.net.au P: +61 434 921 300 E: st...@wo... F: +61 7 3391 6002 |
From: Christophe C. <chr...@fr...> - 2011-11-30 18:11:36
|
Answering to myself... Picked the wrong version, because the page: http://wiki.tcl.tk/Colibri seems quite outdated! Just tryed with the real v0.12, and it compiles fine under Linux/x86-64. All tests passed. Sorry for the noise, Christophe. On Wednesday 30 November 2011 19:04, Christophe CURIS wrote: > Hi Frédéric, > > On Wednesday 30 November 2011 17:52, Frédéric Bonnet wrote: > > Linux/x86 and Windows/x86. It compiles on Linux/x86-64 but the linker > > spits a strange error on a missing symbol (oddly enough it compiled just > > fine on version 0.11), so I haven't had the opportunity to test it. I > > have no Win64 toolchain but given the overall portability of the code it > > should work OK (platform-specific code is minimal). > > If your actual error on Linux/x86-64 is this one: > > colAlloc.o: could not read symbols: Bad value > > then the solution is really simple, you just have to add "-fPIC" to the gcc > command. This is a requirement for all "so" files now under x86_64 > architectures. > > Please note that on my side it did compile well with the option, and the > test thing looked like it ran, thanks to some tweaking on the Makefile due > to TCL paths (I do not have ActiveTcl, so hacked to link against my > system's Tcl-8.5). > > Regards, > Christophe. > > > --------------------------------------------------------------------------- >--- All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Tcl-Core mailing list > Tcl...@li... > https://lists.sourceforge.net/lists/listinfo/tcl-core |
From: Christophe C. <chr...@fr...> - 2011-11-30 17:58:45
|
Hi Frédéric, On Wednesday 30 November 2011 17:52, Frédéric Bonnet wrote: > > Linux/x86 and Windows/x86. It compiles on Linux/x86-64 but the linker spits > a strange error on a missing symbol (oddly enough it compiled just fine on > version 0.11), so I haven't had the opportunity to test it. I have no Win64 > toolchain but given the overall portability of the code it should work OK > (platform-specific code is minimal). If your actual error on Linux/x86-64 is this one: colAlloc.o: could not read symbols: Bad value then the solution is really simple, you just have to add "-fPIC" to the gcc command. This is a requirement for all "so" files now under x86_64 architectures. Please note that on my side it did compile well with the option, and the test thing looked like it ran, thanks to some tweaking on the Makefile due to TCL paths (I do not have ActiveTcl, so hacked to link against my system's Tcl-8.5). Regards, Christophe. |
From: Frédéric B. <fre...@fr...> - 2011-11-30 16:53:11
|
Hi Larry, Le 30/11/2011 15:59, Larry McVoy a écrit : > Strangely enough, we're playing similar games with write protection. Can > you tell me > > - which platforms have you tested this on? Linux/x86 and Windows/x86. It compiles on Linux/x86-64 but the linker spits a strange error on a missing symbol (oddly enough it compiled just fine on version 0.11), so I haven't had the opportunity to test it. I have no Win64 toolchain but given the overall portability of the code it should work OK (platform-specific code is minimal). As for Unix flavours, as long as you have mmap, pthreads and sigaction it should work fine, 32 or 64 bits. > - Is there a regression suite that tests the protection stuff? If so, > a readme on how to run it and we'll see if we can test it on our > cluster (all of the above and more) Yes, there are both a regression test suite and a benchmark suite. The latter is mostly for data structure comparison, so we can ignore it for now. The test suite is in the /Test dir, there is a series of .c for each test group plus a main test.c file. By default it runs the whole Colibri test suite testColibri(), but you can comment it out and run individual tests. Additionnally, there is a group of individual stress tests at the end of TestMain() that aren't by nature integrated in the regression test suite and can run independently. The ones we want are testChildVect() and testChildren(). The former allocates mutable elements and links them so as to buid a chain in reverse order (older vectors point to newer ones). The latter does more or less the same thing but with "word synonym chains" (in short, a circular linked list of related objects). Both hit memory heavily and make an intense use of parent tracking. To run these tests, simply build the whole stuff with the provided makefiles, run the regression test once to check that everything is OK, then modify test.c to run the tests you want. You can also tweak the constants in the tests if you want to adjust the memory usage, but defaults are fine. > - did you see any perf gain or loss? On regular usage (mutable hash and trie maps, both string- and integer-based), my benchmarks show a speedup of up to 25% on write access compared to the previous version 0.11 where mutable objects and client code had to explicitly call Col_WordSetModified() upon each mutable operation (updating the parent tracking structures). Read access is mostly unchanged. But this doesn't take into account the GC process. For that, the stress tests mentioned above are interesting because they are designed to spend most of their time in GC. Overall I've observed a performance penalty of about 10%, but: - In real world applications, GC run much less often than actual read/write access code, so this penalty is more than compensated by the speedup in write access. - This penalty is mostly due not to the page write tracking itself but to the changes in internal structures: I've made tests with manual vs. automatic write tracking using the same internals (calling the same book-keeping code respectively from client code and exception/signal handler), and found the latter to be slightly faster, to my surprise: I thought at first that exception/signal handling would induce a lot of lost CPU cycles, but since it only occurs on the first write, the cost seems amortized compared to the inline overhead of calling Col_WordSetModified() explicitly. > Cool stuff! Thanks! Feel free to steal the code, it's BSD-licensed (the stuff you want is in platform/unix/colUnixPlatform.c and plaform/win32colWin32Platform.c for platform-specific stuff, and colAlloc.c for memory management) Cheers, Fred |
From: Larry M. <lm...@bi...> - 2011-11-30 15:00:07
|
On Wed, Nov 30, 2011 at 02:34:45PM +0100, Fr?d?ric Bonnet wrote: > 1. Implemented page write protection-based parent tracking > > This is a major internal change in the generational GC. In previous > versions, parents (i.e. cells having children in younger generations) had > to be tracked explicitly by client code calls to Col_WordSetModified each > time a mutable cell was modified. This caused a significant performance > penalty and added complexity in client code. Now parents are automatically > tracked thanks to memory page write protection: pages of older generations > are write-protected, and write attempts are caught (using exceptions on > Windows and SIGSEGV signal handling on Unix). When such a page write is > detected, the write protection is removed and the page marked as modified, > and the client code resumes. All this happens transparently, so that client > code no longer have to track modifications. Upon GC, cells belonging to > modified pages are followed as potential parents so that younger reachable > cells are marked as well. Strangely enough, we're playing similar games with write protection. Can you tell me - which platforms have you tested this on? [ ] linux/x86 [ ] linux/x86_64 [ ] macos/x86 [ ] windows/x86 2000? XP? 2003server? 2008server? Win7? [ ] HPUX/ia64 [ ] AIX [ ] NET/Free/OpenBSD - Is there a regression suite that tests the protection stuff? If so, a readme on how to run it and we'll see if we can test it on our cluster (all of the above and more) - did you see any perf gain or loss? Cool stuff! |
From: Frédéric B. <fre...@fr...> - 2011-11-30 14:04:33
|
ANNOUNCE: Picolibri version 0.1 =============================== As part of my ongoing work on the Cloverfield project, I am pleased to announce the first version of Picolibri, a port of Picol 0.1.22 to the Colibri library (currently at version 0.12) WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT IS PICOL? ============== Picolibri started as a minimalist Tcl interpreter written by Salvatore Sanfilippo, written in 550 lines and code. It was further enhanced by Richard Suchenwirth to include modern Tcl features such as apply. Wiki page: http://wiki.tcl.tk/Picol " SS: Picol (http://antirez.com/page/picol) is a Tcl interpreter in 500 lines of C code! If you like it vote my entry on reddit here http://programming.reddit.com/info/1aft9/comments (the reddit url changed because of a reddit bug). RS Since 2007-04-01, I'm playing with Picol at home - it's been a long while that I prefer to do a breakfast fun project in C :) Basically I'm adding features to bring it closer to Tcl (7.x - everything is a string really). Linecount is at about 1700 now. Latest additions focused on 8.5 features like apply (see below), {*}, in/ni etc. DISCLAIMER: This is an educational toy model only, it is weaker and slower that real Tcl. I'm not planning to add regexp, Unicode, full expr parsing etc... But it's great for experimenting with C, I've had weeks of fun with it. " See also the provided README.txt for complete details about Salvatore & Richard work. WHAT IS COLIBRI? ================ Wiki page: http://wiki.tcl.tk/Colibri (outdated) SF.net project page: https://sourceforge.net/projects/tcl9/files/colibri/ " Colibri is an abstract data type infrastructure. It features: - Extensible data type sytem dubbed "words" - Fast and efficient cell-based allocator - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector " WHAT IS PICOLIBRI? ================== Picolibri is a straight port of Picol (in its current version 0.1.22) to the Colibri library (version 0.12), replacing plain C strings with Colibri ropes, and malloc'd memory management with automatic garbage collection. The modified source is about 50 lines longer than the original (1735 vs 1701 for picol.c, and 147 vs. 127 for picol.h). Most changes involve replacing C calls such as strlen by their Colibri counterparts, mostly through macros for readability, and also providing string to Colibri structure conversions (ropes, lists, maps...). Picolibri is mostly an experiment to demonstrate and evaluate the usability of Colibri in real word applications, and especially in the context of interpreted languages. It greatly helped identify weaknesses and lacks of feature in the Colibri interface (especially regarding high level string handling). However it is quite usable as such (and no less usable than the original Picol); it provides a fairly complete Tcl interpreter in a very small footprint and with minimal code: source code is about 1900 lines of C code in two files, and the resulting Windows binary is 57 KB along with the 70 KB Colibri DLL. While not production quality, it proved a serious enough educational toy to play & experiment with. WHERE CAN IT BE FOUND? ====================== The modified Picol source is part of the Colibri source distribution here: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.12/colibri0.12.src.zip/download A pre-compiled Windows library is available here, it includes the command-line executable along with Colibri DLL and the original Picol scripts: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.12/picolibri0.1.win32.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2011-11-30 13:34:59
|
ANNOUNCE: Colibri version 0.12 ============================== As part of my ongoing work on the Cloverfield project, I am pleased to announce the twelveth version of Colibri. CHANGES SINCE VERSION 0.11 ========================== 1. Implemented page write protection-based parent tracking This is a major internal change in the generational GC. In previous versions, parents (i.e. cells having children in younger generations) had to be tracked explicitly by client code calls to Col_WordSetModified each time a mutable cell was modified. This caused a significant performance penalty and added complexity in client code. Now parents are automatically tracked thanks to memory page write protection: pages of older generations are write-protected, and write attempts are caught (using exceptions on Windows and SIGSEGV signal handling on Unix). When such a page write is detected, the write protection is removed and the page marked as modified, and the client code resumes. All this happens transparently, so that client code no longer have to track modifications. Upon GC, cells belonging to modified pages are followed as potential parents so that younger reachable cells are marked as well. As a result, client code is much simpler to write (this is notable for all mutable structures such as maps or vectors) and less error prone (failure to track parents can result in catastrophic memory corruption), but runs faster too due to the absence of book-keeping on the client side. Some benchmarks show a speedup of 25%. 2. Added and improved high level rope functions: - Col_CompareRopes now have several versions for simpler usage (similar to strcmp/strncmp) - Col_RopeFind (similar to strchr/strrchr) - Col_RopeSearch (similar to strstr) 3. Added Col_NewChar for easier single-char strings. Such strings use immutable words and thus take up no allocated space. 4. Improved iterator & traversal interfaces. Ropes, maps and lists can now be iterated and traversed in reverse. 5. Some bug fixes 6. Now includes a port of Picol, a minimalist Tcl implementation (see Wiki page here http://wiki.tcl.tk/17893 ). Read the separate announcement for "Picolibri" WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is an abstract data type infrastructure. It features: - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes on 32-bit systems, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, floating points, single chars and small strings (up to 3 chars on 32-bit and 7 on 64-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as ropes (binary tree-based strings, see http://wiki.tcl.tk/20690), vectors (flat arrays), lists (binary trees of vectors with support for cycles and sparse representation), integer- and string-keyed maps, along with easy to use iterator interfaces * ropes are immutable strings that allow for fast operations (extraction, insertion, deletion, concatenation) with maximized data sharing * string types support the full Unicode range (up to 24-bit codepoints) in 1/2/4-byte fixed-width encodings as well as variable-width UTF-8, with transparent conversions and access * vectors and lists come in both immutable and mutable forms. The former share the same features as ropes. The latter implement in-place modifications and gracious degradation to immutable forms * custom word types are supported: lazy or procedural string representations, mutable or immutable types, etc. - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 4-word cells (16 bytes on 32-bit, 32 on 64-bit systems) fit most needs, but elements can allocatean arbitrary number of cells if needed * overhead is small: 2 bits per cell * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) be declared by the application, using a simple API * mutations are automatically detected thanks to page write tracking * custom types can define a finalizer that is called at deletion time; one of the applications is to plug in externally managed structures (e.g. using malloc/free) * the GC process is fully controllable (pause/resume) so that the applications don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs support circular references without memory leaks nor specific handling; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 67kB. The source code is heavily commented using the very readable Natural Docs syntax. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. Improvements on map types: custom key types 2. Use word synonym chains instead of strings as the basic type in Picolibri PORTABILITY =========== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also use Ubuntu Karmic Koala 9.10 for Linux 32-bit development (Ubuntu 10.10 for 64-bit) so the archive also includes minimalist GNU makefiles for building with the provided GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit and 64-bit systems: all public and internal types are based on portable C-99 types such as intptr_t. The only parts that need platform-specific code are low-level page allocation, memory protection and related exception/signal handling, and GC-related synchronization. Colibri needs system calls that allocate boundary-aligned pages, as well as synchronization primitives such as mutexes and condition variables. At present both Windows and Unix (Linux) versions are provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source subtree. Platform-specific peculiarities should not impact the overall architecture. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. However I believe that this conversion is possible starting at version 0.9, as all the necessary building blocks are now present (data structures, threading model...). A quick port of the Picol miminimalist Tcl implementation has been done with success (see here http://wiki.tcl.tk/17893 ). At present it uses a string-based model and makes systematic string conversions but I plan to convert it to the Tcl_Obj-like word synonym model. I also intend to experiment integration with other C-based Tcl implementations such as Jim in the near future. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.12/colibri0.12.src.zip/download - documentation: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.12/colibri0.12.doc.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.12/colibri0.12.win32.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2011-06-23 21:36:59
|
ANNOUNCE: Colibri version 0.11 ============================== As part of my ongoing work on the Cloverfield project, I am pleased to announce the eleventh version of Colibri. CHANGES SINCE VERSION 0.10 ========================== 1. Some internal changes: - Reworked cyclic list handling. Circular lists are now immediate words that wrap regular lists; cyclic lists are built by concatenating a regular and a circular list. - Some immediate word binary tag changes to make room for floating points. 2. Added floating point word type, in both immediate and regular versions. 3. Added immutable internal map structures (both hash and trie maps); maps are still mutable datatypes but can now share data with copy-on-write semantics. 4. Improved trie map access: tries can now be iterated in reverse and use key prefix matching. 5. Removed COL_CHAR type, single-char words now use COL_STRING. WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is an abstract data type infrastructure. It features: - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes on 32-bit systems, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, floating points, single chars and small strings (up to 3 chars on 32-bit and 7 on 64-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as ropes (binary tree-based strings, see http://wiki.tcl.tk/20690), vectors (flat arrays), lists (binary trees of vectors with support for cycles and sparse representation), integer- and string-keyed maps, along with easy to use iterator interfaces * ropes are immutable strings that allow for fast operations (extraction, insertion, deletion, concatenation) with maximized data sharing * string types support the full Unicode range (up to 24-bit codepoints) in 1/2/4-byte fixed-width encodings as well as variable-width UTF-8, with transparent conversions and access * vectors and lists come in both immutable and mutable forms. The former share the same features as ropes. The latter implement in-place modifications and gracious degradation to immutable forms * custom word types are supported: lazy or procedural string representations, mutable or immutable types, etc. - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 4-word cells (16 bytes on 32-bit, 32 on 64-bit systems) fit most needs, but elements can allocatean arbitrary number of cells if needed * overhead is small: 2 bits per cell * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and cell mutations be declared by the application, using a simple API * custom types can define a finalizer that is called at deletion time; one of the applications is to plug in externally managed structures (e.g. using malloc/free) * the GC process is fully controllable (pause/resume) so that the applications don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs support circular references without memory leaks nor specific handling; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 67kB. The source code is heavily commented using the very readable Natural Docs syntax. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. Improvements on map types: custom key types PORTABILITY =========== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also use Ubuntu Karmic Koala 9.10 for Linux 32-bit development (Ubuntu 10.10 for 64-bit) so the archive also includes minimalist GNU makefiles for building with the provided GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit and 64-bit systems: all public and internal types are based on portable C-99 types such as intptr_t. The only parts that need platform-specific code are low-level page allocation and GC-related synchronization. Colibri needs system calls that allocate boundary-aligned pages, as well as synchronization primitives such as mutexes and condition variables. At present both Windows and Unix (Linux) versions are provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source subtree. Platform-specific peculiarities should not impact the overall architecture. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. However I believe that this conversion is possible starting at version 0.9, as all the necessary building blocks are now present (data structures, threading model...). I intend to experiment integration with other C-based Tcl implementations such as Jim or Picol in the near future. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.11/colibri0.11.src.zip/download - documentation: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.11/colibri0.11.doc.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.11/colibri0.11.win32.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2011-04-19 09:44:04
|
ANNOUNCE: Colibri version 0.10 ============================== As part of my ongoing work on the Cloverfield project, I am pleased to announce the tenth version of Colibri. CHANGES SINCE VERSION 0.9 ========================= 1. Added 64-bit support on Linux - port was rather straightforward thanks to previous efforts on portability (especially through the use of C-99 style typedefs such as intptr_t) - untested on Windows 64 but should work unchanged (system-specific code only uses portable types and calls) 2. Public & Internal documentation! - documentation is automatically generated from source files thanks to the fantastic software Natural Docs: http://www.naturaldocs.org/ - high quality, cross-linked HTML documentation with full indexes and tooltips for both public API and internal code WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is an abstract data type infrastructure. It features: - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes on 32-bit systems, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit and 7 on 64-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as ropes (binary tree-based strings, see http://wiki.tcl.tk/20690), vectors (flat arrays), lists (binary trees of vectors with support for cycles and sparse representation), integer- and string-keyed maps, along with easy to use iterator interfaces * ropes are immutable strings that allow for fast operations (extraction, insertion, deletion, concatenation) with maximized data sharing * string types support the full Unicode range (up to 24-bit codepoints) in 1/2/4-byte fixed-width encodings as well as variable-width UTF-8, with transparent conversions and access * vectors and lists come in both immutable and mutable forms. The former share the same features as ropes. The latter implement in-place modifications and gracious degradation to immutable forms * custom word types are supported: lazy or procedural string representations, mutable or immutable types, etc. - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 4-word cells (16 bytes on 32-bit, 32 on 64-bit systems) fit most needs, but elements can allocatean arbitrary number of cells if needed * overhead is small: 2 bits per cell * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and cell mutations be declared by the application, using a simple API * custom types can define a finalizer that is called at deletion time; one of the applications is to plug in externally managed structures (e.g. using malloc/free) * the GC process is fully controllable (pause/resume) so that the applications don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs support circular references without memory leaks nor specific handling; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 62kB. The source code is heavily commented using the very readable Natural Docs syntax. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. Improvements on custom types: custom hash maps, generic map access 2. Immutable map types PORTABILITY =========== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also use Ubuntu Karmic Koala 9.10 for Linux 32-bit development (Ubuntu 10.10 for 64-bit) so the archive also includes minimalist GNU makefiles for building with the provided GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit and 64-bit systems: all public and internal types are based on portable C-99 types such as intptr_t. The only parts that need platform-specific code are low-level page allocation and GC-related synchronization. Colibri needs system calls that allocate boundary-aligned pages, as well as synchronization primitives such as mutexes and condition variables. At present both Windows and Unix (Linux) versions are provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source subtree. Platform-specific peculiarities should not impact the overall architecture. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. However I believe that this conversion is possible starting at version 0.9, as all the necessary building blocks are now present (data structures, threading model...). I intend to experiment integration with other C-based Tcl implementations such as Jim or Picol in the near future. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.10/colibri0.10.src.zip/download - documentation: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.10/colibri0.10.doc.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.10/colibri0.10.win32.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2011-03-10 10:59:28
|
ANNOUNCE: Colibri version 0.9 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the ninth version of Colibri. CHANGES SINCE VERSION 0.8 ========================= 1. Improved threading model - previous versions used a strict appartment model: each thread had its own memory pools & GC process. Consequently data could not be shared across threads, which is suboptimal when using immutable structures (especially strings and ropes). Individual threads can now choose which model to follow at initialization time. - COL_SINGLE model behaves as previous versions: strict appartment, synchronous GC, no data sharing, deterministic behavior. - COL_ASYNC model resembles COL_SINGLE but the GC process uses its own thread for asynchronous processing. This model is well suited for single-threaded, non-deterministic applications, such as I/O intensive, even-driven or interactive processes, where the GC can be performed as a backgroud task without blocking the main thread (as long as it doesn't need to access the managed data). - COL_SHARED model allows several threads to share data. There can be several distinct "groups" of threads in the same process, each having shared memory pools and a single background GC thread. For example, COL_SHARED and COL_SHARED+1 define two distinct shared groups. Immutable data can be shared safely within a group without specific handling, however mutable data needs explicit, external synchronization depending on the application's needs. Contention is minimal: memory allocation is thread-local and as fast as with single-threaded models, older structures are merged in the shared pool during the GC process. Blocking only occurs when a GC is ongoing. Benchmarks show that this model scales very well. 2. Real-world usability and transition path to Tcl9 - the feature list is now sufficient for real-world applications regarding memory data storage - this includes Tcl itself: Colibri now provides data structures and threading models that I believe can cover the needs of the Tcl core or any of its reimplementations WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is an abstract data type infrastructure. It features: - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as ropes (binary tree-based strings, see http://wiki.tcl.tk/20690), vectors (flat arrays), lists (binary trees of vectors with support for cycles and sparse representation), integer- and string-keyed maps, along with easy to use iterator interfaces * ropes are immutable strings that allow for fast operations (extraction, insertion, deletion, concatenation) with maximized data sharing * string types support the full Unicode range (up to 32-bit codepoints) in 1/2/4-byte fixed-width encodings as well as variable-width UTF-8, with transparent conversions and access * vectors and lists come in both immutable and mutable forms. The former share the same features as ropes. The latter implement in-place modifications and gracious degradation to immutable forms * custom word types are supported: lazy or procedural string representations, mutable or immutable types, etc. - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate an arbitrary number of cells if needed * overhead is small: 2 bits per 16-byte cell * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and cell mutations be declared by the application, using a simple API * custom types can define a finalizer that is called at deletion time; one of the applications is to plug in externally managed structures (e.g. using malloc/free) * the GC process is fully controllable (pause/resume) so that the applications don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs support circular references without memory leaks nor specific handling; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 62kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. 64-bit support 2. Improvements on custom types: custom hash maps, generic map access, immutable map types 3. Proper internal and user documentation WHAT NEEDS TO BE DONE? ====================== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also use an Ubuntu Karmic Koala 9.10 system for Linux development, so the archive also includes minimalist GNU makefiles for building with the provided GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the machine word size (a cell should be able to store 4 words, which add up to 16 bytes on 32-bit systems). The only parts that need platform-specific code are low-level page allocation and GC-related synchronization. Colibri needs system calls that allocate boundary-aligned pages, as well as synchronization primitives such as mutexes and condition variables. At present both Windows and Unix (Linux) versions are provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source subtree. Platform-specific peculiarities should not impact the overall architecture. Last, it lacks user and design documentation, although the source code is extensively commented. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. However I believe that this conversion is possible starting at version 0.9, as all the necessary building blocks are now present (data structures, threading model...). WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.9/colibri0.9.src.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.9/colibri0.9.win32.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2011-02-21 08:38:55
|
ANNOUNCE: Colibri version 0.8 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the eighth version of Colibri. CHANGES SINCE VERSION 0.7 ========================= 1. Major rework of internals - merged ropes and words: ropes are now regular words - as a consequence, raw C string are no longer valid ropes (however conversion procs are provided) - removed old string word types, ropes can now take advantage of words' "immediate" representation (e.g. a 3-byte string takes up no allocated space and is stored within the pointer value) - custom rope types are now custom words - completely reworked parent handling (i.e. cells of older generations having children in younger generations). The previous implementation used one cell per parent, now parents are handled on a per-page basis. This paves the way for transparent support of parent detection through page guards 2. Rework of word synonym handling - immutable word types no longer have a synonym field internally, but use a special wrapper word when needed. The goal is to distinguish immutable vs. mutable data for better memory management (and possibly memory protection and inter-thread sharing in the future) 3. Rope and list traversal improvement - an arbitrary number of ropes or lists can be traversed in parallel, à la Tcl's [foreach]. This is useful for e.g. string comparisons 4. String-keyed map types - generic interface - two implementations: * hash map: similar to Tcl hash tables, unordered collection with O(1) access, but O(k) hash computation (k being the key length) * crit-bit trie: ordered collection, binary tree with O(log(n)) access, but better mean access than hash maps with long key strings 5. Map iterator interface - generic interface for map types with the same key type (integer or string) WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is an abstract data type infrastructure. It features: - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as ropes (binary tree-based strings, see http://wiki.tcl.tk/20690), vectors (flat arrays), lists (binary trees of vectors with support for cycles and sparse representation), integer- and string-keyed maps, along with easy to use iterator interfaces * ropes are immutable strings that allow for fast operations (extraction, insertion, deletion, concatenation) with maximized data sharing * string types support the full Unicode range (up to 32-bit codepoints) in 1/2/4-byte fixed-width encodings as well as variable-width UTF-8, with transparent conversions and access * vectors and lists come in both immutable and mutable forms. The former share the same features as ropes. The latter implement in-place modifications and gracious degradation to immutable forms * custom word types are supported: lazy or procedural string representations, mutable or immutable types, etc. - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate an arbitrary number of cells if needed * overhead is small: 2 bits per 16-byte cell * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and cell mutations be declared by the application, using a simple API * custom types can define a finalizer that is called at deletion time; one of the applications is to plug in externally managed structures (e.g. using malloc/free) * the GC process is fully controllable (pause/resume) so that the applications don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs support circular references without memory leaks nor specific handling; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 61kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. Improvements on custom types (e.g. custom hash maps) 2. Improvements on the threading model 3. Proper internal and user documentation WHAT NEEDS TO BE DONE? ====================== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also use an Ubuntu Karmic Koala 9.10 system for Linux development, so the archive also includes minimalist GNU makefiles for building with the provided GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the machine word size (a cell should be able to store 4 words, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present both Windows and Unix (Linux) versions are provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source subtree. Platform-specific peculiarities should not impact the overall architecture. Last, it lacks user and design documentation, although the source code is extensively commented. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.8/colibri0.8.src.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.8/colibri0.8.win32.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2010-12-20 09:31:44
|
ANNOUNCE: Colibri version 0.7 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the seventh version of Colibri. CHANGES SINCE VERSION 0.6 ========================= 1. Major rework of internals - system page allocator now supports arbitrarily sized pages - data structures are no longer limited in size (previous version were constrained by the page size, 1KB on 32-bit systems) - cells can be pinned, i.e. marked as unmovable when compacting GC is performed - roots no longer wrap the preserved cells but instead are stored in a dedicated trie indexed by the cell address; cells with positive refcount are pinned so that their addres doesn't change - parents (i.e. cells from older generations pointing to newer ones) are now managed on a per page basis; this improves the locality of reference during GC, and simplifies declaration: instead of calling Col_DeclareChild with parent and child cells, client code simply calls Col_SetModified on the parent - a lot of performance optimizations 2. Benchmarking framework - based on Tcl: specific tests are exposed as Tcl commands, and stress tests are written as Tcl scripts - used to compare Colibri datatype performances to Tcl, STL and others 3. Integer-keyed map types. - generic interface - two implementations: * hash map: similar to Tcl hash tables, unordered collection with O(1) access * crit-bit trie: ordered collection, binary tree with O(log(n)) access WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is a string and data type infrastructure. It features: - Rope-based strings (see http://wiki.tcl.tk/20690) : * ropes are immutable ... but ... * extraction, insertion, deletion, concatenation... are fast * limited compatibility with native C strings allocated in static space (or guaranteed to survive the whole application lifetime) * support for the full Unicode range (up to 32-bit codepoints) * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with transparent conversions and access * custom rope types allows for lazy or procedural representations * iterator interface for easy access - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as vectors (flat arrays), lists (binary trees of vectors with support for cycles and sparse representation), integer-keyed maps, along with an easy to use iterator interface - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate an arbitrary number of cells if needed * overhead is small: 2 bits per 16-byte cell * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and changes to parent cells must be declared by the application, using a simple API * custom types can define a finalizer; one of the applications is to plug in externally managed structures (e.g. malloc/free) * the GC process is fully controllable (pause/resume) so that the applications don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs allow circular references without memory leaks; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 50kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. String-keyed maps, and improvement to the generic and integer-keyed map interfaces (e.g. iterators). 2. Proper internal and user documentation WHAT NEEDS TO BE DONE? ====================== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also use an Ubuntu Karmic Koala 9.10 system for Linux development, so the archive also includes minimalist GNU makefiles for building with the provided GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the machine word size (a cell should be able to store 4 words, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present both Windows and Unix (Linux) versions are provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source subtree. Platform-specific peculiarities should not impact the overall architecture. Last, it lacks user and design documentation, although the source code is extensively commented. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.7/colibri0.7.src.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.7/colibri0.7.win32.zip/download - Linux binary (x86): http://sourceforge.net/projects/tcl9/files/colibri/colibri0.7/colibri0.7.linux-x86.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2010-10-25 09:50:16
|
ANNOUNCE: Colibri version 0.6 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the sixth version of Colibri. CHANGES SINCE VERSION 0.5 ========================= 1. Added mutable list type. - efficient sparse representation of very large lists thanks to "void lists". - copy on write model, can be mixed with immutable types while preserving respective semantics. - can be "frozen" and downgraded to immutable lists at no cost. 2. Test suite! - follows the xUnit model. - implemented as a set of macros in a small, single C header file. 3. Lots of bug fixes thanks to the test suite. WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is a string and data type infrastructure. It features: - Rope-based strings (see http://wiki.tcl.tk/20690) : * ropes are immutable ... but ... * extraction, insertion, deletion, concatenation... are fast * limited compatibility with native C strings allocated in static space (or guaranteed to survive the whole application lifetime) * support for the full Unicode range (up to 32-bit codepoints) * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with transparent conversions and access * custom rope types allows for lazy or procedural representations * iterator interface for easy access - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as vectors (flat arrays) and lists (binary trees of vectors with support for cycles and sparse representation), along with an easy to use iterator interface - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate more cells if needed (up to 63 cells on 32-bit systems) * overhead is small: 2 bits per 16-byte cell. * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and parent-child relations must be declared by the application, using a simple API * custom types can define a finalizer; one of the applications is to plug in externally managed structures (e.g. malloc/free) * the GC process is fully controllable (pause/resume) so that the application don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs allow circular references without memory leaks; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 50kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. Maps (AKA dicts) with string keys. Candidate models include Patricia tries, critbits, ternary search trees, hash tables, array hashes, and hash-indexed sparse lists. Several implementations could be provided that will use the same high level interface (access, iteration...), and could be selected programmatically through a capability-based interface (e.g. lexical sorting, order preservation...). Benchmarks and stress tests will be performed on all models including the native Tcl hash table implementation. 2. Test suite improvements and better code coverage. 3. Proper internal and user documentation WHAT NEEDS TO BE DONE? ====================== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also use an Ubuntu Karmic Koala 9.10 system for Linux development, so the archive also includes minimalist GNU makefiles for building with the provided GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the machine word size (a cell should be able to store 4 words, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present both Windows and Unix (Linux) versions are provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source subtree. Platform-specific peculiarities should not impact the overall architecture. Tests have been run extensively during development, both on stability and performance. The source includes a test application that runs the test suite by default. Other performance and stress tests are included; note that some are configured in such a way that they require a system with a large memory size (2GB), so make sure to tune their parameters before running them. Last, it lacks user and design documentation, although the source code is extensively commented. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.6/colibri0.6.src.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.6/colibri0.6.win32.zip/download - Linux binary (x86): http://sourceforge.net/projects/tcl9/files/colibri/colibri0.6/colibri0.6.linux-x86.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2010-10-08 12:45:19
|
ANNOUNCE: Colibri version 0.5 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the fifth version of Colibri. CHANGES SINCE VERSION 0.4 ========================= 1. Removed sequence type. It was awkward, hard to use, and didn't quite fit the purpose. Besides, doing a mutable version was too complicated. Replaced by list type improvements. 2. Improved list type. - support for cyclic lists: each list has a loop length field that, when non-zero, gives the length of the terminating loop. Cyclic loops can also easily be detected during iteration thanks to the iterator index. - support for sparse lists: they can now contain void parts (i.e. whose elements are all nil) of arbitrary lengths at the cost of a single machine word. 3. Added mutable vector type. - can grow and shrink arbitrarily within a reserved size set at creation time - can be "frozen" and downgraded to immutable vectors at no cost. 4. Preliminary work for mutable list type. - will use the new sparse list facility. - will downgrade to immutable form easily. 5. Basic exception handling. - catches critical conditions and validation errors with meaningful message - default handler calls abort(), can be overridden by user proc. WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is a string and data type infrastructure. It features: - Rope-based strings (see http://wiki.tcl.tk/20690) : * ropes are immutable ... but ... * extraction, insertion, deletion, concatenation... are fast * limited compatibility with native C strings allocated in static space (or guaranteed to survive the whole application lifetime) * support for the full Unicode range (up to 32-bit codepoints) * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with transparent conversions and access * custom rope types allows for lazy or procedural representations * iterator interface for easy access - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer whenever possible rather than in an allocated structure. Everything is abstracted behind accessors. * several high level datastructures are provided, such as vectors (flat arrays) and lists (binary trees of vectors with support for cycles and sparse representation), along with an easy to use iterator interface - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate more cells if needed (up to 63 cells on 32-bit systems) * overhead is small: 2 bits per 16-byte cell. * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and parent-child relations must be declared by the application, using a simple API * custom types can define a finalizer; one of the applications is to plug in externally managed structures (e.g. malloc/free) * the GC process is fully controllable (pause/resume) so that the application don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs allow circular references without memory leaks; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 40kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. Mutable data types with graceful degradation to immutable forms: - Mutable lists. - Maps (AKA dicts) with string keys. Candidate models include Patricia trees and Ternary Search Trees, but not hash tables. 2. Proper internal and user documentation 3. Better test suite WHAT NEEDS TO BE DONE? ====================== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also uses an Ubuntu Karmic Koala 9.10 system for Linux development, so the archive also includes minimalist GNU makefiles for building using the included GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the machine word size (a cell should be able to store 4 words, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present both Windows and Unix (Linux) version is provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source tree. Platform-specific peculiarities should not impact the overall architecture. Tests have been run extensively during development, both on stability and performance. However Colibri lacks a real test suite (including unit testings). The source includes a test application that has to be hand-edited to run or customize specific tests. Note that some stress tests are configured in such a way that they require a system with a large memory size (2GB), so make sure to tune their parameters before running them. Last, it lacks user and design documentation, although the source code is extensively commented. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.5/colibri0.5.src.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.5/colibri0.5.win32.zip/download - Linux binary (x86): http://sourceforge.net/projects/tcl9/files/colibri/colibri0.5/colibri0.5.linux-x86.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2010-08-19 13:09:56
|
ANNOUNCE: Colibri version 0.4 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the fourth version of Colibri. CHANGES SINCE VERSION 0.3 ========================= 1. Added sequence and reference word types. - potentially unlimited collections - self-referencing structures with efficient cycle detection 2. Type- and nil/NULL- related cleanup. 3. Improved source code documentation on the internal data structures. WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is a string and data type infrastructure. It features: - Rope-based strings (see http://wiki.tcl.tk/20690) : * ropes are immutable ... but ... * extraction, insertion, deletion, concatenation... are fast * limited compatibility with native C strings allocated in static space (or guaranteed to survive the whole application lifetime) * support for the full Unicode range (up to 32-bit codepoints) * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with transparent conversions and access * custom rope types allows for lazy or procedural representations * iterator interface for easy access - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer rather than in an allocated structure * several high level datastructures are provided, such as vectors (flat arrays), lists (binary trees of vectors), and sequences (potentially cyclic and unlimited linked lists of lists and references to other sequences), along with an easy to use iterator interface - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate more cells if needed (up to 63 cells on 32-bit systems) * overhead is small: 2 bits per 16-byte cell. * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and parent-child relations must be declared by the application, using a simple API * custom types can define a finalizer; one of the applications is to plug in externally managed structures (e.g. malloc/free) * the GC process is fully controllable (pause/resume) so that the application don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs allow circular references without memory leaks; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 39kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. PLANNED FEATURES FOR NEXT VERSIONS ================================== 1. Mutable data types with circular references and graceful degradation to immutable types: - Mutable lists. Candidate models include unrolled linked lists and skip lists. - Maps (AKA dicts) with string keys. Candidate models include Patricia trees and Ternary Search Trees, but not hash tables. 2. Proper internal and user documentation 3. Better test suite WHAT NEEDS TO BE DONE? ====================== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also uses a CentOS 5.2 Linux VMware image for Linux development, so the archive also includes minimalist GNU makefiles for building using the included GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the word size (a cell should be able to store 4 pointers, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present both Windows and Unix (Linux) version is provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source tree. Platform-specific peculiarities should not impact the overall architecture. Exception handling is poor, especially because Colibri is meant to be part of a larger system that provides the appropriate infrastructure. "Exception handling" includes what to do upon allocation failures due to low-memory conditions. As Colibri uses a non-prehemptive model it cannot stop-the-world and GC like other implementations do to free memory. More generally, due to the nature of the model, such conditions would only occur when the memory is actually used or when the GC is not triggered often enough (for example during a long computation phase involving a large number of temporary allocations), both being unrecoverable anyway without explicit programming. Tests have been run extensively during development, both on stability and performance. However Colibri lacks a real test suite (including unit testings). The source includes a test application that has to be hand-edited to run or customize specific tests. Note that some stress tests are configured in such a way that they require a system with a large memory size (2GB), so make sure to tune their parameters before running them. Last, it lacks user and design documentation, although the source code is extensively commented. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/ Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.4/colibri0.4.src.zip/download - Windows binary: http://sourceforge.net/projects/tcl9/files/colibri/colibri0.4/colibri0.4.win32.zip/download - Linux binary (x86): http://sourceforge.net/projects/tcl9/files/colibri/colibri0.4/colibri0.4.linux-x86.zip/download LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2009-09-22 20:25:48
|
ANNOUNCE: Colibri version 0.3 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the third version of Colibri. WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is a string and data type infrastructure. It features: - Rope-based strings (see http://wiki.tcl.tk/20690) : * ropes are immutable ... but ... * extraction, insertion, deletion, concatenation... are fast * limited compatibility with native C strings allocated in static space (or guaranteed to survive the whole application lifetime) * support for the full Unicode range (up to 32-bit codepoints) * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with transparent conversions and access * custom rope types allows for lazy or procedural representations * iterator interface for easy access - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer rather than in an allocated structure - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate more cells if needed (up to 63 cells on 32-bit systems) * overhead is small: 2 bits per 16-byte cell. * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and parent-child relations must be declared by the application, using a simple API * custom types can define a finalizer; one of the applications is to plug in externally managed structures (e.g. malloc/free) * the GC process is fully controllable (pause/resume) so that the application don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs allow circular references without memory leaks; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 32kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. CHANGES SINCE VERSION 0.2 ========================= 1. Multithreading support - Appartment model, each thread has its own memory pool 2. Linux port and code portability improvements - Now uses C99 types such as int8_t and uintptr_t - Moved platform-specific code to platform/* subtree - Linux version uses mmap and pthreads 3. Test app overhaul - Each test has its own C file - Multithreading support PLANNED FEATURES FOR NEXT VERSION ================================= 1. Mutable data types with circular references and graceful degradation to immutable types: - Mutable lists. Candidate models include unrolled linked lists and skip lists. - Maps (AKA dicts) with string keys. Candidate models include Patricia trees and Ternary Search Trees, but not hash tables. 2. Proper internal and user documentation WHAT NEEDS TO BE DONE? ====================== My main development platform is Windows, so the source archive primarily includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. I also uses a CentOS 5.2 Linux VMware image for Linux development, so the archive also includes minimalist GNU makefiles for building using the included GCC compiler. However it makes no use of other parts of the GNU toolchain (autoconf and the like). The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the word size (a cell should be able to store 4 pointers, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present both Windows and Unix (Linux) version is provided, the latter using mmap. Porting to other systems should require only minimal effort, as the platform-specific code is limited to a handful of functions gathered in a platform-specific source tree. Platform-specific peculiarities should not impact the overall architecture. Exception handling is poor, especially because Colibri is meant to be part of a larger system that provides the appropriate infrastructure. "Exception handling" includes what to do upon allocation failures due to low-memory conditions. As Colibri uses a non-prehemptive model it cannot stop-the-world and GC like other implementations do to free memory. More generally, due to the nature of the model, such conditions would only occur when the memory is actually used or when the GC is not triggered often enough (for example during a long computation phase involving a large number of temporary allocations), both being unrecoverable anyway without explicit programming. Tests have been run extensively during development, both on stability and performance. However Colibri lacks a real test suite (including unit testings). The source includes a test application that has to be hand-edited to run or customize specific tests. Note that some tests are configured in such a way that they require a system with a large memory size (2GB), so make sure to tune their parameters before running them. Last, it lacks user and design documentation, although the source code is extensively commented. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: https://sourceforge.net/projects/tcl9/ Mailing list: https://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://downloads.sourceforge.net/tcl9/colibri0.3.src.zip?use_mirror= - Windows binary: http://downloads.sourceforge.net/tcl9/colibri0.3.win32.zip?use_mirror= - Linux binary (x86): http://downloads.sourceforge.net/tcl9/colibri0.3.linux-x86.zip?use_mirror= LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2009-04-16 14:18:25
|
ANNOUNCE: Colibri version 0.2 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the second version of colibri. WHAT IS CLOVERFIELD? ==================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STAND FOR? ============================ Colibris, known in English as hummingbirds, are a family of birds known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renown for their stationary and backward flight abilities on par with flying insects, which allow them to feed on the nectar of plants and flowers in-flight. I've chosen this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI? ================ Colibri is a string and data type infrastructure. It features: - Rope-based strings (see http://wiki.tcl.tk/20690) : * ropes are immutable ... but ... * extraction, insertion, deletion, concatenation... are fast * limited compatibility with native C strings allocated in static space (or guaranteed to survive the whole application lifetime) * support for the full Unicode range (up to 32-bit codepoints) * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with transparent conversions and access * custom rope types allows for lazy or procedural representations * iterator interface for easy access - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words can have synonyms, i.e. words of any type * synonyms can form chains of arbitrary length, so a word can have several synonyms with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer rather than in an allocated structure - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate more cells if needed (up to 63 cells on 32-bit systems) * overhead is small: 2 bits per 16-byte cell. * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and parent-child relations must be declared by the application, using a simple API * custom types can define a finalizer; one of the applications is to plug in externally managed structures (e.g. malloc/free) * the GC process is fully controllable (pause/resume) so that the application don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often as they get promoted to older generations * promotion is done by moving whole pages between memory pools, optionally performing compaction when fragmentation exceeds a certain threshold, thus limiting the overhead and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs allow circular references without memory leaks; word synonym chains take advantage of this, being implemented as circular lists * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 32kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATE TO TCL? =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. CHANGES SINCE VERSION 0.1 ========================= 1. Major rewrite of GC code: - Roots are now sorted by generation for better performances with large number of instances. - Reworked parent-child relation handling. Now uses one cell per parent instead of one per parent-child pair, in combination with a parent flag (replaces the now suppressed generation flag). - Roots and parent-child relations now use cells allocated in a dedicated memory pool. - No more per-cell generation flag; all cells in a page belong to the same generation. Promotion is now immediate upon GC. - Promotion is done per page rather than per cell: whole pages are moved up one generation at each GC. CPU overhead is much smaller. - As a consequence, the first generation is always emptied at each GC, which speeds up further allocations. - The last collected generation is by default merged with the first uncollected one, but may be relocated if fragmentation reaches a certain threshold; this is done by copying individual cells instead of moving pages. - Relocation and redirection are now performed inline during the mark phase, versus during later steps in previous versions. The resulting code is much simpler, and avoids cache-intensive traversal of whole memory pools. 2. Improved performances in every domain thanks to point 1. 3. Added two high-level data types: vectors and lists. - Vectors are flat collections of words. Their size is limited, as they must fit within one single page. They are directly addressable. - Lists are linear collections of words, made by assembling vectors into balanced binary trees. They are to vectors what ropes are to strings. Like ropes, lists are immutable. Iterator and traversal APIs are provided. PLANNED FEATURES FOR NEXT VERSION ================================= 1. Mutable data types with circular references and graceful degradation to immutable types: - Mutable lists. Candidate models include unrolled linked lists and skip lists. - Maps (AKA dicts) with string keys. Candidate models include Patricia trees and Ternary Search Trees, but not hash tables. 2. Proper internal and user documentation WHAT NEEDS TO BE DONE? ====================== My main development platform is Windows, so for now the source archive only includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the word size (a cell should be able to store 4 pointers, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present only a Windows version is provided, but porting to other systems should require minimal effort, as the platform-specific code is limited to a handful of functions. Platform-specific peculiarities should not impact the overall architecture. Exception handling is poor, especially because Colibri is meant to be part of a larger system that provides the appropriate infrastructure. The current implementation is not thread-safe as it uses unprotected global structures. However I don't think that adding synchronization is necessary given that it should eventually be integrated into a larger application with the same appartment threading model as Tcl. The main advantage of having a global allocator is inter-thread data sharing, and Tcl typically shares no data between threads. As Colibri is not designed to be a general-purpose solution, and unless there is a real benefit in using a global thread-safe allocator, converting global structures to thread-local should fit our needs. Tests have been run extensively during development, both on stability and performance. However Colibri lacks a real test suite. The source includes a test application that has to be hand-edited to run or customize specific tests. Note that the current values defined by this application require a system with a large memory size (2GB). Last, it lacks user and design documentation, although the source code is extensively commented. GRAND UNIFICATION SCHEME ======================== A great side project would be to reimplement Tcl over Colibri as a replacement for the current Tcl_Obj based code. Of course the resulting library would be incompatible on an ABI level, but this would provide a great testbed for Colibri in real-world cases (using pure script applications) as well as a comparison point with the official Tcl core, and will possibly open the path to Tcl9. This most likely involves a lot of work. WHERE CAN IT BE FOUND? ====================== Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: https://sourceforge.net/projects/tcl9/ Mailing list: https://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://downloads.sourceforge.net/tcl9/colibri0.2.src.zip?use_mirror= - Windows binary: http://downloads.sourceforge.net/tcl9/colibri0.2.win32.zip?use_mirror= LICENSE ======= The license is the same as for Tcl. |
From: Frédéric B. <fre...@fr...> - 2009-02-13 23:49:16
|
ANNOUNCE: Colibri version 0.1 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the first version of Colibri. WHAT IS CLOVERFIELD ? ===================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STANDS FOR ? ============================== Colibris, known in English as hummingbirds, are a family of bird known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renowed for their stationary and backward flight abilities on par with flying insects, which allows them to feed on the nectar of plants and flowers in-flight. I've chose this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI ? ================= Colibri is a string and data type infrastructure. It features: - Rope-based strings (see http://wiki.tcl.tk/20690) : * ropes are immutable ... but ... * extraction, insertion, deletion, concatenation... are fast * limited compatibility with native C strings allocated in static space (or guaranteed to survive the whole application lifetime) * support for the full Unicode range (up to 32-bit codepoints) * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with transparent conversions and access * custom rope types allows for lazy or procedural representations * iterator interface for easy access - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words have values that can be of any type, including other words * values can form chains of arbitrary length, so a word can have several values with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer rather than in an allocated structure - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate more cells if needed (up to 63 cells on 32-bit systems) * overhead is small: 2 bits per 16-byte cell. * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and parent-child relations must be declared by the application, using a simple API * custom types can define a finalizer * the GC process is fully controllable (pause/resume) so that the application don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often and get promoted after two cycles * promotion is done by copying old elements to older pools, which performs compaction as a side effect, thus limiting the fragmentation and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs allow circular references without memory leaks * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 27kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATES TO TCL =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. WHAT NEEDS TO BE DONE ? ======================= My main development platform is Windows, so for now the source archive only includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the word size (a cell should be able to store 4 pointers, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present only a Windows version is provided, but porting to other systems should require minimal effort, as the platform-specific code is limited to a handful of functions. Platform-specific peculiarities should not impact the overall architecture. Exception handling is poor, especially because Colibri is meant to be part of a larger system that provides the appropriate infrastructure. The current implementation is not thread-safe as it uses unprotected global structures. However I don't think that adding synchronization is necessary given that it should eventually be integrated into a larger application with the same appartment threading model as Tcl. The main advantage of having a global allocator is inter-thread data sharing, and Tcl typically shares no data between threads. As Colibri is not designed to be a general-purpose solution, and unless there is a real benefit in using a global thread-safe allocator, converting global structures to thread-local should fit our needs. Tests have been run extensively during development, both on stability and performance. However Colibri lacks a real test suite. The source includes a test application that has to be hand-edited to run or customize specific tests. Note that the current values defined by this application require a system with a large memory size (2GB). Last, it lacks user and design documentation, although the source code is extensively commented. WHERE CAN IT BE FOUND ? ======================= Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: https://sourceforge.net/projects/tcl9/ Mailing list: https://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://downloads.sourceforge.net/tcl9/colibri0.1.src.zip?use_mirror= - Windows binary: http://downloads.sourceforge.net/tcl9/colibri0.1.win32.zip?use_mirror= LICENSE ======= The license is the same as for Tcl. |