tcl9-cloverfield Mailing List for Tcl9

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

ANNOUNCE: Colibri version 0.13
==============================
As part of my ongoing work on the Cloverfield project, I am pleased to announce
the thirteenth version of Colibri.

CHANGES SINCE VERSION 0.12
==========================

1. Bugfix release: there was a deadlock in the Unix version where a pthread
    mutex was entered recursively, which is forbidden.

WHAT IS CLOVERFIELD?
====================

Wiki page: http://wiki.tcl.tk/Cloverfield

WHAT DOES COLIBRI STAND FOR?
============================

Colibris, known in English as hummingbirds, are a family of birds known for
their small size and high wing speed. The bee hummingbird (Mellisuga helenae),
found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total
length of about 5cm. They are renown for their stationary and backward flight
abilities on par with flying insects, which allow them to feed on the nectar of
plants and flowers in-flight.

I've chosen this name for this project because its goal is to be fast and
lightweight, and to follow the feather and bird theme shared with Tcl and many
related projects.

WHAT IS COLIBRI?
================

Colibri is an abstract data type infrastructure. It features:

  - Extensible data type sytem dubbed "words"

      * similar to Tcl_Obj
      * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes on 32-bit
        systems, but can take more space if needed
      * words can have synonyms, i.e. words of any type
      * synonyms can form chains of arbitrary length, so a word can have several
        synonyms with different representations -- no more shimmering!
	 * predefined types such as ints, floating points, single chars and small
        strings (up to 3 chars on 32-bit and 7 on 64-bit systems) are represented
	   as immediate values: the value is stored directly in the pointer whenever
	   possible rather than in an allocated structure. Everything is abstracted
	   behind accessors.
      * several high level datastructures are provided, such as ropes (binary
        tree-based strings, see http://wiki.tcl.tk/20690), vectors (flat
        arrays), lists (binary trees of vectors with support for cycles and
        sparse representation), integer- and string-keyed maps, along with easy
        to use iterator interfaces
      * ropes are immutable strings that allow for fast operations (extraction,
        insertion, deletion, concatenation) with maximized data sharing
      * string types support the full Unicode range (up to 24-bit codepoints) in
        1/2/4-byte fixed-width encodings as well as variable-width UTF-8, with
        transparent conversions and access
      * vectors and lists come in both immutable and mutable forms. The former
        share the same features as ropes. The latter implement in-place
        modifications and gracious degradation to immutable forms
      * custom word types are supported: lazy or procedural string
        representations, mutable or immutable types, etc.

  - Fast and efficient cell-based allocator

      * page-based allocation for optimal locality of reference and cache use
      * 4-word cells (16 bytes on 32-bit, 32 on 64-bit systems) fit
        most needs, but elements can allocatean arbitrary number of cells if
        needed
      * overhead is small: 2 bits per cell
      * raw alloc performances are competitive with the stock system malloc,
        and in many cases much faster (up to 5 times as fast on small strings and
        on words vs. Tcl_Obj-like structs)
      * single cell allocation (the most frequent case) is very fast

  - Automatic memory management thanks to an exact (AKA accurate or precise),
    generational, copying, mark-and-sweep, garbage collector

      * exact GC implies that roots (externally referenced elements) be declared
        by the application, using a simple API
      * mutations are automatically detected thanks to page write tracking
      * custom types can define a finalizer that is called at deletion time; one
        of the applications is to plug in externally managed structures (e.g.
        using malloc/free)
      * the GC process is fully controllable (pause/resume) so that the
        applications don't get interrupted unexpectedly
      * generational GC limits the overhead by restricting the collection to
        newer elements, which are typically short-lived
      * longer-living elements are collected less often as they get promoted to
        older generations
      * promotion is done by moving whole pages between memory pools, optionally
        performing compaction when fragmentation exceeds a certain threshold,
        thus limiting the overhead and improving the cache-friendliness over time
      * contrary to reference-counting schemes (RC), GCs support circular
        references without memory leaks nor specific handling; word synonym
        chains take advantage of this, being implemented as circular lists
      * studies show that GCs consistently outperform RC in raw performances and
        real-world cases, because the cost (both space and time) of maintaining
        reference counters outweights the amortized GC overhead, especially with
        generational GCs

Colibri is written in plain C and is free of any dependency besides system
libraries. The compiled binary DLL on Windows is about 67kB. The source code is
heavily commented using the very readable Natural Docs syntax.

HOW DOES COLIBRI RELATE TO TCL?
===============================

 From the Cloverfield announcement:

"   The last major point of the project is related to implementation and low
     level issues. The existing Tcl implementation is notoriously solid,
     however some changes are too radical to be performed on the current code
     base, so a reimplementation is certainly needed on vast portions. For
     example, the string representations, object structures, and virtual
     machines will certainly require complete rewrite to accommodate with the
     needed changes, which means that a lot of code won't be reused. However
     vast portions of library code and algorithms certainly will (clock scan,
     bigint, regexp...), as well as the high coding standards and QA that are
     the trademarks of our beloved language.  "

So Colibri is a candidate infrastructure for Tcl9 as an alternative to the
current Tcl_Obj-based core implementation. I believe that the features provided
by Colibri shall yield higher levels of performances than the current
architecture, at the price of an ABI incompatibility (for which major versions
are made anyway), but with a similar philosophy that should ease conversion of
existing code.

PLANNED FEATURES FOR NEXT VERSIONS
==================================

1. Improvements on map types: custom key types

2. Use word synonym chains instead of strings as the basic type in Picolibri

PORTABILITY
===========

My main development platform is Windows, so the source archive primarily
includes Microsoft Visual Studio project files. Microsoft provides a free
edition of their development environment known as Visual Studio Express for
those willing to compile and test the library without having to buy an expensive
license. Other systems need a makefile and/or autoconf scripts.

I also use Ubuntu Karmic Koala 9.10 for Linux 32-bit development (Ubuntu 10.10
for 64-bit) so the archive also includes minimalist GNU makefiles for building
with the provided GCC compiler. However it makes no use of other parts of the
GNU toolchain (autoconf and the like).

The code is fairly portable on 32-bit and 64-bit systems: all public and
internal types are based on portable C-99 types such as intptr_t.

The only parts that need platform-specific code are low-level page allocation,
memory protection and related exception/signal handling, and GC-related
synchronization. Colibri needs system calls that allocate boundary-aligned
pages, as well as synchronization primitives such as mutexes and condition
variables. At present both Windows and Unix (Linux) versions are provided, the
latter using mmap. Porting to other systems should require only minimal effort,
as the platform-specific code is limited to a handful of functions gathered in a
platform-specific source subtree. Platform-specific peculiarities should not
impact the overall architecture.

GRAND UNIFICATION SCHEME
========================

A great side project would be to reimplement Tcl over Colibri as a replacement
for the current Tcl_Obj based code. Of course the resulting library would be
incompatible on an ABI level, but this would provide a great testbed for
Colibri in real-world cases (using pure script applications) as well as a
comparison point with the official Tcl core, and will possibly open the path to
Tcl9. This most likely involves a lot of work. However I believe that this
conversion is possible starting at version 0.9, as all the necessary building
blocks are now present (data structures, threading model...).

A quick port of the Picol miminimalist Tcl implementation has been done with
success (see here http://wiki.tcl.tk/17893 ). At present it uses a string-based
model and makes systematic string conversions but I plan to convert it to the
Tcl_Obj-like word synonym model. I also intend to experiment integration with
other C-based Tcl implementations such as Jim in the near future.

WHERE CAN IT BE FOUND?
======================

Wiki page: http://wiki.tcl.tk/Colibri
Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/
Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri
Direct Download:
     - source: 
http://sourceforge.net/projects/tcl9/files/colibri/colibri0.13/colibri0.13.src.zip/download
     - documentation: 
http://sourceforge.net/projects/tcl9/files/colibri/colibri0.13/colibri0.13.doc.zip/download
     - Windows binary: 
http://sourceforge.net/projects/tcl9/files/colibri/colibri0.13/colibri0.13.win32.zip/download

LICENSE
=======

The license is the same as for Tcl.

2008	Jan	Feb (3)	Mar (9)	Apr (22)	May (64)	Jun (13)	Jul (1)	Aug	Sep	Oct	Nov	Dec
2009	Jan (12)	Feb (1)	Mar	Apr (1)	May	Jun	Jul	Aug	Sep (1)	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct (2)	Nov	Dec (1)
2011	Jan	Feb (1)	Mar (1)	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (5)	Dec (1)

tcl9-cloverfield Mailing List for Tcl9

tcl9-cloverfield — Mailing list dedicated to the Cloverfield project