CPPML Code
Status: Beta
Brought to you by:
braxtonmckee
File | Date | Author | Commit |
---|---|---|---|
Makefile | 2011-05-11 | braxtonmckee | [r54] moved allocation, refcounting, and memo-cycle d... |
README | 2010-09-14 | braxtonmckee | [r51] added "type capture" to lambdas. |
codemodel.ml | 2011-08-20 | braxtonmckee | [r57] rewrote cppml to use proper classes instead of ... |
errs.ml | 2010-09-09 | braxtonmckee | [r41] A bunch of changes including |
gccml.py | 2010-09-14 | braxtonmckee | [r50] changed gccml.py for OSX. |
lexer.mll | 2010-09-13 | braxtonmckee | [r47] working lambda functions! |
parser.mly | 2011-08-08 | braxtonmckee | [r56] cppml handles anonymous namespaces correctly now. |
run.ml | 2010-09-09 | braxtonmckee | [r41] A bunch of changes including |
test.cppml | 2011-08-20 | braxtonmckee | [r57] rewrote cppml to use proper classes instead of ... |
test.hpp | 2010-09-14 | braxtonmckee | [r50] changed gccml.py for OSX. |
************** OVERVIEW *********** CPPML is a simple overlay on top of C++ syntax that creates a concise syntax to mimic the typing and pattern matching found in ML based languages. In ocaml, we might write type list = Nil | Node of int * list; in CPPML, @type List = Nil of () -| Node of int, List; In particular, this syntax makes it easy to create simple tuples or tagged union types. These classes protect their members, have correct copy constructors and operator= functions. CPPML consists of a program "cppml" which takes C++ code that has already been passed through a preprocessor and expands CPPML language extensions back into regular C++ code ready to be fed back into a compiler. gccml.py is a simple wrapper for gcc. If the first argument is a file ending in .cppml or .hppml, it preprocesses the file, passes it through CPPML, and then passes it back to gcc. eventually gccml.py will become a full wrapper for gcc to support the language exension. *********** RUNNING AND INSTALLATION ************ CPPML is written in OCaml, so you need OCaml installed. gccml.py is a python script, so you'll need python. Run "make cppml" to build the cppml binary, which you need in your path for gccml.py to work. Run "gccml.py X.cppml <options> -o whatever.o" will compile X.cppml into an object file. Run "make test" to build the test app out of test.cppml as a simple example, and "./test" to run it. Check out test.cppml to see some example code. Use "-atomic_ops" within gccml.py to cause cppml to use atomic increment and decrement operations, making tagged unions thread safe. ************ LANGUAGE EXAMPLES ************* The extension @type allows us to easily declare tuples and tagged union types. For instance, to declare a type T1 which is a tuple of int and int, we write @type T1 = int, int; The tuple elements are accessible as getM1() and getM2() when they're not named. The tuple has a copy constructor and operator= defined, so its safe to copy around. We also define functions getM1, getM2, etc. to access the relevant members. We can name the members as well. In this example @type T1 = int a, int b; the elements are accessible as a() and b(), as well as getM1() and getM2(). Then we can write T1 t(1,2); assert(t.a() + t.b() == 3); Any type declared this way can be given a body to expose member functions @type T1 = int a, int b { public: int sum(void) const { return a() + b(); }; }; Simple pattern matching can take apart a tuple and bind the tuple elements to variables in the subpattern. for instance, we can sum the elements of the tuple like this: int sum = @match T1(t) -| x,y ->> x + y; in ocaml, we would have written type T1 = int, int;; match t with x,y -> x+y;; here's what these pieces are: -| indicates a term in a match. like | in ocaml matching x,y the pattern to match. decomposes the tuple and binds x and y to the members of the tuple ->> indicates the pattern is over, time to have a result x+y; predicate. "x" is a reference to the first element of the tuple, "y" is a reference to the second element result gets passed back We can also generate tagged unions. For instance, in @type T2 = Tag1 of int, int -| Tag2 of string, string; An element of type T2 is either a "Tag1" with a pair of ints, or a "Tag2" with a pair of strings. in ocaml we would have written type T2 = Tag1 of int * int | Tag2 of string * string The resulting union is stored as a pointer to a block of memory containing the tag, a refcount, and the data of the tuple represented by the tag. GC is done using the refcount. currently its not threadsafe, but its pretty cheap to copy around. Instances of the tagged union are created using a static member function. In this case, T2::Tag1 and T2::Tag2. T2 aTag1Object = T2::Tag1(1,2); T2 aTag2Object = T2::Tag2("hello", "world"); are equivalent to ocaml code let aTag1Object = Tag1(1,2); let aTag2Object = Tag2("hello", "world"); We can query a tagged union rather than using the pattern matching: //we can find out which tag we have: assert(aTag1Object.isTag1()); assert(!aTag1Object.isTag2()); we can also get the tag1 object out as a tuple and query it. The parser generates "getX" and "isX" functions for every possible tag. "getX" functions don't check what kind of object you have to reduce overhead, so be careful. Use @match to be sure... assert(aTag1Object.getTag1().getM1() == 1); We support pattern matching on tagged unions like ocaml has using @match: @match T2(aTag1Object) -| Tag1(a,b) ->> "was a tag1 with integers..." -| Tag2(a,b) ->> "was a tag2 with " + a + b; In this case, we match the first line if "aTag1Object" is an object with tag Tag1 or the second line otherwise. Patterns can be arbitrarily complicated trees. If the set of matches doen't match the result the operation throws an exception. For instance, this would throw T2::MatchError: @match T2(aTag1Object) -| Tag2(a,b) ->> "not going to match"; we can get around this by writing @match T2(aTag1Object) -| Tag2(a,b) ->> "not going to match"; -| _ ->> "matches anything and throws away the result"; since "_" matches anything and doesn't bind the variable. We implicitly add as many _'s as necessary to fill out the match, so @match T2(aTag1Object) -| Tag2(a) ->> ... would work as well. Of course the power of this sort of typing comes when you make the types refer to themselves. Here's the definition of a list of integers, defined in ocaml as type list = Nil | Node of int*list; and in CPPML as @type List = Nil of () // empty tag denoted by "()" -| Node of int, List as >>=; // >>= is a right associative operator in C. // the "as" syntax generates a function // List operator >>=(int, List) // which just calls the right constructor //make a list List nil = List::Nil(); //create a list with "10" in it. two ways to do it List withOneThing = 10 >>= nil; List withOneThingNotUsingAnOperator = List::Node(10, nil); //operators are more conscise than constructors. these are equivalent: List withSeveralThings = 1 >>= 2 >>= 3 >>= nil; List withSeveralThings2 = List::Node(1, List::Node(2, List::Node(3, nil))); We can sum a list using recursion and pattern matching. Unlike OCaml, c++ compilers will usually not turn the tail recursion into a loop. int sumList(List l) { @match List(l) -| Nil() ->> 0 -| Node(head,tail) ->> head + sumList(tail); } A non-recursive loop would look like this int sumListLoop(List l) { int tr = 0; while (! l.isNil() ) l = @match List(l) -| Node(head, tail) ->> (tr += head, tail); return tr; } We can define several types that refer to each other at once @type T1 = Tag1 of int -| Tag2 of T2 and T2 = Tag3 of int -| Tag4 of T1; And we also support templates template <class T> @type ListT = Nil of () -| Node of T head, ListT<T> tail as >>= { public: T sum(void) const { return @match self_type(*this) -| Nil() ->> T() -| Node(head,tail) ->> head + tail.sum(); } }; (10 >>= List<int>::Nil()).sum(); ("test" >>= List<string>::Nil()).sum(); **************** LAMBDA FUNCTIONS ****************** Clients can define inline, anonymous lambda functions using the following syntax: @lambda <captured_types> return_type [local variables to capture](arguments) { function body } as an example, @lambda int [](x) { return x + 1; } produces an anonymous function with one argument that binds no variables. To implement this, we add class lambda_..._cls { public: template<class x_type> class result_type { public: typedef int result_type; }; template<class x_type> int operator()(const x_type& x) { return x + 1; } }; lambda_..._cls lambda_...(void) { return lambda_..._cls(); } to the top of the file in the CPPML namespace, and replace the original instantiation with lambda_...() where ... is a name computed using an MD5 hash of all of the data going into the lambda function. In the case where we bind local variables, lambda_..._ becomes a template, holding a reference to the original object. In this case int y = 1; @lambda int [y](x) { y = y + 1; return x + y; } expands to template<class in_y_type> class lambda_..._cls { public: typedef in_y_type y_type; lambda_..._cls(y_type& in_y) : y(in_y) { } template<class x_type> class result_type { public: typedef int result_type; }; template<class x_type> int operator()(const x_type& x) { y = y + 1; return x + y; } private: mutable y_type& y; }; template<class y_type> lambda_..._cls<y_type> lambda_...(y_type& y) { return lambda_..._cls<y_type>(y); } Because of our implementation places the lambda classes at the top of the file and the bodies at the end, we allow clients to specify external types to capture at the call site. That is, typedef ... X; @lambda<X> X() { ... } allows "X" to be captured in the lambda. in the headers, and lambda_...(y) at the original call site. We also provide an internal template "result_type" which holds the result type in a variable "result". If "T" is a lambda taking N arguments, then T::template result_type<arg_1_type, arg_2_type, ..., arg_N_type>::result is the result type that is produced by calling the function with arguments of type arg_1_type through arg_N_type. The bodies of lambda functions are instantiated as template definitiones at the end of the file. As a result, they won't have the typedefs or using declarations that are active at the point of creation. ********** COMMON DATA ************************** We allow clients to define "common" data members that exist for all alternatives of a tagged union. As an example, @type Alt = -| Option1 of string val -| Option2 of double val with int commonInteger ; "commonInteger" is now defined for both alternatives. Constructors are Alt::Option1(int, string) Alt::Option2(int, double) to reflect the new value. the common value can be extracted as Alt a; ... a.commonInteger() without having to use a pattern match. The value is available in the pattern match. For instance Alt a; ... @match Alt(a) -| Option1(val) with (ci) ->> ... ; is valid. values placed in the "with' clause are full pattern matchers ***************** MEMOIZATION ************************ We also support adding common memo-values to tagged union. These values are computed once, on demand, the first time they are accessed. As an example @type Alt = -| Option1 of float val -| Option2 of double val with double square = (@match Alt(*this) -| Option1(val) ->> (double)val*val -| Option2(val) ->> val*val); ; "square", when accessed, is computed once and then cached. This is particularly useful for caching hashes of constant objects. If a memo-function is called recursively while it is being computed, it throws an exception. ************* METADATA ******************************** cppml creates metadata classes for all tuples and tagged unions. These classes allow for template metaprogramming, letting you automate 1) creation of serializers 2) boost::python wrappers 3) pretty printers and other output more documentation on these classes will be forthcoming. for the moment, email me at braxton.mckee@gmail.com if you're interested and i'll send you some source code. ************** KNOWN ISSUES ********************** 1) destructors operate using recursion, so you can run out of stack space if you have large lists. 2) you cannot use @type within template specializations right now. 3) CPPML doesn't know about structs right now, so you can't put @type declarations inside them.