CPPML Code

Status: Beta
Brought to you by: braxtonmckee
Tree [r57] /

History

HTTPS access
File	Date	Author	Commit
Makefile	2011-05-11	braxtonmckee	[r54] moved allocation, refcounting, and memo-cycle d...
README	2010-09-14	braxtonmckee	[r51] added "type capture" to lambdas.
codemodel.ml	2011-08-20	braxtonmckee	[r57] rewrote cppml to use proper classes instead of ...
errs.ml	2010-09-09	braxtonmckee	[r41] A bunch of changes including
gccml.py	2010-09-14	braxtonmckee	[r50] changed gccml.py for OSX.
lexer.mll	2010-09-13	braxtonmckee	[r47] working lambda functions!
parser.mly	2011-08-08	braxtonmckee	[r56] cppml handles anonymous namespaces correctly now.
run.ml	2010-09-09	braxtonmckee	[r41] A bunch of changes including
test.cppml	2011-08-20	braxtonmckee	[r57] rewrote cppml to use proper classes instead of ...
test.hpp	2010-09-14	braxtonmckee	[r50] changed gccml.py for OSX.
Read Me

************** OVERVIEW ***********

CPPML is a simple overlay on top of C++ syntax that creates a concise
syntax to mimic the typing and pattern matching found in ML based languages.
In ocaml, we might write

	type list = Nil | Node of int * list;

in CPPML,

	@type List = Nil of () -| Node of int, List;

In particular, this syntax makes it easy to create simple tuples or tagged union
types. These classes protect their members, have correct copy constructors and
operator= functions.

CPPML consists of a program "cppml" which takes C++ code that has already
been passed through a preprocessor and expands CPPML language extensions back
into regular C++ code ready to be fed back into a compiler.

gccml.py is a simple wrapper for gcc. If the first argument is a file
ending in .cppml or .hppml, it preprocesses the file, passes it through
CPPML, and then passes it back to gcc. eventually gccml.py will become a full
wrapper for gcc to support the language exension.

*********** RUNNING AND INSTALLATION ************

CPPML is written in OCaml, so you need OCaml installed. gccml.py is a
python script, so you'll need python.

Run "make cppml" to build the cppml binary, which you need in your path for
gccml.py to work.

Run "gccml.py X.cppml <options> -o whatever.o" will compile X.cppml into an
object file.

Run "make test" to build the test app out of test.cppml as a simple example,
and "./test" to run it. Check out test.cppml to see some example code.

Use "-atomic_ops" within gccml.py to cause cppml to use atomic increment
and decrement operations, making tagged unions thread safe.



************ LANGUAGE EXAMPLES *************

The extension @type allows us to easily declare tuples and tagged union types.
For instance, to declare a type T1 which is a tuple of int and int, we write

	@type T1 = int, int;

The tuple elements are accessible as getM1() and getM2() when they're not named.
The tuple has a copy constructor and operator= defined, so its safe to copy
around. We also define functions getM1, getM2, etc. to access the relevant members.

We can name the members as well. In this example

	@type T1 = int a, int b;

the elements are accessible as a() and b(), as well as getM1() and getM2(). Then
we can write

	T1 t(1,2);
	assert(t.a() + t.b() == 3);

Any type declared this way can be given a body to expose member functions

	@type T1 = int a, int b
	{
	public:
			int sum(void) const { return a() + b(); };
	};

Simple pattern matching can take apart a tuple and bind the tuple elements
to variables in the subpattern. for instance, we can sum the elements of the
tuple like this:

	int sum = @match T1(t) -| x,y ->> x + y;

in ocaml, we would have written

	type T1 = int, int;;
	match t with x,y -> x+y;;

here's what these pieces are:

	-|				 indicates a term in a match. like | in ocaml matching
	x,y   		 	 the pattern to match. decomposes the tuple and binds
						x and y to the members of the tuple
	->> 			 indicates the pattern is over, time to have a result
	x+y; 			 predicate. "x" is a reference to the first element
			        	of the tuple, "y" is a reference to the second element
						result gets passed back

We can also generate tagged unions. For instance, in

	@type T2 =
		Tag1 of int, int
	-| 	Tag2 of string, string;


An element of type T2 is either a "Tag1" with a pair of ints, or a "Tag2" with
a pair of strings. in ocaml we would have written

	type T2 = Tag1 of int * int | Tag2 of string * string

The resulting union is stored as a pointer to a block of memory containing
the tag, a refcount, and the data of the tuple represented by the tag. GC is
done using the refcount. currently its not threadsafe, but its pretty cheap to
copy around.

Instances of the tagged union are created using a static member function. In
this case, T2::Tag1 and T2::Tag2.

	T2 aTag1Object = T2::Tag1(1,2);
	T2 aTag2Object = T2::Tag2("hello", "world");

are equivalent to ocaml code

	let aTag1Object = Tag1(1,2);
	let aTag2Object = Tag2("hello", "world");

We can query a tagged union rather than using the pattern matching:

	//we can find out which tag we have:
	assert(aTag1Object.isTag1());
	assert(!aTag1Object.isTag2());

we can also get the tag1 object out as a tuple and query it. The parser generates
"getX" and "isX" functions for every possible tag. "getX" functions don't check
what kind of object you have to reduce overhead, so be careful. Use @match
to be sure...

	assert(aTag1Object.getTag1().getM1() == 1);

We support pattern matching on tagged unions like ocaml has using @match:

	@match T2(aTag1Object)
		-|	Tag1(a,b) ->> "was a tag1 with integers..."
		-|	Tag2(a,b) ->> "was a tag2 with " + a + b;

In this case, we match the first line if "aTag1Object" is an object with tag Tag1
or the second line otherwise. Patterns can be arbitrarily complicated trees.
If the set of matches doen't match the result the operation throws an exception.
For instance, this would throw T2::MatchError:

	@match T2(aTag1Object)
		-|	Tag2(a,b) ->> "not going to match";

we can get around this by writing

	@match T2(aTag1Object)
		-|	Tag2(a,b) ->> "not going to match";
		-|	_ ->> "matches anything and throws away the result";

since "_" matches anything and doesn't bind the variable. We implicitly add
as many _'s as necessary to fill out the match, so

	@match T2(aTag1Object)
		-| Tag2(a) ->> ...

would work as well.

Of course the power of this sort of typing comes when you make the types
refer to themselves. Here's the definition of a list of integers, defined in
ocaml as

	type list = Nil | Node of int*list;

and in CPPML as

	@type List =
		Nil of ()					// empty tag denoted by "()"
	-|	Node of int, List as >>=;	// >>= is a right associative operator in C.
									// the "as" syntax generates a function
									// List operator >>=(int, List)
									// which just calls the right constructor
	//make a list
	List nil = List::Nil();

	//create a list with "10" in it. two ways to do it
	List withOneThing = 10 >>= nil;
	List withOneThingNotUsingAnOperator = List::Node(10, nil);

	//operators are more conscise than constructors. these are equivalent:
	List withSeveralThings = 1 >>= 2 >>= 3 >>= nil;
	List withSeveralThings2 = List::Node(1, List::Node(2, List::Node(3, nil)));

We can sum a list using recursion and pattern matching. Unlike OCaml,
c++ compilers will usually not turn the tail recursion into a loop.

	int sumList(List l)
		{
		@match List(l)
			-|	Nil()	->> 0
			-|	Node(head,tail) ->> head + sumList(tail);
		}

A non-recursive loop would look like this

	int sumListLoop(List l)
		{
		int tr = 0;
		while (! l.isNil() )
			l = @match List(l) -| Node(head, tail) ->> (tr += head, tail);
		return tr;
		}

We can define several types that refer to each other at once

	@type T1 =
			Tag1 of int
		-| 	Tag2 of T2
	and T2 =
			Tag3 of int
		-|	Tag4 of T1;

And we also support templates

	template <class T>
	@type ListT = 		Nil of ()
					-| 	Node of T head, ListT<T> tail as >>=
	{
	public:
			T sum(void) const
				{
				return @match self_type(*this)
					-| Nil() ->> T()
					-| Node(head,tail) ->> head + tail.sum();
				}
	};

	(10 >>= List<int>::Nil()).sum();
	("test" >>= List<string>::Nil()).sum();

**************** LAMBDA FUNCTIONS ******************

Clients can define inline, anonymous lambda functions using the following syntax:

	@lambda <captured_types> return_type [local variables to capture](arguments) { function body }

as an example,

	@lambda int [](x) { return x + 1; }

produces an anonymous function with one argument that binds no variables.  To implement this,
we add

	class lambda_..._cls {
	public:
			template<class x_type>
			class result_type {
			public:
					typedef int result_type;
			};


			template<class x_type>
			int operator()(const x_type& x)
				{
				return x + 1;
				}
	};

	lambda_..._cls lambda_...(void) { return lambda_..._cls(); }

to the top of the file in the CPPML namespace, and replace the original instantiation with

	lambda_...()

where ... is a name computed using an MD5 hash of all of the data going into the lambda function.

In the case where we bind local variables, lambda_..._ becomes a template, holding a reference
to the original object. In this case

	int y = 1;
	@lambda int [y](x) { y = y + 1; return x + y; }

expands to

	template<class in_y_type>
	class lambda_..._cls {
	public:
			typedef in_y_type y_type;

			lambda_..._cls(y_type& in_y) : y(in_y)
				{
				}

			template<class x_type>
			class result_type {
			public:
					typedef int result_type;
			};

			template<class x_type>
			int operator()(const x_type& x)
				{
				y = y + 1; return x + y;
				}

	private:
			mutable y_type& y;
	};

	template<class y_type>
	lambda_..._cls<y_type> lambda_...(y_type& y) { return lambda_..._cls<y_type>(y); }

Because of our implementation places the lambda classes at the top of the file and the
bodies at the end, we allow clients to specify external types to capture at the call site. That is,

	typedef ... X;

	@lambda<X> X() { ... }

allows "X" to be captured in the lambda.

in the headers, and

	lambda_...(y)

at the original call site.

We also provide an internal template "result_type" which holds the result type in a variable "result".  If "T" is
a lambda taking N arguments, then T::template result_type<arg_1_type, arg_2_type, ..., arg_N_type>::result is the
result type that is produced by calling the function with arguments of type arg_1_type through arg_N_type.

The bodies of lambda functions are instantiated as template definitiones at the end of the file. As a result, they
won't have the typedefs or using declarations that are active at the point of creation.

********** COMMON DATA **************************

We allow clients to define "common" data members that exist for all alternatives of a tagged union.

As an example,

	@type Alt =
		-|	Option1 of string val
		-|	Option2 of double val
	with
		int commonInteger
		;

"commonInteger" is now defined for both alternatives.  Constructors are

	Alt::Option1(int, string)
	Alt::Option2(int, double)

to reflect the new value. the common value can be extracted as

	Alt a;
	...
	a.commonInteger()

without having to use a pattern match.  The value is available in the pattern match. For instance

	Alt a;
	...
	@match Alt(a)
		-|	Option1(val) with (ci) ->> ...
		;

is valid.  values placed in the "with' clause are full pattern matchers

***************** MEMOIZATION ************************

We also support adding common memo-values to tagged union.  These values are computed
once, on demand, the first time they are accessed.  As an example

	@type Alt =
		-|	Option1 of float val
		-|	Option2 of double val
	with
		double square = (@match Alt(*this) -| Option1(val) ->> (double)val*val -| Option2(val) ->> val*val);
		;

"square", when accessed, is computed once and then cached. This is particularly useful for caching hashes of constant objects.

If a memo-function is called recursively while it is being computed, it throws an exception.

************* METADATA ********************************

cppml creates metadata classes for all tuples and tagged unions. These
classes allow for template metaprogramming, letting you automate

	1) creation of serializers
	2) boost::python wrappers
	3) pretty printers and other output

more documentation on these classes will be forthcoming. for the moment,
email me at braxton.mckee@gmail.com if you're interested and i'll send you some source code.

************** KNOWN ISSUES **********************
1) destructors operate using recursion, so you can run
out of stack space if you have large lists.

2) you cannot use @type within template specializations right now.

3) CPPML doesn't know about structs right now, so you can't put
@type declarations inside them.
CPPML Code

Tree [r57] / Download Snapshot History

Read Me

Tree [r57] /

History