datadraw-user Mailing List for DataDraw (Page 2)
Brought to you by:
smilindog2000
You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
(6) |
Feb
(7) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(17) |
Jun
(1) |
Jul
(7) |
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
(23) |
| 2009 |
Jan
(8) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Richard P. <rdp...@gm...> - 2008-12-27 21:12:29
|
I notice that I mention this message in my previous post while it bounced. So here it is again. On Sat, Dec 27, 2008 at 12:13 PM, Richard Prescott <rdp...@gm...>wrote: > I like those discussions.... I wish we could take a beer around that. > > Happy Holidays to everybody on the list. I forgot to mention it last time. > > Oups! my last mail wasn't signed. So yes, it was me: Richard. rdprescott > on skype and gmail, richardprescott on linkedin, if anybody wonder. And by > the way, I am a software contractor looking for jobs right now. > > I just hope that all this is not too much noise for others on the list. > Sorry if it is the case for you. Although, the archive button is easy. > > I will try to separate my though onto subjects. Separate emails could have > been irritating. I am tempted to create a google doc on this. > > *Naming: grail, 42* > > I am a big fan of The Guide. Although, 42.org and fortytwo.org are > already took. That makes me sorry, I liked 42. There is certainly another > way to homage Douglas Adams by taking another name from its novel. > > *Strong typing* > > I have a new opinion on the subject. Strong typing for data and parameter > typing determine at compile-time as your min-max example. That would cover > two important features of modern languages which are templates and > interfaces in a simple elegant way. This is now possible given your though > (which I totally share) regarding reuse. The only drawback would be the > amount of code generated, but even there, I think it wouldn't be that bad. > Small code base can get reused a lot, large code base tend to be used once. > > > *Line continuation with \* > > I was thinking of line continuation with backslash for code not for > comment. Personally, I don't like multi-line comment and I am not forced to > use it ;-) > > *Returning tuples and passing by reference* > > I notice it in the fifo.gr example. I had a bad feeling about it but my > mind wasn't set. As normal object passing will be through handles, a method > can modify object's properties no problem. So the only place a reference is > useful is for basic types (including handles themselves) > > On the other hand, tuples may not be too hard to implement (maybe not in > the first version) > > Let > retval, error function(param) > > which consist of returning a tuple but its C implementation could be using > a reference. > retval_type function(Param_type param, error_type * error) > > if the error code is not used, it could be optimized out: > > foreach parent child > child.attribute = function(child) > > *Iterators, list comprehension and more* > > The strongest feature of 42 is definitely the relationship keyword and all > good data types borrowed from datadraw. Iterators as you defined it is > intimately related to it. I do not see the need of fancier iterators given > that fact. > > Now, list comprehension is the answer of python to remove filter, mapoperators in p3k. I don't know what should be our approach, what is the > best approach. What I know is that I insist on having a way to wrote data > transformation in a simple elegant way. Something that will generate code > using the iterators as you defined it. No need for more! > > parent1.child1 = [ foreach parent2 child2: function(child2) ] > > or > > parent1.child1 = map(function, parent2, child2) > > which is equivalent of > > parent1.deleteAllChild1() > foreach parent2 child2 > Child1 c = function(child2) > parent1.child1 += c > > Now that I am writing it down, I really think that a more modern type of > iterator is possible, here is what I am thinking of: > > class SpatialTree > Polygon polygons linked_list # bad... > > # note that polygon must be of type Polygon, no doubt, > # so why specify it? > # and box: as soon as it can gives > # left, right, bottom top that is alright > > iterator touching(box,polygon) > # very bad implementation > foreach polygons(polygon) # why naming the local variable? > if polygon.touching(box) > yield > > # and now its usage > Polygon polygon > foreach my_spatial_tree.touching(window.box, polygon) > window.draw(polygon) > > I love your depth first example. > > > *Break and continue statement* > > Well, the example you provided tells me that break may be syntactic sugar > but looks better. > > > *Class inheritance* > > It is syntactic sugar. Less used as new concept are brought such as > interfaces and dynamic types in my opinion. Very useful to write UML poem > such as "a rose is a rose". At the same time, we may loose users just > because they think 42 is not OO because it have no class inheritance. So > why not? > > The way UML view it is one kind of relationship. Programming language make > it very strong but not UML. Actually, the way relationship one-to-one > relationship works in datadraw would show no overhead because of that. In > addition to this one-to-one relationship, it adds dual cascade and a way to > "import" functions symbols from parent to child. For abstract based class > and/or virtual functions, implicit one to one relationship to methods? That > sounds pretty straightforward? > > importing symbol could be explicit alias as in python. and because you > named the reuse, it can become very fancy mapping. As soon as it can be > evaluated at compile-time. > > > *module parameters extention* > > That is a place where a default value shows its value. ;-) > > Up to now, it is very easy to extend module and classes without modify the > original author version. simply with inheritance and adding definitions. > > *String* > > String was my first answer but it is so a hot topic in c++ that I didn't > want to go there. I agree with you. > > *CLI* > > I disagree, I know that was not your first goal. But as you wrote: people > will use your code in a way you didn't expect! ;-) Here is my though: > > You will need a frontend parser for the language and three backends: one > for C, one for verilog, and one for compile-time expression evaluation. > Three for me means multiples. Enough to justify some abstraction and allow > adding more: VHDL, direct machine language, virtual machine binary, > whatever. None of them is part of your goals but could be done by third > party. > > What I like from a CLI (command line interface or shell) is that it allows > quicker development. I like to test concept in python shell when working on > a larger python code base. > > Also, the datadraw shell have shown its value. You want to loose it? > > *Closure* > > Let start with an example: > global_type global_var > def f1() > local_type local_var > closure_type closure_var # looks like local but is not > def f2() > mutate_var += global_var > return f2 > > I just expect that internally something such as the following happen: > > relationship f2 closure_type:closure_var mandatory > > That's it! > > *Namespace* > > Agree, I don't know what is best regarding operators but I prefer > consistency all over the place. Regarding compatibility with datadraw, it > could be possible to wrote a different frontend? ;-) > > About hierarchical modules, I think it is a good idea. This culture have > been brought by Java but we see that in C++, Perl, Python, probably Ruby. > That solve a lot of issue when the user-base growth. Also, I don't think > that will be hard to implement and use. especially when people reuse. > Which is one of the goals, isn't it? > > > See you later ! > > Regards > > Richard > > |
|
From: Richard P. <rdp...@gm...> - 2008-12-27 20:46:47
|
(continuation of last message)
*About BNF*
I am brainstorming here, so it may don't make sense at all....
Actually, ten years ago A friend and I chatted for a new language too. I
was not ready at that time. We wanted to go beyond c++ and in my internal
notes I called it c4+ (pronounced *see for more*) what follows probably will
recap some though of that time. Souvenirs....
I don't know antlr so I can just compare with lex/yacc
To correct from lex/yacc:
- they are to separate programs,
- there is no standardize simple way to build and use a library of
expression. How many time have you wrote floating point regular
expression? How many time have you saw it wrong?
- streaming one onto another is possible but doesn't come to mind of the
user
- not everything is LALR/GLR, well I guess that needs to be parse the
classical way then
- (when it make sense) better support for binary format
I would like to be able to embed the regular expression used in tokenizer
into the BNF expression. I am sure that would save all the states used in
lex. Although, that means that tokenizer needs to be within the same
engine. Otherwise how would it be possible to determine the nature of the
token if the token is not yet parse? Also ordering of lex is lost, meaning
that it is not possible anymore to try and guess.
Is there someone on the list with enough knowledge of LALR or GLR to comment
on on implementing regular expression within that engine? How about
performance? a hit or a gain?
I am sure there is something sexy to do with iterators and parsers to save
memory (like a network packet filter for instance) and pipe from one parser
to another. (video stream for instance)
and/or automatically build the correct set of class and relationship from
the BNF
import parser
alias ws = parser.optional_whitespace
class Attribute
sym name
string value
format xml name ws "=" ws "\"" value "\"" ws
format json "\"" name "\"" ws ":" ws "\"" value "\"" ws
class Tag
sym name
format xml "<" ws name ws attributes "/>" ws
format xml "<" ws name ws attributes ">" ws children "</" ws name ws
">" ws
format json "\"" name "\"" ws ":" ws "{" attributes children "}" ws
relationship Tag Attribute:attributes tail_linked cascade
relationship Tag Tag:children tail_linked cascade
# what about:
# relationship Tag Tag as children tail_linked cascade
# I am too bias by Python:
from file import file
from gzip import gzip
# would it be possible?:
file("result.json.gz").save(gzip(
json.save(xml.load(file("input.xml").iterate_by_char())))
Note that because we are compiling, iterator object becomes inline #defines.
So actually we are asking the compiler to resolve it at compilation time not
later on. It means that most complex iterators are possible as soon as they
are resolvable. Good news isn't it?
My example is pretty weak though...
Regards
Richard (again)
|
|
From: Richard P. <rdp...@gm...> - 2008-12-27 17:13:13
|
I like those discussions.... I wish we could take a beer around that. Happy Holidays to everybody on the list. I forgot to mention it last time. Oups! my last mail wasn't signed. So yes, it was me: Richard. rdprescott on skype and gmail, richardprescott on linkedin, if anybody wonder. And by the way, I am a software contractor looking for jobs right now. I just hope that all this is not too much noise for others on the list. Sorry if it is the case for you. Although, the archive button is easy. I will try to separate my though onto subjects. Separate emails could have been irritating. I am tempted to create a google doc on this. *Naming: grail, 42* I am a big fan of The Guide. Although, 42.org and fortytwo.org are already took. That makes me sorry, I liked 42. There is certainly another way to homage Douglas Adams by taking another name from its novel. *Strong typing* I have a new opinion on the subject. Strong typing for data and parameter typing determine at compile-time as your min-max example. That would cover two important features of modern languages which are templates and interfaces in a simple elegant way. This is now possible given your though (which I totally share) regarding reuse. The only drawback would be the amount of code generated, but even there, I think it wouldn't be that bad. Small code base can get reused a lot, large code base tend to be used once. *Line continuation with \* I was thinking of line continuation with backslash for code not for comment. Personally, I don't like multi-line comment and I am not forced to use it ;-) *Returning tuples and passing by reference* I notice it in the fifo.gr example. I had a bad feeling about it but my mind wasn't set. As normal object passing will be through handles, a method can modify object's properties no problem. So the only place a reference is useful is for basic types (including handles themselves) On the other hand, tuples may not be too hard to implement (maybe not in the first version) Let retval, error function(param) which consist of returning a tuple but its C implementation could be using a reference. retval_type function(Param_type param, error_type * error) if the error code is not used, it could be optimized out: foreach parent child child.attribute = function(child) *Iterators, list comprehension and more* The strongest feature of 42 is definitely the relationship keyword and all good data types borrowed from datadraw. Iterators as you defined it is intimately related to it. I do not see the need of fancier iterators given that fact. Now, list comprehension is the answer of python to remove filter, mapoperators in p3k. I don't know what should be our approach, what is the best approach. What I know is that I insist on having a way to wrote data transformation in a simple elegant way. Something that will generate code using the iterators as you defined it. No need for more! parent1.child1 = [ foreach parent2 child2: function(child2) ] or parent1.child1 = map(function, parent2, child2) which is equivalent of parent1.deleteAllChild1() foreach parent2 child2 Child1 c = function(child2) parent1.child1 += c Now that I am writing it down, I really think that a more modern type of iterator is possible, here is what I am thinking of: class SpatialTree Polygon polygons linked_list # bad... # note that polygon must be of type Polygon, no doubt, # so why specify it? # and box: as soon as it can gives # left, right, bottom top that is alright iterator touching(box,polygon) # very bad implementation foreach polygons(polygon) # why naming the local variable? if polygon.touching(box) yield # and now its usage Polygon polygon foreach my_spatial_tree.touching(window.box, polygon) window.draw(polygon) I love your depth first example. *Break and continue statement* Well, the example you provided tells me that break may be syntactic sugar but looks better. *Class inheritance* It is syntactic sugar. Less used as new concept are brought such as interfaces and dynamic types in my opinion. Very useful to write UML poem such as "a rose is a rose". At the same time, we may loose users just because they think 42 is not OO because it have no class inheritance. So why not? The way UML view it is one kind of relationship. Programming language make it very strong but not UML. Actually, the way relationship one-to-one relationship works in datadraw would show no overhead because of that. In addition to this one-to-one relationship, it adds dual cascade and a way to "import" functions symbols from parent to child. For abstract based class and/or virtual functions, implicit one to one relationship to methods? That sounds pretty straightforward? importing symbol could be explicit alias as in python. and because you named the reuse, it can become very fancy mapping. As soon as it can be evaluated at compile-time. *module parameters extention* That is a place where a default value shows its value. ;-) Up to now, it is very easy to extend module and classes without modify the original author version. simply with inheritance and adding definitions. *String* String was my first answer but it is so a hot topic in c++ that I didn't want to go there. I agree with you. *CLI* I disagree, I know that was not your first goal. But as you wrote: people will use your code in a way you didn't expect! ;-) Here is my though: You will need a frontend parser for the language and three backends: one for C, one for verilog, and one for compile-time expression evaluation. Three for me means multiples. Enough to justify some abstraction and allow adding more: VHDL, direct machine language, virtual machine binary, whatever. None of them is part of your goals but could be done by third party. What I like from a CLI (command line interface or shell) is that it allows quicker development. I like to test concept in python shell when working on a larger python code base. Also, the datadraw shell have shown its value. You want to loose it? *Closure* Let start with an example: global_type global_var def f1() local_type local_var closure_type closure_var # looks like local but is not def f2() mutate_var += global_var return f2 I just expect that internally something such as the following happen: relationship f2 closure_type:closure_var mandatory That's it! *Namespace* Agree, I don't know what is best regarding operators but I prefer consistency all over the place. Regarding compatibility with datadraw, it could be possible to wrote a different frontend? ;-) About hierarchical modules, I think it is a good idea. This culture have been brought by Java but we see that in C++, Perl, Python, probably Ruby. That solve a lot of issue when the user-base growth. Also, I don't think that will be hard to implement and use. especially when people reuse. Which is one of the goals, isn't it? See you later ! Regards Richard On Sat, Dec 27, 2008 at 5:44 AM, <dat...@li...>wrote: > Hi, Richard. So far as I know, only Richard knows this much about > programming languages, and also knows DataDraw well, so it must be you! > Thanks for the insightful feedback. I've embedded discussion below. > > On Fri, 2008-12-26 at 20:37 -0500, dat...@li... > wrote: > > Hello Bill, > > > > I am always in for this kind of discussion. ;-) Except me to do > > devil's advocate though before becoming proud defender. > > Thank you! Nice to have a strong programmer to bounce these ideas off > of. > > > I understand you want reuse and compatibility with datadraw. > > Important choice here I think. It is sort of a way to reuse and > > extend an existing language. My natural choice would have been to > > start from a language with a larger user-base such as Python and > > compile it to C using datadraw advantage. In the case of Python, it > > would have been translate AST in datadrawish-C (py2exe?) or verilog > > (MyHDL?). > > > > Unfortunately, by looking at the examples and explanation you > > provided, it feel to me that grail (or 42) would definitely be > > strongly typed and mapping a weakly typed language such as Python or > > Ruby to a strongly one probably means overhead which doesn't seem > > acceptable for one of your objective. (I am not even sure of this > > statement but anyway...) > > Yes, I agree. To compile into both hardware and software, and to get > DataDraw-backed-C speed, I believe strong typing is required. Another > possibility is to try and be similar to C, the way C++, Java, C# and > others do. However, C was conceived in 1969, when constraints on > programming were far different. One thing I like about Python (I'm no > expert in Python, BTW), is how it drops ties to old languages. I would > just hate to burden future programmers with the age-old fight of where > to put the braces, or to wear out their pinkies typing semicolons. > > Probably a fairer description of 42 (the preference for 42 over Grail > was almost unanimous) would be to say it's motivated by DataDraw-backed > C, with some influence from Python. > > I don't see how dynamic typing can be supported, so direct compatibility > with Python seems impossible. However, optional types are a feature > I've enjoyed supporting before. For example: > > def min(a, b) > return a <= b? a : b > > This function can be instantiated for each different set of types passed > to it, so the language is still strongly typed. I hate introducing the > 'def' keyword, but I need a way to distinguish a function call from a > function definition without a declared return value. Can you think of > anything better? > > > In bold I highlight the features I found... > > > > # based comments. > > line continuation with \ ? > > Yes, \ line continuation. I also am thinking to use { ... } as block > comments, which can be recursively embedded, so the following would be > legal, and entirely commented out: > > {# In single-line comments, braces like { are ignored > {This block comment is inside another one}} > > > as few noisy character (!@$%^&) as possible. > > > > Less typing then C equivalent. Working with datadraw usually means a > > lot of typing. Although I saw no good example of powerful expression > > such as default parameters, named parameters, list comprehension. > > What are the plans? How powerful would it be? > > Yes, DataDraw based applications require tons of typing. Only good > typists could possibly like DataDraw. That's one strong motivation for > 42. > > I'm not a big fan of default parameters, though I could probably be > convinced of their use. I'm also not a fan of overloading functions, > but I do want to support user-defined operators. Named parameters will > be supported. Do you think native lists are important? I find in > DataDraw backed applications I have less need for lists. > > One thing I haven't mentioned was Pascal style variable parameters. I > was thinking of using * or & to declare that a parameter is passed by > reference rather than value. For example: > > def swap(int &a, int &b) > int temp = a; > a = b > b = temp > > That gets rid of the need to return multiple values, so I don't think I > need tupples. > > > How to create and delete object? > > I was thinking Classes would have some default methods: alloc, free, and > destroy. These will be virtually the same as in DataDraw. Maybe there > should be some special syntax for calling them? For example: > > Node node = Node() > > This looks a bit like Python. Since this kind of thing can be added in > libraries rather than the core language, I'm not to worried about this > level of detail yet. > > I'm working on a BNF interpreter in 42 that will allow users to define > new syntax in libraries rather than modifying the compiler. It also > could be useful for parsing input files without requiring lex/yacc or > ANTLR. > > > How to bind a library? > > C functions that don't take or return pointers could be bound with a > simple declaration. Printf could be bound that way. However, even > simple functions like fopen will need wrappers. These wrappers could be > written in DataDraw backed C, which would be compatible with the output > of 42. I was thinking it would be easier, but I was wrong. Can it be > done better? > > > templates and C preprocessor is replaced by module parameters, that is > > cool plus. > > > > iterators are limited compared to their high level language > > counterparts and are meant for relationship usage only. (Do we need > > more?) Nice way to implement forloops though. > > Here's some thoughts about code reuse. Previous languages I'm familiar > with put the author of code in charge of how his code can be reused. > Code can only be reused exactly as the author predicted and expressly > supports. In practice, I've found that users often figure out new uses > for code that the author never predicted. Previous languages assume > code needs to be reusable even when the source is not available. This > has not worked out well in practice - I never use libraries when I can't > get source, and I know few companies that do. I currently expect 42 to > only compile programs from source, not object code. This allows global > optimizations not possible otherwise, like eliminating function pointers > and substituting switch statements. > > In 42, a major goal is to make all code reusable at a source level, > without burdening the author with templatizing his code. I assume you > have access to the source, and if required, you can add module > parameters and even submit a patch to the author. A major goal of many > object-oriented languages is to hide implementations and protect code > from abuse by users. They're set up as if the authors of code are > omniscient, while users are dorks. This makes code much harder to > reuse, since authors simply can't predict the future for their work, and > who has either the time or desire to muck-up their code by turning it > into templates? > > Iterators in 42 are more limited than in other languages. I don't know > how to implement higher-level language iterators without a performance > hit, and I run across few examples where I need higher-level iterator > objects. > > The example I often use is depth-first recursion in graphs. If I want > to visit nodes in depth-first order, how do I write the iterator? In C, > we wind up using callback functions, and in C++, we sometimes use a > slower implementation, but one that works as we would like. In 42, I > think it would look something like: > > iterator depthFirstOutToIn(Graph graph, Node node) > foreach graphNode(graph, node) > node.visited = false > def visitNode(Node node) > node.visited = true > yield > Edge edge > foreach nodeInEdge(node, edge) > Node nextNode = edge.fromNode > if !nextNode.visited > visitNode(nextNode) > foreach graphNode(graph, node) > if !node.visited > visitNode(node) > > Note that functions can be declared inside functions, and that yield can > be called in a subroutine. This sort of code can be translated to be > exactly what I would type if I were to write a custom depth-first > traversal. > > > The relationship in between the iterator definition and its usage is > > totally counter intuitive unless you have a deep datadraw background: > > safe foreach parent clabel child on one side and > > iterator safeParentClabelChild(Parent parent, Child child) on the > > other > > > > What about something such as: > > safe iterator clabel(Parent parent, Child child) > > This is an error in my examples. I changed foreach loops after reading > up on Python iterators again, but forgot to change the foreach loops in > the examples. To iterate through a linked list, the foreach loop to > find the previous child in linkedList.gr should look like: > > foreach parentClabelChild(parent, nextChild) > if child == nextChild > break > prevChild = nextChild > > By the way, I'm not a big fan of "break" and "continue" statements, but > it seems they can be syntactic sugar rather than part of the core > language. The break statement could go away if we transform the code > like this: > > def loopFunction() > foreach parentClabelChild(parent, nextChild) > if child == nextChild > return > prevChild = nextChild > loopFunction() > > > module inheritance is implemented with the reuse keyword. > > > > Is naming the reused module necessary? what happen if you reuse the > > same module twice? (you cannot unless supporting virtual inheritance) > > Naming the reused module is optional, but it is useful in some cases. > In particular, you can reuse a module multiple times. Each instance of > a linked list in your module is simply another reuse of the LinkedList > module. Relationships are syntactic sugar for reuse statements where > only two classes are remapped. By naming reuse instances, it is > possible for other modules to reuse your code while changing how you > reused other modules. > > A good example is the graph module. By default, Nodes have doubly > linked lists of Edges. However, in the user's database, the > relationship is very likely something else, like a plain linked list, a > hash table, or whatever. You need a way to say "reuse the Graph module, > but use hash tables for the Node Edge relationships". Relationships > always translate into reuse statements with names. The name is just > like DataDraw's default relationship name: parentClabelChild. > > > what about class inheritance? just no example I guess. > > I'm unconvinced (but very likely convincable) of it's value, given > module reuse already exists. However, there will be something similar > to virtual functions. I recently wrote a file stream class with several > streams: two different types of encryption, and two different types of > compression, and file input/output streams. It was one of the strongest > cases I've seen where I wanted to define an abstract base class stream > and inherit from it for each stream type. > > I'll write up a reusable file stream example in 42, and see how that > goes. I'll likely talk myself into some sort of single-class based > inheritance. > > > The module keyword have several meaning: relationship implementation > > (huge plus) and HDL module. Not sure it is good to have two concepts > > used the same way. > > This is an error in my hardware examples. Originally, I was working > with a friend on a language we were calling SynC, which was very C-ish, > but also compilable into hardware. The code looked a lot like C, but > borrowed keywords from Verilog. It's a good idea, I think, possibly > wiser than this concept for 42. However, I just couldn't stomach > forcing all that ancient C baggage on users of the new language. > > For now, I'm using the keyword 'circuit' instead of 'module' for > hardware modules. > > > module implementing a relationship have a long list of parameters, > > what will happen when you/we will add a new one? > > Just do it in your own copy of the code, and submit a patch to the > author. This is one way that open source kicks butt on closed-source: > users get to contribute. All reusable 42 source code must be available > to the user, or he wont be able to reuse it. > > > array char should be sym for plabel and clabel in my understanding. > > Sym does read nicer than array char. I'm not a fan of DataDraw's array > keyword, either. I was being lazy when I wrote it. However, syms are > objects in a global hash table, while clabel and plabel in linkedList.gr > are just strings used as template parameters. I don't see a need for > them to be in the symbol hash table. How about "string plabel" and > "string clabel"? It's simple to have string mean array char. > > > I saw function calls (assert for instance) without parenthesis. This > > is good if it ever be possible to use the language as an environment > > CLI of an application. (lisp for autocad, skill for cadence, tcl for > > stabiesoft, grail for mynextsoftware?) > > I wrote a long response to this, trying to figure out how the syntax > could allow me to distinguish between a variable declaration and a > function call. Then I remembered that 42 is a compiled language, and > unsuitable as an embedded tool control language, like TCL. I think for > 42, there's not much benefit for the command-line-like syntax. > > > A class can be extended anywhere in the file (only the file?) by using > > a relationship of by defining a new method named Classname.Methodname. > > Would it be possible to declare a method underneath a class? (easy > > one) > > Yes, let's make it ok to define methods within the class declaration, > and like C++, let's assume they are inline functions. > > > Compiling a circuit in C will involve a event engine. Something that > > can become very useful for general purpose programming too. We need > > to elaborate on that. Event engine could support multi-threading in a > > elegant matter. (I know what you think of multi-threading and I share > > most of your opinion.) Event engines often use > > signal/callback/connect mechanism. Something which is compatible with > > HDL. The strongest way to express a good callback is by using > > closures or anonymous function. A concept that could aply to HDL too. > > (Yes closure without dynamic types.) > > I'm impressed. I didn't think anyone would realize from the examples > that the runtime for 42 code includes an event engine, but yes, it is > required. In hardware, signals compile into wires, but in software, you > need to process events. > > As you suggest, I would like to use this for > multi-threading/parallel-programming in 42. This needs a lot of work in > 42, and any help would be great. The part I have so far is signals. I > also have example code with the dreaded C "struct" declaration. It > would be used for defining conglomerate types that live in variables and > get passed by value, rather than being objects that we create and > destroy manually. > > I'm very interested in your concept of closures without dynamic types. > How can this be done? > > > Actually all the namespace management is very blurry for me up to now. > > That something important to define. What are the plans? > > Namespaces will be hierarchical, and the '.' operator is used to select > them. I don't know if there needs to be hierarchy for module name > spaces or not. For now, I'm assuming that all modules names are global, > and that identifiers within modules can be accessed with the dot > operator. If you import a module, it's identifiers become a fallback > namespaces. If you can't find a local or global identifier definition, > then imported modules are checked in the order they are imported. If > there is a name collision between modules, you can resolve it with the > module names and dot operator. I'm also thinking that there might be a > less inclusive way to import a module, perhaps something like "use > module", where identifiers are not made accessible unless they are > declared "extern", or something like that. > > > In classical datadraw, the ':' operator can be used to name or to > > indicate from which module the symbol is coming from. (A feature that > > I hate.) Now I saw the '=>' operator. What are the plans? > > How about using the dot operator instead? So we could say > > db.Inst inst > > rather than > > Inst:db inst > > The "=>" syntax may need to be changed. Basically, in a reuse > statement, you can map any class in your database to any class in the > reused module. You can also overide that module's reuse declarations > (if they're named), override any function by defining them locally, and > overide class members simply by redefining them. What syntax would you > suggest for class mappings in a reuse statement? > > > Is it me or you inverted name and type with the ':' operator as > > opposed to datadraw? What about compatibility mentioned? > > I did invert them, and compatibility with DataDraw wont be perfect. I'd > like to get 42 right, and then if people want, DataDraw could be > modified to be compatible with 42 rather than the other way around. > There are still few enough datadraw databases in use that I don't think > it'd cause too much effort to make minor changes. > > > Embedded documentation string as of Python. > > > > > > That is all for now. > > Thanks for the feedback! I'm currently working on the BNF interpreter > build into 42, so I'm able to make progress even while the language is > in flux, and hopefully much of the stuff that changes will be in > libraries anyway, rather than in the core. > > Best regards, and happy holidays. > Bill > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Datadraw-user mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/datadraw-user > |
|
From: Bill C. <bi...@bi...> - 2008-12-27 17:09:46
|
I've worked a bit on the common class inheritance. Here's a file stream example. In it's implementation, there wont actually be virtual function pointers - the l42 compiler will resolve them and replace function pointer calls with switch statements to the actual functions. I don't know how to compile virtual functions into hardware, so this seems to be required. Function pointers in general will have to be resolved by the compiler, rather than actually doing indirect branches. DataDraw objects are organized differently in memory than standard C/C++ objects. Because of this, you can't simply cast a pointer to a derived class to a pointer to the base class. Instead, there are actually objects of both classes created, with cross-pointers under the hood. The derived class will inherit base class functions, and calling the base-class destructor will destroy the derived class object as well. The base class will have a type field that says what kind of object it is, so there's no unsafe type casting. Regards, Bill |
|
From: Bill C. <bi...@bi...> - 2008-12-27 14:41:25
|
No problem. I've set it not to hide the from-address. If anyone is worried about privacy, let me know (direct-email: bi...@bi...). I can always set it back to the way it was before. Regards, Bill On Sat, 2008-12-27 at 13:43 +0000, dat...@li... wrote: > On Sat, 2008-12-27 at 05:44 -0500, dat...@li... > wrote: > > Hi, Richard. So far as I know, only Richard knows this much about > > programming languages, and also knows DataDraw well, so it must be you! > > For the rest of us, would it be possible to fix the mailing list > settings so it doesn't munge the "From" header to say > dat...@li... > > Since the Reply-To is set correctly, this shouldn't cause a problem. > > Best wishes, > |
|
From: <dat...@li...> - 2008-12-27 13:43:20
|
On Sat, 2008-12-27 at 05:44 -0500, dat...@li... wrote: > Hi, Richard. So far as I know, only Richard knows this much about > programming languages, and also knows DataDraw well, so it must be you! For the rest of us, would it be possible to fix the mailing list settings so it doesn't munge the "From" header to say dat...@li... Since the Reply-To is set correctly, this shouldn't cause a problem. Best wishes, -- Peter Clifton Electrical Engineering Division, Engineering Department, University of Cambridge, 9, JJ Thomson Avenue, Cambridge CB3 0FA Tel: +44 (0)7729 980173 - (No signal in the lab!) |
|
From: <dat...@li...> - 2008-12-27 10:44:34
|
Hi, Richard. So far as I know, only Richard knows this much about
programming languages, and also knows DataDraw well, so it must be you!
Thanks for the insightful feedback. I've embedded discussion below.
On Fri, 2008-12-26 at 20:37 -0500, dat...@li...
wrote:
> Hello Bill,
>
> I am always in for this kind of discussion. ;-) Except me to do
> devil's advocate though before becoming proud defender.
Thank you! Nice to have a strong programmer to bounce these ideas off
of.
> I understand you want reuse and compatibility with datadraw.
> Important choice here I think. It is sort of a way to reuse and
> extend an existing language. My natural choice would have been to
> start from a language with a larger user-base such as Python and
> compile it to C using datadraw advantage. In the case of Python, it
> would have been translate AST in datadrawish-C (py2exe?) or verilog
> (MyHDL?).
>
> Unfortunately, by looking at the examples and explanation you
> provided, it feel to me that grail (or 42) would definitely be
> strongly typed and mapping a weakly typed language such as Python or
> Ruby to a strongly one probably means overhead which doesn't seem
> acceptable for one of your objective. (I am not even sure of this
> statement but anyway...)
Yes, I agree. To compile into both hardware and software, and to get
DataDraw-backed-C speed, I believe strong typing is required. Another
possibility is to try and be similar to C, the way C++, Java, C# and
others do. However, C was conceived in 1969, when constraints on
programming were far different. One thing I like about Python (I'm no
expert in Python, BTW), is how it drops ties to old languages. I would
just hate to burden future programmers with the age-old fight of where
to put the braces, or to wear out their pinkies typing semicolons.
Probably a fairer description of 42 (the preference for 42 over Grail
was almost unanimous) would be to say it's motivated by DataDraw-backed
C, with some influence from Python.
I don't see how dynamic typing can be supported, so direct compatibility
with Python seems impossible. However, optional types are a feature
I've enjoyed supporting before. For example:
def min(a, b)
return a <= b? a : b
This function can be instantiated for each different set of types passed
to it, so the language is still strongly typed. I hate introducing the
'def' keyword, but I need a way to distinguish a function call from a
function definition without a declared return value. Can you think of
anything better?
> In bold I highlight the features I found...
>
> # based comments.
> line continuation with \ ?
Yes, \ line continuation. I also am thinking to use { ... } as block
comments, which can be recursively embedded, so the following would be
legal, and entirely commented out:
{# In single-line comments, braces like { are ignored
{This block comment is inside another one}}
> as few noisy character (!@$%^&) as possible.
>
> Less typing then C equivalent. Working with datadraw usually means a
> lot of typing. Although I saw no good example of powerful expression
> such as default parameters, named parameters, list comprehension.
> What are the plans? How powerful would it be?
Yes, DataDraw based applications require tons of typing. Only good
typists could possibly like DataDraw. That's one strong motivation for
42.
I'm not a big fan of default parameters, though I could probably be
convinced of their use. I'm also not a fan of overloading functions,
but I do want to support user-defined operators. Named parameters will
be supported. Do you think native lists are important? I find in
DataDraw backed applications I have less need for lists.
One thing I haven't mentioned was Pascal style variable parameters. I
was thinking of using * or & to declare that a parameter is passed by
reference rather than value. For example:
def swap(int &a, int &b)
int temp = a;
a = b
b = temp
That gets rid of the need to return multiple values, so I don't think I
need tupples.
> How to create and delete object?
I was thinking Classes would have some default methods: alloc, free, and
destroy. These will be virtually the same as in DataDraw. Maybe there
should be some special syntax for calling them? For example:
Node node = Node()
This looks a bit like Python. Since this kind of thing can be added in
libraries rather than the core language, I'm not to worried about this
level of detail yet.
I'm working on a BNF interpreter in 42 that will allow users to define
new syntax in libraries rather than modifying the compiler. It also
could be useful for parsing input files without requiring lex/yacc or
ANTLR.
> How to bind a library?
C functions that don't take or return pointers could be bound with a
simple declaration. Printf could be bound that way. However, even
simple functions like fopen will need wrappers. These wrappers could be
written in DataDraw backed C, which would be compatible with the output
of 42. I was thinking it would be easier, but I was wrong. Can it be
done better?
> templates and C preprocessor is replaced by module parameters, that is
> cool plus.
>
> iterators are limited compared to their high level language
> counterparts and are meant for relationship usage only. (Do we need
> more?) Nice way to implement forloops though.
Here's some thoughts about code reuse. Previous languages I'm familiar
with put the author of code in charge of how his code can be reused.
Code can only be reused exactly as the author predicted and expressly
supports. In practice, I've found that users often figure out new uses
for code that the author never predicted. Previous languages assume
code needs to be reusable even when the source is not available. This
has not worked out well in practice - I never use libraries when I can't
get source, and I know few companies that do. I currently expect 42 to
only compile programs from source, not object code. This allows global
optimizations not possible otherwise, like eliminating function pointers
and substituting switch statements.
In 42, a major goal is to make all code reusable at a source level,
without burdening the author with templatizing his code. I assume you
have access to the source, and if required, you can add module
parameters and even submit a patch to the author. A major goal of many
object-oriented languages is to hide implementations and protect code
from abuse by users. They're set up as if the authors of code are
omniscient, while users are dorks. This makes code much harder to
reuse, since authors simply can't predict the future for their work, and
who has either the time or desire to muck-up their code by turning it
into templates?
Iterators in 42 are more limited than in other languages. I don't know
how to implement higher-level language iterators without a performance
hit, and I run across few examples where I need higher-level iterator
objects.
The example I often use is depth-first recursion in graphs. If I want
to visit nodes in depth-first order, how do I write the iterator? In C,
we wind up using callback functions, and in C++, we sometimes use a
slower implementation, but one that works as we would like. In 42, I
think it would look something like:
iterator depthFirstOutToIn(Graph graph, Node node)
foreach graphNode(graph, node)
node.visited = false
def visitNode(Node node)
node.visited = true
yield
Edge edge
foreach nodeInEdge(node, edge)
Node nextNode = edge.fromNode
if !nextNode.visited
visitNode(nextNode)
foreach graphNode(graph, node)
if !node.visited
visitNode(node)
Note that functions can be declared inside functions, and that yield can
be called in a subroutine. This sort of code can be translated to be
exactly what I would type if I were to write a custom depth-first
traversal.
> The relationship in between the iterator definition and its usage is
> totally counter intuitive unless you have a deep datadraw background:
> safe foreach parent clabel child on one side and
> iterator safeParentClabelChild(Parent parent, Child child) on the
> other
>
> What about something such as:
> safe iterator clabel(Parent parent, Child child)
This is an error in my examples. I changed foreach loops after reading
up on Python iterators again, but forgot to change the foreach loops in
the examples. To iterate through a linked list, the foreach loop to
find the previous child in linkedList.gr should look like:
foreach parentClabelChild(parent, nextChild)
if child == nextChild
break
prevChild = nextChild
By the way, I'm not a big fan of "break" and "continue" statements, but
it seems they can be syntactic sugar rather than part of the core
language. The break statement could go away if we transform the code
like this:
def loopFunction()
foreach parentClabelChild(parent, nextChild)
if child == nextChild
return
prevChild = nextChild
loopFunction()
> module inheritance is implemented with the reuse keyword.
>
> Is naming the reused module necessary? what happen if you reuse the
> same module twice? (you cannot unless supporting virtual inheritance)
Naming the reused module is optional, but it is useful in some cases.
In particular, you can reuse a module multiple times. Each instance of
a linked list in your module is simply another reuse of the LinkedList
module. Relationships are syntactic sugar for reuse statements where
only two classes are remapped. By naming reuse instances, it is
possible for other modules to reuse your code while changing how you
reused other modules.
A good example is the graph module. By default, Nodes have doubly
linked lists of Edges. However, in the user's database, the
relationship is very likely something else, like a plain linked list, a
hash table, or whatever. You need a way to say "reuse the Graph module,
but use hash tables for the Node Edge relationships". Relationships
always translate into reuse statements with names. The name is just
like DataDraw's default relationship name: parentClabelChild.
> what about class inheritance? just no example I guess.
I'm unconvinced (but very likely convincable) of it's value, given
module reuse already exists. However, there will be something similar
to virtual functions. I recently wrote a file stream class with several
streams: two different types of encryption, and two different types of
compression, and file input/output streams. It was one of the strongest
cases I've seen where I wanted to define an abstract base class stream
and inherit from it for each stream type.
I'll write up a reusable file stream example in 42, and see how that
goes. I'll likely talk myself into some sort of single-class based
inheritance.
> The module keyword have several meaning: relationship implementation
> (huge plus) and HDL module. Not sure it is good to have two concepts
> used the same way.
This is an error in my hardware examples. Originally, I was working
with a friend on a language we were calling SynC, which was very C-ish,
but also compilable into hardware. The code looked a lot like C, but
borrowed keywords from Verilog. It's a good idea, I think, possibly
wiser than this concept for 42. However, I just couldn't stomach
forcing all that ancient C baggage on users of the new language.
For now, I'm using the keyword 'circuit' instead of 'module' for
hardware modules.
> module implementing a relationship have a long list of parameters,
> what will happen when you/we will add a new one?
Just do it in your own copy of the code, and submit a patch to the
author. This is one way that open source kicks butt on closed-source:
users get to contribute. All reusable 42 source code must be available
to the user, or he wont be able to reuse it.
> array char should be sym for plabel and clabel in my understanding.
Sym does read nicer than array char. I'm not a fan of DataDraw's array
keyword, either. I was being lazy when I wrote it. However, syms are
objects in a global hash table, while clabel and plabel in linkedList.gr
are just strings used as template parameters. I don't see a need for
them to be in the symbol hash table. How about "string plabel" and
"string clabel"? It's simple to have string mean array char.
> I saw function calls (assert for instance) without parenthesis. This
> is good if it ever be possible to use the language as an environment
> CLI of an application. (lisp for autocad, skill for cadence, tcl for
> stabiesoft, grail for mynextsoftware?)
I wrote a long response to this, trying to figure out how the syntax
could allow me to distinguish between a variable declaration and a
function call. Then I remembered that 42 is a compiled language, and
unsuitable as an embedded tool control language, like TCL. I think for
42, there's not much benefit for the command-line-like syntax.
> A class can be extended anywhere in the file (only the file?) by using
> a relationship of by defining a new method named Classname.Methodname.
> Would it be possible to declare a method underneath a class? (easy
> one)
Yes, let's make it ok to define methods within the class declaration,
and like C++, let's assume they are inline functions.
> Compiling a circuit in C will involve a event engine. Something that
> can become very useful for general purpose programming too. We need
> to elaborate on that. Event engine could support multi-threading in a
> elegant matter. (I know what you think of multi-threading and I share
> most of your opinion.) Event engines often use
> signal/callback/connect mechanism. Something which is compatible with
> HDL. The strongest way to express a good callback is by using
> closures or anonymous function. A concept that could aply to HDL too.
> (Yes closure without dynamic types.)
I'm impressed. I didn't think anyone would realize from the examples
that the runtime for 42 code includes an event engine, but yes, it is
required. In hardware, signals compile into wires, but in software, you
need to process events.
As you suggest, I would like to use this for
multi-threading/parallel-programming in 42. This needs a lot of work in
42, and any help would be great. The part I have so far is signals. I
also have example code with the dreaded C "struct" declaration. It
would be used for defining conglomerate types that live in variables and
get passed by value, rather than being objects that we create and
destroy manually.
I'm very interested in your concept of closures without dynamic types.
How can this be done?
> Actually all the namespace management is very blurry for me up to now.
> That something important to define. What are the plans?
Namespaces will be hierarchical, and the '.' operator is used to select
them. I don't know if there needs to be hierarchy for module name
spaces or not. For now, I'm assuming that all modules names are global,
and that identifiers within modules can be accessed with the dot
operator. If you import a module, it's identifiers become a fallback
namespaces. If you can't find a local or global identifier definition,
then imported modules are checked in the order they are imported. If
there is a name collision between modules, you can resolve it with the
module names and dot operator. I'm also thinking that there might be a
less inclusive way to import a module, perhaps something like "use
module", where identifiers are not made accessible unless they are
declared "extern", or something like that.
> In classical datadraw, the ':' operator can be used to name or to
> indicate from which module the symbol is coming from. (A feature that
> I hate.) Now I saw the '=>' operator. What are the plans?
How about using the dot operator instead? So we could say
db.Inst inst
rather than
Inst:db inst
The "=>" syntax may need to be changed. Basically, in a reuse
statement, you can map any class in your database to any class in the
reused module. You can also overide that module's reuse declarations
(if they're named), override any function by defining them locally, and
overide class members simply by redefining them. What syntax would you
suggest for class mappings in a reuse statement?
> Is it me or you inverted name and type with the ':' operator as
> opposed to datadraw? What about compatibility mentioned?
I did invert them, and compatibility with DataDraw wont be perfect. I'd
like to get 42 right, and then if people want, DataDraw could be
modified to be compatible with 42 rather than the other way around.
There are still few enough datadraw databases in use that I don't think
it'd cause too much effort to make minor changes.
> Embedded documentation string as of Python.
>
>
> That is all for now.
Thanks for the feedback! I'm currently working on the BNF interpreter
build into 42, so I'm able to make progress even while the language is
in flux, and hopefully much of the stuff that changes will be in
libraries anyway, rather than in the core.
Best regards, and happy holidays.
Bill
|
|
From: <dat...@li...> - 2008-12-27 01:52:56
|
Hello Bill, I am always in for this kind of discussion. ;-) Except me to do devil's advocate though before becoming proud defender. I understand you want reuse and compatibility with datadraw. Important choice here I think. It is sort of a way to reuse and extend an existing language. My natural choice would have been to start from a language with a larger user-base such as Python and compile it to C using datadraw advantage. In the case of Python, it would have been translate AST in datadrawish-C (py2exe?) or verilog (MyHDL?). Unfortunately, by looking at the examples and explanation you provided, it feel to me that grail (or 42) would definitely be *strongly typed* and mapping a weakly typed language such as Python or Ruby to a strongly one probably means overhead which doesn't seem acceptable for one of your objective. (I am not even sure of this statement but anyway...) In bold I highlight the features I found... *# based comments*. line continuation with \ ? *as few noisy character (!@$%^&) as possible.* *Less typing* then C equivalent. Working with datadraw usually means a lot of typing. Although I saw no good example of powerful expression such as default parameters, named parameters, list comprehension. What are the plans? How powerful would it be? How to create and delete object? How to bind a library? templates and C preprocessor is replaced by *module parameters*, that is cool plus. iterators are limited compared to their high level language counterparts and are meant for relationship usage only. (Do we need more?) Nice way to implement forloops though. The relationship in between the iterator definition and its usage is totally counter intuitive unless you have a deep datadraw background: safe foreach parent clabel child on one side and iterator safeParentClabelChild(Parent parent, Child child) on the other What about something such as: safe iterator clabel(Parent parent, Child child) *module inheritance* is implemented with the reuse keyword. Is naming the reused module necessary? what happen if you reuse the same module twice? (you cannot unless supporting virtual inheritance) what about class inheritance? just no example I guess. The module keyword have several meaning: relationship implementation (huge plus) and HDL module. Not sure it is good to have two concepts used the same way. module implementing a relationship have a long list of parameters, what will happen when you/we will add a new one? array char should be sym for plabel and clabel in my understanding. I saw *function calls* (assert for instance) *without parenthesis*. This is good if it ever be possible to use the language as an environment CLI of an application. (lisp for autocad, skill for cadence, tcl for stabiesoft, grail for mynextsoftware?) A class can be *extended anywhere* in the file (only the file?) by using a relationship of by defining a new method named Classname.Methodname. Would it be possible to declare a method underneath a class? (easy one) Compiling a circuit in C will involve a *event engine*. Something that can become very useful for general purpose programming too. We need to elaborate on that. Event engine could support multi-threading in a elegant matter. (I know what you think of multi-threading and I share most of your opinion.) Event engines often use signal/callback/connect mechanism. Something which is compatible with HDL. The strongest way to express a good callback is by using closures or anonymous function. A concept that could aply to HDL too. (Yes closure without dynamic types.) Actually all the *namespace* management is very blurry for me up to now. That something important to define. What are the plans? In classical datadraw, the ':' operator can be used to name or to indicate from which module the symbol is coming from. (A feature that I hate.) Now I saw the '=>' operator. What are the plans? Is it me or you inverted name and type with the ':' operator as opposed to datadraw? What about compatibility mentioned? *Embedded documentation string* as of Python. That is all for now. On Wed, Dec 24, 2008 at 9:36 AM, <dat...@li...>wrote: > Hi, all. > > It's the holiday season, and once again work has backed off long enough > for my head to become full of all the cool stuff I wish I had time to > do. This time, it's yet-another-computer-language... again. > > Grail will have these three major goals: > > - Foster extreme code reuse > - Run much faster than C > - Compile to both software and hardware > > Grail will be motivated by Python, Verilog, and of course DataDraw. The > name is from the movie "Monty Python and the Holly Grail." > > If anyone is interested in a discussion of Grail features, motivations, > etc, and has time to kill over the holidays, feel free to join in here. > > Some cool stuff I'm envisioning in Grail include: > > - DataDraw-like data structures, with the speed that goes with it > - Powerful "reuse" capability - I'll attached examples > - Extensible syntax - Grail will parse itself and anything else > - Verilog-like structural hardware/software description capability > > Most of Grail will be written in Grail, including most of the stuff > DataDraw generates today. Syntax for relationships, for example, will > be in a library, not in the language. If you still want to use DataDraw > generated data structures with C, there will be virtually no changes - > DataDraw database files are legal Grail code, and will still compile to > database.c and database.h files like before. However, all that code > DataDraw creates with code generators today will instead just be in much > simpler and more extensible and maintainable Grail libraries. > > I often get excited about my projects. However, this time the pieces > seem to be falling in place in an elegant powerful way. Check out the > examples to see how well the "reuse" construct works, and compare this > to C++ templates, which can't even embed the next pointer of a linked > list in the child objects. Compare the beginning of a reusable graph > module to the C++ Boost Graph Library. I'd say the Grail code will be > an order of magnitude simpler and clearer. I've also attached a circuit > description for a FIFO, though some syntax issues still have to be > resolved. > > Anyone up for some holiday brainstorming on Grail? > > Happy holidays, > Bill > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Datadraw-user mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/datadraw-user > > |
|
From: <dat...@li...> - 2008-12-24 21:13:47
|
My friend Peter Schafer thinks that a better name than Grail would be 42. It's the holidays, so why not focus on the stupid stuff like the name? So, any votes? 42? Grail? Something else? Bill |
|
From: <dat...@li...> - 2008-12-24 14:58:46
|
Hi, all. It's the holiday season, and once again work has backed off long enough for my head to become full of all the cool stuff I wish I had time to do. This time, it's yet-another-computer-language... again. Grail will have these three major goals: - Foster extreme code reuse - Run much faster than C - Compile to both software and hardware Grail will be motivated by Python, Verilog, and of course DataDraw. The name is from the movie "Monty Python and the Holly Grail." If anyone is interested in a discussion of Grail features, motivations, etc, and has time to kill over the holidays, feel free to join in here. Some cool stuff I'm envisioning in Grail include: - DataDraw-like data structures, with the speed that goes with it - Powerful "reuse" capability - I'll attached examples - Extensible syntax - Grail will parse itself and anything else - Verilog-like structural hardware/software description capability Most of Grail will be written in Grail, including most of the stuff DataDraw generates today. Syntax for relationships, for example, will be in a library, not in the language. If you still want to use DataDraw generated data structures with C, there will be virtually no changes - DataDraw database files are legal Grail code, and will still compile to database.c and database.h files like before. However, all that code DataDraw creates with code generators today will instead just be in much simpler and more extensible and maintainable Grail libraries. I often get excited about my projects. However, this time the pieces seem to be falling in place in an elegant powerful way. Check out the examples to see how well the "reuse" construct works, and compare this to C++ templates, which can't even embed the next pointer of a linked list in the child objects. Compare the beginning of a reusable graph module to the C++ Boost Graph Library. I'd say the Grail code will be an order of magnitude simpler and clearer. I've also attached a circuit description for a FIFO, though some syntax issues still have to be resolved. Anyone up for some holiday brainstorming on Grail? Happy holidays, Bill |
|
From: <dat...@li...> - 2008-12-24 14:49:44
|
Hi, all. It's the holiday season, and once again work has backed off long enough for my head to become full of all the cool stuff I wish I had time to do. This time, it's yet-another-computer-language... again. Grail will have these three major goals: - Foster extreme code reuse - Run much faster than C - Compile to both software and hardware Grail will be motivated by Python, Verilog, and of course DataDraw. The name is from the movie "Monty Python and the Holly Grail." If anyone is interested in a discussion of Grail features, motivations, etc, and has time to kill over the holidays, feel free to join in here. Some cool stuff I'm envisioning in Grail include: - DataDraw-like data structures, with the speed that goes with it - Powerful "reuse" capability - I'll attached examples - Extensible syntax - Grail will parse itself and anything else - Verilog-like structural hardware/software description capability Most of Grail will be written in Grail, including most of the stuff DataDraw generates today. Syntax for relationships, for example, will be in a library, not in the language. If you still want to use DataDraw generated data structures with C, there will be virtually no changes - DataDraw database files are legal Grail code, and will still compile to database.c and database.h files like before. However, all that code DataDraw creates with code generators today will instead just be in much simpler and more extensible and maintainable Grail libraries. I often get excited about my projects. However, this time the pieces seem to be falling in place in an elegant powerful way. Check out the examples to see how well the "reuse" construct works, and compare this to C++ templates, which can't even embed the next pointer of a linked list in the child objects. Compare the beginning of a reusable graph module to the C++ Boost Graph Library. I'd say the Grail code will be an order of magnitude simpler and clearer. I've also attached a circuit description for a FIFO, though some syntax issues still have to be resolved. Anyone up for some holiday brainstorming on Grail? Happy holidays, Bill |
|
From: <dat...@li...> - 2008-08-08 12:34:51
|
Thanks Bill. On Thu, Aug 7, 2008 at 2:59 PM, <dat...@li...> wrote: > Hi, Richard. > > I've signed in a fix that allows your example to compile. I haven't > tested it in executing code, but the generated code looks right. > > Regards, > Bill > > On Thu, 2008-08-07 at 10:22 -0400, dat...@li... > wrote: > > Hello! > > > > I am wondering what I am doing wrong here. Is what I am doing legal? > > See example in attchmnt to be clear. > > > > Basically, I import a class from a foreign module and I want to create > > a hash to it using as key one of its attribute. My way doesn't look > > to be the way of doing such thing. > > > > How should I do it? > > > > Thanks > > > > Richard > > > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > > Build the coolest Linux based applications with Moblin SDK & win great > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the > world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ Datadraw-user mailing > list Dat...@li... > https://lists.sourceforge.net/lists/listinfo/datadraw-user > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Datadraw-user mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/datadraw-user > -- +1 514-518-8172 |
|
From: <dat...@li...> - 2008-08-07 18:59:42
|
Hi, Richard. I've signed in a fix that allows your example to compile. I haven't tested it in executing code, but the generated code looks right. Regards, Bill On Thu, 2008-08-07 at 10:22 -0400, dat...@li... wrote: > Hello! > > I am wondering what I am doing wrong here. Is what I am doing legal? > See example in attchmnt to be clear. > > Basically, I import a class from a foreign module and I want to create > a hash to it using as key one of its attribute. My way doesn't look > to be the way of doing such thing. > > How should I do it? > > Thanks > > Richard > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ Datadraw-user mailing list Dat...@li... https://lists.sourceforge.net/lists/listinfo/datadraw-user |
|
From: <dat...@li...> - 2008-07-09 14:12:49
|
Thanks for finding this. I've updated the manual. For starting a project that has persistence enabled, I recommend 2 things: - Start with the examples/graph example - all you really need is the main function - Keep my Skype address handy (xyzzy_bill). There are going to be bugs for a while, but I'll try to address them promptly. Regards, Bill On Wed, 2008-07-09 at 13:21 +0200, dat...@li... wrote: > In the docs is written: > 20.12: bool utStartPersistence(char*) > > > but the signature is: > > char *directory, > bool useTextDatabaseFormat, > bool keepBackup > > > Greetings > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ Datadraw-user mailing list Dat...@li... https://lists.sourceforge.net/lists/listinfo/datadraw-user |
|
From: <dat...@li...> - 2008-07-09 11:21:27
|
In the docs is written:
20.12: bool utStartPersistence(char*)
but the signature is:
char *directory,
bool useTextDatabaseFormat,
bool keepBackup
Greetings
|
|
From: <dat...@li...> - 2008-07-08 13:16:13
|
My mistake... the manual is out of date. The syntax should read:
module Union
enum ShapeType
LINE
CIRCLE
RECTANGLE
class Line
class Circle
class Rectangle
class Shape
ShapeType type
union type
Line line cascade: LINE
Circle circle cascade: CIRCLE
Rectangle rectangle cascade: RECTANGLE
I've signed in corrections for the manual.
Regards,
Bill
On Tue, 2008-07-08 at 09:02 -0400, dat...@li...
wrote:
> I just hope I've done something wrong... I'll dig a bit more now...
>
> Here is an example of a Datadraw file that doesn't go through with svn
> revision 509. It is a copy paste from the manual plus three class
> definitions and a module header. Unfortunately here is what I've got
> from it:
>
> [rip@localhost union]$ datadraw Union.dd
> Generating data structure for database description file Union.dd
> Reading DataDraw file Union.dd
> Line 15, token ":": Module LINE not found
> Exiting due to errors
> [rip@localhost union]$ datadraw
> Expecting a database description file
> Usage: datadraw [options] file...
> -h file -- Use file as the output header file
> -I path -- Add a directory to the module search path
> -l file -- Use file as a debugging log file
> -m -- Start the database manager to examine datadraw's
> database
> -p -- Set the module as persistent.
> -s file -- Use file as the output for the source file
> -u -- Set the module as undo_redo
> This program generates C data structures from their DataDraw file
> description.
> This is version 3.1.X-unstable, compiled on Jul 8 2008 08:54:34
>
>
> Richard
> -------------------------------------------------------------------------
> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
> Studies have shown that voting for your favorite open source project,
> along with a healthy diet, reduces your potential for chronic lameness
> and boredom. Vote Now at http://www.sourceforge.net/community/cca08
> _______________________________________________ Datadraw-user mailing list Dat...@li... https://lists.sourceforge.net/lists/listinfo/datadraw-user
|
|
From: <dat...@li...> - 2008-07-07 14:37:48
|
I've signed in a fix, and it compiles and runs ok on the hash example. Give it a try. On another note, for some reason, users keep using the same names for classes, fields, and relationship labels, which makes me cross-eyed trying to debug it! I would suggest a class called "Codes" with field "Code" be called "Code", with field "Value". Also the Codes class having a "Stats" relationship to "Statistics", which also has field "Stat" just kills me. Thanks for finding the bug though! If anyone would like to ask me about style issues for DataDraw database definitions, feel free to IM me at Yahoo:xyzzy_bill, or Skype:xyzzy_bill. Try not to use some girlie name like "Foxy" or a cute message like "would you like to be my friend?", or I'll just ignore it! If you say anything that includes "datadraw", I'll reply. You can also e-mail me directly at bi...@bi.... Put 'datadraw' in the subject, or you wont get through. Regards, Bill On Mon, 2008-07-07 at 09:41 -0400, dat...@li... wrote: > You've found a bug in unordered hash table traversals. It's fairly new > code, so it hasn't been tested much. I'm going to rewrite the traversal > so it's a bit more efficient, and wont require the relationship to be > bi-directional, with access to the parent. It'll take me a bit, but I > should get it fixed today. For now, just comment out the 'child_only' > attribute, and it should work ok. > > Regards, > Bill > > On Mon, 2008-07-07 at 14:50 +0200, dat...@li... > wrote: > > > > I have the following compilation error: > > > > > > $ cc -Wall -c cpdatabase.c > > cpdatabase.c: In function `cpStatisticsGetNextCodesStatsStatistics': > > cpdatabase.c:653: warning: implicit declaration of function > > `cpStatisticsGetCodes' > > > > ========= CODE ========= > > module Tmp cp persistent > > > > class Codes > > uint code > > > > class Statistics > > uint stat > > uint tmpfield > > uint code > > > > relationship Codes Statistics:Stats hashed tmpfield code child_only > > mandatory unordered > > relationship Codes Statistics hashed tmpfield child_only > > mandatory unordered > > ========= CODE ========= > > > > > > What am I doing wrong here? > > > > > > Thanks :-) > > ------------------------------------------------------------------------- > > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > > Studies have shown that voting for your favorite open source project, > > along with a healthy diet, reduces your potential for chronic lameness > > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > > _______________________________________________ Datadraw-user mailing list Dat...@li... https://lists.sourceforge.net/lists/listinfo/datadraw-user > > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Datadraw-user mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/datadraw-user |
|
From: <dat...@li...> - 2008-07-07 13:41:59
|
You've found a bug in unordered hash table traversals. It's fairly new code, so it hasn't been tested much. I'm going to rewrite the traversal so it's a bit more efficient, and wont require the relationship to be bi-directional, with access to the parent. It'll take me a bit, but I should get it fixed today. For now, just comment out the 'child_only' attribute, and it should work ok. Regards, Bill On Mon, 2008-07-07 at 14:50 +0200, dat...@li... wrote: > > I have the following compilation error: > > > $ cc -Wall -c cpdatabase.c > cpdatabase.c: In function `cpStatisticsGetNextCodesStatsStatistics': > cpdatabase.c:653: warning: implicit declaration of function > `cpStatisticsGetCodes' > > ========= CODE ========= > module Tmp cp persistent > > class Codes > uint code > > class Statistics > uint stat > uint tmpfield > uint code > > relationship Codes Statistics:Stats hashed tmpfield code child_only > mandatory unordered > relationship Codes Statistics hashed tmpfield child_only > mandatory unordered > ========= CODE ========= > > > What am I doing wrong here? > > > Thanks :-) > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ Datadraw-user mailing list Dat...@li... https://lists.sourceforge.net/lists/listinfo/datadraw-user |
|
From: <dat...@li...> - 2008-07-07 12:50:53
|
I have the following compilation error: $ cc -Wall -c cpdatabase.c cpdatabase.c: In function `cpStatisticsGetNextCodesStatsStatistics': cpdatabase.c:653: warning: implicit declaration of function `cpStatisticsGetCodes' ========= CODE ========= module Tmp cp persistent class Codes uint code class Statistics uint stat uint tmpfield uint code relationship Codes Statistics:Stats hashed tmpfield code child_only mandatory unordered relationship Codes Statistics hashed tmpfield child_only mandatory unordered ========= CODE ========= What am I doing wrong here? Thanks :-) |
|
From: <dat...@li...> - 2008-06-18 23:08:10
|
Hi, all. I compared the DataDraw version of the graph benchmark to the raw C version in more detail. The C version starts to suffer terribly from L2 cache misses somewhere between 10,000 and 40,000 nodes. At 40,000 nodes, L2 cache misses take 80% of the CPU time. Num Nodes Runtime L2 miss rate DataDraw 9,000,000 8.64s 0.4% Raw C 40,000 40.3s 5.6% In other words, I can allocate 225 times more memory in the DataDraw version, yet I still don't suffer a significant L2 miss penalty, while the raw C version chokes on L2 misses. Regards, Bill |
|
From: <dat...@li...> - 2008-05-27 18:52:50
|
Hi. The cache_together directive is signed into svn and documented in the manual. It reduced runtime of DataDraw's red-black tree code in one benchmark from 42 seconds down to 28. I strongly recommend using this directive to speed up critical inner loops. Cache efficiency is the major reason DataDraw wins benchmarks vs other systems, such as C++ STL. All the wrapper functions in the database.h files are now static inline, rather than #defines. This seems to have no impact on speed, and it's kind of nice to be able to step into the functions. All class NULL values are now 0, rather than 0xffffffff, which provides a minor speedup. Best regards, Bill |
|
From: <dat...@li...> - 2008-05-27 08:29:19
|
Hi, Kai.
Thanks for the help in benchmarking and optimization. I didn't find any
difference in the speed of static inline functions vs #defines, so I'm
converting over to static inline. Also, I saw many places in the
assembly code where using 0xffffffff for NULL required an extra
instruction, so I converted DataDraw to use 0 for all NULL values.
DataDraw already uses memory pools, so allocation/freeing of objects is
fast. However, I always am eager to hear about ideas for improving
speed. Let me know if you see anything that could be improved.
Best regards,
Bill
On Tue, 2008-05-27 at 09:46 +0200, dat...@li...
wrote:
> Hi Bill,
>
> great that there are some people out there hunting for speed like me :)
>
> another thing I always use (especially when using a algorithm with different functions (compile-time inheritation) and only with a cpp-compiler):
>
> examplecode is from a image-resize-routine:
> class HermiteFilter {
> public:
> static float getDefaultFilterRadius() { return 1.0f; }
> static float getValue(float val) { .... }
> };
> class ImageResize {
> public:
> template<class filter> static uint32_t *resample(....);
> };
> template<class filter> uint32_t *ImageResize::resample(....) {
> ...
> float wdth = filter::getDefaultFilterRadius() / scaleX;
> ....
> float weight = filter::getValue(center-j-0.5f);
> .....
> }
>
> that get inlined, too:
>
> if(scaleX < 1.0f) {
> 0040C581 fld1 <-- note this :)
> 0040C583 fld dword ptr [esp+14h]
> 0040C587 mov ebx,eax
> 0040C589 fcom st(1)
> 0040C58B add esp,4
> 0040C58E mov dword ptr [esp+18h],ebx
> 0040C592 fnstsw ax
> 0040C594 fldz
> 0040C596 test ah,5
> 0040C599 jp 0040CAFD
> //scales from bigger to smaller width
> float wdth = filter::getDefaultFilterRadius() / scaleX;
>
> for(unsigned int i=0; i<outputSizeX; ++i) {
> 0040C59F test esi,esi
> 0040C5A1 fld st(2) <--- filter::getDefaultFilterRadius() returns with in this version 1.0f and
> the constant 1.0f is loaded previously into floatingpoint-registers
> 0040C5A3 fdivrp st(2),st
> 0040C5A5 mov dword ptr [esp+24h],0
> 0040C5AD fxch st(1)
> 0040C5AF fst dword ptr [esp+28h]
> 0040C5B3 fst dword ptr [esp+34h]
> 0040C5B7 jbe 0040CE44
>
> another thing about your benchmarks: I've done benchmarks, too. but I've found that in algorithms like trees and arrays normally the most time-consuming factor was the memory-management (malloc, free). so I think your cache-idea is great for traversing the tree, but to speed up insertion/deletion of entries you could optimise the system with object-pools. you could use an extra layer for allocations (with inlined or defined functions :) so that's easy to use pools or don't use them...
>
> keep coding, datadraw is a great tool!
> kai
>
>
> ----- original Nachricht --------
>
> Betreff: Re: [Datadraw-user] Red-black trees beat treaps
> Gesendet: Mo, 26. Mai 2008
> Von: dat...@li...
>
> > Hi, Kai.
> >
> > I inspected the assembly-code carefully, and found several issues.
> > First, the reason the #define version was so much faster is that it
> > didn't cast the return value to int. Since my keys were unsigned ints,
> > the right key minus left key was always positive, which led to an
> > ordered tree rather than random. When I fixed it, and improved some
> > cache efficiency issues, DataDraw now beats STDCXX at 29 seconds
> > compared to 30, and beats STL by two seconds. The DataDraw version uses
> > about 60% as much memory.
> >
> > For ordered traversals, however, DataDraw did quite a bit better. At 10
> > traversals of 5 million nodes, DataDraw came in at 6.35 seconds vs 10.35
> > for STL. The reason for this is cache efficiency. I organized the data
> > to group only left child, right child, and the key as a single
> > structure, so every read of a node's key would load a cache line that
> > also contained the node's left and right child. In addition, it fills
> > the cache line with only left, right, and key values of nearby nodes,
> > causing in-order traversal to really speed up. The STL version loads
> > mostly data unrelated to the loop into cache, and all the pointers are
> > 2X as big, making cache misses much more common.
> >
> > One thing really hit home: speed is now all about cache performance.
> > The DataDraw inner loop is 9 assembly instructions due to poor
> > optimization from the GNU C compiler, while the STL version is only 6.
> > However the DataDraw version wins big due to better cache organization.
> >
> > I don't know if inline functions in GNU C are as good as #defines. The
> > last time I checked, about 10 years ago, they slowed down my code by
> > 10-20%. It's probably time to check again, now that all C compilers I
> > care about support inline.
> >
> > Since cache organization is soooo important, I'm adding cache hints to
> > the .dd format. For this example, I think it will look like:
> >
> > class Node
> > Node parent
> > Node left
> > Node right
> > uint32 key
> > bool color
> > cache_together left right key
> >
> > This will cause a these three fields to be defined in a struct of their
> > own, and there will be an array of these structures rather than an array
> > for each separate field. It should be a nice hint to introduce in
> > algorithms after profiling.
> >
> > Best regards,
> > Bill
> >
> > On Fri, 2008-05-23 at 18:11 +0200, dat...@li...
> > wrote:
> > > Hi there :)
> > >
> > > it would be interesting to inspect the assembly-code of #define and the
> > static code. I guess the #define really gets inlined and the static is an
> > extra function call.... inspecting code is rather easy in visual studio, but
> > I know you're working in a unix-environment.
> > >
> > > Optimising code is always a great challenge. But nowadays optimising
> > memory-layout is much better than optimising only functions (ok, ok, using
> > the right algorithm for the problem is the best ;)
> > >
> > > so congratulations for you fast routines! I'm really impressed.
> > > kai
> > >
> > > ----- original Nachricht --------
> > >
> > > Betreff: Re: [Datadraw-user] Red-black trees beat treaps
> > > Gesendet: Do, 22. Mai 2008
> > > Von: dat...@li...
> > >
> > > > Hi, Kai.
> > > >
> > > > For some reason, every compiler I've tested has trouble optimizing when
> > > > using inline functions instead of #defines. That's why #defines are
> > > > still in use in datadraw, though for debugging, we undefine most of
> > them
> > > > and provide real functions instead.
> > > >
> > > > I tested multiset from stdcxx. You'll be happy to know it beats the
> > STL
> > > > library. For 5 million nodes, my random insert/remove benchmark takes
> > > > 30 seconds, while STL takes 31. However, DataDraw does it in 8, while
> > > > doing more work (it inserts real objects with key values, rather than
> > > > just ints), and uses only about 60% as much memory on my 64-bit
> > machine.
> > > > I've attached the files I used to do the benchmark. The DataDraw
> > > > benchmarks are signed in under trunk/examples/redblack_benchmarks
> > > >
> > > > I just love winning speed benchmarks :-)
> > > >
> > > > Best regards,
> > > > Bill
> > > >
> > > > On Thu, 2008-05-22 at 13:42 -0400, dat...@li...
> > > > wrote:
> > > > > I am surprise about the static versus #define because I though static
> > > > > could be interpreted as inline by the optimizer.
> > > > >
> > > > > Great to know.
> > > > >
> > > > > Regards
> > > > >
> > > > > On Thu, May 22, 2008 at 12:38 PM,
> > > > > <dat...@li...> wrote:
> > > > > Hi, Kai.
> > > > >
> > > > > Thanks for the link. I've downloaded it and hope to compare
> > > > > to stdcxx
> > > > > soon. In the meantime, I compared DataDraw ordered_lists to
> > > > > the g++ STL
> > > > > multiset. With just integers, the STL code takes 33 seconds
> > > > > for 5
> > > > > million creates, 5 million random insert delete pairs, and 5
> > > > > million
> > > > > deletes. We use to run in 45 seconds... I found out why we
> > > > > were slower,
> > > > > and fixed it. DataDraw now runs in 8 seconds!
> > > > >
> > > > > Best regards,
> > > > > Bill
> > > > >
> > > > > P.S. The reason the datadraw implementation was slow is a bit
> > > > > scary.
> > > > > Static inline functions DO NOT run as fast as #defines! I
> > > > > converted the
> > > > > compare function to #define base, and the speed when from 43
> > > > > seconds to
> > > > > 8!
> > > > >
> > > > > On Thu, 2008-05-22 at 11:59 +0200,
> > > > > dat...@li...
> > > > >
> > > > > wrote:
> > > > > > Hi Guys,
> > > > > >
> > > > > > I've heard that the stdcxx-code should be really fast, but
> > > > > I've not measured the performance against the
> > > > > wikipedia-entry.... what I've done is to measure the
> > > > > performance of the stl-, crisscross-implementation and some
> > > > > other and found that the stdcxx was the fastest for me. you
> > > > > can find the code here: http://stdcxx.apache.org/
> > > > > >
> > > > > > regards and happy coding :)
> > > > > > kai
> > > > > >
> > > > > > ----- original Nachricht --------
> > > > > >
> > > > > > Betreff: Re: [Datadraw-user] Red-black trees beat treaps
> > > > > > Gesendet: Mi, 21. Mai 2008
> > > > > > Von: dat...@li...
> > > > > >
> > > > > > > It's done. We now have red-black tree implementations
> > for
> > > > > ordered_list
> > > > > > > relationships. This is the fastest code I've found on
> > the
> > > > > Internet for
> > > > > > > ordered lists after both Richard and I did a fairly
> > > > > lengthy search.
> > > > > > > Thanks for the help, Richard!
> > > > > > >
> > > > > > > Bill
> > > > > > >
> > > > > > > On Wed, 2008-05-21 at 14:43 -0400,
> > > > > dat...@li...
> > > > > > > wrote:
> > > > > > > > ?Hi.
> > > > > > > >
> > > > > > > > I hand-coded both treap and black-red tree benchmarks.
> > > > > The red-black
> > > > > > > > code I used is the code from the wikipedia.org page.
> > > > > For random insert
> > > > > > > > and delete of 10 million nodes, treaps took 63 seconds,
> > > > > while red-black
> > > > > > > > took 43 seconds. Treaps actually trounced red-black
> > > > > trees when used as
> > > > > > > > a fifo (deleting only smallest, inserting only
> > largest),
> > > > > but I think
> > > > > > > > random is the more important case for ordered lists.
> > > > > > > >
> > > > > > > > I'm going to convert over to wikipedia style red-black
> > > > > trees if there
> > > > > > > > are no objections. You can find my benchmarks in
> > > > > > > > examples/treap_benchark and
> > examples/redblack_benchmark.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Bill
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > > > This SF.net email is sponsored by: Microsoft
> > > > > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > > > > _______________________________________________
> > > > > > > > Datadraw-user mailing list
> > > > > > > > Dat...@li...
> > > > > > > >
> > > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > > This SF.net email is sponsored by: Microsoft
> > > > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > > > _______________________________________________
> > > > > > > Datadraw-user mailing list
> > > > > > > Dat...@li...
> > > > > > >
> > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > > > > >
> > > > > >
> > > > > > --- original Nachricht Ende ----
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > This SF.net email is sponsored by: Microsoft
> > > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > > _______________________________________________
> > > > > > Datadraw-user mailing list
> > > > > > Dat...@li...
> > > > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > > >
> > > > >
> > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > This SF.net email is sponsored by: Microsoft
> > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > _______________________________________________
> > > > > Datadraw-user mailing list
> > > > > Dat...@li...
> > > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > +1 514-518-8172
> > > > >
> > -------------------------------------------------------------------------
> > > > > This SF.net email is sponsored by: Microsoft
> > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > _______________________________________________ Datadraw-user mailing
> > list
> > > > Dat...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > >
> > >
> > > --- original Nachricht Ende ----
> > >
> > >
> > > -------------------------------------------------------------------------
> > > This SF.net email is sponsored by: Microsoft
> > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > _______________________________________________
> > > Datadraw-user mailing list
> > > Dat...@li...
> > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Microsoft
> > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________
> > Datadraw-user mailing list
> > Dat...@li...
> > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> >
>
> --- original Nachricht Ende ----
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Datadraw-user mailing list
> Dat...@li...
> https://lists.sourceforge.net/lists/listinfo/datadraw-user
|
|
From: <dat...@li...> - 2008-05-27 07:46:18
|
Hi Bill,
great that there are some people out there hunting for speed like me :)
another thing I always use (especially when using a algorithm with different functions (compile-time inheritation) and only with a cpp-compiler):
examplecode is from a image-resize-routine:
class HermiteFilter {
public:
static float getDefaultFilterRadius() { return 1.0f; }
static float getValue(float val) { .... }
};
class ImageResize {
public:
template<class filter> static uint32_t *resample(....);
};
template<class filter> uint32_t *ImageResize::resample(....) {
...
float wdth = filter::getDefaultFilterRadius() / scaleX;
....
float weight = filter::getValue(center-j-0.5f);
.....
}
that get inlined, too:
if(scaleX < 1.0f) {
0040C581 fld1 <-- note this :)
0040C583 fld dword ptr [esp+14h]
0040C587 mov ebx,eax
0040C589 fcom st(1)
0040C58B add esp,4
0040C58E mov dword ptr [esp+18h],ebx
0040C592 fnstsw ax
0040C594 fldz
0040C596 test ah,5
0040C599 jp 0040CAFD
//scales from bigger to smaller width
float wdth = filter::getDefaultFilterRadius() / scaleX;
for(unsigned int i=0; i<outputSizeX; ++i) {
0040C59F test esi,esi
0040C5A1 fld st(2) <--- filter::getDefaultFilterRadius() returns with in this version 1.0f and
the constant 1.0f is loaded previously into floatingpoint-registers
0040C5A3 fdivrp st(2),st
0040C5A5 mov dword ptr [esp+24h],0
0040C5AD fxch st(1)
0040C5AF fst dword ptr [esp+28h]
0040C5B3 fst dword ptr [esp+34h]
0040C5B7 jbe 0040CE44
another thing about your benchmarks: I've done benchmarks, too. but I've found that in algorithms like trees and arrays normally the most time-consuming factor was the memory-management (malloc, free). so I think your cache-idea is great for traversing the tree, but to speed up insertion/deletion of entries you could optimise the system with object-pools. you could use an extra layer for allocations (with inlined or defined functions :) so that's easy to use pools or don't use them...
keep coding, datadraw is a great tool!
kai
----- original Nachricht --------
Betreff: Re: [Datadraw-user] Red-black trees beat treaps
Gesendet: Mo, 26. Mai 2008
Von: dat...@li...
> Hi, Kai.
>
> I inspected the assembly-code carefully, and found several issues.
> First, the reason the #define version was so much faster is that it
> didn't cast the return value to int. Since my keys were unsigned ints,
> the right key minus left key was always positive, which led to an
> ordered tree rather than random. When I fixed it, and improved some
> cache efficiency issues, DataDraw now beats STDCXX at 29 seconds
> compared to 30, and beats STL by two seconds. The DataDraw version uses
> about 60% as much memory.
>
> For ordered traversals, however, DataDraw did quite a bit better. At 10
> traversals of 5 million nodes, DataDraw came in at 6.35 seconds vs 10.35
> for STL. The reason for this is cache efficiency. I organized the data
> to group only left child, right child, and the key as a single
> structure, so every read of a node's key would load a cache line that
> also contained the node's left and right child. In addition, it fills
> the cache line with only left, right, and key values of nearby nodes,
> causing in-order traversal to really speed up. The STL version loads
> mostly data unrelated to the loop into cache, and all the pointers are
> 2X as big, making cache misses much more common.
>
> One thing really hit home: speed is now all about cache performance.
> The DataDraw inner loop is 9 assembly instructions due to poor
> optimization from the GNU C compiler, while the STL version is only 6.
> However the DataDraw version wins big due to better cache organization.
>
> I don't know if inline functions in GNU C are as good as #defines. The
> last time I checked, about 10 years ago, they slowed down my code by
> 10-20%. It's probably time to check again, now that all C compilers I
> care about support inline.
>
> Since cache organization is soooo important, I'm adding cache hints to
> the .dd format. For this example, I think it will look like:
>
> class Node
> Node parent
> Node left
> Node right
> uint32 key
> bool color
> cache_together left right key
>
> This will cause a these three fields to be defined in a struct of their
> own, and there will be an array of these structures rather than an array
> for each separate field. It should be a nice hint to introduce in
> algorithms after profiling.
>
> Best regards,
> Bill
>
> On Fri, 2008-05-23 at 18:11 +0200, dat...@li...
> wrote:
> > Hi there :)
> >
> > it would be interesting to inspect the assembly-code of #define and the
> static code. I guess the #define really gets inlined and the static is an
> extra function call.... inspecting code is rather easy in visual studio, but
> I know you're working in a unix-environment.
> >
> > Optimising code is always a great challenge. But nowadays optimising
> memory-layout is much better than optimising only functions (ok, ok, using
> the right algorithm for the problem is the best ;)
> >
> > so congratulations for you fast routines! I'm really impressed.
> > kai
> >
> > ----- original Nachricht --------
> >
> > Betreff: Re: [Datadraw-user] Red-black trees beat treaps
> > Gesendet: Do, 22. Mai 2008
> > Von: dat...@li...
> >
> > > Hi, Kai.
> > >
> > > For some reason, every compiler I've tested has trouble optimizing when
> > > using inline functions instead of #defines. That's why #defines are
> > > still in use in datadraw, though for debugging, we undefine most of
> them
> > > and provide real functions instead.
> > >
> > > I tested multiset from stdcxx. You'll be happy to know it beats the
> STL
> > > library. For 5 million nodes, my random insert/remove benchmark takes
> > > 30 seconds, while STL takes 31. However, DataDraw does it in 8, while
> > > doing more work (it inserts real objects with key values, rather than
> > > just ints), and uses only about 60% as much memory on my 64-bit
> machine.
> > > I've attached the files I used to do the benchmark. The DataDraw
> > > benchmarks are signed in under trunk/examples/redblack_benchmarks
> > >
> > > I just love winning speed benchmarks :-)
> > >
> > > Best regards,
> > > Bill
> > >
> > > On Thu, 2008-05-22 at 13:42 -0400, dat...@li...
> > > wrote:
> > > > I am surprise about the static versus #define because I though static
> > > > could be interpreted as inline by the optimizer.
> > > >
> > > > Great to know.
> > > >
> > > > Regards
> > > >
> > > > On Thu, May 22, 2008 at 12:38 PM,
> > > > <dat...@li...> wrote:
> > > > Hi, Kai.
> > > >
> > > > Thanks for the link. I've downloaded it and hope to compare
> > > > to stdcxx
> > > > soon. In the meantime, I compared DataDraw ordered_lists to
> > > > the g++ STL
> > > > multiset. With just integers, the STL code takes 33 seconds
> > > > for 5
> > > > million creates, 5 million random insert delete pairs, and 5
> > > > million
> > > > deletes. We use to run in 45 seconds... I found out why we
> > > > were slower,
> > > > and fixed it. DataDraw now runs in 8 seconds!
> > > >
> > > > Best regards,
> > > > Bill
> > > >
> > > > P.S. The reason the datadraw implementation was slow is a bit
> > > > scary.
> > > > Static inline functions DO NOT run as fast as #defines! I
> > > > converted the
> > > > compare function to #define base, and the speed when from 43
> > > > seconds to
> > > > 8!
> > > >
> > > > On Thu, 2008-05-22 at 11:59 +0200,
> > > > dat...@li...
> > > >
> > > > wrote:
> > > > > Hi Guys,
> > > > >
> > > > > I've heard that the stdcxx-code should be really fast, but
> > > > I've not measured the performance against the
> > > > wikipedia-entry.... what I've done is to measure the
> > > > performance of the stl-, crisscross-implementation and some
> > > > other and found that the stdcxx was the fastest for me. you
> > > > can find the code here: http://stdcxx.apache.org/
> > > > >
> > > > > regards and happy coding :)
> > > > > kai
> > > > >
> > > > > ----- original Nachricht --------
> > > > >
> > > > > Betreff: Re: [Datadraw-user] Red-black trees beat treaps
> > > > > Gesendet: Mi, 21. Mai 2008
> > > > > Von: dat...@li...
> > > > >
> > > > > > It's done. We now have red-black tree implementations
> for
> > > > ordered_list
> > > > > > relationships. This is the fastest code I've found on
> the
> > > > Internet for
> > > > > > ordered lists after both Richard and I did a fairly
> > > > lengthy search.
> > > > > > Thanks for the help, Richard!
> > > > > >
> > > > > > Bill
> > > > > >
> > > > > > On Wed, 2008-05-21 at 14:43 -0400,
> > > > dat...@li...
> > > > > > wrote:
> > > > > > > ?Hi.
> > > > > > >
> > > > > > > I hand-coded both treap and black-red tree benchmarks.
> > > > The red-black
> > > > > > > code I used is the code from the wikipedia.org page.
> > > > For random insert
> > > > > > > and delete of 10 million nodes, treaps took 63 seconds,
> > > > while red-black
> > > > > > > took 43 seconds. Treaps actually trounced red-black
> > > > trees when used as
> > > > > > > a fifo (deleting only smallest, inserting only
> largest),
> > > > but I think
> > > > > > > random is the more important case for ordered lists.
> > > > > > >
> > > > > > > I'm going to convert over to wikipedia style red-black
> > > > trees if there
> > > > > > > are no objections. You can find my benchmarks in
> > > > > > > examples/treap_benchark and
> examples/redblack_benchmark.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Bill
> > > > > > >
> > > > > > >
> > > > > > >
> > > >
> > >
> -------------------------------------------------------------------------
> > > > > > > This SF.net email is sponsored by: Microsoft
> > > > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > > > _______________________________________________
> > > > > > > Datadraw-user mailing list
> > > > > > > Dat...@li...
> > > > > > >
> > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > > > >
> > > > > >
> > > > > >
> > > >
> > >
> -------------------------------------------------------------------------
> > > > > > This SF.net email is sponsored by: Microsoft
> > > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > > _______________________________________________
> > > > > > Datadraw-user mailing list
> > > > > > Dat...@li...
> > > > > >
> https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > > > >
> > > > >
> > > > > --- original Nachricht Ende ----
> > > > >
> > > > >
> > > > >
> > > >
> > >
> -------------------------------------------------------------------------
> > > > > This SF.net email is sponsored by: Microsoft
> > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > _______________________________________________
> > > > > Datadraw-user mailing list
> > > > > Dat...@li...
> > > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > >
> > > >
> > > >
> > >
> -------------------------------------------------------------------------
> > > > This SF.net email is sponsored by: Microsoft
> > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > _______________________________________________
> > > > Datadraw-user mailing list
> > > > Dat...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > +1 514-518-8172
> > > >
> -------------------------------------------------------------------------
> > > > This SF.net email is sponsored by: Microsoft
> > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > _______________________________________________ Datadraw-user mailing
> list
> > > Dat...@li...
> > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > >
> >
> > --- original Nachricht Ende ----
> >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Microsoft
> > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________
> > Datadraw-user mailing list
> > Dat...@li...
> > https://lists.sourceforge.net/lists/listinfo/datadraw-user
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Datadraw-user mailing list
> Dat...@li...
> https://lists.sourceforge.net/lists/listinfo/datadraw-user
>
--- original Nachricht Ende ----
|
|
From: <dat...@li...> - 2008-05-26 16:50:28
|
Hi, Kai.
I inspected the assembly-code carefully, and found several issues.
First, the reason the #define version was so much faster is that it
didn't cast the return value to int. Since my keys were unsigned ints,
the right key minus left key was always positive, which led to an
ordered tree rather than random. When I fixed it, and improved some
cache efficiency issues, DataDraw now beats STDCXX at 29 seconds
compared to 30, and beats STL by two seconds. The DataDraw version uses
about 60% as much memory.
For ordered traversals, however, DataDraw did quite a bit better. At 10
traversals of 5 million nodes, DataDraw came in at 6.35 seconds vs 10.35
for STL. The reason for this is cache efficiency. I organized the data
to group only left child, right child, and the key as a single
structure, so every read of a node's key would load a cache line that
also contained the node's left and right child. In addition, it fills
the cache line with only left, right, and key values of nearby nodes,
causing in-order traversal to really speed up. The STL version loads
mostly data unrelated to the loop into cache, and all the pointers are
2X as big, making cache misses much more common.
One thing really hit home: speed is now all about cache performance.
The DataDraw inner loop is 9 assembly instructions due to poor
optimization from the GNU C compiler, while the STL version is only 6.
However the DataDraw version wins big due to better cache organization.
I don't know if inline functions in GNU C are as good as #defines. The
last time I checked, about 10 years ago, they slowed down my code by
10-20%. It's probably time to check again, now that all C compilers I
care about support inline.
Since cache organization is soooo important, I'm adding cache hints to
the .dd format. For this example, I think it will look like:
class Node
Node parent
Node left
Node right
uint32 key
bool color
cache_together left right key
This will cause a these three fields to be defined in a struct of their
own, and there will be an array of these structures rather than an array
for each separate field. It should be a nice hint to introduce in
algorithms after profiling.
Best regards,
Bill
On Fri, 2008-05-23 at 18:11 +0200, dat...@li...
wrote:
> Hi there :)
>
> it would be interesting to inspect the assembly-code of #define and the static code. I guess the #define really gets inlined and the static is an extra function call.... inspecting code is rather easy in visual studio, but I know you're working in a unix-environment.
>
> Optimising code is always a great challenge. But nowadays optimising memory-layout is much better than optimising only functions (ok, ok, using the right algorithm for the problem is the best ;)
>
> so congratulations for you fast routines! I'm really impressed.
> kai
>
> ----- original Nachricht --------
>
> Betreff: Re: [Datadraw-user] Red-black trees beat treaps
> Gesendet: Do, 22. Mai 2008
> Von: dat...@li...
>
> > Hi, Kai.
> >
> > For some reason, every compiler I've tested has trouble optimizing when
> > using inline functions instead of #defines. That's why #defines are
> > still in use in datadraw, though for debugging, we undefine most of them
> > and provide real functions instead.
> >
> > I tested multiset from stdcxx. You'll be happy to know it beats the STL
> > library. For 5 million nodes, my random insert/remove benchmark takes
> > 30 seconds, while STL takes 31. However, DataDraw does it in 8, while
> > doing more work (it inserts real objects with key values, rather than
> > just ints), and uses only about 60% as much memory on my 64-bit machine.
> > I've attached the files I used to do the benchmark. The DataDraw
> > benchmarks are signed in under trunk/examples/redblack_benchmarks
> >
> > I just love winning speed benchmarks :-)
> >
> > Best regards,
> > Bill
> >
> > On Thu, 2008-05-22 at 13:42 -0400, dat...@li...
> > wrote:
> > > I am surprise about the static versus #define because I though static
> > > could be interpreted as inline by the optimizer.
> > >
> > > Great to know.
> > >
> > > Regards
> > >
> > > On Thu, May 22, 2008 at 12:38 PM,
> > > <dat...@li...> wrote:
> > > Hi, Kai.
> > >
> > > Thanks for the link. I've downloaded it and hope to compare
> > > to stdcxx
> > > soon. In the meantime, I compared DataDraw ordered_lists to
> > > the g++ STL
> > > multiset. With just integers, the STL code takes 33 seconds
> > > for 5
> > > million creates, 5 million random insert delete pairs, and 5
> > > million
> > > deletes. We use to run in 45 seconds... I found out why we
> > > were slower,
> > > and fixed it. DataDraw now runs in 8 seconds!
> > >
> > > Best regards,
> > > Bill
> > >
> > > P.S. The reason the datadraw implementation was slow is a bit
> > > scary.
> > > Static inline functions DO NOT run as fast as #defines! I
> > > converted the
> > > compare function to #define base, and the speed when from 43
> > > seconds to
> > > 8!
> > >
> > > On Thu, 2008-05-22 at 11:59 +0200,
> > > dat...@li...
> > >
> > > wrote:
> > > > Hi Guys,
> > > >
> > > > I've heard that the stdcxx-code should be really fast, but
> > > I've not measured the performance against the
> > > wikipedia-entry.... what I've done is to measure the
> > > performance of the stl-, crisscross-implementation and some
> > > other and found that the stdcxx was the fastest for me. you
> > > can find the code here: http://stdcxx.apache.org/
> > > >
> > > > regards and happy coding :)
> > > > kai
> > > >
> > > > ----- original Nachricht --------
> > > >
> > > > Betreff: Re: [Datadraw-user] Red-black trees beat treaps
> > > > Gesendet: Mi, 21. Mai 2008
> > > > Von: dat...@li...
> > > >
> > > > > It's done. We now have red-black tree implementations for
> > > ordered_list
> > > > > relationships. This is the fastest code I've found on the
> > > Internet for
> > > > > ordered lists after both Richard and I did a fairly
> > > lengthy search.
> > > > > Thanks for the help, Richard!
> > > > >
> > > > > Bill
> > > > >
> > > > > On Wed, 2008-05-21 at 14:43 -0400,
> > > dat...@li...
> > > > > wrote:
> > > > > > ?Hi.
> > > > > >
> > > > > > I hand-coded both treap and black-red tree benchmarks.
> > > The red-black
> > > > > > code I used is the code from the wikipedia.org page.
> > > For random insert
> > > > > > and delete of 10 million nodes, treaps took 63 seconds,
> > > while red-black
> > > > > > took 43 seconds. Treaps actually trounced red-black
> > > trees when used as
> > > > > > a fifo (deleting only smallest, inserting only largest),
> > > but I think
> > > > > > random is the more important case for ordered lists.
> > > > > >
> > > > > > I'm going to convert over to wikipedia style red-black
> > > trees if there
> > > > > > are no objections. You can find my benchmarks in
> > > > > > examples/treap_benchark and examples/redblack_benchmark.
> > > > > >
> > > > > > Regards,
> > > > > > Bill
> > > > > >
> > > > > >
> > > > > >
> > >
> > -------------------------------------------------------------------------
> > > > > > This SF.net email is sponsored by: Microsoft
> > > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > > _______________________________________________
> > > > > > Datadraw-user mailing list
> > > > > > Dat...@li...
> > > > > >
> > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > > >
> > > > >
> > > > >
> > >
> > -------------------------------------------------------------------------
> > > > > This SF.net email is sponsored by: Microsoft
> > > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > > _______________________________________________
> > > > > Datadraw-user mailing list
> > > > > Dat...@li...
> > > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > > > >
> > > >
> > > > --- original Nachricht Ende ----
> > > >
> > > >
> > > >
> > >
> > -------------------------------------------------------------------------
> > > > This SF.net email is sponsored by: Microsoft
> > > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > > _______________________________________________
> > > > Datadraw-user mailing list
> > > > Dat...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > >
> > >
> > >
> > -------------------------------------------------------------------------
> > > This SF.net email is sponsored by: Microsoft
> > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > _______________________________________________
> > > Datadraw-user mailing list
> > > Dat...@li...
> > > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> > >
> > >
> > >
> > >
> > > --
> > > +1 514-518-8172
> > > -------------------------------------------------------------------------
> > > This SF.net email is sponsored by: Microsoft
> > > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > _______________________________________________ Datadraw-user mailing list
> > Dat...@li...
> > https://lists.sourceforge.net/lists/listinfo/datadraw-user
> >
>
> --- original Nachricht Ende ----
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Datadraw-user mailing list
> Dat...@li...
> https://lists.sourceforge.net/lists/listinfo/datadraw-user
|
|
From: <dat...@li...> - 2008-05-23 16:32:21
|
Hi there :) it would be interesting to inspect the assembly-code of #define and the static code. I guess the #define really gets inlined and the static is an extra function call.... inspecting code is rather easy in visual studio, but I know you're working in a unix-environment. Optimising code is always a great challenge. But nowadays optimising memory-layout is much better than optimising only functions (ok, ok, using the right algorithm for the problem is the best ;) so congratulations for you fast routines! I'm really impressed. kai ----- original Nachricht -------- Betreff: Re: [Datadraw-user] Red-black trees beat treaps Gesendet: Do, 22. Mai 2008 Von: dat...@li... > Hi, Kai. > > For some reason, every compiler I've tested has trouble optimizing when > using inline functions instead of #defines. That's why #defines are > still in use in datadraw, though for debugging, we undefine most of them > and provide real functions instead. > > I tested multiset from stdcxx. You'll be happy to know it beats the STL > library. For 5 million nodes, my random insert/remove benchmark takes > 30 seconds, while STL takes 31. However, DataDraw does it in 8, while > doing more work (it inserts real objects with key values, rather than > just ints), and uses only about 60% as much memory on my 64-bit machine. > I've attached the files I used to do the benchmark. The DataDraw > benchmarks are signed in under trunk/examples/redblack_benchmarks > > I just love winning speed benchmarks :-) > > Best regards, > Bill > > On Thu, 2008-05-22 at 13:42 -0400, dat...@li... > wrote: > > I am surprise about the static versus #define because I though static > > could be interpreted as inline by the optimizer. > > > > Great to know. > > > > Regards > > > > On Thu, May 22, 2008 at 12:38 PM, > > <dat...@li...> wrote: > > Hi, Kai. > > > > Thanks for the link. I've downloaded it and hope to compare > > to stdcxx > > soon. In the meantime, I compared DataDraw ordered_lists to > > the g++ STL > > multiset. With just integers, the STL code takes 33 seconds > > for 5 > > million creates, 5 million random insert delete pairs, and 5 > > million > > deletes. We use to run in 45 seconds... I found out why we > > were slower, > > and fixed it. DataDraw now runs in 8 seconds! > > > > Best regards, > > Bill > > > > P.S. The reason the datadraw implementation was slow is a bit > > scary. > > Static inline functions DO NOT run as fast as #defines! I > > converted the > > compare function to #define base, and the speed when from 43 > > seconds to > > 8! > > > > On Thu, 2008-05-22 at 11:59 +0200, > > dat...@li... > > > > wrote: > > > Hi Guys, > > > > > > I've heard that the stdcxx-code should be really fast, but > > I've not measured the performance against the > > wikipedia-entry.... what I've done is to measure the > > performance of the stl-, crisscross-implementation and some > > other and found that the stdcxx was the fastest for me. you > > can find the code here: http://stdcxx.apache.org/ > > > > > > regards and happy coding :) > > > kai > > > > > > ----- original Nachricht -------- > > > > > > Betreff: Re: [Datadraw-user] Red-black trees beat treaps > > > Gesendet: Mi, 21. Mai 2008 > > > Von: dat...@li... > > > > > > > It's done. We now have red-black tree implementations for > > ordered_list > > > > relationships. This is the fastest code I've found on the > > Internet for > > > > ordered lists after both Richard and I did a fairly > > lengthy search. > > > > Thanks for the help, Richard! > > > > > > > > Bill > > > > > > > > On Wed, 2008-05-21 at 14:43 -0400, > > dat...@li... > > > > wrote: > > > > > ?Hi. > > > > > > > > > > I hand-coded both treap and black-red tree benchmarks. > > The red-black > > > > > code I used is the code from the wikipedia.org page. > > For random insert > > > > > and delete of 10 million nodes, treaps took 63 seconds, > > while red-black > > > > > took 43 seconds. Treaps actually trounced red-black > > trees when used as > > > > > a fifo (deleting only smallest, inserting only largest), > > but I think > > > > > random is the more important case for ordered lists. > > > > > > > > > > I'm going to convert over to wikipedia style red-black > > trees if there > > > > > are no objections. You can find my benchmarks in > > > > > examples/treap_benchark and examples/redblack_benchmark. > > > > > > > > > > Regards, > > > > > Bill > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > This SF.net email is sponsored by: Microsoft > > > > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > > > _______________________________________________ > > > > > Datadraw-user mailing list > > > > > Dat...@li... > > > > > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > This SF.net email is sponsored by: Microsoft > > > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > > _______________________________________________ > > > > Datadraw-user mailing list > > > > Dat...@li... > > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user > > > > > > > > > > --- original Nachricht Ende ---- > > > > > > > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by: Microsoft > > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > _______________________________________________ > > > Datadraw-user mailing list > > > Dat...@li... > > > https://lists.sourceforge.net/lists/listinfo/datadraw-user > > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Datadraw-user mailing list > > Dat...@li... > > https://lists.sourceforge.net/lists/listinfo/datadraw-user > > > > > > > > > > -- > > +1 514-518-8172 > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ Datadraw-user mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/datadraw-user > --- original Nachricht Ende ---- |