From: Jesse G. <je...@wi...> - 2004-07-30 21:34:37
|
Hello, The developers of extlib seem particularly bent on maximum performance. I have no problem with that. In fact, I think it's great. One of the things that attracted me to OCaml is the speed. However, since big-O notation is so widely used to shoot down new code submissions and generally describe the speed of various functions during discussion, why the heck don't we clearly state the expected big-O performance of the function in the documentation? I'm concerned about performance, but as an OCaml newbie on a tight schedule I don't have time to read/evaluate/benchmark each function in extlib or stdlib that I want to use to see just how fast it is. If we stated the presumed big-O scalability in the docs then making an intelligent decision about which module or function to use would be a lot easier/faster. In addition, I think it would be nice to put up a page on the web stating the presumed big-O scalability of as many stdlib functions as possible. This way people will have a clear reason to choose the extlib version of a function over the stdlib version, and we will be providing a valuable service to the OCaml community. What do you think? -- Jesse Guardiani, Systems Administrator WingNET Internet Services, P.O. Box 2605 // Cleveland, TN 37320-2605 423-559-LINK (v) 423-559-5145 (f) http://www.wingnet.net |
From: Brian H. <bh...@sp...> - 2004-07-30 22:10:25
|
On Fri, 30 Jul 2004, Jesse Guardiani wrote: > However, since big-O notation is so widely used > to shoot down new code submissions and generally > describe the speed of various functions during > discussion, why the heck don't we clearly state > the expected big-O performance of the function in > the documentation? I'd like to second this proposal. I don't have the time to do it this at the moment, but I think it's a real good idea. One of the biggest problems people new to Ocaml get into effectively boil down to using the wrong algorithm in the wrong situation. Using List.nth, for example, is almost always a mistake- if you need it, you probably shouldn't be using a list. > In addition, I think it would be nice to put up > a page on the web stating the presumed big-O > scalability of as many stdlib functions as possible. As a big table. Each row is an operation, each column a module or data structure. So you can go "what do I need to do with this structure?" and easily compare different structures against each other. Another thing I've been thinking about doing for a while. > What do you think? I think it's a good idea. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Nicolas C. <nca...@mo...> - 2004-07-30 22:50:10
|
> > However, since big-O notation is so widely used > > to shoot down new code submissions and generally > > describe the speed of various functions during > > discussion, why the heck don't we clearly state > > the expected big-O performance of the function in > > the documentation? > > I'd like to second this proposal. I don't have the time to do it this at > the moment, but I think it's a real good idea. > > One of the biggest problems people new to Ocaml get into effectively boil > down to using the wrong algorithm in the wrong situation. Using List.nth, > for example, is almost always a mistake- if you need it, you probably > shouldn't be using a list. > > > In addition, I think it would be nice to put up > > a page on the web stating the presumed big-O > > scalability of as many stdlib functions as possible. > > As a big table. Each row is an operation, each column a module or data > structure. So you can go "what do I need to do with this structure?" and > easily compare different structures against each other. > > Another thing I've been thinking about doing for a while. > > > What do you think? > > I think it's a good idea. I think the people using List.nth either : - didn't understood that it was a linked list, so we might clarify this in the documentation - don't know at all about complexity, so having big-O notations won't help them either Adding big-O notations in the documentation is only necessary when the actual implementation is different from what can be naturally expected from it, or when the algorithm is unknown ( a good example for this is O(1) for String.length). So I think we might not put big O notations everywhere in the OCamldoc's of ExtLib, but having a separate HTML page - that would be put on the ExtLib website - comparing data structures and algorithms complexity with a short introduction for each would be definitly a nice thing to do. Is the author is willing to do so, he's welcome. Regards, Nicolas Cannasse |
From: Dustin S. <du...@sp...> - 2004-07-30 23:12:28
|
On Jul 30, 2004, at 15:50, Nicolas Cannasse wrote: > I think the people using List.nth either : > - didn't understood that it was a linked list, so we might clarify > this in > the documentation > - don't know at all about complexity, so having big-O notations won't > help > them either How about: - find it efficient enough to solve their problems without doing something less elegant. Most of the time, it's not performance at all costs. Using nth or length on lists that are guaranteed to be small will probably not have much of an affect on performance. O(n) for a fast function with guaranteed really small values of n can certainly be faster than O(1) for a slow function. I mean, I realize this is a general purpose library and all that...but whatever happened to optimizing when things need to be optimized? You can't just declare people ignorant because they use a function that could theoretically be a bottleneck in an extreme case. -- SPY My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <du...@sp...> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________ |
From: Brian H. <bh...@sp...> - 2004-07-30 23:23:04
|
On Fri, 30 Jul 2004, Dustin Sallings wrote: > - find it efficient enough to solve their problems without doing > something less elegant. > > Most of the time, it's not performance at all costs. Using nth or > length on lists that are guaranteed to be small will probably not have > much of an affect on performance. O(n) for a fast function with > guaranteed really small values of n can certainly be faster than O(1) > for a slow function. I am of the opinion that if you (think you) know what you're doing, have all the rope you want. Don't say we didn't warn you, however. Because you *might* be right. > > I mean, I realize this is a general purpose library and all that...but > whatever happened to optimizing when things need to be optimized? You > can't just declare people ignorant because they use a function that > could theoretically be a bottleneck in an extreme case. > One of the design criteria I think about in trying to design APIs for fundamental data structures is how easy it is to swap one structure out for another. As that encourages to use the "obvious" data structure in developing code, and then when performance isn't what you expected/needed, makes it easier to swap to different (more complicated) data structures. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: skaller <sk...@us...> - 2004-07-31 03:48:10
|
On Sat, 2004-07-31 at 08:50, Nicolas Cannasse wrote: > I think the people using List.nth either : > - didn't understood that it was a linked list, so we might clarify this in > the documentation > - don't know at all about complexity, so having big-O notations won't help > them either Nah, I use it, and know both these things. I don't care. I doesn't matter. Finding the tail element of a list (the worst possible value of n) I do quite often: List.length must do this too. The thing is my list might be the list of parameters of a function where the length is 0, 1 or 2 usually, sometimes even 5 or 6 elements in the list -- who writes functions with 20 arguments will expect the compiler to take 0.01 microseconds longer ... One needs to get things in proportion -- lists are easy to handle in Ocaml: they're functional, you can match on them, they're very cheap to construct: in the case of parameters I could actually use an array (i know in advance how many there are) but its easier to get a tail recursive pattern match list visitor to work than get the bounds of a for loop right. I also use List.nth on a HUGE list -- all tokens of a program. That function is only called to report an error, and at most once, since the next thing I do is terminate. So who cares -- I'm just scanning for enough context to print a decent error message -- if it takes a few seconds it just doesn't matter. -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Jesse G. <je...@wi...> - 2004-08-01 14:04:13
|
Brian Hurt wrote: > On Fri, 30 Jul 2004, Jesse Guardiani wrote: > >> However, since big-O notation is so widely used >> to shoot down new code submissions and generally >> describe the speed of various functions during >> discussion, why the heck don't we clearly state >> the expected big-O performance of the function in >> the documentation? > > I'd like to second this proposal. I don't have the time to do it this at > the moment, but I think it's a real good idea. I *might*. It seems like a neat project, and a good excuse to learn mod_caml, so I think I'm going to spend a half hour or hour a day discussing and designing this for the next few weeks and see where it gets me. If I make decent progress and have some good ideas that a lot of people like, then maybe I'll continue on to the coding phase. On the drive home, Friday, I was thinking that it would be best to make a general repository of Big-O information. It can start out as an OCaml repository, then once the OCaml stdlib and Extlib are mostly documented, we can open it up to the other ML languages, Java, Python, Perl, PHP, etc... The people using those languages (except maybe the MLs) typically don't care so much about performance as the people using OCaml, but it would be useful info nevertheless. I figure the website will take the following form: We'll have a top level menu of languages, then projects (i.e. stdlib vs extlib), then modules, then classes, then functions. Once you click on a function, we will *try* to list the theoretical Big-O scalability of that function (I won't be doing the listing. I'll just write and host the website and let more knowledgable volunteers fill in the blanks). Then, on the same page, below the theoretical Big-O, we can have a section for real-world benchmarks supporting or refuting the theoretical Big-O scalability. This section will list benchmarks by OS type and version, arch type, CPU speed, and by the submitter's name. We'll also have a place for the benchmark code and results. Finally, I think it would be a good idea to have a discussion board below each benchmark section, like they do in the on-line MySQL manuals. I'd like to keep this all 100% OCaml based, to provide a little positive PR for OCaml, so does anyone have an OCaml based web forum or blog available for assimilation? This way, when someone performs a search for "String.length", we can display the theoretical Big-O notation, the number of benchmarks supporting, and the number of benchmarks refuting, and a percentile of the two. The user can then make an informed decision to use or not use the function based on this information, or if the user is in doubt, he/she can click on the function and view the benchmarks and any discussions. The only thing I'm hazy on right now is how to handle version control. I want to be able to list function versions, and which version of the project introduced the function so a user can compare Big-O between versions. The first step will be to create a sourceforge project page and a mailing list, but I'd like to get some additional feedback from this list before I do so. Criticism, priase, and new ideas are all welcome. What do you think? -- Jesse Guardiani, Systems Administrator WingNET Internet Services, P.O. Box 2605 // Cleveland, TN 37320-2605 423-559-LINK (v) 423-559-5145 (f) http://www.wingnet.net |
From: Jesse G. <je...@wi...> - 2004-08-01 14:10:35
|
Brian Hurt wrote: > On Fri, 30 Jul 2004, Jesse Guardiani wrote: > >> However, since big-O notation is so widely used >> to shoot down new code submissions and generally >> describe the speed of various functions during >> discussion, why the heck don't we clearly state >> the expected big-O performance of the function in >> the documentation? > > I'd like to second this proposal. I don't have the time to do it this at > the moment, but I think it's a real good idea. > > One of the biggest problems people new to Ocaml get into effectively boil > down to using the wrong algorithm in the wrong situation. Using List.nth, > for example, is almost always a mistake- if you need it, you probably > shouldn't be using a list. > >> In addition, I think it would be nice to put up >> a page on the web stating the presumed big-O >> scalability of as many stdlib functions as possible. > > As a big table. Each row is an operation, each column a module or data > structure. So you can go "what do I need to do with this structure?" and > easily compare different structures against each other. Could you give an example? I'm having a hard time visualizing what you mean to compare with my own ideas. -- Jesse Guardiani, Systems Administrator WingNET Internet Services, P.O. Box 2605 // Cleveland, TN 37320-2605 423-559-LINK (v) 423-559-5145 (f) http://www.wingnet.net |
From: Brian H. <bh...@sp...> - 2004-08-01 14:20:50
|
On Sun, 1 Aug 2004, Jesse Guardiani wrote: > Brian Hurt wrote: > > > On Fri, 30 Jul 2004, Jesse Guardiani wrote: > > > >> However, since big-O notation is so widely used > >> to shoot down new code submissions and generally > >> describe the speed of various functions during > >> discussion, why the heck don't we clearly state > >> the expected big-O performance of the function in > >> the documentation? > > > > I'd like to second this proposal. I don't have the time to do it this at > > the moment, but I think it's a real good idea. > > > > One of the biggest problems people new to Ocaml get into effectively boil > > down to using the wrong algorithm in the wrong situation. Using List.nth, > > for example, is almost always a mistake- if you need it, you probably > > shouldn't be using a list. > > > >> In addition, I think it would be nice to put up > >> a page on the web stating the presumed big-O > >> scalability of as many stdlib functions as possible. > > > > As a big table. Each row is an operation, each column a module or data > > structure. So you can go "what do I need to do with this structure?" and > > easily compare different structures against each other. > > Could you give an example? I'm having a hard time visualizing what you > mean to compare with my own ideas. > ------------+-------+-------+-------+ | | A | | | L | r | T | | i | r | r | | s | a | e | Operation | t | y | e | ------------+-------+-------+-------+ Preppend | 1 | N | lg N | ------------+-------+-------+-------+ Append | N | N | lg N | ------------+-------+-------+-------+ Insert | N | N | lg N | ------------+-------+-------+-------+ First | 1 | 1 | lg N | ------------+-------+-------+-------+ Last | N | 1 | lg N | ------------+-------+-------+-------+ Nth | N | 1 | lg N | ------------+-------+-------+-------+ -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: skaller <sk...@us...> - 2004-08-01 19:22:04
|
On Mon, 2004-08-02 at 00:28, Brian Hurt wrote: > On Sun, 1 Aug 2004, Jesse Guardiani wrote: > ------------+-------+-------+-------+ > | | A | | > | L | r | T | > | i | r | r | > | s | a | e | > Operation | t | y | e | > ------------+-------+-------+-------+ > Preppend | 1 | N | lg N | > ------------+-------+-------+-------+ > Append | N | N | lg N | > ------------+-------+-------+-------+ > Insert | N | N | lg N | > ------------+-------+-------+-------+ > First | 1 | 1 | lg N | > ------------+-------+-------+-------+ > Last | N | 1 | lg N | > ------------+-------+-------+-------+ > Nth | N | 1 | lg N | > ------------+-------+-------+-------+ See the C++ Standard. It basically names two kinds of containers, Sequences and Associative Containers, which are sequences with 'extra' operations. Generally, I'd like to believe the C++ Standard has created a de-facto proper way to think about performance: The complexity of a function BELONGS DIRECTLY IN THE LIBRARY DOCUMENTATION. It is every bit a fundamental part of the interface specification. Sure, build a comparative web-site, but that's no excuse for leaving the complexity information out of the library specification. In ISO C++, its a *specification*. An implementation that doesn't meet the requirements simply isn't conforming, and thus a programmer can actual reason about performance. C++ is VERY strict here. For some STL algorithms, it isn't just big-O notation that is specified -- it actually specifies the exact number of operations allowed. -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Brian H. <bh...@sp...> - 2004-08-02 00:01:54
|
On 2 Aug 2004, skaller wrote: > On Mon, 2004-08-02 at 00:28, Brian Hurt wrote: > > On Sun, 1 Aug 2004, Jesse Guardiani wrote: > > > ------------+-------+-------+-------+ > > | | A | | > > | L | r | T | > > | i | r | r | > > | s | a | e | > > Operation | t | y | e | > > ------------+-------+-------+-------+ > > Preppend | 1 | N | lg N | > > ------------+-------+-------+-------+ > > Append | N | N | lg N | > > ------------+-------+-------+-------+ > > Insert | N | N | lg N | > > ------------+-------+-------+-------+ > > First | 1 | 1 | lg N | > > ------------+-------+-------+-------+ > > Last | N | 1 | lg N | > > ------------+-------+-------+-------+ > > Nth | N | 1 | lg N | > > ------------+-------+-------+-------+ > > > See the C++ Standard. It basically names two > kinds of containers, Sequences and Associative > Containers, which are sequences with 'extra' operations. > > Generally, I'd like to believe the C++ Standard > has created a de-facto proper way to think about > performance: Because of my experience with C++, I tend to assume that whatever C++ does is wrong, unless proven otherwise. I don't want to get into a C++ advocacy flame war. My point here is that Ocaml is a *very* different language in a large number of aspects, and blindly following C++ is certainly a bad idea. In this case, however: > > The complexity of a function BELONGS DIRECTLY IN THE LIBRARY > DOCUMENTATION. It is every bit a fundamental part of the interface > specification. This was the original proposal, which I agree with. The table was in addition to the library documentaion, as an aid and not a primary source. > C++ is VERY strict here. For some STL algorithms, it isn't > just big-O notation that is specified -- it actually > specifies the exact number of operations allowed. This is one place where I think C++ was stupid. The assumption here is that fewer steps is always faster, but this isn't true on architectures where some "steps" are 100x slower than other "steps"- which is basically every modern architecture (adding two integers costs < 1 clock cycle, reading main memory on a cache miss takes 100-250 cycles). The days of counting instructions are long gone. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: skaller <sk...@us...> - 2004-08-02 02:59:39
|
On Mon, 2004-08-02 at 10:09, Brian Hurt wrote: > Because of my experience with C++, I tend to assume that whatever C++ does > is wrong, unless proven otherwise. hehe. > > > > The complexity of a function BELONGS DIRECTLY IN THE LIBRARY > > DOCUMENTATION. It is every bit a fundamental part of the interface > > specification. > > This was the original proposal, which I agree with. The table was in > addition to the library documentaion, as an aid and not a primary source. > > > C++ is VERY strict here. For some STL algorithms, it isn't > > just big-O notation that is specified -- it actually > > specifies the exact number of operations allowed. > > This is one place where I think C++ was stupid. The assumption here is > that fewer steps is always faster, In C++ iterator based operations can call arbitrary methods of class objects. Such methods can have side-effects. Key algorithms like copy() would be utterly useless without the rigid guarrantee. To a less extent but vastly more extensively this applies to copy constructors: more extensively meaning 'this issue pervades the a huge number of operations' and less extent because the compiler has freedom to insert and elide copy constructors 'almost everywhere' -- thus in the case of copy ctors the issue is only one of cost. The thing is -- this kind of detail is simply mandatory in procedural language library interfaces, and that is precisely the issue with Enums -- hand written code gives precise control of sequencing of coupled side effects, and any indeterminacy -- such as laziness or buffering or indeterminate evaluation order -- can impact semantics and therefore mandate the hand written code over the library. When it comes to integration of procedural and functional code, C++ and Ocaml are both pretty bad: Ocaml's virtue is that you can avoid the issues using the strong functional subsystem :) I hate saying 'trust me', but I'm hardly a C++ advocate, despite 15 years as a member of the ISO C++ committee .. however, some parts of C++ are quite good, and some things were quite well handled by the committee. STL was built, after all, by a theoretician, who wasn't even on the committee -- and it was Stepanov who convinced the committee to change from the old ways and include performance guarrantees in the library. Most of the C++ community now accepts that this is a Good Thing. -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Brian H. <bh...@sp...> - 2004-08-02 03:40:17
|
On 2 Aug 2004, skaller wrote: > > This is one place where I think C++ was stupid. The assumption here is > > that fewer steps is always faster, > > In C++ iterator based operations > can call arbitrary methods of class objects. > Such methods can have side-effects. > > Key algorithms like copy() would be utterly useless > without the rigid guarrantee. The problem is that a specific count of steps basically demands a specific implementation. IMHO, you should never modify a mutable data structure while iterating over it (simply iterating over the data structure shouldn't change it). And mutable data structures include I/O streams in this statement. Writting to a file on one file handle, while reading from it with another, or even writting to the same file with two different file handles, produces "interesting" results. Functional data structures don't have this problem. > > To a less extent but vastly more extensively this > applies to copy constructors: more extensively > meaning 'this issue pervades the a huge number > of operations' and less extent because the compiler > has freedom to insert and elide copy constructors > 'almost everywhere' -- thus in the case of copy ctors > the issue is only one of cost. > > The thing is -- this kind of detail is simply mandatory > in procedural language library interfaces, and that is > precisely the issue with Enums -- hand written code > gives precise control of sequencing of coupled side effects, > and any indeterminacy -- such as laziness or buffering > or indeterminate evaluation order -- can impact semantics > and therefore mandate the hand written code over the > library. This is also means there is a huge amount of dependency between the code and the library. And this is bad. The more we discourage this level of intimacy between modules, the better off everyone is. > > When it comes to integration of procedural and functional > code, C++ and Ocaml are both pretty bad: Ocaml's virtue > is that you can avoid the issues using the strong functional > subsystem :) There's not a whole lot you can do with the imperitive style of programming. > > I hate saying 'trust me', but I'm hardly a C++ advocate, > despite 15 years as a member of the ISO C++ committee .. > however, some parts of C++ are quite good, and some > things were quite well handled by the committee. I don't claim to be a C++ expert. I have used C++ professional on mid-sized projects. > > STL was built, after all, by a theoretician, who wasn't > even on the committee -- and it was Stepanov who convinced > the committee to change from the old ways and include > performance guarrantees in the library. Most of the C++ > community now accepts that this is a Good Thing. > We're of a mind, basically, here. Big-O complexity measures in the library documentation is good. I just think specific number of steps is bad. I don't mind stealing other good ideas. I kind of like the sequence vr.s association differentiation. Two tweaks: first of all, I'd call it positional instead of sequence. A quad tree is a different beast than a sequence of sequences. Assuming all sequences are one-dimensional is a bad idea. Second, I'd add a third category- graphs. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Jesse G. <je...@wi...> - 2004-08-02 13:40:55
|
On Sunday 01 August 2004 10:28, Brian Hurt wrote: > On Sun, 1 Aug 2004, Jesse Guardiani wrote: > > Brian Hurt wrote: > > > On Fri, 30 Jul 2004, Jesse Guardiani wrote: > > >> However, since big-O notation is so widely used > > >> to shoot down new code submissions and generally > > >> describe the speed of various functions during > > >> discussion, why the heck don't we clearly state > > >> the expected big-O performance of the function in > > >> the documentation? > > > > > > I'd like to second this proposal. I don't have the time to do it this > > > at the moment, but I think it's a real good idea. > > > > > > One of the biggest problems people new to Ocaml get into effectively > > > boil down to using the wrong algorithm in the wrong situation. Using > > > List.nth, for example, is almost always a mistake- if you need it, you > > > probably shouldn't be using a list. > > > > > >> In addition, I think it would be nice to put up > > >> a page on the web stating the presumed big-O > > >> scalability of as many stdlib functions as possible. > > > > > > As a big table. Each row is an operation, each column a module or data > > > structure. So you can go "what do I need to do with this structure?" > > > and easily compare different structures against each other. > > > > Could you give an example? I'm having a hard time visualizing what you > > mean to compare with my own ideas. > > ------------+-------+-------+-------+ > > | | A | | > | > | L | r | T | > | i | r | r | > | s | a | e | > > Operation | t | y | e | > ------------+-------+-------+-------+ > Preppend | 1 | N | lg N | > ------------+-------+-------+-------+ > Append | N | N | lg N | > ------------+-------+-------+-------+ > Insert | N | N | lg N | > ------------+-------+-------+-------+ > First | 1 | 1 | lg N | > ------------+-------+-------+-------+ > Last | N | 1 | lg N | > ------------+-------+-------+-------+ > Nth | N | 1 | lg N | > ------------+-------+-------+-------+ I like that. And I agree, we should make an effort to be able to provide special database "views" like that. But I think this should be a comparison page, and not the only method for querying Big-O data. After all, not all modules have all those functions, and I'd like to be able to list Big-O for those other module functions too. -- Jesse Guardiani, Systems Administrator WingNET Internet Services, P.O. Box 2605 // Cleveland, TN 37320-2605 423-559-LINK (v) 423-559-5145 (f) http://www.wingnet.net |