Thread: [Ocaml-lib-devel] big-O notation and the stdlib/extlib

Brought to you by: adubey, ncannasse

ocaml-lib-devel

[Ocaml-lib-devel] big-O notation and the stdlib/extlib

From: Jesse G. <je...@wi...> - 2004-07-30 21:34:37

Hello,

The developers of extlib seem particularly bent
on maximum performance. I have no problem with
that. In fact, I think it's great. One of the
things that attracted me to OCaml is the speed.

However, since big-O notation is so widely used
to shoot down new code submissions and generally
describe the speed of various functions during
discussion, why the heck don't we clearly state
the expected big-O performance of the function in
the documentation?

I'm concerned about performance, but as an OCaml
newbie on a tight schedule I don't have time to
read/evaluate/benchmark each function in extlib
or stdlib that I want to use to see just how fast
it is. If we stated the presumed big-O scalability
in the docs then making an intelligent decision
about which module or function to use would be a
lot easier/faster.

In addition, I think it would be nice to put up
a page on the web stating the presumed big-O
scalability of as many stdlib functions as possible.

This way people will have a clear reason to
choose the extlib version of a function over
the stdlib version, and we will be providing
a valuable service to the OCaml community.

What do you think?

-- 
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net

Re: [Ocaml-lib-devel] big-O notation and the stdlib/extlib

From: Brian H. <bh...@sp...> - 2004-07-30 22:10:25

On Fri, 30 Jul 2004, Jesse Guardiani wrote:

> However, since big-O notation is so widely used
> to shoot down new code submissions and generally
> describe the speed of various functions during
> discussion, why the heck don't we clearly state
> the expected big-O performance of the function in
> the documentation?

I'd like to second this proposal.  I don't have the time to do it this at 
the moment, but I think it's a real good idea.

One of the biggest problems people new to Ocaml get into effectively boil 
down to using the wrong algorithm in the wrong situation.  Using List.nth, 
for example, is almost always a mistake- if you need it, you probably 
shouldn't be using a list. 

> In addition, I think it would be nice to put up
> a page on the web stating the presumed big-O
> scalability of as many stdlib functions as possible.

As a big table.  Each row is an operation, each column a module or data 
structure.  So you can go "what do I need to do with this structure?" and 
easily compare different structures against each other.

Another thing I've been thinking about doing for a while.

> What do you think?

I think it's a good idea.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

Re: [Ocaml-lib-devel] big-O notation and the stdlib/extlib

From: Nicolas C. <nca...@mo...> - 2004-07-30 22:50:10

> > However, since big-O notation is so widely used
> > to shoot down new code submissions and generally
> > describe the speed of various functions during
> > discussion, why the heck don't we clearly state
> > the expected big-O performance of the function in
> > the documentation?
>
> I'd like to second this proposal.  I don't have the time to do it this at
> the moment, but I think it's a real good idea.
>
> One of the biggest problems people new to Ocaml get into effectively boil
> down to using the wrong algorithm in the wrong situation.  Using List.nth,
> for example, is almost always a mistake- if you need it, you probably
> shouldn't be using a list.
>
> > In addition, I think it would be nice to put up
> > a page on the web stating the presumed big-O
> > scalability of as many stdlib functions as possible.
>
> As a big table.  Each row is an operation, each column a module or data
> structure.  So you can go "what do I need to do with this structure?" and
> easily compare different structures against each other.
>
> Another thing I've been thinking about doing for a while.
>
> > What do you think?
>
> I think it's a good idea.

I think the people using List.nth either :
- didn't understood that it was a linked list, so we might clarify this in
the documentation
- don't know at all about complexity, so having big-O notations won't help
them either
Adding big-O notations in the documentation is only necessary when the
actual implementation is different from what can be naturally expected from
it, or when the algorithm is unknown ( a good example for this is O(1) for
String.length).

So I think we might not put big O notations everywhere in the OCamldoc's of
ExtLib, but having a separate HTML page - that would be put on the ExtLib
website - comparing data structures and algorithms complexity with a short
introduction for each would be definitly a nice thing to do.
Is the author is willing to do so, he's welcome.

Regards,
Nicolas Cannasse

Re: [Ocaml-lib-devel] big-O notation and the stdlib/extlib

From: Dustin S. <du...@sp...> - 2004-07-30 23:12:28

On Jul 30, 2004, at 15:50, Nicolas Cannasse wrote:

> I think the people using List.nth either :
> - didn't understood that it was a linked list, so we might clarify 
> this in
> the documentation
> - don't know at all about complexity, so having big-O notations won't 
> help
> them either

	How about:

- find it efficient enough to solve their problems without doing 
something less elegant.

	Most of the time, it's not performance at all costs.  Using nth or 
length on lists that are guaranteed to be small will probably not have 
much of an affect on performance.  O(n) for a fast function with 
guaranteed really small values of n can certainly be faster than O(1) 
for a slow function.

	I mean, I realize this is a general purpose library and all that...but 
whatever happened to optimizing when things need to be optimized?  You 
can't just declare people ignorant because they use a function that 
could theoretically be a bottleneck in an extreme case.

--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <du...@sp...>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________

Re: [Ocaml-lib-devel] big-O notation and the stdlib/extlib

From: Brian H. <bh...@sp...> - 2004-07-30 23:23:04

On Fri, 30 Jul 2004, Dustin Sallings wrote:

> - find it efficient enough to solve their problems without doing 
> something less elegant.
> 
> 	Most of the time, it's not performance at all costs.  Using nth or 
> length on lists that are guaranteed to be small will probably not have 
> much of an affect on performance.  O(n) for a fast function with 
> guaranteed really small values of n can certainly be faster than O(1) 
> for a slow function.

I am of the opinion that if you (think you) know what you're doing, have 
all the rope you want.  Don't say we didn't warn you, however.

Because you *might* be right.

> 
> 	I mean, I realize this is a general purpose library and all that...but 
> whatever happened to optimizing when things need to be optimized?  You 
> can't just declare people ignorant because they use a function that 
> could theoretically be a bottleneck in an extreme case.
> 

One of the design criteria I think about in trying to design APIs for 
fundamental data structures is how easy it is to swap one structure out 
for another.  As that encourages to use the "obvious" data structure in 
developing code, and then when performance isn't what you expected/needed, 
makes it easier to swap to different (more complicated) data structures.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

Re: [Ocaml-lib-devel] big-O notation and the stdlib/extlib

From: skaller <sk...@us...> - 2004-07-31 03:48:10

On Sat, 2004-07-31 at 08:50, Nicolas Cannasse wrote:

> I think the people using List.nth either :
> - didn't understood that it was a linked list, so we might clarify this in
> the documentation
> - don't know at all about complexity, so having big-O notations won't help
> them either

Nah, I use it, and know both these things.

I don't care. I doesn't matter. Finding the tail element
of a list (the worst possible value of n) I do quite often: 
List.length must do this too.

The thing is my list might be the list of parameters 
of a function where the length is 0, 1 or 2 usually, sometimes
even 5 or 6 elements in the list -- who writes functions
with 20 arguments will expect the compiler to take 0.01
microseconds longer ...

One needs to get things in proportion -- lists are easy to 
handle in Ocaml: they're functional, you can match on them,
they're very cheap to construct: in the case of parameters
I could actually use an array (i know in advance how many
there are) but its easier to get a tail recursive pattern
match list visitor to work than get the bounds of a for
loop right.

I also use List.nth on a HUGE list -- all tokens of
a program.  That function is only called to report
an error, and at most once, since the next thing I do
is terminate. So who cares -- I'm just scanning for
enough context to print a decent error message -- if it takes
a few seconds it just doesn't matter.

-- 
John Skaller, mailto:sk...@us...
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net

[Ocaml-lib-devel] Re: big-O notation and the stdlib/extlib

From: Jesse G. <je...@wi...> - 2004-08-01 14:04:13

Brian Hurt wrote:

> On Fri, 30 Jul 2004, Jesse Guardiani wrote:
> 
>> However, since big-O notation is so widely used
>> to shoot down new code submissions and generally
>> describe the speed of various functions during
>> discussion, why the heck don't we clearly state
>> the expected big-O performance of the function in
>> the documentation?
> 
> I'd like to second this proposal.  I don't have the time to do it this at
> the moment, but I think it's a real good idea.

I *might*. It seems like a neat project, and a good excuse to learn mod_caml,
so I think I'm going to spend a half hour or hour a day discussing and designing
this for the next few weeks and see where it gets me. If I make decent progress
and have some good ideas that a lot of people like, then maybe I'll continue on
to the coding phase.

On the drive home, Friday, I was thinking that it would be best to make a
general repository of Big-O information. It can start out as an OCaml repository,
then once the OCaml stdlib and Extlib are mostly documented, we can open it up
to the other ML languages, Java, Python, Perl, PHP, etc...

The people using those languages (except maybe the MLs) typically don't care so
much about performance as the people using OCaml, but it would be useful info
nevertheless.

I figure the website will take the following form: We'll have a top level menu
of languages, then projects (i.e. stdlib vs extlib), then modules, then classes,
then functions. Once you click on a function, we will *try* to list the
theoretical Big-O scalability of that function (I won't be doing the listing.
I'll just write and host the website and let more knowledgable volunteers fill
in the blanks).

Then, on the same page, below the theoretical Big-O, we can have a section for
real-world benchmarks supporting or refuting the theoretical Big-O scalability.
This section will list benchmarks by OS type and version, arch type, CPU speed,
and by the submitter's name. We'll also have a place for the benchmark code
and results.

Finally, I think it would be a good idea to have a discussion board below each
benchmark section, like they do in the on-line MySQL manuals. I'd like to keep
this all 100% OCaml based, to provide a little positive PR for OCaml, so does
anyone have an OCaml based web forum or blog available for assimilation?

This way, when someone performs a search for "String.length", we can display
the theoretical Big-O notation, the number of benchmarks supporting, and the
number of benchmarks refuting, and a percentile of the two. The user can then
make an informed decision to use or not use the function based on this information,
or if the user is in doubt, he/she can click on the function and view the
benchmarks and any discussions.

The only thing I'm hazy on right now is how to handle version control. I want
to be able to list function versions, and which version of the project introduced
the function so a user can compare Big-O between versions.

The first step will be to create a sourceforge project page and a mailing list,
but I'd like to get some additional feedback from this list before I do so.
Criticism, priase, and new ideas are all welcome.

What do you think?

-- 
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net

[Ocaml-lib-devel] Re: big-O notation and the stdlib/extlib

From: Jesse G. <je...@wi...> - 2004-08-01 14:10:35

Brian Hurt wrote:

> On Fri, 30 Jul 2004, Jesse Guardiani wrote:
> 
>> However, since big-O notation is so widely used
>> to shoot down new code submissions and generally
>> describe the speed of various functions during
>> discussion, why the heck don't we clearly state
>> the expected big-O performance of the function in
>> the documentation?
> 
> I'd like to second this proposal.  I don't have the time to do it this at
> the moment, but I think it's a real good idea.
> 
> One of the biggest problems people new to Ocaml get into effectively boil
> down to using the wrong algorithm in the wrong situation.  Using List.nth,
> for example, is almost always a mistake- if you need it, you probably
> shouldn't be using a list.
> 
>> In addition, I think it would be nice to put up
>> a page on the web stating the presumed big-O
>> scalability of as many stdlib functions as possible.
> 
> As a big table.  Each row is an operation, each column a module or data
> structure.  So you can go "what do I need to do with this structure?" and
> easily compare different structures against each other.

Could you give an example? I'm having a hard time visualizing what you
mean to compare with my own ideas.

-- 
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net