Re: [Ocaml-lib-devel] Future of ExtLib

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, 2004-12-17 at 22:36, Bardur Arantsson wrote:

> > Doing things 'the right way' can't be overkill can it?
> > I would expect this to be lightning fast and it should
> > make it easy to generalise/extend ..?
> 
> I'm not sure there will ever be any need to
> generalise/extend, but anyway...

Consider:

(a) C has certain rules for resolving #include filenames.
Recall eveb "kjkjh" and <kjhgkjg> are distinct ..

(b) Felix (and Interscript) also have rules for resolving
filenames. Interscript rules are in fact quite interesting:

Names of include files in interscript are *required* to
be Unix relative filenames. From the command line,
a native filename prefix is taken.
The two are spliced together, after converting
the interscript name to a native one.
[This ensures everything 'in document' is as OS independent
as possible]

(c) There are several conventions for PATH names.
The unix one (separator ':') is one, but TeX uses
kpathsea ..

(d) I have used an archaic system which is much better
than any of the above .. a TI os which has no subdirectories
at all. Instead it has something much better: an environment
plus structured filename convention. Filename components
are replaced from the environment, subsuming the current
directory idea completely.

(e) Hmm what about URL/URI things .. :)

In any case the idea of using a parser for filenames
isn't overkill IMHO .. on the contrary the problem is 
more likely to be that mere LALR1 parsing and Ocamllex
lexing simply isn't good enough (some OS use
UCS-2 filenames .. solaris and Win32 for example ..)

Ocamllex, for example, can't even translate UTF-8 
(I tried once, it blows the lexer generators brains out).

The real advantage of a lexer/parser combination seems
to be that the specification is heavily declarative.

> To my mind, using full-blown parsers is overkill for
> splitting UNIX paths into their constituent parts, 

But we're not restricted to Unix..

> Another concern, which is more related to the interface is
> that the module seems to raise exceptions in situations
> one wouldn't normally expect. An an example:
> 
>     FilePath.check_extension p ext
> 
> raises an exception if the filename doesn't have an
> extension. I can't tell whether this is a part of the
> interface

Agree. I think exceptions should be reserved for
unrecoverable errors if possible, but in any case
it should be documented.

> Apart from that, I feel that simple 'shortcut' path
> queries like is_dir, is_link, etc. should be added to the
> FilePath module. I realize that this removes the
> separation of purely abtract paths and concrete files, but
> it's just too convenient to pass up IMO.

I think the separation is an *essential* feature.
It's there by design and for a good reason: you can
have Unix, MacOS, and Win32 modules all available at once.
These modules must not be permitted to touch the
file system.

However short version in the FileUtil module,
OR, a third module, (eg FileQuery) may make sense.
(Especially as 'fstat' and friends are fairly
OS specific animals)

> All of the above is stuff that's fixable, so I guess the
> best idea would be to just decide on an answer to the
> question
> 
>      Do we want a path/file query/manipulation module in ExtLib?     
> 
> My answer would definitely be 'yes' 

Mine too. Filenames are needed even in Pervasives ..

> Maybe I'm just stupid, but I don't see why a test harness
> would require lots of work...?

Because it has to

(a) Collate all the tests
(b) Run the tests -- terminating rogues 
(c) Collate the results
(d) Standardise a way to actually report results

Point (d) is extremely difficult.

> Of course, writing individual test cases for all the
> modules would be a lot of work, but this is work that can
> be done incrementally.

Yes, I don't think writing the tests is the issue.
We can start with just half a dozen, and make sure
for every bug there is a regression test generated.

> Am I missing something?

Yeah -- designing a test system is probably harder
than designing a library or an average application.
This is especially the case for Ocaml I suspect,
since it doesn't support dynamic loading and unloading. 
That seems to mean each test must be a separate process.

In addition some tests are dangerous, especially ones
that mess with the filesystem or network -- usually you'd run
your tests in a suitably restricted environment
as an low privilege user.. which makes the test
harnes OS specific ..

In addition, we will need community support:
something like a web page for submitting tests,
so that they're easy to install, and meet requirements,
such as having a description, expected output, etc ..

I personally think 'unit' testing is a bit silly.
It rarely finds bugs because it can't cover enough cases,
can't handle integration, etc.. I prefer a more sloppy
concept of just collecting whatever test code you can
and running it, just to get some confidence you didn't
completely mess something up whilst committing a minor
change to CVS.

-- 
John Skaller, mailto:sk...@us...
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net