Re: status update

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

First a word to GSoC: I'm of course sad that I didn't made it through
the final evaluation, but it was definitely the right decision. I had
not achieved the main goal (making a release, updating gnulib) and
thus would've decided the same way (there was a checkbox asking about
my own impression about the project status where I noted exactly this
impression myself, too).

> BUFSIZ is a pretty standard constant for all string buffers.

Eh, can you support that? It's a standard constant for STREAM
buffers. This is even defined by the C (C99) standard. Using it for
anything that's not directly related to a stream thus seems wrong to
me.

The only thing that might be interpreted as "standard for [..] string
buffers" is this quote from the glibc documentation:

> Sometimes people also use BUFSIZ as the allocation size of buffers
> used for related purposes, such as strings used to receive a line of
> input with fgets (see Character Input). There is no particular
> reason to use BUFSIZ for this instead of any other integer, except
> that it might lead to doing I/O in chunks of an efficient size.

Though this too does not specify whether BUFSIZ will be small enough
to be put onto the stack. Moreover it's just in the documentation of a
single libc, there might be systems that have a huge BUFSIZ but only
provide limited stack space.

> Let us revisit this issue at a later date.
> I think the with_string_0 mechanism is good enough.
> If disagree, you will have to argue for it to be changed pervasively
> throughout CLISP.

with_string_0 is not involved here. I'm concerned by this (from regexi.c):

begin_system_call();
ret = (regmatch_t*)alloca((re->re_nsub+1)*sizeof(regmatch_t));
end_system_call();

re->re_nsub is the number of subexpressions, and if the regex is in
anyway "modifyable" by a malicious actor (e.g. a POST parameter for a
search field), then that actor could pass a regex with lots of
subexpressions, thus causing above alloca to produce a stack overflow
(in the best case).

> This is why we don't run gnulib-tool in that directory!
> We only ever run it in src.

Hm, I'm afraid that this not a good idea, at least not a scalable
one. Let me explain: We have modules because we don't want to have
their code in core CLISP, and we want to be able to (or let the user)
provide modules to extend CLISP at will (and even at runtime, with
dynamic loading).

And adding a new module should not require changes to core CLISP,
right? A user should be able to write some module, and let clisp-link
from the installed CLISP do it's magic.

Now assume a CLISP module needs (for example) access to some_function,
but core CLISP does not. The gnulib module the_module provides that
some_function on systems where it's not available. Should we add
the_module to core CLISP? I think no, because that would bloat core
CLISP (and we'd need special linker flags to actually have it in the
resulting binary). Thus we add it to the CLISP module, and
everything's fine.

Until we extend the CLISP module and it suddenly needs
another_function. Core CLISP happens to need this function, too. So
the gnulib module another_module which provides this function is
already included in CLISP. Now we could just use that and be done with
it.

Until we need another version (newer, older, using xalloc-die instead
of xalloc, ...) of it. Or another_module and the_module both depend on
a gnulib module lowlevel_thing. another_module only works with
lowlevel_thing from a year ago, but the_module needs a recent one.

My point is: gnulib is not designed to be something "shareable"
across projects (and core CLISP and a module is basically that: two
separate projects).

Thus IMO the correct approach at using gnulib is to have a gnulib
checkout for core CLISP, using only the gnulib modules needed by core
CLISP. And letting each CLISP module (which wants/needs to use gnulib)
maintain an own gnulib checkout with only the gnulib modules needed by
that particular CLISP module.

This adds some bloat when functionality overlaps (rawsock and
socket.d, and basic stuff like file IO), but probably only on "gnulib
intensive" platforms (windows?):

https://sourceforge.net/p/clisp/bugs/634/

Also I think complaining about a missing libgnu.so won't help. That's
just not the way gnulib is supposed to be used.

> what was the problem [updating gnulib for core CLISP]?

I'm not entierly sure. Makefile.devel tried to update (?) something
(configure?) for all modules first, but this failed for all
modules. IMO updating gnulib should - in the end - be no more effort
than running gnulib-tool --update (in the correct directories, that is
the top level directory and every module directory that has an own
gnulib checkout, we could put that into a script or Makefile.devel
then).

> Fine. This means that the change necessary for a release is actually
> quite small:
> [snip]
> Right?

Unfortunately it's not that easy: rawsock fails to build on Windows
(MinGW) because the gnulib code (from core CLISP) is too old and makes
a (now) false assumption about the internals of MinGW header files.

Thus, the necessary changes are:

1) Either update gnulib for core CLISP, or give rawsock its own gnulib
   (that's what I did and - due to the reasoning above - I'd argue
   for)

2) Remove "windows" specific code from rawsock.c: The typedef,
   including windows headers, the parts with #if defined(WIN32_NATIVE)

At this point it will compile, but has reduced functionality, because
gnulib has a (IMHO) design flaw: If there's no (for example) netdb.h,
then the corresponding gnulib module provides one. But it does not
#define HAVE_NETDB_H, thus our code would not use it (because of the
#if defined(HAVE_NETDB_H) conditional source parts).

Therefore we need to either remove the #if defined(..) stuff, or
(better) find a way to determine whether gnulib does provide a
replacement or not.

The first option is straightforward but will lead to issues with
platforms that are not supported by gnulib or where gnulib does not
provide a replacement (should we again target them).

> PLAN:
>
> -1- fix rawsock on windows and make a release (2.50)
> 2/3 *.d --> *.c rename
> 2/3 switch to autotools (dropping generated files)
> -4- update gnulib
> -5- your proposed regexp changes
> -6- release (3.0)

Due to all of the above discussion I think the first and most
important thing to do is to find a consensus on how we use gnulib, and
then update it (otherwise rawsock will not work).

> Okay, so you want to go the way of Emacs - the developers have to
> install autotools and the generated files are excluded from VCS.
> Fine.
> Let us do that after the release.
>
> Note, however, that you should use "hg mv" for configure.in -->
> configure.am transition and make changes to configure.am only after
> committing the "mv" operation (same for _all_ renaming).

Ok :)