From: <li...@mi...> - 2007-06-07 15:07:49
|
Hi Mike, I suggested to use Genparse for command line parsing in the GNU Coreutils. I very quickly got a reply from Jim Meyering from the Coreutils team. See http://lists.gnu.org/archive/html/bug-coreutils/2007-06/msg00035.html for the thread on the Coreutils mailing list. I don't think it is appropriate to post all the Genparse problems to the Coreutils mailing list. But I'm posting to the "genparse-users" mailing list in order to make it available to a broader audience. In this email I'm focusing on only 1 of Jim's concerns. See my comments below. On Thu, Jun 07, 2007 at 02:44:38PM +0200, Jim Meyering wrote: > li...@mi... (Michael Geng) wrote: > > would it be an option to use Genparse (http://genparse.sourceforge.net/) > > for command line parsing in the GNU Coreutils? > > > > I'm one of the developers of Genparse and I recently used some of the > > well known Coreutils as an exercise for testing Genparse (see > > http://genparse.sourceforge.net/examples.html). Using Genparse for > > generating the command line parsing code could reduce the amount of > > coreutils source code because the input to Genparse is a short config > > file only. The overhead of writing the parser code would be delegated > > to the tool then. > > > > The Genparse generated parsers call getopt() (or getopt_long()) exactly > > the same way the Coreutils's command line parsers do it today. This > > wouldn't be changed. So the code Genparse generates will be very similar > > to the existing hand written parsers of the individual Coreutils tools. > > But calling getopt() is only part of the work, preparing and evaluating > > the results of getopt() also causes coding work which could be delegated > > to Genparse. Genparse also automatically generates a help screen which > > would no longer have to be done manually. > > Hi Michael, > > Genparse looks promising. > I like the examples. But there are almost 100 programs in the > coreutils. If genparse can really handle all of those use cases > without causing any significant degradation in the tools, then > it will be hard to object. > > How does it deal with internationalization? > > However, before I even consider it seriously, it'll need > some improvements: > > - it must detect any and all write failures[*] > - packaging so that I can run ./configure && make && make check, and > if I don't happen to have cppunit infrastructure, gcj or something > similar, it should tell me about it directly, rather than causing > harder-to-diagnose build failures. > - one of the generated .c files I looked at calls strdup but doesn't > check for a NULL return value or (less important) attempt to avoid > the leak when the corresponding --backup=S option appears twice or more. I guess Jim looked into the mv example (http://genparse.sourceforge.net/examples/mv_clp.c). I propose not to allocate any memory dynamically in Genparse at all and use the stack instead. malloc() and strdup() might not be fast enough in all applications. For instance in the mv command I can hardly imagine that it is acceptable to have a call to malloc() for allocating the arg_t struct which would have to be executed with every call to mv no matter what command line switches or arguments were specified. By the way, the result of this call to malloc() is also not checked for failure. In principle the Coreutils implementation of the mv command does it similarly: The optind pointer which is returned by getopt_long() is first stored in a variable on the heap and later the content of the string is copied to a string with strdup() which allocates from the heap. In order to leave the freedom to do so to the user of Genparse it is required that Genparse uses only the stack. > - not exactly essential, but highly recommended: it should work > with the latest autoconf and automake > > > [*] with genparse-0.6.3, I get this: > > $ strace -o log -e write ./genparse -o /full/tmp/foo ../examples/ls.gp \ > && echo exit status 0 > creating /full/tmp/foo.h...done > creating /full/tmp/foo.c...done > exit status 0 > $ tail -2 log > write(3, "/*******************************"..., 80) = -1 ENOSPC (No space left on device) > write(1, "creating /full/tmp/foo.c...done\n", 32) = 32 > > The two files it claims to have created are both empty, > and genparse exited successfully. > This means genparse is not detecting write or close failures. > > Note that in the example above, /full/tmp is a full file system that still > has free inodes, so open can create new files, but writes always fail. > > > _______________________________________________ > Bug-coreutils mailing list > Bug...@gn... > http://lists.gnu.org/mailman/listinfo/bug-coreutils |
From: Mike B. <mi...@bo...> - 2007-06-12 01:20:06
|
Michael, Thanks for the note. If I understand correctly, you are changing the generated code so that is does not allocate any memory. This is a good technique that I have actually required of my vendors from time to time. It certainly helps with the debugging. I'm a bit embarrassed that I never thought of this 7 years ago :) Will the user of genparse now have to allocate memory before calling the genparse functions? If so we should document that very clearly. Thanks, Mike Michael Geng wrote: > Hi Mike, > > I suggested to use Genparse for command line parsing in the > GNU Coreutils. I very quickly got a reply from Jim Meyering > from the Coreutils team. See > http://lists.gnu.org/archive/html/bug-coreutils/2007-06/msg00035.html > for the thread on the Coreutils mailing list. I don't think > it is appropriate to post all the Genparse problems to the Coreutils > mailing list. But I'm posting to the "genparse-users" mailing list > in order to make it available to a broader audience. > > In this email I'm focusing on only 1 of Jim's concerns. See > my comments below. > > On Thu, Jun 07, 2007 at 02:44:38PM +0200, Jim Meyering wrote: >> li...@mi... (Michael Geng) wrote: >>> would it be an option to use Genparse (http://genparse.sourceforge.net/) >>> for command line parsing in the GNU Coreutils? >>> >>> I'm one of the developers of Genparse and I recently used some of the >>> well known Coreutils as an exercise for testing Genparse (see >>> http://genparse.sourceforge.net/examples.html). Using Genparse for >>> generating the command line parsing code could reduce the amount of >>> coreutils source code because the input to Genparse is a short config >>> file only. The overhead of writing the parser code would be delegated >>> to the tool then. >>> >>> The Genparse generated parsers call getopt() (or getopt_long()) exactly >>> the same way the Coreutils's command line parsers do it today. This >>> wouldn't be changed. So the code Genparse generates will be very similar >>> to the existing hand written parsers of the individual Coreutils tools. >>> But calling getopt() is only part of the work, preparing and evaluating >>> the results of getopt() also causes coding work which could be delegated >>> to Genparse. Genparse also automatically generates a help screen which >>> would no longer have to be done manually. >> Hi Michael, >> >> Genparse looks promising. >> I like the examples. But there are almost 100 programs in the >> coreutils. If genparse can really handle all of those use cases >> without causing any significant degradation in the tools, then >> it will be hard to object. >> >> How does it deal with internationalization? >> >> However, before I even consider it seriously, it'll need >> some improvements: >> >> - it must detect any and all write failures[*] >> - packaging so that I can run ./configure && make && make check, and >> if I don't happen to have cppunit infrastructure, gcj or something >> similar, it should tell me about it directly, rather than causing >> harder-to-diagnose build failures. >> - one of the generated .c files I looked at calls strdup but doesn't >> check for a NULL return value or (less important) attempt to avoid >> the leak when the corresponding --backup=S option appears twice or more. > > I guess Jim looked into the mv example > (http://genparse.sourceforge.net/examples/mv_clp.c). > > I propose not to allocate any memory dynamically in Genparse at all and use > the stack instead. malloc() and strdup() might not be fast enough in all > applications. For instance in the mv command I can hardly imagine that it > is acceptable to have a call to malloc() for allocating the arg_t struct > which would have to be executed with every call to mv no matter what > command line switches or arguments were specified. By the way, the result > of this call to malloc() is also not checked for failure. > > In principle the Coreutils implementation of the mv command does it > similarly: The optind pointer which is returned by getopt_long() is first > stored in a variable on the heap and later the content of the string is > copied to a string with strdup() which allocates from the heap. In order > to leave the freedom to do so to the user of Genparse it is required that > Genparse uses only the stack. > >> - not exactly essential, but highly recommended: it should work >> with the latest autoconf and automake >> >> >> [*] with genparse-0.6.3, I get this: >> >> $ strace -o log -e write ./genparse -o /full/tmp/foo ../examples/ls.gp \ >> && echo exit status 0 >> creating /full/tmp/foo.h...done >> creating /full/tmp/foo.c...done >> exit status 0 >> $ tail -2 log >> write(3, "/*******************************"..., 80) = -1 ENOSPC (No space left on device) >> write(1, "creating /full/tmp/foo.c...done\n", 32) = 32 >> >> The two files it claims to have created are both empty, >> and genparse exited successfully. >> This means genparse is not detecting write or close failures. >> >> Note that in the example above, /full/tmp is a full file system that still >> has free inodes, so open can create new files, but writes always fail. >> >> >> _______________________________________________ >> Bug-coreutils mailing list >> Bug...@gn... >> http://lists.gnu.org/mailman/listinfo/bug-coreutils > -- Mike Borella mike at borella dot net http://www.borella.net/mike |
From: <li...@mi...> - 2007-06-13 18:00:45
|
On Mon, Jun 11, 2007 at 06:01:51PM -0500, Mike Borella wrote: > Michael, > > Thanks for the note. If I understand correctly, you are changing the > generated code so that is does not allocate any memory. This is a good > technique that I have actually required of my vendors from time to time. > It certainly helps with the debugging. One more argument for doing this change is that then the C code will be closer to the C++ code because the C++ code doesn't allocate any memory right now. > I'm a bit embarrassed that I never thought of this 7 years ago :) > > Will the user of genparse now have to allocate memory before calling the > genparse functions? If so we should document that very clearly. I absolutely agree. I plan to look through the man and info pages and I will have to regenerate the C examples. My question is: Do we really have to support both solutions (the existing one which allocates memory) and the new one (which doesn't allocate any memory)? The problem with throwing away the old solution is that this would break compatibility. How popular is Genparse today? Can we still afford to make incompatible changes? Will it be a problem for you personally, Mike? The big advantage of dropping the old memory allocating solution is simplicity. The code will become simpler and we avoid adding one more command line option to Genparse. I would vote for dropping the old solution and not confuse any users with 2 different implementations for the C output. Michael |
From: <li...@mi...> - 2007-06-16 17:24:39
|
On Thu, Jun 07, 2007 at 02:44:38PM +0200, Jim Meyering wrote: > However, before I even consider it seriously, it'll need > some improvements: > > - it must detect any and all write failures[*] I just added a new release of genparse (http://sourceforge.net/project/showfiles.php?group_id=4341&package_id=4354&release_id=516584) which now also checks file write and close operations, not only file open operations as it was before. > - packaging so that I can run ./configure && make && make check, and > if I don't happen to have cppunit infrastructure, gcj or something > similar, it should tell me about it directly, rather than causing > harder-to-diagnose build failures. I went through all these tools and updated configure.ac (and related files) such that you get a warning if something is missing but you can still compile genparse. For example if you don't have gcj installed then it automatically removes the java example from the list of build targets. It now checks for cunit, cppunit, junit, gcj, doxygen, texi2html, texi2pdf and man2html. > - one of the generated .c files I looked at calls strdup but doesn't > check for a NULL return value or (less important) attempt to avoid > the leak when the corresponding --backup=S option appears twice or more. I'm working on that and I'll let you know when this is finished. > - not exactly essential, but highly recommended: it should work > with the latest autoconf and automake I'm using autoconf version 2.59 and automake 1.9.5. Aditionally I renamed configure.in to configure.ac which appears to be the standard today and removed acconfig.h because it appears to be deprecated. This might have produced some warnings before. > [*] with genparse-0.6.3, I get this: > > $ strace -o log -e write ./genparse -o /full/tmp/foo ../examples/ls.gp \ > && echo exit status 0 > creating /full/tmp/foo.h...done > creating /full/tmp/foo.c...done > exit status 0 > $ tail -2 log > write(3, "/*******************************"..., 80) = -1 ENOSPC (No space left on device) > write(1, "creating /full/tmp/foo.c...done\n", 32) = 32 > > The two files it claims to have created are both empty, > and genparse exited successfully. > This means genparse is not detecting write or close failures. > > Note that in the example above, /full/tmp is a full file system that still > has free inodes, so open can create new files, but writes always fail. I could verify this behavior and I hope it's fixed now. Now it tells you that it can't write to the file and returns with a non zero exit status. Thanks again for your input, Michael |
From: <li...@mi...> - 2007-06-23 15:22:53
|
On Sat, Jun 16, 2007 at 07:23:09PM +0200, Michael Geng wrote: > On Thu, Jun 07, 2007 at 02:44:38PM +0200, Jim Meyering wrote: > > However, before I even consider it seriously, it'll need > > some improvements: > > > > - it must detect any and all write failures[*] > > I just added a new release of genparse > (http://sourceforge.net/project/showfiles.php?group_id=4341&package_id=4354&release_id=516584) > which now also checks file write and close operations, not only > file open operations as it was before. > > > - packaging so that I can run ./configure && make && make check, and > > if I don't happen to have cppunit infrastructure, gcj or something > > similar, it should tell me about it directly, rather than causing > > harder-to-diagnose build failures. > > I went through all these tools and updated configure.ac (and related files) > such that you get a warning if something is missing but you can still > compile genparse. For example if you don't have gcj installed then it > automatically removes the java example from the list of build targets. > It now checks for cunit, cppunit, junit, gcj, doxygen, texi2html, texi2pdf > and man2html. > > > - one of the generated .c files I looked at calls strdup but doesn't > > check for a NULL return value or (less important) attempt to avoid > > the leak when the corresponding --backup=S option appears twice or more. > > I'm working on that and I'll let you know when this is finished. I again released a new version of genparse (0.6.5) which doesn't allocate any memory on the heap any more because I think this is a better concept. So it doesn't generate any calls to strdup() or malloc() any more. As a consequence the parser function changed from struct arg_t *Cmdline(int argc, char *argv[]) to void Cmdline(struct arg_t *my_args, int argc, char *argv[]). See the Genparse examples page (http://genparse.sourceforge.net/examples.html) which is also updated. > > > - not exactly essential, but highly recommended: it should work > > with the latest autoconf and automake > > I'm using autoconf version 2.59 and automake 1.9.5. Aditionally I renamed > configure.in to configure.ac which appears to be the standard today and > removed acconfig.h because it appears to be deprecated. This might have > produced some warnings before. > > > [*] with genparse-0.6.3, I get this: > > > > $ strace -o log -e write ./genparse -o /full/tmp/foo ../examples/ls.gp \ > > && echo exit status 0 > > creating /full/tmp/foo.h...done > > creating /full/tmp/foo.c...done > > exit status 0 > > $ tail -2 log > > write(3, "/*******************************"..., 80) = -1 ENOSPC (No space left on device) > > write(1, "creating /full/tmp/foo.c...done\n", 32) = 32 > > > > The two files it claims to have created are both empty, > > and genparse exited successfully. > > This means genparse is not detecting write or close failures. > > > > Note that in the example above, /full/tmp is a full file system that still > > has free inodes, so open can create new files, but writes always fail. > > I could verify this behavior and I hope it's fixed now. Now it tells you > that it can't write to the file and returns with a non zero exit status. > > Thanks again for your input, > Michael > > > _______________________________________________ > Bug-coreutils mailing list > Bug...@gn... > http://lists.gnu.org/mailman/listinfo/bug-coreutils |
From: <li...@mi...> - 2007-06-08 13:10:51
|
On Thu, Jun 07, 2007 at 05:06:33PM +0200, Michael Geng wrote: > On Thu, Jun 07, 2007 at 02:44:38PM +0200, Jim Meyering wrote: > > - one of the generated .c files I looked at calls strdup but doesn't > > check for a NULL return value or (less important) attempt to avoid > > the leak when the corresponding --backup=S option appears twice or more. > > I guess Jim looked into the mv example > (http://genparse.sourceforge.net/examples/mv_clp.c). > > I propose not to allocate any memory dynamically in Genparse at all and use > the stack instead. malloc() and strdup() might not be fast enough in all > applications. For instance in the mv command I can hardly imagine that it > is acceptable to have a call to malloc() for allocating the arg_t struct > which would have to be executed with every call to mv no matter what > command line switches or arguments were specified. By the way, the result > of this call to malloc() is also not checked for failure. > > In principle the Coreutils implementation of the mv command does it > similarly: The optind pointer which is returned by getopt_long() is first > stored in a variable on the heap and later the content of the string is > copied to a string with strdup() which allocates from the heap. In order > to leave the freedom to do so to the user of Genparse it is required that > Genparse uses only the stack. The problem with not using malloc() and strdup() is that this would break compatibility of Genparse. For this reason I suggest to add a new command line switch -s/--noheap to Genparse. If this switch is not set (the default) then Genparse will stay compatible to previous versions. If it is set then the parser interface will change: ----------------------------- The compatible version of the generated parser of the mycopy3 example (--noheap not set) will look as follows. I marked the lines which have changed with '>'. struct arg_t *Cmdline(int argc, char *argv[]) { extern char *optarg; extern int optind; int c; struct arg_t *my_args; int errflg = 0; static struct option long_options[] = { {"iterations", 1, 0, 'i'}, {"outfile", 1, 0, 'o'}, {"longopt", 0, 0, 256}, {"help", 0, 0, 'h'}, {"version", 0, 0, 'v'}, {0, 0, 0, 0} }; my_args = (struct arg_t *) malloc (sizeof(struct arg_t)); > if (my_args == NULL) > { > fprintf(stderr, "error: out of memory\n"); > return NULL; > } my_args->i = 1; my_args->o = NULL; my_args->longopt = false; my_args->h = false; my_args->v = false; optind = 0; while ((c = getopt_long(argc, argv, "i:o:hv", long_options, &optind)) != EOF) { switch(c) { case 'i': my_args->i = atoi(optarg); if (my_args->i < 0) { fprintf(stderr, "parameter range error: i must be >= 0\n"); errflg++; } break; case 'o': > if (my_args->o != NULL) > free(my_args->o); > if ((my_args->o = strdup(optarg)) == NULL) > { > fprintf(stderr, "error: out of memory\n"); > errflg++; > } break; case 256: my_args->longopt = true; break; case 'h': my_args->h = true; usage(argv[0]); break; case 'v': my_args->v = true; break; default: usage(argv[0]); } } /* while */ if (errflg) usage(argv[0]); if (optind >= argc) my_args->optind = 0; else my_args->optind = optind; return my_args; } ----------------------------- The new version which doesn't use malloc() and strdup() (--noheap set) will look as follows. void Cmdline(struct arg_t *my_args, int argc, char *argv[]) { extern char *optarg; extern int optind; int c; int errflg = 0; static struct option long_options[] = { {"iterations", 1, 0, 'i'}, {"outfile", 1, 0, 'o'}, {"longopt", 0, 0, 256}, {"help", 0, 0, 'h'}, {"version", 0, 0, 'v'}, {0, 0, 0, 0} }; my_args->i = 1; my_args->o = NULL; my_args->longopt = false; my_args->h = false; my_args->v = false; optind = 0; while ((c = getopt_long(argc, argv, "i:o:hv", long_options, &optind)) != EOF) { switch(c) { case 'i': my_args->i = atoi(optarg); if (my_args->i < 0) { fprintf(stderr, "parameter range error: i must be >= 0\n"); errflg++; } break; case 'o': > my_args->o = optarg; break; case 256: my_args->longopt = true; break; case 'h': my_args->h = true; usage(argv[0]); break; case 'v': my_args->v = true; break; default: usage(argv[0]); } } /* while */ if (errflg) usage(argv[0]); if (optind >= argc) my_args->optind = 0; else my_args->optind = optind; return my_args; } ----------------------------- The latter version is clearly the most simple one. For this reason I would recommend to always set the --noheap switch. I would also like to update all the examples including those in the info pages to use --noheap. I found another issue with the present version of Genparse: If a string has a default value then setting this parameter will always cause a memory leak because it is initialized with a strdup() result and when the parameter is actually processed it is overwritten with another strdup() result without freeing the old value. The same happens if you specify the same string parameter twice on the command line as Jim mentioned. Both cases will be solved in the code above no matter if Genparse is called with or without --noheap. Michael |