You can subscribe to this list here.
2006 |
Jan
(4) |
Feb
(3) |
Mar
(4) |
Apr
(5) |
May
(10) |
Jun
(7) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(18) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(5) |
Feb
(9) |
Mar
(24) |
Apr
(3) |
May
(2) |
Jun
(1) |
Jul
(8) |
Aug
(3) |
Sep
(10) |
Oct
(5) |
Nov
|
Dec
(9) |
2009 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(39) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
(7) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(6) |
Dec
(3) |
2013 |
Jan
(13) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(6) |
Dec
(2) |
2014 |
Jan
|
Feb
|
Mar
(2) |
Apr
(7) |
May
(8) |
Jun
|
Jul
(3) |
Aug
(3) |
Sep
(15) |
Oct
|
Nov
(15) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(5) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
(2) |
Feb
|
Mar
(1) |
Apr
(10) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
2019 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(10) |
Jul
|
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
(1) |
2020 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
(27) |
Oct
(4) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(20) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Andrew J. S. <as...@te...> - 2023-08-21 10:30:21
|
Hi, This is not a bug. Please set XMLMODE=0 and see that it works. Without that, it is trying to parse the output from sort as XML data. Regards, Andy On Thu, Aug 17, 2023 at 09:23:57PM +0200, Jürgen Kahrs via Gawkextlib-users wrote: > Hello, > > the following posting looks like a reasonable question to our list. > For some unknown reason it was auto-discarded by the mail server. > Therefore I am re-posting it myself with a copy going to the author. > > The origin of the problem probably is that the loading of the XML extension > switches the behaviour of the interpreter when processing two-way I/O. > To the best of my knowledge, this is intended and documented somewhere. > > Jürgen Kahrs > > > Hi everyone, > > I'm a huge fan of the xml Extension for gawk. Great stuff! > Recently I noticed that I cannot open another process within gawk when the > xml extension is loaded (Two-way I/O (The GNU Awk User’s Guide) > > Simple script to reproduce (I basically took the example from the man > page): > > Does not work: > > @load "xml" > BEGIN { > command = "LC_ALL=C sort" > n = split("abc", a, "") > > for (i = n; i > 0; i--) > print a[i] |& command > close(command, "to") > > while ((command |& getline line) > 0) > print "got", line > close(command) > } > > Works (notice the "@load" Statement is commented) > > #@load "xml" > BEGIN { > command = "LC_ALL=C sort" > n = split("abc", a, "") > > for (i = n; i > 0; i--) > print a[i] |& command > close(command, "to") > > while ((command |& getline line) > 0) > print "got", line > close(command) > } > > Interestingly enough if I load other extensions (e.g @load "json") it is no > problem. > It seems to be an issue specifically with xml. > > Am I doing something wrong here? Can anyone confirm? > Maybe this was already reported > > Versions I used: > □ GNU Awk 5.2.2 > □ XML: 1.1.1 > Greetings > Joseph > > > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users -- Andrew Schorr e-mail: as...@te... Telemetry Investments, L.L.C. phone: 917-305-1748 152 W 36th St, #402 fax: 212-425-5550 New York, NY 10018-8765 |
From: <jue...@go...> - 2023-08-17 19:24:08
|
Hello, the following posting looks like a reasonable question to our list. For some unknown reason it was auto-discarded by the mail server. Therefore I am re-posting it myself with a copy going to the author. The origin of the problem probably is that the loading of the XML extension switches the behaviour of the interpreter when processing two-way I/O. To the best of my knowledge, this is intended and documented somewhere. Jürgen Kahrs > Hi everyone, > > I'm a huge fan of the xml Extension for gawk. Great stuff! > Recently I noticed that I cannot open another process within gawk when the xml extension is loaded (Two-way I/O (The GNU Awk User’s Guide) <https://www.gnu.org/software/gawk/manual/html_node/Two_002dway-I_002fO.html> > > Simple script to reproduce (I basically took the example from the man page): > > Does not work: > > @load "xml" > BEGIN { > command = "LC_ALL=C sort" > n = split("abc", a, "") > > for (i = n; i > 0; i--) > print a[i] |& command > close(command, "to") > > while ((command |& getline line) > 0) > print "got", line > close(command) > } > > Works (notice the "@load" Statement is commented) > > #@load "xml" > BEGIN { > command = "LC_ALL=C sort" > n = split("abc", a, "") > > for (i = n; i > 0; i--) > print a[i] |& command > close(command, "to") > > while ((command |& getline line) > 0) > print "got", line > close(command) > } > > Interestingly enough if I load other extensions (e.g @load "json") it is no problem. > It seems to be an issue specifically with xml. > > Am I doing something wrong here? Can anyone confirm? > Maybe this was already reported > > Versions I used: > > * GNU Awk 5.2.2 > * XML: 1.1.1 > > Greetings > Joseph > > |
From: Andrew J. S. <as...@te...> - 2023-02-21 14:56:12
|
Hi, Thanks so much for drilling down and finding this. I just reported it to bug-gawk and cc'ed you. Best, Andy On Mon, Feb 20, 2023 at 03:37:56PM -0600, Daniel Pouzzner via Gawkextlib-users wrote: > I was able to whittle away the xml parts of the logic until none was left. This > turns out to be a gawk core bug. Reproducer: > > #!/usr/bin/gawk -f > > function f(x) { > return x; > } > > BEGIN { > print "a[b] is " (a["b"] ? "true" : "false"); > > f(a["b"]); > > print "a[b] is " (a["b"] ? "true" : "false"); > > print a["b"]; > } > > Result on 5.1.1: > > $ /tmp/arraybug.awk > a[b] is false > a[b] is false > > On 5.2.1: > > $ /tmp/portage/sys-apps/gawk-5.2.1/image/usr/bin/gawk -f /tmp/arraybug.awk > a[b] is false > a[b] is true > free(): double free detected in tcache 2 > Aborted > > > The syndrome in a nutshell: if a nonexistent array element is passed as an > argument to a function, the element is sortof-created, such that testing it > somehow evaluates to true, but its state/internal pointers are invalid. I've > actually gotten scripts to outright SEGV and exhibit various other obviously > undefined behavior, like printing characters from the name of the redirect > target ("/dev/stde" etc), by just changing the length of words in a printf > format (constant string). > > > Do I need to refile a bug on gawk core, or have I "done enough", as it were? > > > Oh and thanks for the quick turnaround! > > > On Mon, 2023-02-20 at 10:13 -0500, Andrew J. Schorr wrote: > > Hi, > > > > On Mon, Feb 20, 2023 at 02:43:58AM -0600, Daniel Pouzzner via Gawkextlib-users wrote: > > > Is gawkextlib xml expected to work with gawk 5.2.1 (API 3.2), with the new > > > AWK_BOOL? > > > > I naively expect it to work. :-) If it doesn't work, then we've got a problem. > > > > > It works as expected with awk 5.1.1, and with all earlier versions going back to > > > 4.1.3. I've been using it regularly since 2017. > > > > Glad to hear you've been finding it useful. > > > > > But with 5.2.1 I'm seeing anomalous behavior where empty xml elements (e.g. > > > <doi></doi>) are evaluating as true even though they string-equal "". > > > > > > In connection with that empty xml field, gawk 5.2.1 crashes with > > > > > > gawk: ../mkbib.awk:1142: (FILENAME=buzsaki_2003_EEG_source.xml FNR=173) fatal: internal error: file eval.c, line 1358: unexpected parameter type Node_illegal > > > > > > If I build with sanitizer, I see concat_exp() doing a double-free of an arg that > > > was earlier freed by r_interpret(). > > > > > > I did a whole slew of experiments to try to understand what's happening, but > > > it's a large and tricky code base. It seems to have something to do with > > > Node_var appearing where usually Node_val is, but I was quickly in over my head. > > > > > > libgawkextlib and xml.so were both built with gawk-5.2.1 installed. I tried > > > with old release code and with the latest git sources -- same result, as above. > > > > > > If it's useful I can share buzsaki_2003_EEG_source.xml and even the script > > > that's crashing on 5.2.1. > > > > Do you have a small test case that reproduces the problem? That would be very > > helpful for debugging. If you don't have a small test case, then I guess a large > > test case may be better than nothing. > > > > Regards, > > Andy > > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users |
From: Daniel P. <do...@me...> - 2023-02-20 21:38:14
|
I was able to whittle away the xml parts of the logic until none was left. This turns out to be a gawk core bug. Reproducer: #!/usr/bin/gawk -f function f(x) { return x; } BEGIN { print "a[b] is " (a["b"] ? "true" : "false"); f(a["b"]); print "a[b] is " (a["b"] ? "true" : "false"); print a["b"]; } Result on 5.1.1: $ /tmp/arraybug.awk a[b] is false a[b] is false On 5.2.1: $ /tmp/portage/sys-apps/gawk-5.2.1/image/usr/bin/gawk -f /tmp/arraybug.awk a[b] is false a[b] is true free(): double free detected in tcache 2 Aborted The syndrome in a nutshell: if a nonexistent array element is passed as an argument to a function, the element is sortof-created, such that testing it somehow evaluates to true, but its state/internal pointers are invalid. I've actually gotten scripts to outright SEGV and exhibit various other obviously undefined behavior, like printing characters from the name of the redirect target ("/dev/stde" etc), by just changing the length of words in a printf format (constant string). Do I need to refile a bug on gawk core, or have I "done enough", as it were? Oh and thanks for the quick turnaround! On Mon, 2023-02-20 at 10:13 -0500, Andrew J. Schorr wrote: > Hi, > > On Mon, Feb 20, 2023 at 02:43:58AM -0600, Daniel Pouzzner via Gawkextlib-users wrote: > > Is gawkextlib xml expected to work with gawk 5.2.1 (API 3.2), with the new > > AWK_BOOL? > > I naively expect it to work. :-) If it doesn't work, then we've got a problem. > > > It works as expected with awk 5.1.1, and with all earlier versions going back to > > 4.1.3. I've been using it regularly since 2017. > > Glad to hear you've been finding it useful. > > > But with 5.2.1 I'm seeing anomalous behavior where empty xml elements (e.g. > > <doi></doi>) are evaluating as true even though they string-equal "". > > > > In connection with that empty xml field, gawk 5.2.1 crashes with > > > > gawk: ../mkbib.awk:1142: (FILENAME=buzsaki_2003_EEG_source.xml FNR=173) fatal: internal error: file eval.c, line 1358: unexpected parameter type Node_illegal > > > > If I build with sanitizer, I see concat_exp() doing a double-free of an arg that > > was earlier freed by r_interpret(). > > > > I did a whole slew of experiments to try to understand what's happening, but > > it's a large and tricky code base. It seems to have something to do with > > Node_var appearing where usually Node_val is, but I was quickly in over my head. > > > > libgawkextlib and xml.so were both built with gawk-5.2.1 installed. I tried > > with old release code and with the latest git sources -- same result, as above. > > > > If it's useful I can share buzsaki_2003_EEG_source.xml and even the script > > that's crashing on 5.2.1. > > Do you have a small test case that reproduces the problem? That would be very > helpful for debugging. If you don't have a small test case, then I guess a large > test case may be better than nothing. > > Regards, > Andy |
From: <jue...@go...> - 2023-02-20 15:45:25
|
Hello David, welcome to our users-mailing-list. > Is gawkextlib xml expected to work with gawk 5.2.1 (API 3.2), with the new > AWK_BOOL? We have a build server running that always builds the latest development branch of GNU Awk and the the latest development branch of all the GNU Awk extensions that SourceForge hosts for us. Automatic testing is also done, but I have not seen error message with the combination of the latest branches. > > It works as expected with awk 5.1.1, and with all earlier versions going back to > 4.1.3. I've been using it regularly since 2017. > > But with 5.2.1 I'm seeing anomalous behavior where empty xml elements (e.g. > <doi></doi>) are evaluating as true even though they string-equal "". This sounds like it could become a pretty small regression test case. Assuming that empty XML input elements always cause a differing behaviour. > In connection with that empty xml field, gawk 5.2.1 crashes with > > gawk: ../mkbib.awk:1142: (FILENAME=buzsaki_2003_EEG_source.xml FNR=173) fatal: internal error: file eval.c, line 1358: unexpected parameter type Node_illegal > > If I build with sanitizer, I see concat_exp() doing a double-free of an arg that > was earlier freed by r_interpret(). > > I did a whole slew of experiments to try to understand what's happening, but > it's a large and tricky code base. It seems to have something to do with > Node_var appearing where usually Node_val is, but I was quickly in over my head. > > libgawkextlib and xml.so were both built with gawk-5.2.1 installed. I tried > with old release code and with the latest git sources -- same result, as above. > > If it's useful I can share buzsaki_2003_EEG_source.xml and even the script > that's crashing on 5.2.1. As Andrew said, having a small test case would be nice. If your application works on electroencephalogram data, I might be inclined to have a look at it. In the early 1990s I used the original GNU Awk for processing tons of EEG data. Seeing how the processing of EEG data has changed over the last 30 years may be interesting. Thanks to Daniel for the precise report and to Andrew for the quick reply. Juergen Kahrs |
From: Andrew J. S. <as...@te...> - 2023-02-20 15:30:00
|
Hi, On Mon, Feb 20, 2023 at 02:43:58AM -0600, Daniel Pouzzner via Gawkextlib-users wrote: > Is gawkextlib xml expected to work with gawk 5.2.1 (API 3.2), with the new > AWK_BOOL? I naively expect it to work. :-) If it doesn't work, then we've got a problem. > It works as expected with awk 5.1.1, and with all earlier versions going back to > 4.1.3. I've been using it regularly since 2017. Glad to hear you've been finding it useful. > But with 5.2.1 I'm seeing anomalous behavior where empty xml elements (e.g. > <doi></doi>) are evaluating as true even though they string-equal "". > > In connection with that empty xml field, gawk 5.2.1 crashes with > > gawk: ../mkbib.awk:1142: (FILENAME=buzsaki_2003_EEG_source.xml FNR=173) fatal: internal error: file eval.c, line 1358: unexpected parameter type Node_illegal > > If I build with sanitizer, I see concat_exp() doing a double-free of an arg that > was earlier freed by r_interpret(). > > I did a whole slew of experiments to try to understand what's happening, but > it's a large and tricky code base. It seems to have something to do with > Node_var appearing where usually Node_val is, but I was quickly in over my head. > > libgawkextlib and xml.so were both built with gawk-5.2.1 installed. I tried > with old release code and with the latest git sources -- same result, as above. > > If it's useful I can share buzsaki_2003_EEG_source.xml and even the script > that's crashing on 5.2.1. Do you have a small test case that reproduces the problem? That would be very helpful for debugging. If you don't have a small test case, then I guess a large test case may be better than nothing. Regards, Andy |
From: Daniel P. <do...@me...> - 2023-02-20 09:10:56
|
Is gawkextlib xml expected to work with gawk 5.2.1 (API 3.2), with the new AWK_BOOL? It works as expected with awk 5.1.1, and with all earlier versions going back to 4.1.3. I've been using it regularly since 2017. But with 5.2.1 I'm seeing anomalous behavior where empty xml elements (e.g. <doi></doi>) are evaluating as true even though they string-equal "". In connection with that empty xml field, gawk 5.2.1 crashes with gawk: ../mkbib.awk:1142: (FILENAME=buzsaki_2003_EEG_source.xml FNR=173) fatal: internal error: file eval.c, line 1358: unexpected parameter type Node_illegal If I build with sanitizer, I see concat_exp() doing a double-free of an arg that was earlier freed by r_interpret(). I did a whole slew of experiments to try to understand what's happening, but it's a large and tricky code base. It seems to have something to do with Node_var appearing where usually Node_val is, but I was quickly in over my head. libgawkextlib and xml.so were both built with gawk-5.2.1 installed. I tried with old release code and with the latest git sources -- same result, as above. If it's useful I can share buzsaki_2003_EEG_source.xml and even the script that's crashing on 5.2.1. |
From: Andrew J. S. <as...@te...> - 2021-06-28 13:20:44
|
On Mon, Jun 28, 2021 at 09:57:51AM -0300, Vinícius dos Santos Oliveira wrote: > Em seg., 28 de jun. de 2021 às 09:51, Andrew J. Schorr > <as...@te...> escreveu: > > On Mon, Jun 28, 2021 at 09:56:03AM +0200, Manuel Collado wrote: > > > >Maybe that'd do. Can such a function use the statement next? How'd > > > >that work, the plugin calling the function triggering next? > > > > That's a very interesting question. I imagine that all hell > > would break loose. > > Unfortunately triggering next would be one of the obvious error > handling approaches just like triggering nextfile. > > On file-level error, user triggers nextfile on BEGINFILE. > > On record-level error, user triggers next on JSONRECORDERROR. I really don't know. If you can figure out how to create an API to call an awk function, then maybe "next" will just work. But I don't think it will be terribly easy to implement this idea in the first place. I think you just need to mess around. It will be cool if you can pull it off, but it's nontrivial. Regards, Andy |
From: Vinícius d. S. O. <vin...@gm...> - 2021-06-28 12:58:49
|
Em seg., 28 de jun. de 2021 às 09:51, Andrew J. Schorr <as...@te...> escreveu: > On Mon, Jun 28, 2021 at 09:56:03AM +0200, Manuel Collado wrote: > > >Maybe that'd do. Can such a function use the statement next? How'd > > >that work, the plugin calling the function triggering next? > > That's a very interesting question. I imagine that all hell > would break loose. Unfortunately triggering next would be one of the obvious error handling approaches just like triggering nextfile. On file-level error, user triggers nextfile on BEGINFILE. On record-level error, user triggers next on JSONRECORDERROR. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Andrew J. S. <as...@te...> - 2021-06-28 12:51:46
|
On Mon, Jun 28, 2021 at 09:56:03AM +0200, Manuel Collado wrote: > >Maybe that'd do. Can such a function use the statement next? How'd > >that work, the plugin calling the function triggering next? That's a very interesting question. I imagine that all hell would break loose. > The idea was to let the plugin invoke the user function. But I was a > bit wrong. The current API doesn't provide such facility. It would > be necessary to extend the API in order to allow an extension to > call a gawk user function. That seems like a more plausible API extension to me. The code is probably similar to how indirect function calls are implemented (Op_indirect_func_call), I'd guess. Regards, Andy |
From: Manuel C. <mco...@gm...> - 2021-06-28 07:56:13
|
El 28/06/2021 a las 9:40, Vinícius dos Santos Oliveira escribió: > Em seg., 28 de jun. de 2021 às 04:27, Manuel Collado > <mco...@gm...> escreveu: >> ... >> I.e., instead of expecting the user to write some "JSONRECORDERROR" >> rule, expect the user to write some "JsonRecordError()" function. If >> there is such function the extension will invoke it. If not, the >> extension will execute a default error handling action. > > Maybe that'd do. Can such a function use the statement next? How'd > that work, the plugin calling the function triggering next? The idea was to let the plugin invoke the user function. But I was a bit wrong. The current API doesn't provide such facility. It would be necessary to extend the API in order to allow an extension to call a gawk user function. > > Thanks for the idea anyway. It does seem simpler (to implement). You are welcome. -- Manuel Collado - http://mcollado.z15.es |
From: Vinícius d. S. O. <vin...@gm...> - 2021-06-28 07:41:51
|
Em seg., 28 de jun. de 2021 às 04:27, Manuel Collado <mco...@gm...> escreveu: > The fact that a rule exist doesn't ensures it will be executed. Use of > 'next' or 'getline' can prevent the execution of some rules. That's not a problem. Wrong code can always be written. > It seems you want to check if the user wants to handle record errors. Yes. > It > this is want you want, then an alternative is to check the existence of > a specific user-defined function, instead of a rule. > > I.e., instead of expecting the user to write some "JSONRECORDERROR" > rule, expect the user to write some "JsonRecordError()" function. If > there is such function the extension will invoke it. If not, the > extension will execute a default error handling action. Maybe that'd do. Can such a function use the statement next? How'd that work, the plugin calling the function triggering next? Thanks for the idea anyway. It does seem simpler (to implement). -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Manuel C. <mco...@gm...> - 2021-06-28 07:27:51
|
El 26/06/2021 a las 0:54, Vinícius dos Santos Oliveira escribió: > Em sex., 25 de jun. de 2021 às 10:07, Andrew J. Schorr > <as...@te...> escreveu: >> I'm not sure that I understand what "rule" means in an awk script. > ... > I mean the pattern. On plugin startup, I want to traverse all the > program rules to check if any pattern in them matches the AST > "somevar". > > I'll code the plugin's behaviour to depend on whether such a rule > exists in the program. The end result gives an approximation of what's > offered by BEGINFILE rules to deal with errors on single files, but > for record-level errors. The fact that a rule exist doesn't ensures it will be executed. Use of 'next' or 'getline' can prevent the execution of some rules. It seems you want to check if the user wants to handle record errors. It this is want you want, then an alternative is to check the existence of a specific user-defined function, instead of a rule. I.e., instead of expecting the user to write some "JSONRECORDERROR" rule, expect the user to write some "JsonRecordError()" function. If there is such function the extension will invoke it. If not, the extension will execute a default error handling action. Checking the existence of a function is straightforward. Just look for the corresponding entry in the FUNCTAB array. No need to extend the API. Is this what you want? Regards. -- Manuel Collado - http://mcollado.z15.es |
From: Vinícius d. S. O. <vin...@gm...> - 2021-06-25 22:55:37
|
Em sex., 25 de jun. de 2021 às 10:07, Andrew J. Schorr <as...@te...> escreveu: > I'm not sure that I understand what "rule" means in an awk script. > From the POSIX spec: > > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html > > An awk program is composed of pairs of the form: > > pattern { action } > > So what's a "rule"? In the source code, it appears that the word "rule" > is used to refer to the combination of a pattern and an action. So > is the idea that this API hook would depend on detecting whether a combined > pattern & action is present? I mean the pattern. On plugin startup, I want to traverse all the program rules to check if any pattern in them matches the AST "somevar". I'll code the plugin's behaviour to depend on whether such a rule exists in the program. The end result gives an approximation of what's offered by BEGINFILE rules to deal with errors on single files, but for record-level errors. > The implementation of BEGINFILE and ENDFILE requires a lot of > special support in the interpreter. It seems to me that implementing > this sort of thing requires diving into the guts of how a program > is compiled in awkgram.y, which is quite frankly an area of the > code that I've never messed with. In main.c, the program is parsed > into code_block: > > INSTRUCTION *code_block = NULL; > ... > /* Read in the program */ > if (parse_program(& code_block, false) != 0 || dash_v_errs > 0) > exit(EXIT_FAILURE); > > And then it runs the program by calling interpret(code_block). > > So any info about the running program would have to be extracted by > parsing the INSTRUCTION data pointed to by code_block, as far > as I can tell. Thanks for the pointer. Now I know where to start. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Andrew J. S. <as...@te...> - 2021-06-25 13:07:21
|
Hi Jürgen, On Fri, Jun 25, 2021 at 10:09:54AM +0200, Jürgen Kahrs via Gawkextlib-users wrote: > I think he wants to detect the presence (and not the value) of a > certain AWK rule at run-time. The term "rule" here seems to mean > a keyword that exists after being declared by the plugin. I'm not sure that I understand what "rule" means in an awk script. >From the POSIX spec: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html An awk program is composed of pairs of the form: pattern { action } So what's a "rule"? In the source code, it appears that the word "rule" is used to refer to the combination of a pattern and an action. So is the idea that this API hook would depend on detecting whether a combined pattern & action is present? > I agree with you (Andrew) that the script example that he gave > should be accompanied by a more detailed implementation of > the plugin. Maybe it would suffice to have a way of looking up > all the keywords that are defined after loading a plugin. The implementation of BEGINFILE and ENDFILE requires a lot of special support in the interpreter. It seems to me that implementing this sort of thing requires diving into the guts of how a program is compiled in awkgram.y, which is quite frankly an area of the code that I've never messed with. In main.c, the program is parsed into code_block: INSTRUCTION *code_block = NULL; ... /* Read in the program */ if (parse_program(& code_block, false) != 0 || dash_v_errs > 0) exit(EXIT_FAILURE); And then it runs the program by calling interpret(code_block). So any info about the running program would have to be extracted by parsing the INSTRUCTION data pointed to by code_block, as far as I can tell. Regards, Andy |
From: <jue...@go...> - 2021-06-25 08:10:06
|
Hello Andrew, > Hi, > > On Thu, Jun 24, 2021 at 12:19:17PM -0300, Vinícius dos Santos Oliveira wrote: >> Would it be possible to add an API to gawk that allows the plugin author to >> check whether there's a certain rule in the AWK script? > ... > > I don't have a strong view on this, but I'm also not certain that I understand > precisely what you want. If you were to develop and share a gawk patch to > implement your desired behavior, then I think we would all understand better > and could offer an opinion on whether the patch might be accepted. You are of > course always free to build gawk with your own patches; that's the beauty of > free software. > > Regards, > Andy I think he wants to detect the presence (and not the value) of a certain AWK rule at run-time. The term "rule" here seems to mean a keyword that exists after being declared by the plugin. I agree with you (Andrew) that the script example that he gave should be accompanied by a more detailed implementation of the plugin. Maybe it would suffice to have a way of looking up all the keywords that are defined after loading a plugin. |
From: Andrew J. S. <as...@te...> - 2021-06-25 01:00:29
|
Hi, On Thu, Jun 24, 2021 at 12:19:17PM -0300, Vinícius dos Santos Oliveira wrote: > Would it be possible to add an API to gawk that allows the plugin author to > check whether there's a certain rule in the AWK script? ... I don't have a strong view on this, but I'm also not certain that I understand precisely what you want. If you were to develop and share a gawk patch to implement your desired behavior, then I think we would all understand better and could offer an opinion on whether the patch might be accepted. You are of course always free to build gawk with your own patches; that's the beauty of free software. Regards, Andy |
From: Vinícius d. S. O. <vin...@gm...> - 2021-06-24 15:20:10
|
Would it be possible to add an API to gawk that allows the plugin author to check whether there's a certain rule in the AWK script? The use-case is that I want to offer something similar to BEGINFILE. If the user doesn't write a rule to handle some specific use case, the program will abort when/if it happens. If the user does write a rule for it then the program doesn't abort and just executes the user's rule. I just need to check a very simple expression, "read var X". For instance (the first rule): JSONRECORDERROR { next } .. { ... } .. { ... } .. { ... } No other types of AWK expressions need to be checked against. I don't need to know if the user wrote the rule "JSONRECORDERROR && somethingelse". I only need to know if he wrote the rule "JSONRECORDERROR" exactly (no other variation). There's no need to "merge multiple rules" in the likes of BEGINFILE either. Just consume the AWK program as it normally would and nothing else. All I want is to check on plugin startup if the user wrote a certain rule of a very simple pattern. The abort behaviour is something I write on my own plugin and shouldn't be part of this API addition. How does that sound? Too fancy? Too weirdo? Too far-fetched? Or is it reasonable after all? -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Galen T. <gta...@ya...> - 2021-06-07 15:16:08
|
A little more flexible, perhaps, than using an environment variable, would be to use a gawk command line option to indicate how your extension should handle errors. If you're not yet familiar with how to do so, see the documentation for getopt() at Getopt Function (The GNU Awk User’s Guide). | | | | Getopt Function (The GNU Awk User’s Guide) Getopt Function (The GNU Awk User’s Guide) | | | But I don't know if you can do this easily from an extension. Galen On Monday, June 7, 2021, 9:34:38 AM EDT, Vinícius dos Santos Oliveira <vin...@gm...> wrote: Em seg., 7 de jun. de 2021 às 10:23, Andrew J. Schorr <as...@te...> escreveu: You are certainly free to use global variables to configure the behavior of your library. Many extensions do this; for example, the XML extension uses the XMLMODE variable. But if you expect gawk to implement this policy for you, then I don't know how that would work. It would be the extension's responsibility to implement this type of behavior. How could gawk enforce it for you? To do that, we'd have to change the API, which seems unlikely. Okay. Thanks. I was just uneasy given I don't have much actual experience with AWK, so I wanted to check whether a current or similar idiom/convention for this was already in place as to not create new friction between extensions. I'll introduce a new var on my plugin and roll with it. -- Vinícius dos Santos Oliveirahttps://vinipsmaker.github.io/ _______________________________________________ Gawkextlib-users mailing list Gaw...@li... https://lists.sourceforge.net/lists/listinfo/gawkextlib-users |
From: Vinícius d. S. O. <vin...@gm...> - 2021-06-07 13:34:32
|
Em seg., 7 de jun. de 2021 às 10:23, Andrew J. Schorr < as...@te...> escreveu: > You are certainly free to use global variables to configure the > behavior of your library. Many extensions do this; for example, > the XML extension uses the XMLMODE variable. But if you expect gawk > to implement this policy for you, then I don't know how that would > work. It would be the extension's responsibility to implement this type > of behavior. How could gawk enforce it for you? To do that, we'd have > to change the API, which seems unlikely. > Okay. Thanks. I was just uneasy given I don't have much actual experience with AWK, so I wanted to check whether a current or similar idiom/convention for this was already in place as to not create new friction between extensions. I'll introduce a new var on my plugin and roll with it. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Andrew J. S. <as...@te...> - 2021-06-07 13:23:05
|
Hi, On Mon, Jun 07, 2021 at 10:05:46AM -0300, Vinícius dos Santos Oliveira wrote: > Just an idea. Maybe the user should fill some field in PROCINFO informing > whether he wants the input parser to silently discard broken records, to print > a warning before discarding the broken record, or to allow the user to consume > and handle the error explicitly. You are certainly free to use global variables to configure the behavior of your library. Many extensions do this; for example, the XML extension uses the XMLMODE variable. But if you expect gawk to implement this policy for you, then I don't know how that would work. It would be the extension's responsibility to implement this type of behavior. How could gawk enforce it for you? To do that, we'd have to change the API, which seems unlikely. Regards, Andy |
From: Vinícius d. S. O. <vin...@gm...> - 2021-06-07 13:13:13
|
Em seg., 7 de jun. de 2021 às 00:51, Manuel Collado <mco...@gm...> escreveu: > The relevant sections of the gawk manual are: > > 17.4.6 Printing Messages > 17.4.7 Updating ERRNO > Currently that's what I do. I just print a warning using the warning() function. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Vinícius d. S. O. <vin...@gm...> - 2021-06-07 13:06:18
|
Em dom., 6 de jun. de 2021 às 23:15, Andrew J. Schorr < as...@te...> escreveu: > Are you asking how an input parser get_record function should behave? > Not really. Taking the extensions side aside, what would be the pure AWK code way to handle parsing errors on individual records? I only started to code in AWK last year, so I'm not much familiarized with AWK idioms and the like. [...] > > Do you have something else in mind besides the above behavior? > Just an idea. Maybe the user should fill some field in PROCINFO informing whether he wants the input parser to silently discard broken records, to print a warning before discarding the broken record, or to allow the user to consume and handle the error explicitly. In general, if an input parser cannot read > a record, then I think it has to barf. What else could make sense? > For some streams it's possible to ignore just the broken record and proceed to parsing the rest of the stream. It might not make sense to gawkextlib's current plugins, but it does make sense to the plugin I'm writing right now. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Manuel C. <mco...@gm...> - 2021-06-07 08:53:50
|
Resent. The previous attempt to respond failed. El 04/06/2021 a las 12:25, Vinícius dos Santos Oliveira escribió: > Hi, > > I've been playing with GAWK extension API again and I'm wondering what's > the proper way to report errors on individual records. > > If there was a problem to extract a single record on the stream, how > should I report this to the user? The relevant sections of the gawk manual are: 17.4.6 Printing Messages 17.4.7 Updating ERRNO > > I only know two GAWK extensions that play with input modes. XML where > failing the whole stream is fine, and CSV where I just don't see errors > on single records. Therefore I don't know where to look for current > practice. The gawk-csv extension reports syntax errors on a per-record basis. It can even give several error messages for the same input record. $ echo 'a"b" ,"c,"x' | gawk -lcsv -v CSVMODE=1 '{1}' gawk: error: csvinput: Unexpected quote a" ^ gawk: error: csvinput: Unexpected quote a"b" ^ gawk: error: csvinput: Unexpected character a"b" ,"c,"x ^ Regards. -- Manuel Collado - http://mcollado.z15.es |
From: Manuel C. <mco...@gm...> - 2021-06-07 03:51:05
|
El 04/06/2021 a las 12:25, Vinícius dos Santos Oliveira escribió: > Hi, > > I've been playing with GAWK extension API again and I'm wondering what's > the proper way to report errors on individual records. > > If there was a problem to extract a single record on the stream, how > should I report this to the user? The relevant sections of the gawk manual are: 17.4.6 Printing Messages 17.4.7 Updating ERRNO > > I only know two GAWK extensions that play with input modes. XML where > failing the whole stream is fine, and CSV where I just don't see errors > on single records. Therefore I don't know where to look for current > practice. The gawk-csv extension reports syntax errors on a per-record basis. It can even give several error messages for the same input record. $ echo 'a"b" ,"c,"x' | gawk -lcsv -v CSVMODE=1 '{1}' gawk: error: csvinput: Unexpected quote a" ^ gawk: error: csvinput: Unexpected quote a"b" ^ gawk: error: csvinput: Unexpected character a"b" ,"c,"x ^ Regards. -- Manuel Collado - http://mcollado.z15.es |