You can subscribe to this list here.
2006 |
Jan
(4) |
Feb
(3) |
Mar
(4) |
Apr
(5) |
May
(10) |
Jun
(7) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(18) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(5) |
Feb
(9) |
Mar
(24) |
Apr
(3) |
May
(2) |
Jun
(1) |
Jul
(8) |
Aug
(3) |
Sep
(10) |
Oct
(5) |
Nov
|
Dec
(9) |
2009 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(39) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
(7) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(6) |
Dec
(3) |
2013 |
Jan
(13) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(6) |
Dec
(2) |
2014 |
Jan
|
Feb
|
Mar
(2) |
Apr
(7) |
May
(8) |
Jun
|
Jul
(3) |
Aug
(3) |
Sep
(15) |
Oct
|
Nov
(15) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(5) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
(2) |
Feb
|
Mar
(1) |
Apr
(10) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
2019 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(10) |
Jul
|
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
(1) |
2020 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
(27) |
Oct
(4) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(20) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Vinícius d. S. O. <vin...@gm...> - 2020-09-08 19:31:38
|
Em ter., 8 de set. de 2020 às 16:20, Andrew J. Schorr < as...@te...> escreveu: > So we're stuck at the same place. Please run: $ git submodule update --init I think this project needs > a README file that explains how to install and use it. > This is true. Meson is similar to cmake/autotools. After it runs, it generates a build.ninja that can be used to run ninja and compile the project. So typically these are the steps you have to run: $ mkdir build $ cd build $ meson .. $ ninja This will generate a jsonstream.so file in the build dir that can be loaded in gawk through -l jsonstream (if you also set the AWKLIBPATH env var). I'll be renaming the plug-in to tabjson later on. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Vinícius d. S. O. <vin...@gm...> - 2020-09-08 19:26:00
|
Em ter., 8 de set. de 2020 às 15:58, Jürgen Kahrs via Gawkextlib-users < gaw...@li...> escreveu: > Following your lead, I also tried a build (easier than I thought) > and ran into the same problem. I see the followings reasons for the > build failure: > 1. he requires boost 1.69 and I have 1.66 only. > 2. package missing: libboost_headers1_66_0-devel > Could you edit the meson.build file and change the line 1.69 to 1.66 and test if it works? If the project happens to compile with this old Boost version, I'll gladly change this definition on the main repository too. 3. package missing: re2c > I use re2c to parse the JSON Pointer expressions. re2c is actually a quite popular (and old) project, so it shouldn't be hard to find a package in your distro. re2c is a code generator, so it's actually a binary I depend on and not header/development files. The binary will read a .ypp file and spit a C++ source code out of that. 4. directory missing: "mkdir 3rd/trial.protocol/include" > Please run: $ git submodule update --init It'll download submodule dependencies (just one; the JSON library that I use in the project). -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Andrew J. S. <as...@te...> - 2020-09-08 19:20:15
|
My bleeding-edge Fedora 32 system does have boost 1.69, but I had failed to install boost-devel and re2c. With those, I now get: bash$ meson builddir The Meson build system Version: 0.55.1 Source dir: /home/users/schorr/gawk-tabjson Build dir: /home/users/schorr/gawk-tabjson/builddir Build type: native build Project name: gawk-jsonstream Project version: undefined C++ compiler for the host machine: c++ (gcc 10.2.1 "c++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)") C++ linker for the host machine: c++ ld.bfd 2.34-4 Host machine cpu family: x86_64 Host machine cpu: x86_64 Found pkg-config: /usr/bin/pkg-config (1.6.3) Run-time dependency Boost found: YES 1.69.0 (/usr) Program re2c found: YES meson.build:23:0: ERROR: Include dir 3rd/trial.protocol/include does not exist. A full log can be found at /home/users/schorr/gawk-tabjson/builddir/meson-logs/meson-log.txt So we're stuck at the same place. I think this project needs a README file that explains how to install and use it. Regards, Andy On Tue, Sep 08, 2020 at 08:57:58PM +0200, Jürgen Kahrs via Gawkextlib-users wrote: > Following your lead, I also tried a build (easier than I thought) > and ran into the same problem. I see the followings reasons for the > build failure: > 1. he requires boost 1.69 and I have 1.66 only. > 2. package missing: libboost_headers1_66_0-devel > 3. package missing: re2c > 4. directory missing: "mkdir 3rd/trial.protocol/include" > > Now I run out of ideas: > > meson builddir > The Meson build system > Version: 0.46.0 > Source dir: /home/kahrs/work/gawk/jsonstream/gawk-tabjson > Build dir: /home/kahrs/work/gawk/jsonstream/gawk-tabjson/builddir > Build type: native build > Project name: gawk-jsonstream > Native C++ compiler: c++ (gcc 7.5.0 "c++ (SUSE Linux) 7.5.0") > Build machine cpu family: x86_64 > Build machine cpu: x86_64 > Dependency Boost () found: YES 1.66 > Program re2c found: YES (/usr/bin/re2c) > Build targets in project: 1 > Found ninja-1.8.2 at /usr/bin/ninja > kahrs@linux-q7qy:~/work/gawk/jsonstream/gawk-tabjson> cd builddir/ > kahrs@linux-q7qy:~/work/gawk/jsonstream/gawk-tabjson/builddir> meson > compile > Error during basic setup: > > Neither directory contains a build file meson.build. > kahrs@linux-q7qy:~/work/gawk/jsonstream/gawk-tabjson/builddir> ls -l > insgesamt 24 > -rw-r--r-- 1 kahrs users 5062 8. Sep 20:55 build.ninja > drwxr-xr-x 2 kahrs users 4096 8. Sep 20:55 compile > -rw-r--r-- 1 kahrs users 1533 8. Sep 20:55 compile_commands.json > lrwxrwxrwx 1 kahrs users 15 8. Sep 20:55 jsonstream.so -> > jsonstream.so.0 > lrwxrwxrwx 1 kahrs users 19 8. Sep 20:55 jsonstream.so.0 -> > jsonstream.so.0.1.0 > drwxr-xr-x 2 kahrs users 4096 8. Sep 20:55 meson-logs > drwxr-xr-x 2 kahrs users 4096 8. Sep 20:55 meson-private > > Perhaps some files were not put into the repo. > > > I installed meson and boost on my Fedora 32 system, but I get an error > when I run "meson builddir": > > sh-5.0$ meson builddir > The Meson build system > Version: 0.55.1 > Source dir: /home/users/schorr/gawk-tabjson > Build dir: /home/users/schorr/gawk-tabjson/builddir > Build type: native build > Project name: gawk-jsonstream > Project version: undefined > C++ compiler for the host machine: c++ (gcc 10.2.1 "c++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)") > C++ linker for the host machine: c++ ld.bfd 2.34-4 > Host machine cpu family: x86_64 > Host machine cpu: x86_64 > Found pkg-config: /usr/bin/pkg-config (1.6.3) > Run-time dependency Boost found: NO > > meson.build:3:0: ERROR: Dependency "boost" not found > > A full log can be found at /home/users/schorr/gawk-tabjson/builddir/meson-logs/meson-log.txt > > So I'm stuck. > > But I see that "jsonstream" is used in some places, but the > namespace was set to "json". Might it make sense to use > "jsonstream" for the namespace to avoid colliding with the > existing gawk "json" library? > > Regards, > Andy > > On Tue, Sep 08, 2020 at 01:11:30PM -0400, Andrew J. Schorr wrote: > > Hi, > > If you take a look at the source code, it looks to me as if it is > packaged as an extension by calling register_input_parser and > adding a json::output function. You can see the code like so: > git clone https://gitlab.com/vinipsmaker/gawk-tabjson.git > I confess that I have no experience with the Meson build > system, so I haven't tried building it. > > I agree that this looks like an interesting feature set. > Are you aware that there's an existing gawk json extension > for converting between JSON and gawk arrays? It's very simple > and straightforward: > http://gawkextlib.sourceforge.net/json/json.html#:~:text=Summary,gawk%20associative%20arrays%20and%20JSON. > > I wonder whether you might consider using a different namespace > to avoid colliding with the existing json extension. > > Regards, > Andy > > On Tue, Sep 08, 2020 at 06:53:00PM +0200, Jürgen Kahrs via Gawkextlib-users wrote: > > Hello Vinícius, > after reading your examples scripts, your project reminded me of > the XML extension that exists for GNU Awk. In both cases AWK > supplies the syntax for the scripting and the extension supplies > the access mechansim to the data files (JSON / XML). > In cases like these, AWK's syntax is really much more readable > than the syntax of other tools that were created especially for > one kind of new data format and nothing else. > > After looking at your web page, I got the impression that your > project is not an "extension" to GAWK in the established technical > sense of GAWK extensions. > https://www.gnu.org/software/gawk/manual/html_node/Extension-Intro.html > Your source code looks rather like a proposal for a change to the > source code of GAWK. This idea (building access to a special file > format into GAWK itself) has already been discussed several times. > For example, reading CSV, XML, PNG and ZIP files could have been built > into GAWK's released source code. Such ideas have been rejected > for good reasons. Most importantly, GAWK would grow too fat for > small embedded systems, would depend on too many external libraries > and the work load of source code management, testing and documentation > would be put all onto the shoulders of the GAWK maintainer. > Did you know that mawk is still so popular mostly because it is not as > "fat and slow as GAWK has become over the years" ? > (sorry Arnold, I am exaggerating a bit to make the point) > > Now you see the way my argument goes: Keep file format access > outside of the original GAWK source code and put it into an extension. > Andrew Schorr has done the same for the extension that reads XML files. > Did you consider packaging your source code as an extension and > supplying the doc with the packaged extension ? > > Thanks for sending a notice to us. > > Juergen Kahrs > > > Hi, > > I'd like to announce a new JSON plug-in that I've been working on the past > days to consume JSON streams. > > This new project is inspired by the FPAT gawk's variable. As you all know, > FS allows you to specify how to find the field separator. And FPAT allows > you to specify how to find the field contents. That's how I designed the > plugin. There is a JPAT variable that allows you to specify how to find the > contents. It works just like FPAT, but for JSON. > > JPAT is an array and each index will tell how the plug-in should fill the > same index in the J array. The values of JPAT are JSON Pointers[1]. For > instance, given the following stream: > > {"name": "Beth", "pay_rate": 4.00, "hours_worked": 0} > {"name": "Dan", "pay_rate": 3.75, "hours_worked": 0} > {"name": "Kathy", "pay_rate": 4.00, "hours_worked": 10} > {"name": "Mark", "pay_rate": 5.00, "hours_worked": 20} > {"name": "Mary", "pay_rate": 5.50, "hours_worked": 22} > {"name": "Susie", "pay_rate": 4.25, "hours_worked": 18} > > And the following AWK program: > > BEGIN { > JPAT[1] = "/name" > JPAT[2] = "/pay_rate" > JPAT[3] = "/hours_worked" > } > J[3] > 0 { print J[1], J[2] * J[3] } > > The output will be: > > Kathy 40 > Mark 100 > Mary 121 > Susie 76.5 > > So the idea is to give AWK a tabular view of the array. AWK is great to > process tabular data, so I think this fits well with the rest of AWK's > design. > > The tool is robust and the fields inside the JSON can come in any order. It > doesn't matter for the order you specified. Arbitrary nesting is supported. > The JSON is parsed in a one-pass fashion and only decodes the fields you > requested. No DOM tree is saved at any time. > > You can update the JSON by calling the function json::output() which will > give you a new serialized JSON string as its return. To create a new JSON > is a cheap process as its only job is to concatenate strings from the > offsets saved during the parsing stage. If a value was originally boolean > and you set J[n] to 1 or 0, the field will stay a boolean (but any other > numeric value will coerce the type to a double). Values deleted from J are > converted to the null JSON value. > > You can also use the unwind operation to flatten an array in the document. > For instance, given the following data set: > > { "_id" : "jane", "joined" : "2011-03-02", "likes" : ["golf", > "racquetball"] } > { "_id" : "joe", "joined" : "2012-07-02", "likes" : ["tennis", "golf", > "swimming"] } > > And the following AWK program: > > BEGIN { > JPAT[1] = "/likes" > JUNWIND = 1 > } > { print NR, json::output() } > > The output will be: > > 1 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "golf" } > 2 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "racquetball" } > 3 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "tennis" } > 4 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "golf" } > 5 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "swimming" } > > And this other AWK program: > > BEGIN { > JPAT[1] = "/likes" > JUNWIND = 1 > } > { likes[J[1]] += 1 } > END { > PROCINFO["sorted_in"] = "@val_num_desc" > for (like in likes) { > print like, likes[like] > } > } > > Will output: > > golf 2 > tennis 1 > swimming 1 > racquetball 1 > > The unwind operation idea comes from the MongoDB aggregation framework for > which an $unwind operator exists. As each document is synthesized for that > array element, the variable JUIND will hold the index from that element in > the array. If that field wasn't an array and the document is just being > passed through, JUIND has the value 0. If JUIND value is different than 0, > JULENGTH is also set with the array length. These two variables let you > delimit array elements that have origin in the same document. > > The homepage of the project is: https://gitlab.com/vinipsmaker/gawk-tabjson > > I'd like to know how strange this idea is for AWK. Do you believe it is too > far off? Or maybe do you think the AWK spirit for tabular data processing > has been properly absorbed here? Which one (it can't be both)? > > BUGS > > □ The README is pretty outdated. This very email is more accurate than > the README page. That's on me. I like to document how I'd like to use a > tool before I implement it, so implementation came after the README. > □ Right now, I don't like the name json::output(). If you know a better > name, please let me know. > □ The JSON pointer syntax is 0-indexed as opposed to AWK's 1-indexed > mindset. I did it for compatibility with JSON pointer expressions, but > I believe this decision was a mistake and I'll be changing it later. > □ OFMT isn't respected yet. > □ I don't know how to report invalid JSONs from the stream. I don't know > what'd be the AWK way here. > □ Right now the plug-in always takes over streams. I'm planning to check > for the value of JPAT to decide if the plug-in's getline should be > activated. And to also provide a split()-like function so you could use > the same functionality on a per-record basis. This idea requires > thought and design on how to best proceed. > □ You have to pass the flag --characters-as-bytes (or the shorthand -b) > when invoking gawk. > □ Only newline delimited JSONs are supported right now, but it's possible > to improve the plug-in to automatically recognize RS. > □ The plug-in can't right now insert new fields in the documents. This > operation is also possible with the chosen approach, but I haven't > spent the time here. > □ A handle for DOM-parsed subtrees could be created, but this feature > would also require thought and design. > □ Right now I don't use mmap() when available. mmap() should improve > plug-in performance (although it has already beaten jq performance on > my tests). > □ There isn't MPFR support yet. > > And I hope I didn't get you too excited as the time I have available to > spend on this project for this year is nearing its end and further work > will have to be postponed to later. > > [1] https://tools.ietf.org/html/rfc6901 > > -- > Vinícius dos Santos Oliveira > https://vinipsmaker.github.io/ > > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > -- > Andrew Schorr e-mail: as...@te... > Telemetry Investments, L.L.C. phone: 917-305-1748 > 152 W 36th St, #402 fax: 212-425-5550 > New York, NY 10018-8765 > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users -- Andrew Schorr e-mail: as...@te... Telemetry Investments, L.L.C. phone: 917-305-1748 152 W 36th St, #402 fax: 212-425-5550 New York, NY 10018-8765 |
From: Vinícius d. S. O. <vin...@gm...> - 2020-09-08 19:20:12
|
Em ter., 8 de set. de 2020 às 15:42, Andrew J. Schorr < as...@te...> escreveu: > But I see that "jsonstream" is used in some places, but the > namespace was set to "json". Might it make sense to use > "jsonstream" for the namespace to avoid colliding with the > existing gawk "json" library? > I actually don't like the name jsonstream. I'll update the plugin name to tabjson later. The repo name already reflects the current name, but I need to update the source code too. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Vinícius d. S. O. <vin...@gm...> - 2020-09-08 19:18:12
|
Em ter., 8 de set. de 2020 às 14:11, Andrew J. Schorr < as...@te...> escreveu: > Hi, > Hi Andrew, Are you aware that there's an existing gawk json extension > for converting between JSON and gawk arrays? It's very simple > and straightforward: > > http://gawkextlib.sourceforge.net/json/json.html#:~:text=Summary,gawk%20associative%20arrays%20and%20JSON > . > Yes, I'm aware of this plug-in. I wonder whether you might consider using a different namespace > to avoid colliding with the existing json extension. > I could change the namespace, but in truth I like the json namespace. If I can think of another identifier that I like and is short I'll update it right away. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Vinícius d. S. O. <vin...@gm...> - 2020-09-08 19:13:30
|
Em ter., 8 de set. de 2020 às 13:53, Jürgen Kahrs via Gawkextlib-users < gaw...@li...> escreveu: > Hello Vinícius, > Hello Jürgen, thanks for taking the time to look at the project After looking at your web page, I got the impression that your > project is not an "extension" to GAWK in the established technical > sense of GAWK extensions. > https://www.gnu.org/software/gawk/manual/html_node/Extension-Intro.html > Your source code looks rather like a proposal for a change to the > source code of GAWK. > Sorry about the unfamiliar directory structure. The project is actually an extension. I compile it and the process generates a jsonstream.so file in the build directory that can be loaded with gawk via the -l command line flag if AWKLIBPATH env var is properly set. Did you consider packaging your source code as an extension and > supplying the doc with the packaged extension ? > The code is already an extension. I'll write a manpage later. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: <jue...@go...> - 2020-09-08 18:58:14
|
Following your lead, I also tried a build (easier than I thought) and ran into the same problem. I see the followings reasons for the build failure: 1. he requires boost 1.69 and I have 1.66 only. 2. package missing: libboost_headers1_66_0-devel 3. package missing: re2c 4. directory missing: "mkdir 3rd/trial.protocol/include" Now I run out of ideas: meson builddir The Meson build system Version: 0.46.0 Source dir: /home/kahrs/work/gawk/jsonstream/gawk-tabjson Build dir: /home/kahrs/work/gawk/jsonstream/gawk-tabjson/builddir Build type: native build Project name: gawk-jsonstream Native C++ compiler: c++ (gcc 7.5.0 "c++ (SUSE Linux) 7.5.0") Build machine cpu family: x86_64 Build machine cpu: x86_64 Dependency Boost () found: YES 1.66 Program re2c found: YES (/usr/bin/re2c) Build targets in project: 1 Found ninja-1.8.2 at /usr/bin/ninja kahrs@linux-q7qy:~/work/gawk/jsonstream/gawk-tabjson> cd builddir/ kahrs@linux-q7qy:~/work/gawk/jsonstream/gawk-tabjson/builddir> meson compile Error during basic setup: Neither directory contains a build file meson.build. kahrs@linux-q7qy:~/work/gawk/jsonstream/gawk-tabjson/builddir> ls -l insgesamt 24 -rw-r--r-- 1 kahrs users 5062 8. Sep 20:55 build.ninja drwxr-xr-x 2 kahrs users 4096 8. Sep 20:55 compile -rw-r--r-- 1 kahrs users 1533 8. Sep 20:55 compile_commands.json lrwxrwxrwx 1 kahrs users 15 8. Sep 20:55 jsonstream.so -> jsonstream.so.0 lrwxrwxrwx 1 kahrs users 19 8. Sep 20:55 jsonstream.so.0 -> jsonstream.so.0.1.0 drwxr-xr-x 2 kahrs users 4096 8. Sep 20:55 meson-logs drwxr-xr-x 2 kahrs users 4096 8. Sep 20:55 meson-private Perhaps some files were not put into the repo. > I installed meson and boost on my Fedora 32 system, but I get an error > when I run "meson builddir": > > sh-5.0$ meson builddir > The Meson build system > Version: 0.55.1 > Source dir: /home/users/schorr/gawk-tabjson > Build dir: /home/users/schorr/gawk-tabjson/builddir > Build type: native build > Project name: gawk-jsonstream > Project version: undefined > C++ compiler for the host machine: c++ (gcc 10.2.1 "c++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)") > C++ linker for the host machine: c++ ld.bfd 2.34-4 > Host machine cpu family: x86_64 > Host machine cpu: x86_64 > Found pkg-config: /usr/bin/pkg-config (1.6.3) > Run-time dependency Boost found: NO > > meson.build:3:0: ERROR: Dependency "boost" not found > > A full log can be found at /home/users/schorr/gawk-tabjson/builddir/meson-logs/meson-log.txt > > So I'm stuck. > > But I see that "jsonstream" is used in some places, but the > namespace was set to "json". Might it make sense to use > "jsonstream" for the namespace to avoid colliding with the > existing gawk "json" library? > > Regards, > Andy > > On Tue, Sep 08, 2020 at 01:11:30PM -0400, Andrew J. Schorr wrote: >> Hi, >> >> If you take a look at the source code, it looks to me as if it is >> packaged as an extension by calling register_input_parser and >> adding a json::output function. You can see the code like so: >> git clone https://gitlab.com/vinipsmaker/gawk-tabjson.git >> I confess that I have no experience with the Meson build >> system, so I haven't tried building it. >> >> I agree that this looks like an interesting feature set. >> Are you aware that there's an existing gawk json extension >> for converting between JSON and gawk arrays? It's very simple >> and straightforward: >> http://gawkextlib.sourceforge.net/json/json.html#:~:text=Summary,gawk%20associative%20arrays%20and%20JSON. >> >> I wonder whether you might consider using a different namespace >> to avoid colliding with the existing json extension. >> >> Regards, >> Andy >> >> On Tue, Sep 08, 2020 at 06:53:00PM +0200, Jürgen Kahrs via Gawkextlib-users wrote: >>> Hello Vinícius, >>> after reading your examples scripts, your project reminded me of >>> the XML extension that exists for GNU Awk. In both cases AWK >>> supplies the syntax for the scripting and the extension supplies >>> the access mechansim to the data files (JSON / XML). >>> In cases like these, AWK's syntax is really much more readable >>> than the syntax of other tools that were created especially for >>> one kind of new data format and nothing else. >>> >>> After looking at your web page, I got the impression that your >>> project is not an "extension" to GAWK in the established technical >>> sense of GAWK extensions. >>> https://www.gnu.org/software/gawk/manual/html_node/Extension-Intro.html >>> Your source code looks rather like a proposal for a change to the >>> source code of GAWK. This idea (building access to a special file >>> format into GAWK itself) has already been discussed several times. >>> For example, reading CSV, XML, PNG and ZIP files could have been built >>> into GAWK's released source code. Such ideas have been rejected >>> for good reasons. Most importantly, GAWK would grow too fat for >>> small embedded systems, would depend on too many external libraries >>> and the work load of source code management, testing and documentation >>> would be put all onto the shoulders of the GAWK maintainer. >>> Did you know that mawk is still so popular mostly because it is not as >>> "fat and slow as GAWK has become over the years" ? >>> (sorry Arnold, I am exaggerating a bit to make the point) >>> >>> Now you see the way my argument goes: Keep file format access >>> outside of the original GAWK source code and put it into an extension. >>> Andrew Schorr has done the same for the extension that reads XML files. >>> Did you consider packaging your source code as an extension and >>> supplying the doc with the packaged extension ? >>> >>> Thanks for sending a notice to us. >>> >>> Juergen Kahrs >>> >>> >>> Hi, >>> >>> I'd like to announce a new JSON plug-in that I've been working on the past >>> days to consume JSON streams. >>> >>> This new project is inspired by the FPAT gawk's variable. As you all know, >>> FS allows you to specify how to find the field separator. And FPAT allows >>> you to specify how to find the field contents. That's how I designed the >>> plugin. There is a JPAT variable that allows you to specify how to find the >>> contents. It works just like FPAT, but for JSON. >>> >>> JPAT is an array and each index will tell how the plug-in should fill the >>> same index in the J array. The values of JPAT are JSON Pointers[1]. For >>> instance, given the following stream: >>> >>> {"name": "Beth", "pay_rate": 4.00, "hours_worked": 0} >>> {"name": "Dan", "pay_rate": 3.75, "hours_worked": 0} >>> {"name": "Kathy", "pay_rate": 4.00, "hours_worked": 10} >>> {"name": "Mark", "pay_rate": 5.00, "hours_worked": 20} >>> {"name": "Mary", "pay_rate": 5.50, "hours_worked": 22} >>> {"name": "Susie", "pay_rate": 4.25, "hours_worked": 18} >>> >>> And the following AWK program: >>> >>> BEGIN { >>> JPAT[1] = "/name" >>> JPAT[2] = "/pay_rate" >>> JPAT[3] = "/hours_worked" >>> } >>> J[3] > 0 { print J[1], J[2] * J[3] } >>> >>> The output will be: >>> >>> Kathy 40 >>> Mark 100 >>> Mary 121 >>> Susie 76.5 >>> >>> So the idea is to give AWK a tabular view of the array. AWK is great to >>> process tabular data, so I think this fits well with the rest of AWK's >>> design. >>> >>> The tool is robust and the fields inside the JSON can come in any order. It >>> doesn't matter for the order you specified. Arbitrary nesting is supported. >>> The JSON is parsed in a one-pass fashion and only decodes the fields you >>> requested. No DOM tree is saved at any time. >>> >>> You can update the JSON by calling the function json::output() which will >>> give you a new serialized JSON string as its return. To create a new JSON >>> is a cheap process as its only job is to concatenate strings from the >>> offsets saved during the parsing stage. If a value was originally boolean >>> and you set J[n] to 1 or 0, the field will stay a boolean (but any other >>> numeric value will coerce the type to a double). Values deleted from J are >>> converted to the null JSON value. >>> >>> You can also use the unwind operation to flatten an array in the document. >>> For instance, given the following data set: >>> >>> { "_id" : "jane", "joined" : "2011-03-02", "likes" : ["golf", >>> "racquetball"] } >>> { "_id" : "joe", "joined" : "2012-07-02", "likes" : ["tennis", "golf", >>> "swimming"] } >>> >>> And the following AWK program: >>> >>> BEGIN { >>> JPAT[1] = "/likes" >>> JUNWIND = 1 >>> } >>> { print NR, json::output() } >>> >>> The output will be: >>> >>> 1 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "golf" } >>> 2 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "racquetball" } >>> 3 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "tennis" } >>> 4 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "golf" } >>> 5 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "swimming" } >>> >>> And this other AWK program: >>> >>> BEGIN { >>> JPAT[1] = "/likes" >>> JUNWIND = 1 >>> } >>> { likes[J[1]] += 1 } >>> END { >>> PROCINFO["sorted_in"] = "@val_num_desc" >>> for (like in likes) { >>> print like, likes[like] >>> } >>> } >>> >>> Will output: >>> >>> golf 2 >>> tennis 1 >>> swimming 1 >>> racquetball 1 >>> >>> The unwind operation idea comes from the MongoDB aggregation framework for >>> which an $unwind operator exists. As each document is synthesized for that >>> array element, the variable JUIND will hold the index from that element in >>> the array. If that field wasn't an array and the document is just being >>> passed through, JUIND has the value 0. If JUIND value is different than 0, >>> JULENGTH is also set with the array length. These two variables let you >>> delimit array elements that have origin in the same document. >>> >>> The homepage of the project is: https://gitlab.com/vinipsmaker/gawk-tabjson >>> >>> I'd like to know how strange this idea is for AWK. Do you believe it is too >>> far off? Or maybe do you think the AWK spirit for tabular data processing >>> has been properly absorbed here? Which one (it can't be both)? >>> >>> BUGS >>> >>> □ The README is pretty outdated. This very email is more accurate than >>> the README page. That's on me. I like to document how I'd like to use a >>> tool before I implement it, so implementation came after the README. >>> □ Right now, I don't like the name json::output(). If you know a better >>> name, please let me know. >>> □ The JSON pointer syntax is 0-indexed as opposed to AWK's 1-indexed >>> mindset. I did it for compatibility with JSON pointer expressions, but >>> I believe this decision was a mistake and I'll be changing it later. >>> □ OFMT isn't respected yet. >>> □ I don't know how to report invalid JSONs from the stream. I don't know >>> what'd be the AWK way here. >>> □ Right now the plug-in always takes over streams. I'm planning to check >>> for the value of JPAT to decide if the plug-in's getline should be >>> activated. And to also provide a split()-like function so you could use >>> the same functionality on a per-record basis. This idea requires >>> thought and design on how to best proceed. >>> □ You have to pass the flag --characters-as-bytes (or the shorthand -b) >>> when invoking gawk. >>> □ Only newline delimited JSONs are supported right now, but it's possible >>> to improve the plug-in to automatically recognize RS. >>> □ The plug-in can't right now insert new fields in the documents. This >>> operation is also possible with the chosen approach, but I haven't >>> spent the time here. >>> □ A handle for DOM-parsed subtrees could be created, but this feature >>> would also require thought and design. >>> □ Right now I don't use mmap() when available. mmap() should improve >>> plug-in performance (although it has already beaten jq performance on >>> my tests). >>> □ There isn't MPFR support yet. >>> >>> And I hope I didn't get you too excited as the time I have available to >>> spend on this project for this year is nearing its end and further work >>> will have to be postponed to later. >>> >>> [1] https://tools.ietf.org/html/rfc6901 >>> >>> -- >>> Vinícius dos Santos Oliveira >>> https://vinipsmaker.github.io/ >>> >>> >>> >>> _______________________________________________ >>> Gawkextlib-users mailing list >>> Gaw...@li... >>> https://lists.sourceforge.net/lists/listinfo/gawkextlib-users >>> >>> >> >>> _______________________________________________ >>> Gawkextlib-users mailing list >>> Gaw...@li... >>> https://lists.sourceforge.net/lists/listinfo/gawkextlib-users >> >> -- >> Andrew Schorr e-mail: as...@te... >> Telemetry Investments, L.L.C. phone: 917-305-1748 >> 152 W 36th St, #402 fax: 212-425-5550 >> New York, NY 10018-8765 >> >> >> _______________________________________________ >> Gawkextlib-users mailing list >> Gaw...@li... >> https://lists.sourceforge.net/lists/listinfo/gawkextlib-users |
From: Andrew J. S. <as...@te...> - 2020-09-08 18:42:03
|
I installed meson and boost on my Fedora 32 system, but I get an error when I run "meson builddir": sh-5.0$ meson builddir The Meson build system Version: 0.55.1 Source dir: /home/users/schorr/gawk-tabjson Build dir: /home/users/schorr/gawk-tabjson/builddir Build type: native build Project name: gawk-jsonstream Project version: undefined C++ compiler for the host machine: c++ (gcc 10.2.1 "c++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)") C++ linker for the host machine: c++ ld.bfd 2.34-4 Host machine cpu family: x86_64 Host machine cpu: x86_64 Found pkg-config: /usr/bin/pkg-config (1.6.3) Run-time dependency Boost found: NO meson.build:3:0: ERROR: Dependency "boost" not found A full log can be found at /home/users/schorr/gawk-tabjson/builddir/meson-logs/meson-log.txt So I'm stuck. But I see that "jsonstream" is used in some places, but the namespace was set to "json". Might it make sense to use "jsonstream" for the namespace to avoid colliding with the existing gawk "json" library? Regards, Andy On Tue, Sep 08, 2020 at 01:11:30PM -0400, Andrew J. Schorr wrote: > Hi, > > If you take a look at the source code, it looks to me as if it is > packaged as an extension by calling register_input_parser and > adding a json::output function. You can see the code like so: > git clone https://gitlab.com/vinipsmaker/gawk-tabjson.git > I confess that I have no experience with the Meson build > system, so I haven't tried building it. > > I agree that this looks like an interesting feature set. > Are you aware that there's an existing gawk json extension > for converting between JSON and gawk arrays? It's very simple > and straightforward: > http://gawkextlib.sourceforge.net/json/json.html#:~:text=Summary,gawk%20associative%20arrays%20and%20JSON. > > I wonder whether you might consider using a different namespace > to avoid colliding with the existing json extension. > > Regards, > Andy > > On Tue, Sep 08, 2020 at 06:53:00PM +0200, Jürgen Kahrs via Gawkextlib-users wrote: > > Hello Vinícius, > > after reading your examples scripts, your project reminded me of > > the XML extension that exists for GNU Awk. In both cases AWK > > supplies the syntax for the scripting and the extension supplies > > the access mechansim to the data files (JSON / XML). > > In cases like these, AWK's syntax is really much more readable > > than the syntax of other tools that were created especially for > > one kind of new data format and nothing else. > > > > After looking at your web page, I got the impression that your > > project is not an "extension" to GAWK in the established technical > > sense of GAWK extensions. > > https://www.gnu.org/software/gawk/manual/html_node/Extension-Intro.html > > Your source code looks rather like a proposal for a change to the > > source code of GAWK. This idea (building access to a special file > > format into GAWK itself) has already been discussed several times. > > For example, reading CSV, XML, PNG and ZIP files could have been built > > into GAWK's released source code. Such ideas have been rejected > > for good reasons. Most importantly, GAWK would grow too fat for > > small embedded systems, would depend on too many external libraries > > and the work load of source code management, testing and documentation > > would be put all onto the shoulders of the GAWK maintainer. > > Did you know that mawk is still so popular mostly because it is not as > > "fat and slow as GAWK has become over the years" ? > > (sorry Arnold, I am exaggerating a bit to make the point) > > > > Now you see the way my argument goes: Keep file format access > > outside of the original GAWK source code and put it into an extension. > > Andrew Schorr has done the same for the extension that reads XML files. > > Did you consider packaging your source code as an extension and > > supplying the doc with the packaged extension ? > > > > Thanks for sending a notice to us. > > > > Juergen Kahrs > > > > > > Hi, > > > > I'd like to announce a new JSON plug-in that I've been working on the past > > days to consume JSON streams. > > > > This new project is inspired by the FPAT gawk's variable. As you all know, > > FS allows you to specify how to find the field separator. And FPAT allows > > you to specify how to find the field contents. That's how I designed the > > plugin. There is a JPAT variable that allows you to specify how to find the > > contents. It works just like FPAT, but for JSON. > > > > JPAT is an array and each index will tell how the plug-in should fill the > > same index in the J array. The values of JPAT are JSON Pointers[1]. For > > instance, given the following stream: > > > > {"name": "Beth", "pay_rate": 4.00, "hours_worked": 0} > > {"name": "Dan", "pay_rate": 3.75, "hours_worked": 0} > > {"name": "Kathy", "pay_rate": 4.00, "hours_worked": 10} > > {"name": "Mark", "pay_rate": 5.00, "hours_worked": 20} > > {"name": "Mary", "pay_rate": 5.50, "hours_worked": 22} > > {"name": "Susie", "pay_rate": 4.25, "hours_worked": 18} > > > > And the following AWK program: > > > > BEGIN { > > JPAT[1] = "/name" > > JPAT[2] = "/pay_rate" > > JPAT[3] = "/hours_worked" > > } > > J[3] > 0 { print J[1], J[2] * J[3] } > > > > The output will be: > > > > Kathy 40 > > Mark 100 > > Mary 121 > > Susie 76.5 > > > > So the idea is to give AWK a tabular view of the array. AWK is great to > > process tabular data, so I think this fits well with the rest of AWK's > > design. > > > > The tool is robust and the fields inside the JSON can come in any order. It > > doesn't matter for the order you specified. Arbitrary nesting is supported. > > The JSON is parsed in a one-pass fashion and only decodes the fields you > > requested. No DOM tree is saved at any time. > > > > You can update the JSON by calling the function json::output() which will > > give you a new serialized JSON string as its return. To create a new JSON > > is a cheap process as its only job is to concatenate strings from the > > offsets saved during the parsing stage. If a value was originally boolean > > and you set J[n] to 1 or 0, the field will stay a boolean (but any other > > numeric value will coerce the type to a double). Values deleted from J are > > converted to the null JSON value. > > > > You can also use the unwind operation to flatten an array in the document. > > For instance, given the following data set: > > > > { "_id" : "jane", "joined" : "2011-03-02", "likes" : ["golf", > > "racquetball"] } > > { "_id" : "joe", "joined" : "2012-07-02", "likes" : ["tennis", "golf", > > "swimming"] } > > > > And the following AWK program: > > > > BEGIN { > > JPAT[1] = "/likes" > > JUNWIND = 1 > > } > > { print NR, json::output() } > > > > The output will be: > > > > 1 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "golf" } > > 2 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "racquetball" } > > 3 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "tennis" } > > 4 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "golf" } > > 5 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "swimming" } > > > > And this other AWK program: > > > > BEGIN { > > JPAT[1] = "/likes" > > JUNWIND = 1 > > } > > { likes[J[1]] += 1 } > > END { > > PROCINFO["sorted_in"] = "@val_num_desc" > > for (like in likes) { > > print like, likes[like] > > } > > } > > > > Will output: > > > > golf 2 > > tennis 1 > > swimming 1 > > racquetball 1 > > > > The unwind operation idea comes from the MongoDB aggregation framework for > > which an $unwind operator exists. As each document is synthesized for that > > array element, the variable JUIND will hold the index from that element in > > the array. If that field wasn't an array and the document is just being > > passed through, JUIND has the value 0. If JUIND value is different than 0, > > JULENGTH is also set with the array length. These two variables let you > > delimit array elements that have origin in the same document. > > > > The homepage of the project is: https://gitlab.com/vinipsmaker/gawk-tabjson > > > > I'd like to know how strange this idea is for AWK. Do you believe it is too > > far off? Or maybe do you think the AWK spirit for tabular data processing > > has been properly absorbed here? Which one (it can't be both)? > > > > BUGS > > > > □ The README is pretty outdated. This very email is more accurate than > > the README page. That's on me. I like to document how I'd like to use a > > tool before I implement it, so implementation came after the README. > > □ Right now, I don't like the name json::output(). If you know a better > > name, please let me know. > > □ The JSON pointer syntax is 0-indexed as opposed to AWK's 1-indexed > > mindset. I did it for compatibility with JSON pointer expressions, but > > I believe this decision was a mistake and I'll be changing it later. > > □ OFMT isn't respected yet. > > □ I don't know how to report invalid JSONs from the stream. I don't know > > what'd be the AWK way here. > > □ Right now the plug-in always takes over streams. I'm planning to check > > for the value of JPAT to decide if the plug-in's getline should be > > activated. And to also provide a split()-like function so you could use > > the same functionality on a per-record basis. This idea requires > > thought and design on how to best proceed. > > □ You have to pass the flag --characters-as-bytes (or the shorthand -b) > > when invoking gawk. > > □ Only newline delimited JSONs are supported right now, but it's possible > > to improve the plug-in to automatically recognize RS. > > □ The plug-in can't right now insert new fields in the documents. This > > operation is also possible with the chosen approach, but I haven't > > spent the time here. > > □ A handle for DOM-parsed subtrees could be created, but this feature > > would also require thought and design. > > □ Right now I don't use mmap() when available. mmap() should improve > > plug-in performance (although it has already beaten jq performance on > > my tests). > > □ There isn't MPFR support yet. > > > > And I hope I didn't get you too excited as the time I have available to > > spend on this project for this year is nearing its end and further work > > will have to be postponed to later. > > > > [1] https://tools.ietf.org/html/rfc6901 > > > > -- > > Vinícius dos Santos Oliveira > > https://vinipsmaker.github.io/ > > > > > > > > _______________________________________________ > > Gawkextlib-users mailing list > > Gaw...@li... > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > _______________________________________________ > > Gawkextlib-users mailing list > > Gaw...@li... > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > -- > Andrew Schorr e-mail: as...@te... > Telemetry Investments, L.L.C. phone: 917-305-1748 > 152 W 36th St, #402 fax: 212-425-5550 > New York, NY 10018-8765 > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users -- Andrew Schorr e-mail: as...@te... Telemetry Investments, L.L.C. phone: 917-305-1748 152 W 36th St, #402 fax: 212-425-5550 New York, NY 10018-8765 |
From: Andrew J. S. <as...@te...> - 2020-09-08 17:11:50
|
Hi, If you take a look at the source code, it looks to me as if it is packaged as an extension by calling register_input_parser and adding a json::output function. You can see the code like so: git clone https://gitlab.com/vinipsmaker/gawk-tabjson.git I confess that I have no experience with the Meson build system, so I haven't tried building it. I agree that this looks like an interesting feature set. Are you aware that there's an existing gawk json extension for converting between JSON and gawk arrays? It's very simple and straightforward: http://gawkextlib.sourceforge.net/json/json.html#:~:text=Summary,gawk%20associative%20arrays%20and%20JSON. I wonder whether you might consider using a different namespace to avoid colliding with the existing json extension. Regards, Andy On Tue, Sep 08, 2020 at 06:53:00PM +0200, Jürgen Kahrs via Gawkextlib-users wrote: > Hello Vinícius, > after reading your examples scripts, your project reminded me of > the XML extension that exists for GNU Awk. In both cases AWK > supplies the syntax for the scripting and the extension supplies > the access mechansim to the data files (JSON / XML). > In cases like these, AWK's syntax is really much more readable > than the syntax of other tools that were created especially for > one kind of new data format and nothing else. > > After looking at your web page, I got the impression that your > project is not an "extension" to GAWK in the established technical > sense of GAWK extensions. > https://www.gnu.org/software/gawk/manual/html_node/Extension-Intro.html > Your source code looks rather like a proposal for a change to the > source code of GAWK. This idea (building access to a special file > format into GAWK itself) has already been discussed several times. > For example, reading CSV, XML, PNG and ZIP files could have been built > into GAWK's released source code. Such ideas have been rejected > for good reasons. Most importantly, GAWK would grow too fat for > small embedded systems, would depend on too many external libraries > and the work load of source code management, testing and documentation > would be put all onto the shoulders of the GAWK maintainer. > Did you know that mawk is still so popular mostly because it is not as > "fat and slow as GAWK has become over the years" ? > (sorry Arnold, I am exaggerating a bit to make the point) > > Now you see the way my argument goes: Keep file format access > outside of the original GAWK source code and put it into an extension. > Andrew Schorr has done the same for the extension that reads XML files. > Did you consider packaging your source code as an extension and > supplying the doc with the packaged extension ? > > Thanks for sending a notice to us. > > Juergen Kahrs > > > Hi, > > I'd like to announce a new JSON plug-in that I've been working on the past > days to consume JSON streams. > > This new project is inspired by the FPAT gawk's variable. As you all know, > FS allows you to specify how to find the field separator. And FPAT allows > you to specify how to find the field contents. That's how I designed the > plugin. There is a JPAT variable that allows you to specify how to find the > contents. It works just like FPAT, but for JSON. > > JPAT is an array and each index will tell how the plug-in should fill the > same index in the J array. The values of JPAT are JSON Pointers[1]. For > instance, given the following stream: > > {"name": "Beth", "pay_rate": 4.00, "hours_worked": 0} > {"name": "Dan", "pay_rate": 3.75, "hours_worked": 0} > {"name": "Kathy", "pay_rate": 4.00, "hours_worked": 10} > {"name": "Mark", "pay_rate": 5.00, "hours_worked": 20} > {"name": "Mary", "pay_rate": 5.50, "hours_worked": 22} > {"name": "Susie", "pay_rate": 4.25, "hours_worked": 18} > > And the following AWK program: > > BEGIN { > JPAT[1] = "/name" > JPAT[2] = "/pay_rate" > JPAT[3] = "/hours_worked" > } > J[3] > 0 { print J[1], J[2] * J[3] } > > The output will be: > > Kathy 40 > Mark 100 > Mary 121 > Susie 76.5 > > So the idea is to give AWK a tabular view of the array. AWK is great to > process tabular data, so I think this fits well with the rest of AWK's > design. > > The tool is robust and the fields inside the JSON can come in any order. It > doesn't matter for the order you specified. Arbitrary nesting is supported. > The JSON is parsed in a one-pass fashion and only decodes the fields you > requested. No DOM tree is saved at any time. > > You can update the JSON by calling the function json::output() which will > give you a new serialized JSON string as its return. To create a new JSON > is a cheap process as its only job is to concatenate strings from the > offsets saved during the parsing stage. If a value was originally boolean > and you set J[n] to 1 or 0, the field will stay a boolean (but any other > numeric value will coerce the type to a double). Values deleted from J are > converted to the null JSON value. > > You can also use the unwind operation to flatten an array in the document. > For instance, given the following data set: > > { "_id" : "jane", "joined" : "2011-03-02", "likes" : ["golf", > "racquetball"] } > { "_id" : "joe", "joined" : "2012-07-02", "likes" : ["tennis", "golf", > "swimming"] } > > And the following AWK program: > > BEGIN { > JPAT[1] = "/likes" > JUNWIND = 1 > } > { print NR, json::output() } > > The output will be: > > 1 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "golf" } > 2 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "racquetball" } > 3 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "tennis" } > 4 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "golf" } > 5 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "swimming" } > > And this other AWK program: > > BEGIN { > JPAT[1] = "/likes" > JUNWIND = 1 > } > { likes[J[1]] += 1 } > END { > PROCINFO["sorted_in"] = "@val_num_desc" > for (like in likes) { > print like, likes[like] > } > } > > Will output: > > golf 2 > tennis 1 > swimming 1 > racquetball 1 > > The unwind operation idea comes from the MongoDB aggregation framework for > which an $unwind operator exists. As each document is synthesized for that > array element, the variable JUIND will hold the index from that element in > the array. If that field wasn't an array and the document is just being > passed through, JUIND has the value 0. If JUIND value is different than 0, > JULENGTH is also set with the array length. These two variables let you > delimit array elements that have origin in the same document. > > The homepage of the project is: https://gitlab.com/vinipsmaker/gawk-tabjson > > I'd like to know how strange this idea is for AWK. Do you believe it is too > far off? Or maybe do you think the AWK spirit for tabular data processing > has been properly absorbed here? Which one (it can't be both)? > > BUGS > > □ The README is pretty outdated. This very email is more accurate than > the README page. That's on me. I like to document how I'd like to use a > tool before I implement it, so implementation came after the README. > □ Right now, I don't like the name json::output(). If you know a better > name, please let me know. > □ The JSON pointer syntax is 0-indexed as opposed to AWK's 1-indexed > mindset. I did it for compatibility with JSON pointer expressions, but > I believe this decision was a mistake and I'll be changing it later. > □ OFMT isn't respected yet. > □ I don't know how to report invalid JSONs from the stream. I don't know > what'd be the AWK way here. > □ Right now the plug-in always takes over streams. I'm planning to check > for the value of JPAT to decide if the plug-in's getline should be > activated. And to also provide a split()-like function so you could use > the same functionality on a per-record basis. This idea requires > thought and design on how to best proceed. > □ You have to pass the flag --characters-as-bytes (or the shorthand -b) > when invoking gawk. > □ Only newline delimited JSONs are supported right now, but it's possible > to improve the plug-in to automatically recognize RS. > □ The plug-in can't right now insert new fields in the documents. This > operation is also possible with the chosen approach, but I haven't > spent the time here. > □ A handle for DOM-parsed subtrees could be created, but this feature > would also require thought and design. > □ Right now I don't use mmap() when available. mmap() should improve > plug-in performance (although it has already beaten jq performance on > my tests). > □ There isn't MPFR support yet. > > And I hope I didn't get you too excited as the time I have available to > spend on this project for this year is nearing its end and further work > will have to be postponed to later. > > [1] https://tools.ietf.org/html/rfc6901 > > -- > Vinícius dos Santos Oliveira > https://vinipsmaker.github.io/ > > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users -- Andrew Schorr e-mail: as...@te... Telemetry Investments, L.L.C. phone: 917-305-1748 152 W 36th St, #402 fax: 212-425-5550 New York, NY 10018-8765 |
From: <jue...@go...> - 2020-09-08 16:53:25
|
Hello Vinícius, after reading your examples scripts, your project reminded me of the XML extension that exists for GNU Awk. In both cases AWK supplies the syntax for the scripting and the extension supplies the access mechansim to the data files (JSON / XML). In cases like these, AWK's syntax is really much more readable than the syntax of other tools that were created especially for one kind of new data format and nothing else. After looking at your web page, I got the impression that your project is not an "extension" to GAWK in the established technical sense of GAWK extensions. https://www.gnu.org/software/gawk/manual/html_node/Extension-Intro.html Your source code looks rather like a proposal for a change to the source code of GAWK. This idea (building access to a special file format into GAWK itself) has already been discussed several times. For example, reading CSV, XML, PNG and ZIP files could have been built into GAWK's released source code. Such ideas have been rejected for good reasons. Most importantly, GAWK would grow too fat for small embedded systems, would depend on too many external libraries and the work load of source code management, testing and documentation would be put all onto the shoulders of the GAWK maintainer. Did you know that mawk is still so popular mostly because it is not as "fat and slow as GAWK has become over the years" ? (sorry Arnold, I am exaggerating a bit to make the point) Now you see the way my argument goes: Keep file format access outside of the original GAWK source code and put it into an extension. Andrew Schorr has done the same for the extension that reads XML files. Did you consider packaging your source code as an extension and supplying the doc with the packaged extension ? Thanks for sending a notice to us. Juergen Kahrs > Hi, > > I'd like to announce a new JSON plug-in that I've been working on the past days to consume JSON streams. > > This new project is inspired by the FPAT gawk's variable. As you all know, FS allows you to specify how to find the /field separator/. And FPAT allows you to specify how to find the /field contents/. That's how I designed the plugin. There is a JPAT variable that allows you to specify how to find the contents. It works just like FPAT, but for JSON. > > JPAT is an array and each index will tell how the plug-in should fill the same index in the J array. The values of JPAT are JSON Pointers[1]. For instance, given the following stream: > > {"name": "Beth", "pay_rate": 4.00, "hours_worked": 0} > {"name": "Dan", "pay_rate": 3.75, "hours_worked": 0} > {"name": "Kathy", "pay_rate": 4.00, "hours_worked": 10} > {"name": "Mark", "pay_rate": 5.00, "hours_worked": 20} > {"name": "Mary", "pay_rate": 5.50, "hours_worked": 22} > {"name": "Susie", "pay_rate": 4.25, "hours_worked": 18} > > And the following AWK program: > > BEGIN { > JPAT[1] = "/name" > JPAT[2] = "/pay_rate" > JPAT[3] = "/hours_worked" > } > J[3] > 0 { print J[1], J[2] * J[3] } > > The output will be: > > Kathy 40 > Mark 100 > Mary 121 > Susie 76.5 > > So the idea is to give AWK a /tabular view/ of the array. AWK is great to process tabular data, so I think this fits well with the rest of AWK's design. > > The tool is robust and the fields inside the JSON can come in any order. It doesn't matter for the order you specified. Arbitrary nesting is supported. The JSON is parsed in a one-pass fashion and only decodes the fields you requested. No DOM tree is saved at any time. > > You can update the JSON by calling the function json::output() which will give you a new serialized JSON string as its return. To create a new JSON is a cheap process as its only job is to concatenate strings from the offsets saved during the parsing stage. If a value was originally boolean and you set J[n] to 1 or 0, the field will stay a boolean (but any other numeric value will coerce the type to a double). Values deleted from J are converted to the null JSON value. > > You can also use the unwind operation to flatten an array in the document. For instance, given the following data set: > > { "_id" : "jane", "joined" : "2011-03-02", "likes" : ["golf", "racquetball"] } > { "_id" : "joe", "joined" : "2012-07-02", "likes" : ["tennis", "golf", "swimming"] } > > And the following AWK program: > > BEGIN { > JPAT[1] = "/likes" > JUNWIND = 1 > } > { print NR, json::output() } > > The output will be: > > 1 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "golf" } > 2 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "racquetball" } > 3 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "tennis" } > 4 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "golf" } > 5 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "swimming" } > > And this other AWK program: > > BEGIN { > JPAT[1] = "/likes" > JUNWIND = 1 > } > { likes[J[1]] += 1 } > END { > PROCINFO["sorted_in"] = "@val_num_desc" > for (like in likes) { > print like, likes[like] > } > } > > Will output: > > golf 2 > tennis 1 > swimming 1 > racquetball 1 > > The unwind operation idea comes from the MongoDB aggregation framework for which an $unwind operator exists. As each document is synthesized for that array element, the variable JUIND will hold the index from that element in the array. If that field wasn't an array and the document is just being passed through, JUIND has the value 0. If JUIND value is different than 0, JULENGTH is also set with the array length. These two variables let you delimit array elements that have origin in the same document. > > The homepage of the project is: https://gitlab.com/vinipsmaker/gawk-tabjson > > I'd like to know how strange this idea is for AWK. Do you believe it is too far off? Or maybe do you think the AWK spirit for tabular data processing has been properly absorbed here? Which one (it can't be both)? > > BUGS > > * The README is pretty outdated. This very email is more accurate than the README page. That's on me. I like to document how I'd like to use a tool before I implement it, so implementation came after the README. > * Right now, I don't like the name json::output(). If you know a better name, please let me know. > * The JSON pointer syntax is 0-indexed as opposed to AWK's 1-indexed mindset. I did it for compatibility with JSON pointer expressions, but I believe this decision was a mistake and I'll be changing it later. > * OFMT isn't respected yet. > * I don't know how to report invalid JSONs from the stream. I don't know what'd be the AWK way here. > * Right now the plug-in always takes over streams. I'm planning to check for the value of JPAT to decide if the plug-in's getline should be activated. And to also provide a split()-like function so you could use the same functionality on a per-record basis. This idea requires thought and design on how to best proceed. > * You have to pass the flag --characters-as-bytes (or the shorthand -b) when invoking gawk. > * Only newline delimited JSONs are supported right now, but it's possible to improve the plug-in to automatically recognize RS. > * The plug-in can't right now insert new fields in the documents. This operation is also possible with the chosen approach, but I haven't spent the time here. > * A handle for DOM-parsed subtrees could be created, but this feature would also require thought and design. > * Right now I don't use mmap() when available. mmap() should improve plug-in performance (although it has already beaten jq performance on my tests). > * There isn't MPFR support yet. > > > And I hope I didn't get you too excited as the time I have available to spend on this project for this year is nearing its end and further work will have to be postponed to later. > > [1] https://tools.ietf.org/html/rfc6901 > > -- > Vinícius dos Santos Oliveira > https://vinipsmaker.github.io/ > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users |
From: Vinícius d. S. O. <vin...@gm...> - 2020-09-08 10:17:36
|
Hi, I'd like to announce a new JSON plug-in that I've been working on the past days to consume JSON streams. This new project is inspired by the FPAT gawk's variable. As you all know, FS allows you to specify how to find the *field separator*. And FPAT allows you to specify how to find the *field contents*. That's how I designed the plugin. There is a JPAT variable that allows you to specify how to find the contents. It works just like FPAT, but for JSON. JPAT is an array and each index will tell how the plug-in should fill the same index in the J array. The values of JPAT are JSON Pointers[1]. For instance, given the following stream: {"name": "Beth", "pay_rate": 4.00, "hours_worked": 0} {"name": "Dan", "pay_rate": 3.75, "hours_worked": 0} {"name": "Kathy", "pay_rate": 4.00, "hours_worked": 10} {"name": "Mark", "pay_rate": 5.00, "hours_worked": 20} {"name": "Mary", "pay_rate": 5.50, "hours_worked": 22} {"name": "Susie", "pay_rate": 4.25, "hours_worked": 18} And the following AWK program: BEGIN { JPAT[1] = "/name" JPAT[2] = "/pay_rate" JPAT[3] = "/hours_worked" } J[3] > 0 { print J[1], J[2] * J[3] } The output will be: Kathy 40 Mark 100 Mary 121 Susie 76.5 So the idea is to give AWK a *tabular view* of the array. AWK is great to process tabular data, so I think this fits well with the rest of AWK's design. The tool is robust and the fields inside the JSON can come in any order. It doesn't matter for the order you specified. Arbitrary nesting is supported. The JSON is parsed in a one-pass fashion and only decodes the fields you requested. No DOM tree is saved at any time. You can update the JSON by calling the function json::output() which will give you a new serialized JSON string as its return. To create a new JSON is a cheap process as its only job is to concatenate strings from the offsets saved during the parsing stage. If a value was originally boolean and you set J[n] to 1 or 0, the field will stay a boolean (but any other numeric value will coerce the type to a double). Values deleted from J are converted to the null JSON value. You can also use the unwind operation to flatten an array in the document. For instance, given the following data set: { "_id" : "jane", "joined" : "2011-03-02", "likes" : ["golf", "racquetball"] } { "_id" : "joe", "joined" : "2012-07-02", "likes" : ["tennis", "golf", "swimming"] } And the following AWK program: BEGIN { JPAT[1] = "/likes" JUNWIND = 1 } { print NR, json::output() } The output will be: 1 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "golf" } 2 { "_id" : "jane", "joined" : "2011-03-02", "likes" : "racquetball" } 3 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "tennis" } 4 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "golf" } 5 { "_id" : "joe", "joined" : "2012-07-02", "likes" : "swimming" } And this other AWK program: BEGIN { JPAT[1] = "/likes" JUNWIND = 1 } { likes[J[1]] += 1 } END { PROCINFO["sorted_in"] = "@val_num_desc" for (like in likes) { print like, likes[like] } } Will output: golf 2 tennis 1 swimming 1 racquetball 1 The unwind operation idea comes from the MongoDB aggregation framework for which an $unwind operator exists. As each document is synthesized for that array element, the variable JUIND will hold the index from that element in the array. If that field wasn't an array and the document is just being passed through, JUIND has the value 0. If JUIND value is different than 0, JULENGTH is also set with the array length. These two variables let you delimit array elements that have origin in the same document. The homepage of the project is: https://gitlab.com/vinipsmaker/gawk-tabjson I'd like to know how strange this idea is for AWK. Do you believe it is too far off? Or maybe do you think the AWK spirit for tabular data processing has been properly absorbed here? Which one (it can't be both)? BUGS - The README is pretty outdated. This very email is more accurate than the README page. That's on me. I like to document how I'd like to use a tool before I implement it, so implementation came after the README. - Right now, I don't like the name json::output(). If you know a better name, please let me know. - The JSON pointer syntax is 0-indexed as opposed to AWK's 1-indexed mindset. I did it for compatibility with JSON pointer expressions, but I believe this decision was a mistake and I'll be changing it later. - OFMT isn't respected yet. - I don't know how to report invalid JSONs from the stream. I don't know what'd be the AWK way here. - Right now the plug-in always takes over streams. I'm planning to check for the value of JPAT to decide if the plug-in's getline should be activated. And to also provide a split()-like function so you could use the same functionality on a per-record basis. This idea requires thought and design on how to best proceed. - You have to pass the flag --characters-as-bytes (or the shorthand -b) when invoking gawk. - Only newline delimited JSONs are supported right now, but it's possible to improve the plug-in to automatically recognize RS. - The plug-in can't right now insert new fields in the documents. This operation is also possible with the chosen approach, but I haven't spent the time here. - A handle for DOM-parsed subtrees could be created, but this feature would also require thought and design. - Right now I don't use mmap() when available. mmap() should improve plug-in performance (although it has already beaten jq performance on my tests). - There isn't MPFR support yet. And I hope I didn't get you too excited as the time I have available to spend on this project for this year is nearing its end and further work will have to be postponed to later. [1] https://tools.ietf.org/html/rfc6901 -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/ |
From: Manuel C. <mco...@gm...> - 2020-05-28 16:33:42
|
El 28/05/2020 a las 13:12, Phil escribió: > Hi, > > I've just come across gawkextlib having previously used a regex in FPAT > to handle CSV properly. It's very useful, but I have one question. > > I often modify a specific CSV column based on the value in another > column. Without the CSV extension this is just done by comparing of $n > and assigning to $m, where n and m are the test and modified field > indexes, respectively. > > Whilst there is a nice helper function csvfield() to return fields via > the column name, there is no equivalent update fields. However looking > into the source code I can see internal array _csv_label exists and I seem to be > able to assign back to it. > > My question is - is there a more idiomatic way to do this, and if not is > it safe to use _csv_label like in the below example? I'm afraid not. Al least with the current version. Maybe a next version will make _csv_label visible, probably with a better name like csvindex. > > If the field "target_type" starts with text "Slow_Target" then split the > field "qualifier" on the first underscore and replace the qualifier with > the first split-out element: > > awk -i csv -i inplace -vCSVMODE=1 'NR>1 && substr(csvfield("target_type"),1,11)=="Slow_Target" { split(csvfield("qualifier"),isin,"_");$_csv_label["qualifier"]=isin[1];csvprint($0);next }{ print CSVRECORD }' example.csv > You can store the field indexes at NR==1. NR==1 {type=_csv_label["target_type"]; qual=_csv_label["qualifier"]; next} NR>1 && substr($type,1,11)=="Slow_Target" ... split($qual,isin,"_") ... > > I note this is a little slower than using FPAT and indexes too, but I > guess there must be a cost to using the CSV column names and translating them? The intended advantage of CSVMODE=1 is to simplify computations with the effective field values. Fields are automatically unquoted on input, and requoted by csvprint() on output. If the required process is just to handle CSV text chunks without modifying quotation, then the FPAT direct approach is probably enough. > > Thanks, > Phil. Thanks for the report. -- Manuel Collado - http://mcollado.z15.es |
From: Phil <ph...@be...> - 2020-05-28 12:18:38
|
Hi, I've just come across gawkextlib having previously used a regex in FPAT to handle CSV properly. It's very useful, but I have one question. I often modify a specific CSV column based on the value in another column. Without the CSV extension this is just done by comparing of $n and assigning to $m, where n and m are the test and modified field indexes, respectively. Whilst there is a nice helper function csvfield() to return fields via the column name, there is no equivalent update fields. However looking into the source code I can see internal array _csv_label exists and I seem to be able to assign back to it. My question is - is there a more idiomatic way to do this, and if not is it safe to use _csv_label like in the below example? If the field "target_type" starts with text "Slow_Target" then split the field "qualifier" on the first underscore and replace the qualifier with the first split-out element: awk -i csv -i inplace -vCSVMODE=1 'NR>1 && substr(csvfield("target_type"),1,11)=="Slow_Target" { split(csvfield("qualifier"),isin,"_");$_csv_label["qualifier"]=isin[1];csvprint($0);next }{ print CSVRECORD }' example.csv I note this is a little slower than using FPAT and indexes too, but I guess there must be a cost to using the CSV column names and translating them? Thanks, Phil. |
From: Andrew J. S. <as...@te...> - 2020-01-05 14:32:04
|
Hi, On Fri, Jan 03, 2020 at 10:05:28PM +0000, lemur117 wrote: > Are there any simple tutorials I could follow for building an input parser? The documentation stated to check readdir.c for inspiration, but I am having a hard time understanding how to even compile it. The only way that I've gotten to work is if I do verbatim what is done by "make" on the gawk source code: `gcc -DHAVE_CONFIG_H -I. -I./.. -g -O2 -DNDEBUG -MT readdir.lo -MD -MP -MF .deps/readdir.Tpo -c -o readdir.lo readdir.c`. The library object "readdir.lo" is created, but I don't know how that helps me create my own usable parser. > > What I'm looking for, if I'm not asking too much, is something like: > > 1. C/C++ code for a trivial input parser, e.g. one that changes each line of input to be the number of characters that appeared on that line ('hello\nworld!\n' becomes '5\n6\n'). > 2. Compilation instructions, including what to do with the output of compilation (object file?). > 3. Awk code that invokes the new parser. > > readdir.c is a candidate for the first item, but I think it has added complexity to fit it in with the rest of the gawk source code build process (e.g. config.h, support for Windows, etc.), which a standalone parser may not need. I'd like something pretty barebones to be able to wrap my feeble mind around. > > Your consideration is appreciated. You can find more examples of input parsers in the gawkextlib project in xml/xml_interface.c and in csv/csv.c. With those in addition to readdir.c, I think you should be able to get a good idea of how this stuff works. Also note that there are test cases for all 3 of these extensions that show how to use them. Regarding build instructions: I think you simply need to dig a bit more deeply into the "make" output. For example: bash-4.2$ touch extension/readdir.c bash-4.2$ make 2>&1 | grep readdir /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I./.. -g -O2 -DARRAYDEBUG -DYYDEBUG -DLOCALEDEBUG -DMEMDEBUG -Wall -fno-builtin -g3 -ggdb3 -MT readdir.lo -MD -MP -MF .deps/readdir.Tpo -c -o readdir.lo readdir.c libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./.. -g -O2 -DARRAYDEBUG -DYYDEBUG -DLOCALEDEBUG -DMEMDEBUG -Wall -fno-builtin -g3 -ggdb3 -MT readdir.lo -MD -MP -MF .deps/readdir.Tpo -c readdir.c -fPIC -DPIC -o .libs/readdir.o mv -f .deps/readdir.Tpo .deps/readdir.Plo /bin/sh ./libtool --tag=CC --mode=link gcc -g -O2 -DARRAYDEBUG -DYYDEBUG -DLOCALEDEBUG -DMEMDEBUG -Wall -fno-builtin -g3 -ggdb3 -module -avoid-version -no-undefined -o readdir.la -rpath /extra_disk/tmp/gawk-test/lib/gawk readdir.lo -lm libtool: link: rm -fr .libs/readdir.la .libs/readdir.lai .libs/readdir.so libtool: link: gcc -shared -fPIC -DPIC .libs/readdir.o -lm -g -O2 -g3 -ggdb3 -Wl,-soname -Wl,readdir.so -o .libs/readdir.so libtool: link: ( cd ".libs" && rm -f "readdir.la" && ln -s "../readdir.la" "readdir.la" ) So as you can see, after compiling, you need to run the linker to build the shared library readdir.so ("gcc -shared -fPIC ..."). It's just 2 gcc commands: one to compile, and another to link. You can ignore the libtool nonsense, assuming you are on Linux. You can then load readdir.so using the -l readdir command-line arg, or @load "readdir" in the awk code, as long as you have placed readdir.so somewhere in the path designated by AWKLIBPATH. Regards, Andy |
From: lemur117 <lem...@pr...> - 2020-01-03 22:05:47
|
Are there any simple tutorials I could follow for building an input parser? The documentation stated to check readdir.c for inspiration, but I am having a hard time understanding how to even compile it. The only way that I've gotten to work is if I do verbatim what is done by "make" on the gawk source code: `gcc -DHAVE_CONFIG_H -I. -I./.. -g -O2 -DNDEBUG -MT readdir.lo -MD -MP -MF .deps/readdir.Tpo -c -o readdir.lo readdir.c`. The library object "readdir.lo" is created, but I don't know how that helps me create my own usable parser. What I'm looking for, if I'm not asking too much, is something like: 1. C/C++ code for a trivial input parser, e.g. one that changes each line of input to be the number of characters that appeared on that line ('hello\nworld!\n' becomes '5\n6\n'). 2. Compilation instructions, including what to do with the output of compilation (object file?). 3. Awk code that invokes the new parser. readdir.c is a candidate for the first item, but I think it has added complexity to fit it in with the rest of the gawk source code build process (e.g. config.h, support for Windows, etc.), which a standalone parser may not need. I'd like something pretty barebones to be able to wrap my feeble mind around. Your consideration is appreciated. -Jon Sent with ProtonMail Secure Email. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, January 3, 2020 11:58 AM, lemur117 via Gawkextlib-users <gaw...@li...> wrote: > Thanks! This looks exactly like what I need. I'll play around with it. > > -Jon > > Sent with ProtonMail Secure Email. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Friday, January 3, 2020 10:15 AM, Andrew J. Schorr as...@te... wrote: > > > Hi, > > On Fri, Jan 03, 2020 at 03:50:00PM +0000, lemur117 via Gawkextlib-users wrote: > > > > > Is it possible for a dynamic extension to preprocess a file before feeding it > > > into awk's pattern-action rules? > > > > Yes! See the docs here: > > https://www.gnu.org/software/gawk/manual/html_node/Input-Parsers.html > > Regards, > > Andy > > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users |
From: lemur117 <lem...@pr...> - 2020-01-03 17:59:24
|
Thanks! This looks exactly like what I need. I'll play around with it. -Jon Sent with ProtonMail Secure Email. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, January 3, 2020 10:15 AM, Andrew J. Schorr <as...@te...> wrote: > Hi, > > On Fri, Jan 03, 2020 at 03:50:00PM +0000, lemur117 via Gawkextlib-users wrote: > > > Is it possible for a dynamic extension to preprocess a file before feeding it > > into awk's pattern-action rules? > > Yes! See the docs here: > > https://www.gnu.org/software/gawk/manual/html_node/Input-Parsers.html > > Regards, > Andy |
From: Andrew J. S. <as...@te...> - 2020-01-03 16:15:50
|
Hi, On Fri, Jan 03, 2020 at 03:50:00PM +0000, lemur117 via Gawkextlib-users wrote: > Is it possible for a dynamic extension to preprocess a file before feeding it > into awk's pattern-action rules? Yes! See the docs here: https://www.gnu.org/software/gawk/manual/html_node/Input-Parsers.html Regards, Andy |
From: lemur117 <lem...@pr...> - 2020-01-03 15:50:29
|
Is it possible for a dynamic extension to preprocess a file before feeding it into awk's pattern-action rules? For instance, if the input files are compressed, the preprocessor would decompress them; or if they are encoded, the preprocessor would decode them; or if they are encrypted, the preprocessor would decrypt them. I have built many shell scripts to preprocess files externally before providing them to awk as input, but there have always been disadvantages. So I am exploring the possibility of implementing a preprocessor as a dynamic extension to mitigate the disadvantages. As a simple example, assume I have three gzipped files (file1.gz, file2.gz, file3.gz) that I want awk to operate on, and I have a compiled C program that can unzip a file to stdout, named "my_gunzip." Can a dynamic extension be built that would allow awk to be invoked on the input as: "awk -f my_script.awk file1.gz file2.gz file3.gz" where my_script.awk calls "my_gunzip" to preprocess each file before feeding to awk's pattern-action rules? Thanks for your consideration. Sent with [ProtonMail](https://protonmail.com) Secure Email. |
From: <jue...@go...> - 2019-12-27 14:14:59
|
Hello, our mailing list for *users* is still working and we have a new subscriber (Galen Tackett). Galen sent the following question to the *developers* list, which does not exist any more since the re-organization of mailing lists at SourceForge some years ago. Please feel free to comment on Galen's question. Juergen Kahrs -------- Weitergeleitete Nachricht -------- Betreff: [Gawkextlib-developer] Some revisions and corrections for gawk-xml manual Datum: Fri, 27 Dec 2019 13:57:11 +0000 (UTC) Von: Galen Tackett via Gawkextlib-developer <gaw...@li...> Antwort an: For internal communication among developers <gaw...@li...> An: gaw...@li... <gaw...@li...> Kopie (CC): Galen Tackett <glt...@ya...> I have a few corrections and revisions to the gawk-xml manual that I'd like to offer How should I go about this? My goal has been to fix typographic errors, correct a few problems with and make some additions to the index, and to clarify the text in a few places by rendering it in more idiomatic English. Thanks! Galen |
From: <jue...@go...> - 2019-09-22 18:53:22
|
Hello, the attachment contains a mail that was automatically discarded by the SourceForge mailing system. The reason for the automatic discard was the length that exceeded our limit of 40 KB. I have increased the limit to 100 KB now. So this posting should be accepted by the mailing system and reach all subscribers. |
From: Andrew J. S. <as...@te...> - 2019-09-22 17:45:32
|
Hi, First of all, please group reply. The gawkextlib project is a collective effort. I will not respond to any further personal emails. Second, you need to fork the repo, apply your changes, and then submit a merge request. See the sourceforge git docs here: https://sourceforge.net/p/forge/documentation/Git/ There's a video here: https://www.youtube.com/watch?v=Rp8f0cLmMrI Regards, Andy On Sun, Sep 22, 2019 at 08:29:25PM +0300, Oğuz wrote: > Okay, what should I do right now? How do I make a proper diff so that you guys > can get a working copy of the extension? > > On Sun, Sep 22, 2019 at 6:57 PM Andrew J. Schorr < > as...@te...> wrote: > > Hi, > > It doesn't need a "manual", but it will need a man page explaining > what it does. But maybe somebody else can help with that. > > I'm cc'ing the gawkextlib-users mailing list. Let's please discuss > on list. > > Thanks, > Andy > > On Sun, Sep 22, 2019 at 06:50:51PM +0300, Oğuz wrote: > > Well, I managed to make it compile with my extension. But writing a > manual etc. > > looks like too much work since I suck at English > > > > On Sun, Sep 22, 2019 at 5:52 PM Andrew J. Schorr < > > as...@te...> wrote: > > > > Hi, yes, sorry, I should not have said "donating". The gawkextlib > project > > is simply a place to post GPL'ed gawk libraries. > > > > There is more info here: > > http://gawkextlib.sourceforge.net/ > > and more particularly: > > http://gawkextlib.sourceforge.net/Development.html > > > > Regards, > > Andy > > > > On Sun, Sep 22, 2019 at 07:53:49AM -0600, ar...@sk... wrote: > > > Donating was a poor word. Allowing the project to use your code > would > > > be better, you retain ownership and copyright. > > > > > > I'm cc-ing Andy who can help out in more detail. > > > > > > For a start, clone the gawkextlib project and see if you can add a > > > new library for your code using the scripts there. Then replace > the > > > stub code with your code. Get it all to compile, and then send > Andy > > > a diff. Voila! :-) > > > > > > Arnold > > > > > > Oğuz <ogu...@gm...> wrote: > > > > > > > It doesn't say anything about *donating your code*, what is the > > protocol > > > > for that? > > > > > > > > On Sun, Sep 22, 2019 at 3:40 PM Aharon Robbins <ar...@sk... > > > > wrote: > > > > > > > > > Please consider donating your code to the gawkextlib project. > See the > > gawk > > > > > doc > > > > > for details... > > > > > > > > > > Thanks, > > > > > > > > > > Arnold > > > > > > > > > > In article <qm4o7h$ed6$1...@do...> you write: > > > > > >On Tue, 17 Sep 2019 18:01:15 +0000, Kenny McCormack wrote: > > > > > > > > > > > >> In article <qlnfmn$qqv$1...@do...>, > > > > > >> Oðuz <ogu...@gm...> wrote: > > > > > >>>> 4) I did something similar - quite some time ago - > called > > 'spawn' > > > > > >>>> - that provides both 'spawn' and 'exec' > functionality. > > 'spawn' > > > > > >>>> runs the program as a child process, while 'exec' > replaces > > the > > > > > >>>> GAWK program with another program. You might want to > > consider > > > > > >>>> renaming your 'exec' to 'spawn', since 'exec' means, > well, > > exec(). > > > > > >>>> > > > > > >>>> 5) If you're interested in my approach to this > problem, let > > me > > > > > >>>> know, > > > > > >>>> but I'm assuming that since you've gone ahead and done > it > > > > > >>>> yourself, you're more in tune with your own code. > > > > > >>> > > > > > >>>Well, "spawn" sounds way better, thanks! And yes, I would > like to > > see > > > > > >>>your approach. > > > > > >>> > > > > > >>> > > > > > >> Try this command: > > > > > >> > > > > > >> $ wget http://shell.xmission.com:PORT/spawn.zip > > > > > >> > > > > > >> where PORT is 65401. > > > > > >> > > > > > >> I think it contains everything you need. Note that it is > written > > for > > > > > >> the "old" GAWK API (pre-gawk-4.2). > > > > > >> > > > > > >> This version supports 3 types of calls: > > > > > >> > > > > > >> 1) split("ls -l foo",A) > > > > > >> spawn(x,0,A[1],A) # Read the source for what to > put in > > for > > > > > >'x'. > > > > > >> 2) exec(A[1],A) > > > > > >> 3) exec("ls","ls","-l","foo") > > > > > >> > > > > > >> Note that for completeness, I should probably extend "spawn" > the > > same > > > > > >> way, so that you can supply the args directly instead of via > the > > array, > > > > > >> but I never got around to doing it. > > > > > >> > > > > > >> Also note that the purpose of spawn_setInterrupt() is kind > of the > > > > > >> opposite of what system() does (and which you re-created in > your > > code). > > > > > >> That is, it allows the child to get killed without killing > the > > parent. > > > > > >> I always found the (usual and standard) behavior of system() > with > > regard > > > > > >> to signals weird and unhelpful. > > > > > > > > > > > >I added support for supplying args via an array, here is the > last > > version > > > > > >if you're interested: > > > > > > > > > > > > > > > > > >/* provide spawn function for GAWK > > > > > > * > > > > > > * build (GCC/clang) > > > > > > * cc -shared -fPIC -o spawn.so spawn.c > > > > > > */ > > > > > > > > > > > >#include <errno.h> > > > > > >#include <signal.h> > > > > > >#include <stddef.h> > > > > > >#include <stdio.h> > > > > > >#include <stdlib.h> > > > > > >#include <string.h> > > > > > >#include <sys/stat.h> > > > > > >#include <sys/types.h> > > > > > >#include <sys/wait.h> > > > > > >#include <unistd.h> > > > > > >#include <gawkapi.h> > > > > > > > > > > > >#define error(how, func) \ > > > > > > how ## fatal(ext_id, "%s: %s", func, strerror(errno)) > > > > > > > > > > > >int plugin_is_GPL_compatible; > > > > > > > > > > > >static const gawk_api_t *api; > > > > > >static awk_ext_id_t ext_id; > > > > > >static const char *ext_version = NULL; > > > > > >static awk_bool_t (*init_func)(void) = NULL; > > > > > > > > > > > >/* do_spawn - spawn(arg0[, ...]) > > > > > > * spawn a child process, wait for its termination and > return its > > exit > > > > > >status. > > > > > > * arg0 can be an array, in such case; arg0[0], arg0[1], ... > will > > > > > >constitute the > > > > > > * argument list to the new process > > > > > > * > > > > > > * return value: > > > > > > * * exit status of spawned process - if everything goes > well > > > > > > * * status of spawned process as - if it gets > signaled > > > > > > * described in waitpid(3) > > > > > > * * 127 - if execvp() fails > > > > > > * * -1 - otherwise > > > > > > * > > > > > > * causes GAWK to exit if: > > > > > > * * arg0 is not an array, and an argument after it is of > > non-scalar > > > > > >type, > > > > > > * * arg0 is an array, and an element in it is of > non-scalar > > type, > > > > > > * * malloc() or waitpid() fails > > > > > > */ > > > > > > > > > > > >static awk_value_t * > > > > > >do_spawn (int nargs, awk_value_t *result, awk_ext_func_t > *unused) > > > > > >{ > > > > > > char **args; > > > > > > > > > > > > awk_value_t arg; > > > > > > if (get_argument(0, AWK_ARRAY, &arg)) { > > > > > > awk_array_t a_cookie = arg.array_cookie; > > > > > > get_element_count(a_cookie, (size_t *) &nargs); > > > > > > args = malloc((nargs + 1) * sizeof(char *)); > > > > > > if (! args) { > > > > > > error(,"malloc"); > > > > > > } > > > > > > > > > > > > awk_value_t index; > > > > > > for (int i = 0; i < nargs; i++) { > > > > > > /* note that a missing index will also > cause > > an > > > > > >error, e.g: > > > > > > * > > > > > > * a[0] = "echo" > > > > > > * a[2] = "hello world" > > > > > > * spawn(a) > > > > > > */ > > > > > > make_number(i, &index); > > > > > > if (! get_array_element(a_cookie, & > index, > > > > > >AWK_STRING, &arg)) { > > > > > > errno = EINVAL; > > > > > > error(,"get_array_element"); > > > > > > } > > > > > > args[i] = arg.str_value.str; > > > > > > } > > > > > > args[nargs] = NULL; > > > > > > } > > > > > > else { > > > > > > args = malloc((nargs + 1) * sizeof(char *)); > > > > > > if (! args) { > > > > > > error(,"malloc"); > > > > > > } > > > > > > > > > > > > for (int i = 0; i < nargs; i++) { > > > > > > if (! get_argument(i, AWK_STRING, & > arg)) { > > > > > > errno = EINVAL; > > > > > > error(,"get_argument"); > > > > > > } > > > > > > args[i] = arg.str_value.str; > > > > > > } > > > > > > args[nargs] = NULL; > > > > > > } > > > > > > > > > > > > /* handle signals the way system() does. */ > > > > > > struct sigaction sa, intr, quit; > > > > > > sigset_t omask; > > > > > > sa.sa_handler = SIG_IGN; > > > > > > sigemptyset(&sa.sa_mask); > > > > > > sa.sa_flags = 0; > > > > > > sigemptyset(&intr.sa_mask); > > > > > > sigemptyset(&quit.sa_mask); > > > > > > sigaction(SIGINT, &sa, &intr); > > > > > > sigaction(SIGQUIT, &sa, &quit); > > > > > > sigaddset(&sa.sa_mask, SIGCHLD); > > > > > > sigprocmask(SIG_BLOCK, &sa.sa_mask, &omask); > > > > > > > > > > > > pid_t pid = fork(); > > > > > > if (! pid) { > > > > > > sigaction(SIGINT, &intr, NULL); > > > > > > sigaction(SIGQUIT, &quit, NULL); > > > > > > sigprocmask(SIG_SETMASK, &omask, NULL); > > > > > > > > > > > > /* note that unlike system(), > > > > > > * this can be tricked with a tainted PATH. > > > > > > * > > > > > > * if it's a concern, programmer should either > use > > > > > >absolute paths, > > > > > > * or sanitize PATH before calling spawn(). > > > > > > * > > > > > > * e.g: > > > > > > * spawn("/usr/bin/rm",$1) > > > > > > * # or, > > > > > > * ENVIRON["PATH"]="/bin:/usr/bin:/usr/ > local/bin" > > > > > > * spawn("rm",$1) > > > > > > */ > > > > > > execvp(args[0], args); > > > > > > error(non, "execvp"); > > > > > > exit(127); > > > > > > } > > > > > > else if (pid > 0) { > > > > > > int status; > > > > > > > > > > > > do { > > > > > > if (waitpid(pid, &status, 0) < 0) { > > > > > > error(,"waitpid"); > > > > > > } > > > > > > } > > > > > > while (! WIFEXITED(status) && ! WIFSIGNALED > (status)); > > > > > > > > > > > > sigaction(SIGINT, &intr, NULL); > > > > > > sigaction(SIGQUIT, &quit, NULL); > > > > > > sigprocmask(SIG_SETMASK, &omask, NULL); > > > > > > > > > > > > if (WIFEXITED(status)) { > > > > > > return make_number(WEXITSTATUS(status), > > result); > > > > > > } > > > > > > else { > > > > > > return make_number(status, result); > > > > > > } > > > > > > } > > > > > > else { > > > > > > error(non, "fork"); > > > > > > } > > > > > > > > > > > > free(args); > > > > > > > > > > > > return make_number(-1, result); > > > > > >} > > > > > > > > > > > >static awk_ext_func_t func_table[] = { > > > > > > { "spawn", do_spawn, 0, 1, awk_true, NULL } > > > > > >}; > > > > > > > > > > > >dl_load_func(func_table, spawn, "") > > > > > > > > > > > > > > > > > > > > > -- > > > > > Aharon (Arnold) Robbins arnold AT skeeve DOT > com > > > > > > > > > > > > -- > > Andrew Schorr e-mail: > > as...@te... > > Telemetry Investments, L.L.C. phone: 917-305-1748 > > 545 Fifth Ave, Suite 1108 fax: 212-425-5550 > > New York, NY 10017-3630 > > > > -- > Andrew Schorr e-mail: > as...@te... > Telemetry Investments, L.L.C. phone: 917-305-1748 > 545 Fifth Ave, Suite 1108 fax: 212-425-5550 > New York, NY 10017-3630 > -- Andrew Schorr e-mail: as...@te... Telemetry Investments, L.L.C. phone: 917-305-1748 545 Fifth Ave, Suite 1108 fax: 212-425-5550 New York, NY 10017-3630 |
From: Andrew J. S. <as...@te...> - 2019-09-22 16:13:20
|
Hi, It doesn't need a "manual", but it will need a man page explaining what it does. But maybe somebody else can help with that. I'm cc'ing the gawkextlib-users mailing list. Let's please discuss on list. Thanks, Andy On Sun, Sep 22, 2019 at 06:50:51PM +0300, Oğuz wrote: > Well, I managed to make it compile with my extension. But writing a manual etc. > looks like too much work since I suck at English > > On Sun, Sep 22, 2019 at 5:52 PM Andrew J. Schorr < > as...@te...> wrote: > > Hi, yes, sorry, I should not have said "donating". The gawkextlib project > is simply a place to post GPL'ed gawk libraries. > > There is more info here: > http://gawkextlib.sourceforge.net/ > and more particularly: > http://gawkextlib.sourceforge.net/Development.html > > Regards, > Andy > > On Sun, Sep 22, 2019 at 07:53:49AM -0600, ar...@sk... wrote: > > Donating was a poor word. Allowing the project to use your code would > > be better, you retain ownership and copyright. > > > > I'm cc-ing Andy who can help out in more detail. > > > > For a start, clone the gawkextlib project and see if you can add a > > new library for your code using the scripts there. Then replace the > > stub code with your code. Get it all to compile, and then send Andy > > a diff. Voila! :-) > > > > Arnold > > > > Oğuz <ogu...@gm...> wrote: > > > > > It doesn't say anything about *donating your code*, what is the > protocol > > > for that? > > > > > > On Sun, Sep 22, 2019 at 3:40 PM Aharon Robbins <ar...@sk...> > wrote: > > > > > > > Please consider donating your code to the gawkextlib project. See the > gawk > > > > doc > > > > for details... > > > > > > > > Thanks, > > > > > > > > Arnold > > > > > > > > In article <qm4o7h$ed6$1...@do...> you write: > > > > >On Tue, 17 Sep 2019 18:01:15 +0000, Kenny McCormack wrote: > > > > > > > > > >> In article <qlnfmn$qqv$1...@do...>, > > > > >> Oðuz <ogu...@gm...> wrote: > > > > >>>> 4) I did something similar - quite some time ago - called > 'spawn' > > > > >>>> - that provides both 'spawn' and 'exec' functionality. > 'spawn' > > > > >>>> runs the program as a child process, while 'exec' replaces > the > > > > >>>> GAWK program with another program. You might want to > consider > > > > >>>> renaming your 'exec' to 'spawn', since 'exec' means, well, > exec(). > > > > >>>> > > > > >>>> 5) If you're interested in my approach to this problem, let > me > > > > >>>> know, > > > > >>>> but I'm assuming that since you've gone ahead and done it > > > > >>>> yourself, you're more in tune with your own code. > > > > >>> > > > > >>>Well, "spawn" sounds way better, thanks! And yes, I would like to > see > > > > >>>your approach. > > > > >>> > > > > >>> > > > > >> Try this command: > > > > >> > > > > >> $ wget http://shell.xmission.com:PORT/spawn.zip > > > > >> > > > > >> where PORT is 65401. > > > > >> > > > > >> I think it contains everything you need. Note that it is written > for > > > > >> the "old" GAWK API (pre-gawk-4.2). > > > > >> > > > > >> This version supports 3 types of calls: > > > > >> > > > > >> 1) split("ls -l foo",A) > > > > >> spawn(x,0,A[1],A) # Read the source for what to put in > for > > > > >'x'. > > > > >> 2) exec(A[1],A) > > > > >> 3) exec("ls","ls","-l","foo") > > > > >> > > > > >> Note that for completeness, I should probably extend "spawn" the > same > > > > >> way, so that you can supply the args directly instead of via the > array, > > > > >> but I never got around to doing it. > > > > >> > > > > >> Also note that the purpose of spawn_setInterrupt() is kind of the > > > > >> opposite of what system() does (and which you re-created in your > code). > > > > >> That is, it allows the child to get killed without killing the > parent. > > > > >> I always found the (usual and standard) behavior of system() with > regard > > > > >> to signals weird and unhelpful. > > > > > > > > > >I added support for supplying args via an array, here is the last > version > > > > >if you're interested: > > > > > > > > > > > > > > >/* provide spawn function for GAWK > > > > > * > > > > > * build (GCC/clang) > > > > > * cc -shared -fPIC -o spawn.so spawn.c > > > > > */ > > > > > > > > > >#include <errno.h> > > > > >#include <signal.h> > > > > >#include <stddef.h> > > > > >#include <stdio.h> > > > > >#include <stdlib.h> > > > > >#include <string.h> > > > > >#include <sys/stat.h> > > > > >#include <sys/types.h> > > > > >#include <sys/wait.h> > > > > >#include <unistd.h> > > > > >#include <gawkapi.h> > > > > > > > > > >#define error(how, func) \ > > > > > how ## fatal(ext_id, "%s: %s", func, strerror(errno)) > > > > > > > > > >int plugin_is_GPL_compatible; > > > > > > > > > >static const gawk_api_t *api; > > > > >static awk_ext_id_t ext_id; > > > > >static const char *ext_version = NULL; > > > > >static awk_bool_t (*init_func)(void) = NULL; > > > > > > > > > >/* do_spawn - spawn(arg0[, ...]) > > > > > * spawn a child process, wait for its termination and return its > exit > > > > >status. > > > > > * arg0 can be an array, in such case; arg0[0], arg0[1], ... will > > > > >constitute the > > > > > * argument list to the new process > > > > > * > > > > > * return value: > > > > > * * exit status of spawned process - if everything goes well > > > > > * * status of spawned process as - if it gets signaled > > > > > * described in waitpid(3) > > > > > * * 127 - if execvp() fails > > > > > * * -1 - otherwise > > > > > * > > > > > * causes GAWK to exit if: > > > > > * * arg0 is not an array, and an argument after it is of > non-scalar > > > > >type, > > > > > * * arg0 is an array, and an element in it is of non-scalar > type, > > > > > * * malloc() or waitpid() fails > > > > > */ > > > > > > > > > >static awk_value_t * > > > > >do_spawn (int nargs, awk_value_t *result, awk_ext_func_t *unused) > > > > >{ > > > > > char **args; > > > > > > > > > > awk_value_t arg; > > > > > if (get_argument(0, AWK_ARRAY, &arg)) { > > > > > awk_array_t a_cookie = arg.array_cookie; > > > > > get_element_count(a_cookie, (size_t *) &nargs); > > > > > args = malloc((nargs + 1) * sizeof(char *)); > > > > > if (! args) { > > > > > error(,"malloc"); > > > > > } > > > > > > > > > > awk_value_t index; > > > > > for (int i = 0; i < nargs; i++) { > > > > > /* note that a missing index will also cause > an > > > > >error, e.g: > > > > > * > > > > > * a[0] = "echo" > > > > > * a[2] = "hello world" > > > > > * spawn(a) > > > > > */ > > > > > make_number(i, &index); > > > > > if (! get_array_element(a_cookie, &index, > > > > >AWK_STRING, &arg)) { > > > > > errno = EINVAL; > > > > > error(,"get_array_element"); > > > > > } > > > > > args[i] = arg.str_value.str; > > > > > } > > > > > args[nargs] = NULL; > > > > > } > > > > > else { > > > > > args = malloc((nargs + 1) * sizeof(char *)); > > > > > if (! args) { > > > > > error(,"malloc"); > > > > > } > > > > > > > > > > for (int i = 0; i < nargs; i++) { > > > > > if (! get_argument(i, AWK_STRING, &arg)) { > > > > > errno = EINVAL; > > > > > error(,"get_argument"); > > > > > } > > > > > args[i] = arg.str_value.str; > > > > > } > > > > > args[nargs] = NULL; > > > > > } > > > > > > > > > > /* handle signals the way system() does. */ > > > > > struct sigaction sa, intr, quit; > > > > > sigset_t omask; > > > > > sa.sa_handler = SIG_IGN; > > > > > sigemptyset(&sa.sa_mask); > > > > > sa.sa_flags = 0; > > > > > sigemptyset(&intr.sa_mask); > > > > > sigemptyset(&quit.sa_mask); > > > > > sigaction(SIGINT, &sa, &intr); > > > > > sigaction(SIGQUIT, &sa, &quit); > > > > > sigaddset(&sa.sa_mask, SIGCHLD); > > > > > sigprocmask(SIG_BLOCK, &sa.sa_mask, &omask); > > > > > > > > > > pid_t pid = fork(); > > > > > if (! pid) { > > > > > sigaction(SIGINT, &intr, NULL); > > > > > sigaction(SIGQUIT, &quit, NULL); > > > > > sigprocmask(SIG_SETMASK, &omask, NULL); > > > > > > > > > > /* note that unlike system(), > > > > > * this can be tricked with a tainted PATH. > > > > > * > > > > > * if it's a concern, programmer should either use > > > > >absolute paths, > > > > > * or sanitize PATH before calling spawn(). > > > > > * > > > > > * e.g: > > > > > * spawn("/usr/bin/rm",$1) > > > > > * # or, > > > > > * ENVIRON["PATH"]="/bin:/usr/bin:/usr/local/bin" > > > > > * spawn("rm",$1) > > > > > */ > > > > > execvp(args[0], args); > > > > > error(non, "execvp"); > > > > > exit(127); > > > > > } > > > > > else if (pid > 0) { > > > > > int status; > > > > > > > > > > do { > > > > > if (waitpid(pid, &status, 0) < 0) { > > > > > error(,"waitpid"); > > > > > } > > > > > } > > > > > while (! WIFEXITED(status) && ! WIFSIGNALED(status)); > > > > > > > > > > sigaction(SIGINT, &intr, NULL); > > > > > sigaction(SIGQUIT, &quit, NULL); > > > > > sigprocmask(SIG_SETMASK, &omask, NULL); > > > > > > > > > > if (WIFEXITED(status)) { > > > > > return make_number(WEXITSTATUS(status), > result); > > > > > } > > > > > else { > > > > > return make_number(status, result); > > > > > } > > > > > } > > > > > else { > > > > > error(non, "fork"); > > > > > } > > > > > > > > > > free(args); > > > > > > > > > > return make_number(-1, result); > > > > >} > > > > > > > > > >static awk_ext_func_t func_table[] = { > > > > > { "spawn", do_spawn, 0, 1, awk_true, NULL } > > > > >}; > > > > > > > > > >dl_load_func(func_table, spawn, "") > > > > > > > > > > > > > > > > > -- > > > > Aharon (Arnold) Robbins arnold AT skeeve DOT com > > > > > > > > -- > Andrew Schorr e-mail: > as...@te... > Telemetry Investments, L.L.C. phone: 917-305-1748 > 545 Fifth Ave, Suite 1108 fax: 212-425-5550 > New York, NY 10017-3630 > -- Andrew Schorr e-mail: as...@te... Telemetry Investments, L.L.C. phone: 917-305-1748 545 Fifth Ave, Suite 1108 fax: 212-425-5550 New York, NY 10017-3630 |
From: Andrew J. S. <as...@te...> - 2019-06-17 12:44:56
|
Hi Luis, You do not need to compile your own gawk, if your system is running gawk 5.0, which is pretty new. In general, your problem seems to be that things are getting installed in the wrong place. When gawk is compiled, the library search path is fixed for that binary. In your case, it clearly does not include /usr/local/lib/gawk. You can override the builtin setting by configuring a value for AWKLIBPATH in your environment. For more info: https://www.gnu.org/software/gawk/manual/html_node/AWKLIBPATH-Variable.html You can see the default value like so: bash-4.2$ /bin/gawk 'BEGIN {print ENVIRON["AWKLIBPATH"]}' /usr/lib64/gawk In conclusion, you must do one of 3 things: 1. Use --prefix to install gawk-xml (and gawkextlib) in your system's standard location compiled into the gawk binary. 2. Compile your own version of gawk that will use a compatible value for --prefix so that it will be able to find the gawk-xml extension that you installed. 3. Configure AWKLIBPATH in your environment so that gawk knows where to search for the gawk-xml extension. Regards, Andy On Mon, Jun 17, 2019 at 11:25:08AM +0100, Luis P. Mendes wrote: > Hi Andy, > > I cannot reproduce the problem, even after untarring the gawkextlib > and gawk-xml tarballs again, without setting any --prefix=/usr/local > for the configuration script. > Even the make check for gawk-xml works fine now. > > > But now, still cannot use the xml tool: > $ awk -l xml workingdocuments.awk awk_demos/a.xml awk: fatal: can't > open shared library `xml' for reading (No such file or directory) > > The compilation arguments for my distro's gawk are: > configure_args="--with-readline" > > $ whereis awk > awk: /usr/bin/awk /usr/libexec/awk /usr/share/awk > /usr/share/man/man1p/awk.1p /usr/share/man/man1/awk.1 > > The installation destination for the xml library: > /usr/local/lib/gawk/xml.so > > > Do I need to compile a different gawk myself and place it under > /usr/local? > > > Thanks, > > > Luis > > > On 20190616 09:50:04 -0400, Andrew J. Schorr wrote: > > Hi Luis, > > > > The default autoconf prefix is /usr/local. If that's not where you intend > > to install, then you should specify a prefix explicitly. I recommend that > > you set --prefix explicitly instead of relying upon defaults. If you do > > that, does the build succeed? > > > > If you email the entire sequence of commands that you are running, then it > > will enable us to duplicate the problem. > > > > Also, that "Configured with" message seems to be incorrect. I tried > > the same thing, and it does say "Configured with: ../configure --prefix=/usr ...", > > but that's simply not true. That message seems to be a bug in the > > autoconf tools. If you run "grep prefix= config.log", you should see at the end: > > prefix='/usr/local' > > > > Regards, > > Andy > > > > On Sat, Jun 15, 2019 at 11:35:33PM +0100, Luis P. Mendes wrote: > > > Hi Andy, > > > > > > > > > # gawk -V > > > GNU Awk 5.0.0, API: 2.0 > > > > > > > > > I've downloaded and installed the latest gawkextlib I saw from > > > sourceforge: gawkextlib-1.0.4.tar.gz > > > > > > > > > I think I found an error. > > > This is in the config.log of gawkextlib: > > > Configured with: /builddir/gcc-8.3.0/configure > > > --build=x86_64-unknown-linux-gnu --enable-fast-character > > > --enable-vtable-verify --prefix=/usr --mandir=/usr/share/man > > > --infodir=/usr/share/info --libexecdir=/usr/ lib --libdir=/usr/lib > > > --enable-threads=posix --enable-__cxa_atexit --disable-multilib > > > --with-system-zlib --enable-shared --enable-lto --enable-plugins > > > --enable-linker-build-id --disable-werror --disable-nls -- > > > enable-default-pie --enable-default-ssp --enable-checking=release > > > --disable-libstdcxx-pch --with-isl --with-linker-hash-style=gnu > > > --disable-libunwind-exceptions --disable-target-libiberty > > > --enable-serial-confi gure > > > --enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada > > > > > > Please note that prefix is /usr, as I didn't specified any in the > > > configure command. > > > > > > But, > > > > > > # find /usr/ -name libgawkextlib.so.0 > > > /usr/local/lib/libgawkextlib.so.0 > > > > > > # ll /usr/local/lib/libgawkextlib.so > > > lrwxrwxrwx 1 root root 22 jun 15 18:26 /usr/local/lib/libgawkextlib.so > > > -> libgawkextlib.so.0.0.0 > > > > > > It seems that /usr/local/lib was used instead of /usr/lib. > > > > > > > > > > > > In the config.log of gawk-xml: > > > Configured with: /builddir/gcc-8.3.0/configure > > > --build=x86_64-unknown-linux-gnu --enable-fast-character > > > --enable-vtable-verify --prefix=/usr --mandir=/usr/share/man > > > --infodir=/usr/share/info --libexecdir=/usr/ lib --libdir=/usr/lib > > > --enable-threads=posix --enable-__cxa_atexit --disable-multilib > > > --with-system-zlib --enable-shared --enable-lto --enable-plugins > > > --enable-linker-build-id --disable-werror --disable-nls -- > > > enable-default-pie --enable-default-ssp --enable-checking=release > > > --disable-libstdcxx-pch --with-isl --with-linker-hash-style=gnu > > > --disable-libunwind-exceptions --disable-target-libiberty > > > --enable-serial-confi gure > > > --enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada > > > > > > > > > After a make clean in gawk-xml, the make command outputs an error: > > > gawk: xmlbase:14: fatal: load_ext: cannot open library > > > `../.libs/xml.so' (libgawkextlib.so.0: cannot open shared object file: > > > No such file or directory) > > > > > > > > > Thanks, > > > > > > > > > Luis > > > > > > > > > > > > On 20190615 15:41:07 -0400, Andrew J. Schorr wrote: > > > > Hi Luis, > > > > > > > > Which versions of gawk and gawkextlib are you using? > > > > Can you please share the entire sequence of commands you are running > > > > up through the "make check" that fails? It is clearly not configuring > > > > the environment variables properly, so it can't find the extension, > > > > as Jürgen pointed out. > > > > > > > > Regards, > > > > Andy > > > > > > > > On Sat, Jun 15, 2019 at 07:38:34PM +0100, Luis P. Mendes wrote: > > > > > Jürgen, > > > > > > > > > > I'm on Linux, not MacOSX: 4.19.50_1 #1 SMP PREEMPT x86_64 GNU/Linux > > > > > > > > > > > > > > > On Sat, Jun 15, 2019 at 7:29 PM Jürgen Kahrs via Gawkextlib-users < > > > > > gaw...@li...> wrote: > > > > > > > > > > Hello Luis, > > > > > I guess you tried this on MacOSX. > > > > > This seems to be the reason for the failed test cases: > > > > > .. LD_LIBRARY_PATH= DYLD_LIBRARY_PATH= gawk > > > > > > > > > > The environment variable DYLD_LIBRARY_PATH should have been > > > > > expanded as $DYLD_LIBRARY_PATH during configuration. > > > > > I am not quite sure where this information got lost, in the source > > > > > code or during configuration on your machine. Any idea ? > > > > > > > > > > > > > > > Jürgen Kahrs > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > After some years since last used this wonderful tool, I'd like to > > > > > install gawk-xml. > > > > > As per the README, I've installed gawkextlib without problems. > > > > > Next, `configure` and ``make` for gawk-xml went fine, but not `make > > > > > check`. > > > > > Output of the command below. > > > > > Do I need to install extra packages? > > > > > Thanks > > > > > > > > > > $ make check > > > > > Making check in awklib > > > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/awklib' > > > > > make[1]: Nothing to be done for 'check'. > > > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/awklib' > > > > > Making check in po > > > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/po' > > > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/po' > > > > > Making check in packaging > > > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/packaging' > > > > > make[1]: Nothing to be done for 'check'. > > > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/packaging' > > > > > Making check in test > > > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/test' > > > > > > > > > > AWK = LC_ALL=C LANG=C AWKLIBPATH=../.libs:/usr/lib/gawk PATH=/home/lupe > > > > > /bin:/home/lupe/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/ > > > > > sbin:/sbin:/opt/texlive/2019/bin/x86_64-linux:/usr/local/bin/fim:/opt/ > > > > > texlive/2017/bin/x86_64-linux:/home/lupe/go/bin:/home/lupe/.fzf/bin:/ > > > > > usr/local/bin/fim:/opt/texlive/2017/bin/x86_64-linux:/home/lupe/prog/go > > > > > /bin LD_LIBRARY_PATH= DYLD_LIBRARY_PATH= gawk > > > > > > > > > > Locale environment: > > > > > LC_ALL="C" LANG="C" > > > > > > > > > > ======== Starting XML extension tests ======== > > > > > xdocbook > > > > > ./xdocbook.ok _xdocbook diferem: byte 1, linha 1 > > > > > make[1]: [Makefile:574: xdocbook] Error 1 (ignored) > > > > > xdeep2 > > > > > ./xdeep2.ok _xdeep2 diferem: byte 1, linha 1 > > > > > make[1]: [Makefile:602: xdeep2] Error 1 (ignored) > > > > > xattr > > > > > ./xattr.ok _xattr diferem: byte 1, linha 1 > > > > > make[1]: [Makefile:608: xattr] Error 1 (ignored) > > > > > xfujutf8 > > > > > ./xfujutf8.ok _xfujutf8 diferem: byte 1, linha 1 > > > > > make[1]: [Makefile:614: xfujutf8] Error 1 (ignored) > > > > > xotlsjis > > > > > ./xotlsjis.ok _xotlsjis diferem: byte 1, linha 1 > > > > > make[1]: [Makefile:626: xotlsjis] Error 1 (ignored) > > > > > xfujeucj > > > > > ./xfujeucj.ok _xfujeucj diferem: byte 1, linha 1 > > > > > make[1]: [Makefile:620: xfujeucj] Error 1 (ignored) > > > > > ./xload.ok _xload diferem: byte 52, linha 2 > > > > > make[1]: [Makefile:631: xload] Error 1 (ignored) > > > > > xmlinterleave > > > > > ./xmlinterleave.ok _xmlinterleave diferem: byte 1, linha 1 > > > > > make[1]: [Makefile:582: xmlinterleave] Error 1 (ignored) > > > > > beginfile > > > > > ./beginfile.ok _beginfile diferem: byte 1, linha 1 > > > > > make[1]: [Makefile:588: beginfile] Error 1 (ignored) > > > > > ======== Done with XML extension tests ======== > > > > > make[2]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/test' > > > > > 9 TESTS FAILED > > > > > make[2]: *** [Makefile:516: pass-fail] Error 1 > > > > > make[2]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/test' > > > > > make[1]: *** [Makefile:555: check] Error 2 > > > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > > gawk-xml-1.1.1/test' > > > > > make: *** [Makefile:591: check-recursive] Error 1 > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Gawkextlib-users mailing list > > > > > Gaw...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > > > > > > > > _______________________________________________ > > > > > Gawkextlib-users mailing list > > > > > Gaw...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Gawkextlib-users mailing list > > > > > Gaw...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > > > > > > > > > > _______________________________________________ > > > Gawkextlib-users mailing list > > > Gaw...@li... > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users -- Andrew Schorr e-mail: as...@te... Telemetry Investments, L.L.C. phone: 917-305-1748 545 Fifth Ave, Suite 1108 fax: 212-425-5550 New York, NY 10017-3630 |
From: Luis P. M. <lui...@gm...> - 2019-06-17 10:25:20
|
Hi Andy, I cannot reproduce the problem, even after untarring the gawkextlib and gawk-xml tarballs again, without setting any --prefix=/usr/local for the configuration script. Even the make check for gawk-xml works fine now. But now, still cannot use the xml tool: $ awk -l xml workingdocuments.awk awk_demos/a.xml awk: fatal: can't open shared library `xml' for reading (No such file or directory) The compilation arguments for my distro's gawk are: configure_args="--with-readline" $ whereis awk awk: /usr/bin/awk /usr/libexec/awk /usr/share/awk /usr/share/man/man1p/awk.1p /usr/share/man/man1/awk.1 The installation destination for the xml library: /usr/local/lib/gawk/xml.so Do I need to compile a different gawk myself and place it under /usr/local? Thanks, Luis On 20190616 09:50:04 -0400, Andrew J. Schorr wrote: > Hi Luis, > > The default autoconf prefix is /usr/local. If that's not where you intend > to install, then you should specify a prefix explicitly. I recommend that > you set --prefix explicitly instead of relying upon defaults. If you do > that, does the build succeed? > > If you email the entire sequence of commands that you are running, then it > will enable us to duplicate the problem. > > Also, that "Configured with" message seems to be incorrect. I tried > the same thing, and it does say "Configured with: ../configure --prefix=/usr ...", > but that's simply not true. That message seems to be a bug in the > autoconf tools. If you run "grep prefix= config.log", you should see at the end: > prefix='/usr/local' > > Regards, > Andy > > On Sat, Jun 15, 2019 at 11:35:33PM +0100, Luis P. Mendes wrote: > > Hi Andy, > > > > > > # gawk -V > > GNU Awk 5.0.0, API: 2.0 > > > > > > I've downloaded and installed the latest gawkextlib I saw from > > sourceforge: gawkextlib-1.0.4.tar.gz > > > > > > I think I found an error. > > This is in the config.log of gawkextlib: > > Configured with: /builddir/gcc-8.3.0/configure > > --build=x86_64-unknown-linux-gnu --enable-fast-character > > --enable-vtable-verify --prefix=/usr --mandir=/usr/share/man > > --infodir=/usr/share/info --libexecdir=/usr/ lib --libdir=/usr/lib > > --enable-threads=posix --enable-__cxa_atexit --disable-multilib > > --with-system-zlib --enable-shared --enable-lto --enable-plugins > > --enable-linker-build-id --disable-werror --disable-nls -- > > enable-default-pie --enable-default-ssp --enable-checking=release > > --disable-libstdcxx-pch --with-isl --with-linker-hash-style=gnu > > --disable-libunwind-exceptions --disable-target-libiberty > > --enable-serial-confi gure > > --enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada > > > > Please note that prefix is /usr, as I didn't specified any in the > > configure command. > > > > But, > > > > # find /usr/ -name libgawkextlib.so.0 > > /usr/local/lib/libgawkextlib.so.0 > > > > # ll /usr/local/lib/libgawkextlib.so > > lrwxrwxrwx 1 root root 22 jun 15 18:26 /usr/local/lib/libgawkextlib.so > > -> libgawkextlib.so.0.0.0 > > > > It seems that /usr/local/lib was used instead of /usr/lib. > > > > > > > > In the config.log of gawk-xml: > > Configured with: /builddir/gcc-8.3.0/configure > > --build=x86_64-unknown-linux-gnu --enable-fast-character > > --enable-vtable-verify --prefix=/usr --mandir=/usr/share/man > > --infodir=/usr/share/info --libexecdir=/usr/ lib --libdir=/usr/lib > > --enable-threads=posix --enable-__cxa_atexit --disable-multilib > > --with-system-zlib --enable-shared --enable-lto --enable-plugins > > --enable-linker-build-id --disable-werror --disable-nls -- > > enable-default-pie --enable-default-ssp --enable-checking=release > > --disable-libstdcxx-pch --with-isl --with-linker-hash-style=gnu > > --disable-libunwind-exceptions --disable-target-libiberty > > --enable-serial-confi gure > > --enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada > > > > > > After a make clean in gawk-xml, the make command outputs an error: > > gawk: xmlbase:14: fatal: load_ext: cannot open library > > `../.libs/xml.so' (libgawkextlib.so.0: cannot open shared object file: > > No such file or directory) > > > > > > Thanks, > > > > > > Luis > > > > > > > > On 20190615 15:41:07 -0400, Andrew J. Schorr wrote: > > > Hi Luis, > > > > > > Which versions of gawk and gawkextlib are you using? > > > Can you please share the entire sequence of commands you are running > > > up through the "make check" that fails? It is clearly not configuring > > > the environment variables properly, so it can't find the extension, > > > as Jürgen pointed out. > > > > > > Regards, > > > Andy > > > > > > On Sat, Jun 15, 2019 at 07:38:34PM +0100, Luis P. Mendes wrote: > > > > Jürgen, > > > > > > > > I'm on Linux, not MacOSX: 4.19.50_1 #1 SMP PREEMPT x86_64 GNU/Linux > > > > > > > > > > > > On Sat, Jun 15, 2019 at 7:29 PM Jürgen Kahrs via Gawkextlib-users < > > > > gaw...@li...> wrote: > > > > > > > > Hello Luis, > > > > I guess you tried this on MacOSX. > > > > This seems to be the reason for the failed test cases: > > > > .. LD_LIBRARY_PATH= DYLD_LIBRARY_PATH= gawk > > > > > > > > The environment variable DYLD_LIBRARY_PATH should have been > > > > expanded as $DYLD_LIBRARY_PATH during configuration. > > > > I am not quite sure where this information got lost, in the source > > > > code or during configuration on your machine. Any idea ? > > > > > > > > > > > > Jürgen Kahrs > > > > > > > > > > > > > > > > Hi, > > > > > > > > After some years since last used this wonderful tool, I'd like to > > > > install gawk-xml. > > > > As per the README, I've installed gawkextlib without problems. > > > > Next, `configure` and ``make` for gawk-xml went fine, but not `make > > > > check`. > > > > Output of the command below. > > > > Do I need to install extra packages? > > > > Thanks > > > > > > > > $ make check > > > > Making check in awklib > > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/awklib' > > > > make[1]: Nothing to be done for 'check'. > > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/awklib' > > > > Making check in po > > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/po' > > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/po' > > > > Making check in packaging > > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/packaging' > > > > make[1]: Nothing to be done for 'check'. > > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/packaging' > > > > Making check in test > > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/test' > > > > > > > > AWK = LC_ALL=C LANG=C AWKLIBPATH=../.libs:/usr/lib/gawk PATH=/home/lupe > > > > /bin:/home/lupe/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/ > > > > sbin:/sbin:/opt/texlive/2019/bin/x86_64-linux:/usr/local/bin/fim:/opt/ > > > > texlive/2017/bin/x86_64-linux:/home/lupe/go/bin:/home/lupe/.fzf/bin:/ > > > > usr/local/bin/fim:/opt/texlive/2017/bin/x86_64-linux:/home/lupe/prog/go > > > > /bin LD_LIBRARY_PATH= DYLD_LIBRARY_PATH= gawk > > > > > > > > Locale environment: > > > > LC_ALL="C" LANG="C" > > > > > > > > ======== Starting XML extension tests ======== > > > > xdocbook > > > > ./xdocbook.ok _xdocbook diferem: byte 1, linha 1 > > > > make[1]: [Makefile:574: xdocbook] Error 1 (ignored) > > > > xdeep2 > > > > ./xdeep2.ok _xdeep2 diferem: byte 1, linha 1 > > > > make[1]: [Makefile:602: xdeep2] Error 1 (ignored) > > > > xattr > > > > ./xattr.ok _xattr diferem: byte 1, linha 1 > > > > make[1]: [Makefile:608: xattr] Error 1 (ignored) > > > > xfujutf8 > > > > ./xfujutf8.ok _xfujutf8 diferem: byte 1, linha 1 > > > > make[1]: [Makefile:614: xfujutf8] Error 1 (ignored) > > > > xotlsjis > > > > ./xotlsjis.ok _xotlsjis diferem: byte 1, linha 1 > > > > make[1]: [Makefile:626: xotlsjis] Error 1 (ignored) > > > > xfujeucj > > > > ./xfujeucj.ok _xfujeucj diferem: byte 1, linha 1 > > > > make[1]: [Makefile:620: xfujeucj] Error 1 (ignored) > > > > ./xload.ok _xload diferem: byte 52, linha 2 > > > > make[1]: [Makefile:631: xload] Error 1 (ignored) > > > > xmlinterleave > > > > ./xmlinterleave.ok _xmlinterleave diferem: byte 1, linha 1 > > > > make[1]: [Makefile:582: xmlinterleave] Error 1 (ignored) > > > > beginfile > > > > ./beginfile.ok _beginfile diferem: byte 1, linha 1 > > > > make[1]: [Makefile:588: beginfile] Error 1 (ignored) > > > > ======== Done with XML extension tests ======== > > > > make[2]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/test' > > > > 9 TESTS FAILED > > > > make[2]: *** [Makefile:516: pass-fail] Error 1 > > > > make[2]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/test' > > > > make[1]: *** [Makefile:555: check] Error 2 > > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > > gawk-xml-1.1.1/test' > > > > make: *** [Makefile:591: check-recursive] Error 1 > > > > > > > > > > > > > > > > _______________________________________________ > > > > Gawkextlib-users mailing list > > > > Gaw...@li... > > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > > > > > _______________________________________________ > > > > Gawkextlib-users mailing list > > > > Gaw...@li... > > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > > > > > > > _______________________________________________ > > > > Gawkextlib-users mailing list > > > > Gaw...@li... > > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > > > > > _______________________________________________ > > Gawkextlib-users mailing list > > Gaw...@li... > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > |
From: Andrew J. S. <as...@te...> - 2019-06-16 13:50:13
|
Hi Luis, The default autoconf prefix is /usr/local. If that's not where you intend to install, then you should specify a prefix explicitly. I recommend that you set --prefix explicitly instead of relying upon defaults. If you do that, does the build succeed? If you email the entire sequence of commands that you are running, then it will enable us to duplicate the problem. Also, that "Configured with" message seems to be incorrect. I tried the same thing, and it does say "Configured with: ../configure --prefix=/usr ...", but that's simply not true. That message seems to be a bug in the autoconf tools. If you run "grep prefix= config.log", you should see at the end: prefix='/usr/local' Regards, Andy On Sat, Jun 15, 2019 at 11:35:33PM +0100, Luis P. Mendes wrote: > Hi Andy, > > > # gawk -V > GNU Awk 5.0.0, API: 2.0 > > > I've downloaded and installed the latest gawkextlib I saw from > sourceforge: gawkextlib-1.0.4.tar.gz > > > I think I found an error. > This is in the config.log of gawkextlib: > Configured with: /builddir/gcc-8.3.0/configure > --build=x86_64-unknown-linux-gnu --enable-fast-character > --enable-vtable-verify --prefix=/usr --mandir=/usr/share/man > --infodir=/usr/share/info --libexecdir=/usr/ lib --libdir=/usr/lib > --enable-threads=posix --enable-__cxa_atexit --disable-multilib > --with-system-zlib --enable-shared --enable-lto --enable-plugins > --enable-linker-build-id --disable-werror --disable-nls -- > enable-default-pie --enable-default-ssp --enable-checking=release > --disable-libstdcxx-pch --with-isl --with-linker-hash-style=gnu > --disable-libunwind-exceptions --disable-target-libiberty > --enable-serial-confi gure > --enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada > > Please note that prefix is /usr, as I didn't specified any in the > configure command. > > But, > > # find /usr/ -name libgawkextlib.so.0 > /usr/local/lib/libgawkextlib.so.0 > > # ll /usr/local/lib/libgawkextlib.so > lrwxrwxrwx 1 root root 22 jun 15 18:26 /usr/local/lib/libgawkextlib.so > -> libgawkextlib.so.0.0.0 > > It seems that /usr/local/lib was used instead of /usr/lib. > > > > In the config.log of gawk-xml: > Configured with: /builddir/gcc-8.3.0/configure > --build=x86_64-unknown-linux-gnu --enable-fast-character > --enable-vtable-verify --prefix=/usr --mandir=/usr/share/man > --infodir=/usr/share/info --libexecdir=/usr/ lib --libdir=/usr/lib > --enable-threads=posix --enable-__cxa_atexit --disable-multilib > --with-system-zlib --enable-shared --enable-lto --enable-plugins > --enable-linker-build-id --disable-werror --disable-nls -- > enable-default-pie --enable-default-ssp --enable-checking=release > --disable-libstdcxx-pch --with-isl --with-linker-hash-style=gnu > --disable-libunwind-exceptions --disable-target-libiberty > --enable-serial-confi gure > --enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada > > > After a make clean in gawk-xml, the make command outputs an error: > gawk: xmlbase:14: fatal: load_ext: cannot open library > `../.libs/xml.so' (libgawkextlib.so.0: cannot open shared object file: > No such file or directory) > > > Thanks, > > > Luis > > > > On 20190615 15:41:07 -0400, Andrew J. Schorr wrote: > > Hi Luis, > > > > Which versions of gawk and gawkextlib are you using? > > Can you please share the entire sequence of commands you are running > > up through the "make check" that fails? It is clearly not configuring > > the environment variables properly, so it can't find the extension, > > as Jürgen pointed out. > > > > Regards, > > Andy > > > > On Sat, Jun 15, 2019 at 07:38:34PM +0100, Luis P. Mendes wrote: > > > Jürgen, > > > > > > I'm on Linux, not MacOSX: 4.19.50_1 #1 SMP PREEMPT x86_64 GNU/Linux > > > > > > > > > On Sat, Jun 15, 2019 at 7:29 PM Jürgen Kahrs via Gawkextlib-users < > > > gaw...@li...> wrote: > > > > > > Hello Luis, > > > I guess you tried this on MacOSX. > > > This seems to be the reason for the failed test cases: > > > .. LD_LIBRARY_PATH= DYLD_LIBRARY_PATH= gawk > > > > > > The environment variable DYLD_LIBRARY_PATH should have been > > > expanded as $DYLD_LIBRARY_PATH during configuration. > > > I am not quite sure where this information got lost, in the source > > > code or during configuration on your machine. Any idea ? > > > > > > > > > Jürgen Kahrs > > > > > > > > > > > > Hi, > > > > > > After some years since last used this wonderful tool, I'd like to > > > install gawk-xml. > > > As per the README, I've installed gawkextlib without problems. > > > Next, `configure` and ``make` for gawk-xml went fine, but not `make > > > check`. > > > Output of the command below. > > > Do I need to install extra packages? > > > Thanks > > > > > > $ make check > > > Making check in awklib > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/awklib' > > > make[1]: Nothing to be done for 'check'. > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/awklib' > > > Making check in po > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/po' > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/po' > > > Making check in packaging > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/packaging' > > > make[1]: Nothing to be done for 'check'. > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/packaging' > > > Making check in test > > > make[1]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/test' > > > > > > AWK = LC_ALL=C LANG=C AWKLIBPATH=../.libs:/usr/lib/gawk PATH=/home/lupe > > > /bin:/home/lupe/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/ > > > sbin:/sbin:/opt/texlive/2019/bin/x86_64-linux:/usr/local/bin/fim:/opt/ > > > texlive/2017/bin/x86_64-linux:/home/lupe/go/bin:/home/lupe/.fzf/bin:/ > > > usr/local/bin/fim:/opt/texlive/2017/bin/x86_64-linux:/home/lupe/prog/go > > > /bin LD_LIBRARY_PATH= DYLD_LIBRARY_PATH= gawk > > > > > > Locale environment: > > > LC_ALL="C" LANG="C" > > > > > > ======== Starting XML extension tests ======== > > > xdocbook > > > ./xdocbook.ok _xdocbook diferem: byte 1, linha 1 > > > make[1]: [Makefile:574: xdocbook] Error 1 (ignored) > > > xdeep2 > > > ./xdeep2.ok _xdeep2 diferem: byte 1, linha 1 > > > make[1]: [Makefile:602: xdeep2] Error 1 (ignored) > > > xattr > > > ./xattr.ok _xattr diferem: byte 1, linha 1 > > > make[1]: [Makefile:608: xattr] Error 1 (ignored) > > > xfujutf8 > > > ./xfujutf8.ok _xfujutf8 diferem: byte 1, linha 1 > > > make[1]: [Makefile:614: xfujutf8] Error 1 (ignored) > > > xotlsjis > > > ./xotlsjis.ok _xotlsjis diferem: byte 1, linha 1 > > > make[1]: [Makefile:626: xotlsjis] Error 1 (ignored) > > > xfujeucj > > > ./xfujeucj.ok _xfujeucj diferem: byte 1, linha 1 > > > make[1]: [Makefile:620: xfujeucj] Error 1 (ignored) > > > ./xload.ok _xload diferem: byte 52, linha 2 > > > make[1]: [Makefile:631: xload] Error 1 (ignored) > > > xmlinterleave > > > ./xmlinterleave.ok _xmlinterleave diferem: byte 1, linha 1 > > > make[1]: [Makefile:582: xmlinterleave] Error 1 (ignored) > > > beginfile > > > ./beginfile.ok _beginfile diferem: byte 1, linha 1 > > > make[1]: [Makefile:588: beginfile] Error 1 (ignored) > > > ======== Done with XML extension tests ======== > > > make[2]: Entering directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/test' > > > 9 TESTS FAILED > > > make[2]: *** [Makefile:516: pass-fail] Error 1 > > > make[2]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/test' > > > make[1]: *** [Makefile:555: check] Error 2 > > > make[1]: Leaving directory '/home/lupe/recolhidos/gawkextlib/ > > > gawk-xml-1.1.1/test' > > > make: *** [Makefile:591: check-recursive] Error 1 > > > > > > > > > > > > _______________________________________________ > > > Gawkextlib-users mailing list > > > Gaw...@li... > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > > _______________________________________________ > > > Gawkextlib-users mailing list > > > Gaw...@li... > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > > > > _______________________________________________ > > > Gawkextlib-users mailing list > > > Gaw...@li... > > > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users > > > > > > > _______________________________________________ > Gawkextlib-users mailing list > Gaw...@li... > https://lists.sourceforge.net/lists/listinfo/gawkextlib-users -- Andrew Schorr e-mail: as...@te... Telemetry Investments, L.L.C. phone: 917-305-1748 545 Fifth Ave, Suite 1108 fax: 212-425-5550 New York, NY 10017-3630 |