Thread: [Videlibri-xidel] documentation for syntax to extract two or more fields from same "record" ?

Client for public libraries

Brought to you by: benibela

videlibri-xidel

[Videlibri-xidel] documentation for syntax to extract two or more fields from same "record" ?

From: Marco F. <mfi...@ne...> - 2023-08-05 07:28:52

Greetings,

(I also posted this on github because I am not sure what the best channel is...)
I am learning Xidel 0.9.8 on Ubuntu. My issue is that I have looked into the
documentation, but as far as I can tell it gives no clue (none I recognize, at
least) to do this.

I have a JSON file with records that have, among others, id and title fields,
eg:

#> jq '.' test.json  | cut -c1-100 | more
[
  {
    "id": 42,
    "title": "Software is eating the world",

I want to extract with Xidel those two values, producing output lines like this
:

ARTICLE: 42 ==> Software is eating the world

or at least like this:

42 ==> Software is eating the world

and I can't find, or recognize, the right syntax to use for what seems a
general, very common need to me. The closest I have come to what I want is
this:

xidel test.json -e 'for $t in $json/title return string-join(("$t/../id", $t),
" ==> ")'

which produces lines like these:

$t/../id ==> Software is eating the world

what is the right way to refer to the id value of the current record???

Thanks!
Marco

Re: [Videlibri-xidel] documentation for syntax to extract two or more fields from same "record" ?

From: Reino W. <rwi...@xs...> - 2023-08-05 15:32:23

Hello Marco,

On 2023-08-05T09:09:17+0200, Marco Fioretti <mfi...@ne...> wrote:
> (I also posted this on github because I am not sure what the best channel is...)

It's Benito's project of course, but I'd say that while the Github issue-tracker is a good place for bug-reports, for questions about usage the mailinglist here, the SourceForge discussion forum, or StackOverflow is a better place.

> I am learning Xidel 0.9.8 on Ubuntu.

It's really recommended to use a more up-to-date version! See https://videlibri.sourceforge.net/xidel.html#downloads.

> I have a JSON file with records that have, among others, id and title fields,eg:
>
> #> jq '.' test.json  | cut -c1-100 | more
> [
>   {
>     "id": 42,
>     "title": "Software is eating the world",
>
> I want to extract with Xidel those two values, producing output lines like this:
>
> ARTICLE: 42 ==> Software is eating the world
> [...]

Can we assume the rest of the JSON looks a bit like this?

[
  {
    "id": 42,
    "title": "Software is eating the world"
  },
  {
    "id": 43,
    "title": "..."
  }
  ...
]

If not, then please specify.

-- 
Reino

Re: [Videlibri-xidel] documentation for syntax to extract two or more fields from same "record" ?

From: M. F. <mfi...@ne...> - 2023-08-05 16:05:41

On Sat, Aug 05, 2023 17:10:39 PM +0200, Reino Wijnsma wrote:

> Can we assume the rest of the JSON looks a bit like this?
> 
> [
>   {
>     "id": 42,
>     "title": "Software is eating the world"
>   },
>   {
>     "id": 43,
>     "title": "..."
>   }
>   ...
> ]
> 
> If not, then please specify.

Hi Reino,

yes, the JSON does all look like that. There are OTHER fields
e.g. url, creation date and so on, but all the records have the same
structure. Looking forward to suggestions. Meanwhile, I will also try
to download the newer version.

Thanks,
Marco

Re: [Videlibri-xidel] documentation for syntax to extract two or more fields from same "record" ?

From: Reino W. <rwi...@xs...> - 2023-08-05 17:01:42

On 2023-08-05T18:05:28+0200, M. Fioretti <mfi...@ne...> wrote:
> yes, the JSON does all look like that.

Then a simple string-concatenation would suffice:

xidel -s test.json -e '$json()/concat("ARTICLE: ",id," ==> ",title)'

Or with the latest XPath 4 extended-string-syntax:

xidel -s test.json -e '$json()/`ARTICLE: {id} ==> {title}`'

Your JSON is an array, so be sure to use $json(), or $json?* (XPath/XQuery 3 syntax), to itterate over its members.

On 2023-08-05T09:09:17+0200, Marco Fioretti <mfi...@ne...> wrote:
> My issue is that I have looked into the documentation, but as far as I can tell it gives no clue (none I recognize, at least) to do this.

What documentation specifically?
When I first encountered Xidel long ago I thought lots of thing were not documented, until I realized it had full support for (at that time) XPath/Xquery 2.0, which has its own documentation.
Last week I made this post <https://sourceforge.net/p/xidel/discussion/help/thread/9bebdbf105/#35a8>, which in my opinion links to a lot of interesting Xidel specific and general XPath/XQuery information.

-- 
Reino

Re: [Videlibri-xidel] documentation for syntax to extract two or more fields from same "record" ?

From: M. F. <mfi...@ne...> - 2023-08-05 18:24:50

On Sat, Aug 05, 2023 19:01:34 PM +0200, Reino Wijnsma wrote:
> On 2023-08-05T18:05:28+0200, M. Fioretti <mfi...@ne...> wrote:
> 
>     yes, the JSON does all look like that.
> 
> 
> Then a simple string-concatenation would suffice:
> 
> xidel -s test.json -e '$json()/concat("ARTICLE: ",id," ==> ",title)'
> 
> Or with the latest XPath 4 extended-string-syntax:
> 
> xidel -s test.json -e '$json()/`ARTICLE: {id} ==> {title}`'

thanks a LOT. Just tried both commands, and confirm that they both
work as I need.

> Your JSON is an array, so be sure to use $json(), or $json?*...

And THIS is the one  (in hindsight, obvious) thing that I was missing

> What documentation specifically?

> When I first encountered Xidel long ago I thought lots of thing were
> not documented, until I realized it had full support for (at that
> time) XPath/ Xquery 2.0, which has its own documentation.

Indeed, probably the problem I have may very well be that even THAT
documentation is hard to recognize/navigate (for me, at least). For
example, now that I do have the output I asked for, all lines like:

ARTICLE: 42 ==> Software is eating the world

the next step would be to make each "column" fixed length, for
readability, i.e. this:

ARTICLE:   42 ==> Software is eating the world
ARTICLE: 3942 ==> Software licenses in the age of AI

instead of:

ARTICLE: 42 ==> Software is eating the world
ARTICLE: 3942 ==> Software licenses in the age of AI

in Perl, C, bash... I'd use sprintf, but Is there an XQuery / Xpath
version of it? Doesn't seem so.

Thanks,
Marco

Re: [Videlibri-xidel] documentation for syntax to extract two or more fields from same "record" ?

From: Reino W. <rwi...@xs...> - 2023-08-05 21:29:10

On 2023-08-05T20:24:40+0200, M. Fioretti <mfi...@ne...> wrote:
> the next step would be to make each "column" fixed length, for
> readability, i.e. this:
>
> ARTICLE:   42 ==> Software is eating the world
> ARTICLE: 3942 ==> Software licenses in the age of AI
If it's only about the id, if it doesn't exceed 9999, then the following trick would do:

echo '[
  {
    "id": 1,
    "title": "Software A"
  },
  {
    "id": 10,
    "title": "Software B"
  },
  {
    "id": 100,
    "title": "Software C"
  },
  {
    "id": 1000,
    "title": "Software D"
  }
]' | xidel -se '
  $json()/concat(
    "ARTICLE: ",
    *substring("   ",1,4 - string-length(id))||id,*
    " ==> ",
    title
  )
'
ARTICLE:    1 ==> Software A
ARTICLE:   10 ==> Software B
ARTICLE:  100 ==> Software C
ARTICLE: 1000 ==> Software D

If you have a situation where you need every "column" to have a fixed width, then it IS possible, but your query will become very difficult.
My own hobby-project for example: https://github.com/Reino17/xivid/blob/master/xivid_notes.txt#L1465-L1730.
Another example that comes to mind: https://sourceforge.net/p/xidel/discussion/help/thread/031d881982/#a3e1.

-- 
Reino

Re: [Videlibri-xidel] documentation for syntax to extract two or more fields from same "record" ?

From: Reino W. <rwi...@xs...> - 2024-09-28 13:59:02

Hello videlibri-xidel and Marco,

On 2023-08-05T09:09:17+0200, Marco Fioretti <mfi...@ne...> wrote:
> I am learning Xidel 0.9.8 on Ubuntu. [...] 
I just found out that you're a freelance author, because by pure coincidence I stumbled upon your article https://www.linux-magazine.com/Issues/2023/276/Xidel.
Great job! A good article for anyone new to Xidel. I have no idea if Benito has already seen it, but I'm sure he'll agree.

-- 
Reino