Menu

How to combine a 3-part query ?

Help
Anonymous
2022-09-17
2022-10-16
  • Anonymous

    Anonymous - 2022-09-17

    I'm working on

          https://github.com/ArtifexSoftware/ghostpdl-downloads/releases
    

    and I'm able to extract
    the title lines

                   --extract "txt:=//div[@class='col-md-9']//h1"
        or         --extract "txt:=//div[@class='flex-1']/h1"
    

    the released versions

                   --extract "cls:=//div[@class='col-md-2 d-flex flex-md-column flex-row flex-wrap pr-md-6 mb-2 mb-md-0 flex-items-start pt-md-4']"   
                   --extract "vrs:=$cls//span[@class = 'ml-1 wb-break-all']"
    

    the acc. date

                   --extract "cls:=//div[@class='col-md-2 d-flex flex-md-column flex-row flex-wrap pr-md-6 mb-2 mb-md-0 flex-items-start pt-md-4']"   
                   --extract "tim:=//*[ends-with(name(),'-time')]/@datetime"
    

    All these are arrays.

    When I try to produce a single line per entry, problems begin - actually on the basic level.
    So am not able to combine (the version)

                   --extract "vrs:=$cls//span[@class = 'ml-1 wb-break-all']"
    

    with the function

                   normalize-space (...)
    

    I get everything - syntax error, endless loop and a string of all entries - but no array of normalized entries.

    So don't even reach the 'string-join'-level.

    Any help?!

    Thsnd Thx

     
  • Reino

    Reino - 2022-09-17

    All these are arrays.

    Sequences.

    I get everything - syntax error, endless loop and a string of all entries - but no array of normalized entries.

    What exactly have you tried? And what is the end-result you're actually looking for?

     
  • Anonymous

    Anonymous - 2022-09-17

    Reino, nice to meet you again ...

    I'm trying to build 'one liners' like

                2022-04-04T14:53:27Z    gs9561    Ghostscript/GhostPDL 9.56.1
                :
    

    I tried

        --extract "cls:=//div[@class='col-md-2 d-flex flex-md-column flex-row flex-wrap pr-md-6 mb-2 mb-md-0 flex-items-start pt-md-4']"
        --extract "vrs:=normalize-space($cls//span[@class = 'ml-1 wb-break-all'])"
    

    giving
    SET vrs=gs9561 gs1000rc2 gs10.0.0rc1 gs9560 gs9550 gs9560rc2 gs9560rc1 gs9540 gs9550rc1 gpdf_beta1

        --extract "vrs:=($cls)(normalize-space(//span[@class = 'ml-1 wb-break-all']))"
    

    gives
    SET vrs=

    for /f "delims=" %%r in ( 'xidel.exe --silent "%fil%" --output-format=cmd --extract-exclude=cls
    
        --extract "cls:=//div[@class='col-md-2 d-flex flex-md-column flex-row flex-wrap pr-md-6 mb-2 mb-md-0 flex-items-start pt-md-4']"
    

    --extract "vrs:=($cls)//normalize-space(span[@class = 'ml-1 wb-break-all'])"

    ') do %%r
    
    is an endless loop.
    

    Thx for your attention

     
  • Reino

    Reino - 2022-09-17

    Reino, nice to meet you again ...

    Wish I could say "likewise", but with you being anonymous that's not so obvious to me.

    --extract "vrs:=($cls)//normalize-space(span[@class = 'ml-1 wb-break-all'])"

    That would be...

    --extract "vrs:=normalize-space($cls//span[@class='ml-1 wb-break-all'])"
    

    or...

    --extract "vrs:=$cls/normalize-space(.//span[@class='ml-1 wb-break-all'])"
    

    I'd forget about the use of external variables if I were you, because that's something very difficult to accomplish in cmd, if possible at all.

    Instead of...

    //div[@class='col-md-2 d-flex flex-md-column flex-row flex-wrap pr-md-6 mb-2 mb-md-0 flex-items-start pt-md-4']
    

    I would take the parent node...

    //div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center']
    

    as a starting-point. The first occurrence:

    xidel -s "<url>" -e ^"^
      (//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center'])[1]/(^
        div/div/local-time/@datetime,^
        div/div/a/div/span[@class='ml-1 wb-break-all'],^
        div[@class='col-md-9']//h1^
      )^
    "
    

    Or shorter:

    xidel -s "<url>" -e ^"^
      (//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center'])[1]/(^
        (.//@datetime)[1],^
        .//span[@class='ml-1 wb-break-all'],^
        div[@class='col-md-9']//h1^
      )^
    "
    2022-04-04T14:53:27Z
    
                gs9561
    
    Ghostscript/GhostPDL 9.56.1
    

    There are multiple @datetime attribute-nodes, so be sure to select the first one.

    xidel -s "<url>" -e ^"^
      (//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center'])[1] ! (^
        (.//@datetime)[1],^
        normalize-space(.//span[@class='ml-1 wb-break-all']),^
        div[@class='col-md-9']//h1^
      )^
    "
    2022-04-04T14:53:27Z
    gs9561
    Ghostscript/GhostPDL 9.56.1
    
    xidel -s "<url>" -e ^"^
      (//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center'])[1]/join(^
        (^
          (.//@datetime)[1],^
          normalize-space(.//span[@class='ml-1 wb-break-all']),^
          div[@class='col-md-9']//h1^
        ),x:cps(9)^
      )^
    "
    2022-04-04T14:53:27Z    gs9561  Ghostscript/GhostPDL 9.56.1
    

    Instead of x:cps(9) (TAB), "&#9;" should work as well. From your post I assumed that's what you want.
    And so for all occurrences:

    xidel -s "<url>" -e ^"^
      //div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center']/join(^
        (^
          (.//@datetime)[1],^
          normalize-space(.//span[@class='ml-1 wb-break-all']),^
          div[@class='col-md-9']//h1^
        ),x:cps(9)^
      )^
    "
    2022-04-04T14:53:27Z    gs9561  Ghostscript/GhostPDL 9.56.1
    2022-09-07T13:05:51Z    gs1000rc2       Ghostscript/GhostPDL 10.0.0 Release Candidate 2
    2022-08-24T13:18:12Z    gs10.0.0rc1     Ghostscript/GhostPDL 10.0.0 Release Candidate 1
    2022-03-29T10:10:45Z    gs9560  Ghostscript/GhostPDL 9.56.0
    2021-09-27T09:22:22Z    gs9550  Ghostscript/GhostPDL 9.55.0
    2022-03-14T14:47:52Z    gs9560rc2       Ghostscript/GhostPDL 9.56.0rc2
    2022-03-02T12:06:19Z    gs9560rc1       Ghostscript/GhostPDL 9.56.0rc1
    2021-03-30T09:05:01Z    gs9540  Ghostscript/GhostPDL 9.54.0
    2021-09-16T10:03:33Z    gs9550rc1       Ghostscript/GhostPDL 9.55.0 Release Candidate 1
    2021-08-25T14:05:17Z    gpdf_beta1      gpdf/pdfi tech preview/beta 1
    

    Or of course minified:

    xidel -s "<url>" -e "//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center']/join(((.//@datetime)[1],normalize-space(.//span[@class='ml-1 wb-break-all']),div[@class='col-md-9']//h1),x:cps(9))"
    
     
  • Anonymous

    Anonymous - 2022-09-17

    ... I'm stunned - and I have to think over.
    For the moment i thank you for another midnight lesson.

     
  • Anonymous

    Anonymous - 2022-09-19

    Sorry I'm a bit delayed - my systems power supply had given up.

    Reino, I thank you for this piece of code.
    With it's concreteness and it's clearness it is something I had long been looking for.
    Especially the clarification of how to combine adressing & processing helped me a lot.

    ... and this is what I made from it

    set "xtr=         vrs:=(//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center']/"
    set "xtr=%xtr%    join("
    set "xtr=%xtr%          ("
    set "xtr=%xtr%              (.//@datetime)[1],"
    set "xtr=%xtr%              normalize-space(.//span[@class='ml-1 wb-break-all']),"
    set "xtr=%xtr%              div[@class='col-md-9']//h1"
    set "xtr=%xtr%          ),"
    set "xtr=%xtr%          x:cps(9)"
    set "xtr=%xtr%    ))"
    
    for /f "delims=" %%l in ('xidel.exe --silent "%fil%" --output-format=cmd --extract "%xtr%"') do %%l
    

    Thank you

     
  • Reino

    Reino - 2022-09-21

    That's a creative way to assign the (prettified) extraction-query to a variable. The outer parenthesis ( ) aren't necessary btw.

    Alternatively you could of course insert the extraction-query directly (with the necessary escape-characters):

    FOR /F "delims=" %A IN ('
      xidel -s "%fil%" -e ^"
        vrs:^=//div[@class^='d-flex flex-column flex-md-row my-5 flex-justify-center']/join^(
          ^(
            ^(.//@datetime^)[1]^,
            normalize-space^(.//span[@class^='ml-1 wb-break-all']^)^,
            div[@class^='col-md-9']//h1
          ^)^,x:cps^(9^)
        ^)
      ^" --output-format^=cmd
    ') DO @%A
    

    Or minified:

    FOR /F "delims=" %A IN ('
      xidel -s "%fil%" -e "vrs:=//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center']/join(((.//@datetime)[1],normalize-space(.//span[@class='ml-1 wb-break-all']),div[@class='col-md-9']//h1),x:cps(9))" --output-format^=cmd
    ') DO @%A
    
     
  • Anonymous

    Anonymous - 2022-09-26

    And this is the final result (for now):

    set "xtr=   $srt/join("         
    set "xtr=%xtr%          (.//@datetime, "
    set "xtr=%xtr%          substring(concat(normalize-space(.//span[@class='ml-1 wb-break-all']), '               '), 1, 23 ), "
    set "xtr=%xtr%          div[@class='col-md-9']//h1)"
    set "xtr=%xtr%      , x:cps(9))"
    set "xtr=%xtr%"
    
        for /f "delims=" %%l in ('xidel.exe -s "%fil%" --output-format=cmd --extract-exclude=srt
                --extract "srt:=reverse(sort(//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center'], (), function($div){$div//relative-time/@datetime}))   "
                --extract "xtr:=%xtr%"
        ') do %%l
    

    It's more impressive on https://github.com/Hibbiki/chromium-win32/releases, because the nodes there are not ordered chronolical per se.

    btw: './/@datetime' appears only once per node - The others are dynamically loaded by the browser.

    ... my name is Michael, and I appreciate your expertise as well as concreteness and detailedness of your answers. Thank you.

     
  • Reino

    Reino - 2022-10-02

    Hello Michael,

    sort() is for individual items, like sort(//div[@class='d-flex flex-column flex-md-row my-5 flex-justify-center']//@datetime) for instance. This won't work for sorting nodes. You'll have to use the XQuery "Order By Clause" in a FLWOR expression:

    xidel -s "https://github.com/Hibbiki/chromium-win32/releases" -e "for $x in //section order by $x//@datetime descending return $x/join((.//@datetime,h2),x:cps(9))"
    xidel -s "https://github.com/Hibbiki/chromium-win32/releases" -e ^"^
      for $x in //section^
      order by $x//@datetime descending^
      return^
      $x/join((.//@datetime,h2),x:cps(9))^
    "
    2022-10-01T15:28:04Z    v106.0.5249.91-r1036826
    2022-09-28T18:04:31Z    v106.0.5249.62-r1036826
    2022-09-14T10:45:09Z    v105.0.5195.127-r1027018
    2022-09-03T08:22:22Z    v105.0.5195.102-r856
    2022-08-31T06:54:31Z    v105.0.5195.54-r1027018
    2022-08-17T19:50:35Z    v104.0.5112.102-r1012729
    2022-08-03T19:09:18Z    v104.0.5112.81-r1012729
    2022-07-20T21:55:38Z    v103.0.5060.134-r1002911
    2022-06-29T16:35:26Z    v103.0.5060.66-r1002911
    2022-06-22T20:10:51Z    v103.0.5060.53-r1002911
    

    It also appears that Github has changed their HTML-source a bit. In this case it leads to a shorter and more simple query, as you can see.

    For the other Github-url that would be:

    xidel -s "https://github.com/ArtifexSoftware/ghostpdl-downloads/releases" -e "for $x in //section order by ($x//@datetime)[1] descending return $x/join(((.//@datetime)[1],normalize-space(.//span[@class='ml-1 wb-break-all']),h2),x:cps(9))"
    xidel -s "https://github.com/ArtifexSoftware/ghostpdl-downloads/releases" -e ^"^
      for $x in //section^
      order by ($x//@datetime)[1] descending^
      return^
      $x/join(^
        (^
          (.//@datetime)[1],^
          normalize-space(.//span[@class='ml-1 wb-break-all']),^
          h2^
        ),x:cps(9)^
      )^
    "
    2022-09-21T12:19:16Z    gs1000  Ghostscript/GhostPDL 10.0.0
    2022-09-07T13:05:51Z    gs1000rc2       Ghostscript/GhostPDL 10.0.0 Release Candidate 2
    2022-08-24T13:18:12Z    gs10.0.0rc1     Ghostscript/GhostPDL 10.0.0 Release Candidate 1
    2022-04-04T14:53:27Z    gs9561  Ghostscript/GhostPDL 9.56.1
    2022-03-29T10:10:45Z    gs9560  Ghostscript/GhostPDL 9.56.0
    2022-03-14T14:47:52Z    gs9560rc2       Ghostscript/GhostPDL 9.56.0rc2
    2022-03-02T12:06:19Z    gs9560rc1       Ghostscript/GhostPDL 9.56.0rc1
    2021-09-27T09:22:22Z    gs9550  Ghostscript/GhostPDL 9.55.0
    2021-09-16T10:03:33Z    gs9550rc1       Ghostscript/GhostPDL 9.55.0 Release Candidate 1
    2021-03-30T09:05:01Z    gs9540  Ghostscript/GhostPDL 9.54.0
    

    substring(concat(..., ' '), 1, 23 )

    There is a (cumbersome) way to automate this:

    xidel -s "https://github.com/ArtifexSoftware/ghostpdl-downloads/releases" -e "let $len:=(max(//section/(.//@datetime)[1] ! string-length(.)),max(//section/normalize-space(.//span[@class='ml-1 wb-break-all']) ! string-length(.))) for $x in //section order by ($x//@datetime)[1] descending return string-join(for $node at $i in (($x//@datetime)[1],normalize-space($x//span[@class='ml-1 wb-break-all']),$x/h2) return ($node,(1 to $len[$i] - string-length($node) + 4) ! ' '))"
    xidel -s "https://github.com/ArtifexSoftware/ghostpdl-downloads/releases" -e ^"^
      let $len:=(^
        max(//section/(.//@datetime)[1] ! string-length(.)),^
        max(//section/normalize-space(.//span[@class='ml-1 wb-break-all']) ! string-length(.))^
      )^
      for $x in //section^
      order by ($x//@datetime)[1] descending^
      return^
      string-join(^
        for $node at $i in (^
          ($x//@datetime)[1],^
          normalize-space($x//span[@class='ml-1 wb-break-all']),^
          $x/h2^
        )^
        return (^
          $node,^
          (1 to $len[$i] - string-length($node) + 4) ! ' '^
        )^
      )^
    "
    2022-09-21T12:19:16Z    gs1000         Ghostscript/GhostPDL 10.0.0
    2022-09-07T13:05:51Z    gs1000rc2      Ghostscript/GhostPDL 10.0.0 Release Candidate 2
    2022-08-24T13:18:12Z    gs10.0.0rc1    Ghostscript/GhostPDL 10.0.0 Release Candidate 1
    2022-04-04T14:53:27Z    gs9561         Ghostscript/GhostPDL 9.56.1
    2022-03-29T10:10:45Z    gs9560         Ghostscript/GhostPDL 9.56.0
    2022-03-14T14:47:52Z    gs9560rc2      Ghostscript/GhostPDL 9.56.0rc2
    2022-03-02T12:06:19Z    gs9560rc1      Ghostscript/GhostPDL 9.56.0rc1
    2021-09-27T09:22:22Z    gs9550         Ghostscript/GhostPDL 9.55.0
    2021-09-16T10:03:33Z    gs9550rc1      Ghostscript/GhostPDL 9.55.0 Release Candidate 1
    2021-03-30T09:05:01Z    gs9540         Ghostscript/GhostPDL 9.54.0
    
     
  • Anonymous

    Anonymous - 2022-10-16

    Hello Reino,
    thank you for another elaborated reply.
    - I'll be back when I read it attentively.

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.