Menu

How to extract JSON data embedded in a HTML page

Help
lemale
2019-03-15
2019-08-03
  • lemale

    lemale - 2019-03-15

    I want to extract data from a json that is defined in a javascript variable. This looks like this:

    ...HTML CODE...
    <script>
    var myvar = {
        name: "John",
        lastname: "Doe"
    }
    </script>
    ...HTML CODE...
    

    I need to get:
    John
    Doe

    Can I do it using Xidel?

     
  • Benito van der Zander

    yes, but the script is treated as plaintext and you need to get the text with the object first. e.g.

     json(substring-after(//script, "myvar ="))/(name,lastname)
    

    or

    json(extract(//script, "myvar *= *(\{.*\})", 1, "s"))/(name,lastname)
    

    or in xidel 0.9.9:

     json(substring-after(//script, "myvar ="))?*
    
     

    Last edit: Benito van der Zander 2019-03-18
  • lemale

    lemale - 2019-03-23

    Thanks Benito, it works perfectly!

    The command works fine, but when there is a script tag before it doesn't work. How do I specify the script tag I want to get?

    I also want to save the data in variables, but I seem to be doing something wrong. I'm using this command:

    for /f "delims=" %%a in ('xidel "page.html" -e "name:=json(substring-after(//script, \"myvar =\"))/(name)" -e "lastname:=json(substring-after(//script, \"myvar =\"))/(lastname)" --output-format cmd') do %%a

     
  • Reino

    Reino - 2019-08-03

    Seeing you're on Windows that's because using \" inside double quotes will get you into lots of trouble. Use single quotes:

    FOR /F "delims=" %%A IN ('xidel -s "page.html" -e "name:=json(substring-after
    (//script,'myvar ='))/name" -e "lastname:=json(substring-after(//script,'myva
    r ='))/lastname" --output-format^=cmd') DO %%A
    

    You can use 1 query to export both variables btw:

    FOR /F "delims=" %%A IN ('xidel -s "page.html" -e "json(substring-after(//scr
    ipt,'myvar ='))/(name:=name,lastname:=lastname)" --output-format^=cmd') DO %%
    A
    

    You could even have "John" and "Doe" automatically being assigned a variable with the corresponding attribute name:

    FOR /F "delims=" %%A IN ('xidel -s "page.html" --extract-exclude=json -e "jso
    n:=json(substring-after(//script,'myvar =')),$json() ! eval(x'{.}:=$json/{.}'
    )" --output-format^=cmd') DO %%A
    
     

    Last edit: Reino 2019-08-03

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.