Menu

Accessing Information from XML/JSON using XPath/Pointer

2022-01-22
2024-03-25
  • Morten MacFly

    Morten MacFly - 2022-01-22

    Since version 22.01, WCM supports to extract information from XML and JSON documents. These can be numbers (for example) that be further processed using the interpreter function for a comparison against an expected / changed value to create an web change alert. Furthermore, if the server features a REST API it can be used to access dedicated information using dedicated server-requests.

    How does it work? Consider this document for XML:

    <books>
      <book>
            <title>A Wild Sheep Chase</title>
            <price>22.72</price>
      </book>
      <book>
            <title>The Night Watch</title>
            <price>23.58</price>
      </book>
      <book>
            <title>The Comedians</title>
            <price>21.99</price>
      </book>
    </books>
    

    ...then you can access the price of the second book (the index is zero-based!) by this XML/XPath notation:
    /books/book[1]/price
    ...which will return: 23.58

    For JSON it is similar. Consider this JSON document:

    {
        "books":
        [
            {
                "title" : "A Wild Sheep Chase",
                "price" : 22.72
            },
            {
                "title" : "The Night Watch",
                "price" : 23.58
            },
            {
                "title" : "The Comedians",
                "price" : 21.99
            },
        ]
    }
    

    ...then you can access the price of the second book by this JSON/Pointer notation:
    /books/1/price
    ...alternatively you can use JSONPath like this:
    $.books.1.price

    Again, if you let an interpreter operation follow then you track prices of items, for example.

    For XML, there is something special to consider:
    XML/XPath does not work on HTML because HTML is not a valid XML document you can apply XPath on. For these cases, WCM offers the option to try to convert HTML to a valid XML document (in fact, it becomes something like XHTML). You have to enable this conversion in the options of the XML/XPath filter. On the other hand, if the webpage is already XML or XHTML you don't need and should not apply the conversion as it may lead to errors otherwise.

     
    👍
    1

    Last edit: Morten MacFly 2023-09-23
    • Gitoffthelawn

      Gitoffthelawn - 2024-03-24

      Is there a way to select the price of "The Night Watch" if its position in the list may change?

      For example, something like: /books/title:"The Night Watch"/price

      If not, this functionality will be exceptionally useful to add!

       

      Last edit: Gitoffthelawn 2024-03-24
      • Morten MacFly

        Morten MacFly - 2024-03-25

        Good question. Honest answer: I don't now. I am using the syntax of the JSONCons library (https://danielaparker.github.io/jsoncons/) which ships with many examples (on that homepage and also here: https://github.com/danielaparker/jsoncons/tree/master/examples/src and here: https://github.com/danielaparker/jsoncons?tab=readme-ov-file#E1). You might want to check if you find something useful there for your purpose and then I can check if there is a need to change / enhance the implementation.

         
  • GABRIEL OLIVEIRA DINIZ

    Thank you for this update! I think it has a little bit relation with my question, right? Anyway, thank you!

     
    • Morten MacFly

      Morten MacFly - 2022-04-01

      Yes, indeed - it could probably solve your problem! For me it was more to support REST APIs on servers where you send a query and get often JSON in return. Glad to hear it may be of further use...

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.