Menu

[HELP and SUGGESTION] Can't find a specific part to monitor because of multiple " </div>"s (start and end tag)

2022-01-05
2022-01-07
  • GABRIEL OLIVEIRA DINIZ

    Is it hard to implement the XPath in the Content tab? I'm asking as noob, cause I really don't know. Explaining:

    I'm trying to search the content in the blue square in the screenshot (website: http://www.fumarc.com.br/concursos/detalhe/escrivao-de-policia-i-/138#aba-Pub).
    If I use the HTML finder I need to put in
    start tag: <div class="aba-concursos-internas>
    and in the
    end tag it should be:</div>.

    Problem: Between the start wanted tag and the end tag there are multiple</div>.

    Im trying to search for any change in the whole square (any new "topic" or link).

    If I used XPath (as given by chrome/edge) it would be: //*[@id="site"]/div/div[3]/div[1]/div[2]/div[2]

    jsPath: document.querySelector("#site > div > div.main > div.content-internas > div.bloco-info-int > div.aba-concursos-internas")

    and Full Xpath: /html/body/div[1]/div/div[3]/div[1]/div[2]/div[2]

    Those "paths" would be much more easier to use (since you can use any common browser to get this) and it would be more correctly. I don't know if it's easier to implement (my guess: it isn't).

    Anyway, can anyone give me a solution for my problem for now? I'm currently using a full page check since seems to be the only way, but other website might give me a problem.

    Thank you for your work.

     

    Last edit: GABRIEL OLIVEIRA DINIZ 2022-01-05
  • Morten MacFly

    Morten MacFly - 2022-01-07

    In your case I can capture the content with these two tags:

    <h2>Documentação do concurso</h2>
    

    ...and:

    <!-- Links -->
    

    ...or:

    <div class="aba-concursos-internas>
    

    ...and:

    <div id="sidebar-internas">
    

    .
    You don't need to exactly match the inner content to track changes, did you consider that?

    WRT to you second question:
    Currently, WCM was designed to actually not need to know about the language it parses (HTML/XML, JSON for example). This keeps the complexity low. But I agree that there are cases that would be better to specify with XPath. In addition, most users don't know XPath so it would be for professionals only and due to the many filter options you have you can easily screw the syntax of the underlying language such that a parser could not understand the content anymore.

    However, you are not the first one asking for a parser. So I'll see what I can do and whether or not there are tiny XML libraries that support XPath or similar. I had to work with libxml2 a lot and this (for example) is "too fat" to be considered. I like TinyXML2 but it comes w/o XPath. So if you have a suggestion for a well supported but tiny library, let me know...

     
    • Anonymous

      Anonymous - 2022-01-07

      You don't need to exactly match the inner content to track changes, did you consider that?

      No, I did not and that makes a lot of sense. Thank you my friend for the help, now I got it how it works. If I catch the content BETWEEN two others points which dont exactly are what i'm looking for, in anyway i'm still watching over what I want.

      That was a lot of help, sorry for all the trouble.

      I just found out it's hard in forums like sourceforge since those things like * 11 hours ago* make it difficult to track (it will change each hour, even without an update).

       
  • Morten MacFly

    Morten MacFly - 2022-01-07

    I was digging into it a little I found pugixml that would actually be fine: A light-weight XML parser with XPath support. But now comes the actual issue: HTML is not valid XML. So regular XML parsers (including pugixml) will fail as a lot of opening tags do not have a closing one (e.g. meta).
    So what would be needed is either a fully compliant HTML parser with XPath or a converter that converts HTML into XHTML on-the-fly. The latter is expensive for large home-pages - and there are many. So as you can see: Its at least not that easy...

    If you still need such functionality, please open a feature request and point to this thread in the forums. But maybe the hint above is good for you...

     
    • Anonymous

      Anonymous - 2022-01-07

      I really can't trouble you this much, your help was enough to make me understand how to do it.

       

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.