Menu

XML parser

2000-10-04
2000-10-09
  • Sean McCombe

    Sean McCombe - 2000-10-04

    We will need to source an XML parser (presumably there must be one out there somewhere) so that we can parse the HTML documents.

    This implies the ability to convert the retrieved/stored HTML documents to a syntactically correct HTML-schema XML document. We would need to write a design for this conversion, including heuristics in those cases where the change required to the HTML document is not obvious (for example, where it is not obvious where the missing HTML end tags should be).

     
    • Anonymous

      Anonymous - 2000-10-05

      We should use SAX  or DOM Parsers I  think. I know of two which are
      free and in Java.

      1. Apache Projects XERXES : go to  http://xml.apache.org/
          SAX 2 and DOM  1 and 2(beta)

      2. J Clarks XP : go to http://www.jclark.com/ (Sax only ?)

      Isn't there a  Oracle-something too ?

      martin

       
      • Sean McCombe

        Sean McCombe - 2000-10-09

        I like the look of Xerces at xml.apache.org. It appears to be fairly complete and sophisticated, and of course, it comes from the apache group so it can't be too bad at all to use.

        I'm all for adopting it as our standard XML parsing component, and by the looks of it, you are too Martin. I'd say Chris would be too if he weren't so busy at Uni right now :)

        I'll add a link to the home page just now.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.