HTML file to a DB

Help
Devinder
2008-03-08
2013-04-17
  • Devinder

    Devinder - 2008-03-08

    Hi,

    I have some HTML files that I have crawled using Heritrix. I want the grab the data that they have in them and put into mysql. How the Mozilla parser can help me?

     
    • Ohad Serfaty

      Ohad Serfaty - 2008-03-08

      Hi
      It depends in what format you wish to store them in the database. The html parser takes the html String that you have and it parses a Document object from it - doing all the things that firefox does in the process of parsing it ( i.e , closing tags , fixing tags that may have been misplaced etc ) - So , essentially if you want to insert all the data that's inside an html page , you need to parse that page and then have some DFS go over all the nodes in the DOM , picking whatever you want to store in the DB . the parser will do the correct parsing of the page for you.

       

Log in to post a comment.