Menu

New Project

2008-01-28
2012-09-04
  • Robert Whitney

    Robert Whitney - 2008-01-28

    I have a great idea for a new project, but my knowledge stops at basic xml etc. I want to harvest specific HTML based content and convert it into an XML file, but I have no idea where to start. Basically, I want say- header and body content, but I want to ignore a lot of the markup and things like that. I am sure there is some way to automate this, and I think I've come to the right place, but I am missing some pieces. Can anyone help me?

    -Bob

     
    • Jeff

      Jeff - 2008-01-28

      web-harvest is a great tool for extracting information from websites. But it's geared towards programmers, in that you typically need to know (or be willing to learn) a fair amount about html, http, regular expressions, and xpath.

      You can usually get answers to specific problems here in the forums. But if it takes longer than a few minutes for someone to answer, you'll probably have to figure it out for yourself. (Or, of course, find someone else to do it for you... hire someone, convince a friend, etc.)

       

Log in to post a comment.