Extract META tags

J Säll
2008-12-11
2013-05-13
  • J Säll

    J Säll - 2008-12-11

    I need to extract DC meta tags but im unsure how to proceed. Maybe someone could point me in the right direction?
    My plan is to create my own RDFS "Page" and then map the exstraced meta tags to my Page schema.
    Any ideas?

     
    • Antoni Mylka

      Antoni Mylka - 2008-12-28

      Sorry for a late reply,

      The component you need is the HtmlExtractor. Just get an InputStream with the content of the html file and pass it to the HtmlExtractor.

      There is an example on the Aperture wiki

      http://aperture.wiki.sourceforge.net/Extractors

      you can use it as a guideline.

      The HtmlExtractor will extract the fulltext and the basic metadata (the META tags) from a html file.

       
    • J Säll

      J Säll - 2009-01-09

      I sucessfully extracted the metadata, however the keywords end up within the <Keyword> tags and not as example/htmlpage#keyword. I'm having trouble finding the links when i do a search on keywords...

       

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks