Extract META tags

J Säll
2008-12-11
2013-05-13
  • J Säll
    J Säll
    2008-12-11

    I need to extract DC meta tags but im unsure how to proceed. Maybe someone could point me in the right direction?
    My plan is to create my own RDFS "Page" and then map the exstraced meta tags to my Page schema.
    Any ideas?

     
    • Antoni Mylka
      Antoni Mylka
      2008-12-28

      Sorry for a late reply,

      The component you need is the HtmlExtractor. Just get an InputStream with the content of the html file and pass it to the HtmlExtractor.

      There is an example on the Aperture wiki

      http://aperture.wiki.sourceforge.net/Extractors

      you can use it as a guideline.

      The HtmlExtractor will extract the fulltext and the basic metadata (the META tags) from a html file.

       
    • J Säll
      J Säll
      2009-01-09

      I sucessfully extracted the metadata, however the keywords end up within the <Keyword> tags and not as example/htmlpage#keyword. I'm having trouble finding the links when i do a search on keywords...