I need to extract DC meta tags but im unsure how to proceed. Maybe someone could point me in the right direction?
My plan is to create my own RDFS "Page" and then map the exstraced meta tags to my Page schema.
Sorry for a late reply,
The component you need is the HtmlExtractor. Just get an InputStream with the content of the html file and pass it to the HtmlExtractor.
There is an example on the Aperture wiki
you can use it as a guideline.
The HtmlExtractor will extract the fulltext and the basic metadata (the META tags) from a html file.
I sucessfully extracted the metadata, however the keywords end up within the <Keyword> tags and not as example/htmlpage#keyword. I'm having trouble finding the links when i do a search on keywords...