A HTMLMetaDataExtractor is needed: extract title (almost always available) and other meta tags found for author, subject etc..
Log in to post a comment.