[sleuthkit-developers] HTML Text Extraction of comments, script, etc.
Brought to you by:
carrier
From: Brian C. <ca...@sl...> - 2012-06-06 18:22:09
|
Anyone know of an open source library that extracts text from HTML files including the comments, java script etc? We're playing with SOLR/Tika and its HTML extraction will only output the file's text and not the other stuff. thanks, brian |