Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder
License
Apache License V2.0Follow JavaWAC
Other Useful Business Software
Get the most trusted enterprise browser
Defend against security incidents with Chrome Enterprise. Create customizable controls, manage extensions and set proactive alerts to keep your data and employees protected without slowing down productivity.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of JavaWAC!