Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder
License
Apache License V2.0Follow JavaWAC
Other Useful Business Software
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of JavaWAC!