Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder
License
Apache License V2.0Follow JavaWAC
Other Useful Business Software
Stop vibe-debugging.
AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of JavaWAC!