Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder

Project Activity

See All Activity >

License

Apache License V2.0

Follow JavaWAC

JavaWAC Web Site

Other Useful Business Software
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of JavaWAC!

Additional Project Details

Intended Audience

Science/Research

User Interface

Non-interactive (Daemon), Web-based

Programming Language

Java

Related Categories

Java Search Engines, Java Frameworks, Java Intelligent Agents, Java Information Analysis Software

Registered

2008-04-11