Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder

Project Activity

See All Activity >

License

Apache License V2.0

Follow JavaWAC

JavaWAC Web Site

Other Useful Business Software
AI-powered service management for IT and enterprise teams Icon
AI-powered service management for IT and enterprise teams

Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Try it Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of JavaWAC!

Additional Project Details

Intended Audience

Science/Research

User Interface

Non-interactive (Daemon), Web-based

Programming Language

Java

Related Categories

Java Search Engines, Java Frameworks, Java Intelligent Agents, Java Information Analysis Software

Registered

2008-04-11