crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not.