Protect your business with AI policies and data loss prevention in the browser
Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
Download Chrome
G-P - Global EOR Solution
Companies searching for an Employer of Record solution to mitigate risk and manage compliance, taxes, benefits, and payroll anywhere in the world
With G-P's industry-leading Employer of Record (EOR) and Contractor solutions, you can hire, onboard and manage teams in 180+ countries — quickly and compliantly — without setting up entities.
crawler4j is an open source webcrawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded webcrawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not.
The Java Sitemap Parser can parse a website's Sitemap (http://www.sitemaps.org/). This is useful for web crawlers that want to discover URLs from a website that is using the Sitemap Protocol.
This project has been incorporated into crawler-commons (https://github.com/crawler-commons/crawler-commons) and is no longer being maintained.