WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features include the fact that it is multi-thread and has distribution support. WebMagic is very easy to integrate. Add dependencies to your pom.xml. WebMagic use slf4j with slf4j-log4j12 implementation. If you customized your slf4j implementation, please exclude slf4j-log4j12. You can write a class implementation of PageProcessor.

Features

  • Simple core with high flexibility
  • Simple API for html extracting
  • Annotation with POJO to customize a crawler, no configuration
  • Multi-thread and Distribution support
  • Easy to be integrated
  • It covers the whole lifecycle of crawler

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow WebMagic

WebMagic Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of WebMagic!

Additional Project Details

Programming Language

Java

Related Categories

Java Frameworks, Java Web Scrapers

Registered

2021-06-11