An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser. Html parsing is based on goquery package. Json parsing is based on simple JSON package. Jsonp will converse to json. Text form represents plain text content without a parser. The PageProcesser moduler only parse results. The moduler gets results(key-value pairs) and URLs to be crawled next step. These key-value pairs will be saved in PageItems and urls will be pushed in Scheduler.

Features

  • Requires Go 1.2 or higher
  • Concurrent
  • Fit for vertical communities
  • Flexible, Modular
  • Native Go implementation
  • Can be expanded to an individualized crawler easily

Project Samples

Project Activity

See All Activity >

Categories

Frameworks

License

Mozilla Public License 1.0 (MPL)

Follow go_spider

go_spider Web Site

Other Useful Business Software
Build Securely on AWS with Proven Frameworks Icon
Build Securely on AWS with Proven Frameworks

Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
Download Now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of go_spider!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Go

Related Categories

Go Frameworks

Registered

2023-01-27