An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser. Html parsing is based on goquery package. Json parsing is based on simple JSON package. Jsonp will converse to json. Text form represents plain text content without a parser. The PageProcesser moduler only parse results. The moduler gets results(key-value pairs) and URLs to be crawled next step. These key-value pairs will be saved in PageItems and urls will be pushed in Scheduler.

Features

  • Requires Go 1.2 or higher
  • Concurrent
  • Fit for vertical communities
  • Flexible, Modular
  • Native Go implementation
  • Can be expanded to an individualized crawler easily

Project Samples

Project Activity

See All Activity >

Categories

Frameworks

License

Mozilla Public License 1.0 (MPL)

Follow go_spider

go_spider Web Site

Other Useful Business Software
AI-powered service management for IT and enterprise teams Icon
AI-powered service management for IT and enterprise teams

Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Try it Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of go_spider!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Go

Related Categories

Go Frameworks

Registered

2023-01-27