go_spider

An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser. Html parsing is based on goquery package. Json parsing is based on simple JSON package. Jsonp will converse to json. Text form represents plain text content without a parser. The PageProcesser moduler only parse results. The moduler gets results(key-value pairs) and URLs to be crawled next step. These key-value pairs will be saved in PageItems and urls will be pushed in Scheduler.

Features

Requires Go 1.2 or higher
Concurrent
Fit for vertical communities
Flexible, Modular
Native Go implementation
Can be expanded to an individualized crawler easily

Project Samples

Project Activity

See All Activity >

License

Mozilla Public License 1.0 (MPL)

Follow go_spider

go_spider Web Site

Other Useful Business Software

MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free

Rate This Project

User Reviews

Be the first to post a review of go_spider!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Related Categories

Go Frameworks

Registered

2023-01-27

Similar Business Software

Webix

JavaScript UI library and framework for speeding up web development. JS Framework for cross-platform web Apps development 102 UI widgets and feature-rich CSS / HTML5 JavaScript controls. Save at least 3000+ development hours by using ready-made widgets and UI controls. Develop Web UI 30% faster....

See Software
Echo

High-performance, extensible, minimalist Go web framework. Highly optimized HTTP router with zero dynamic memory allocation which smartly prioritizes routes. Build robust and scalable RESTful API, easily organized into groups. Automatically install TLS certificates from Let's Encrypt. HTTP/2...

See Software
getcss

getcss - An intuitive CSS framework. Create responsive web apps with getcss. It's simple, easy to use, free, and open source. Features: * Accelerate your development - Write less, get more * Zero dependencies * Latest technologies - Supports latest browsers, HTML5, CSS3 * Easy to learn,...

See Software
SvelteKit

SvelteKit is a framework for rapidly developing robust, performant web applications using Svelte. It addresses common development challenges by providing solutions for routing, server-side rendering, data fetching, service workers, TypeScript integration, and more. SvelteKit apps are...

See Software
hapi

Build powerful, scalable applications, with minimal overhead and full out-of-the-box functionality, your code, your way. Developed initially to handle Walmart’s Black Friday sales, hapi continues to be the proven choice for enterprise-grade backend needs. When you install hapi, every single line...

See Software
Mithril.js

Mithril.js is a modern client-side JavaScript framework for building Single Page Applications. It's small (< 10kb gzip), fast and provides routing and XHR utilities out of the box. Mithril.js is used by companies like Vimeo and Nike, and open source platforms like Lichess. If you are an...

See Software