Showing 8 open source projects for "web crawler spider"

View related business solutions
  • Achieve perfect load balancing with a flexible Open Source Load Balancer Icon
    Achieve perfect load balancing with a flexible Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    Boost application security and continuity with SKUDONET ADC, our Open Source Load Balancer, that maximizes IT infrastructure flexibility. Additionally, save up to $470 K per incident with AI and SKUDONET solutions, further enhancing your organization’s risk management and cost-efficiency strategies.
  • Finance Automation that puts you in charge Icon
    Finance Automation that puts you in charge

    Tipalti delivers smart payables that elevate modern business.

    Our robust pre-built connectors and our no-code, drag-and-drop interface makes it easy and fast to automatically sync vendors, invoices, and invoice payment data between Tipalti and your ERP or accounting software.
  • 1
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for URLs...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    GoSpider

    GoSpider

    Gospider - Fast web spider written in Go

    GoSpider - Fast web spider written in Go. Fast web crawling. Brute force and parse sitemap.xml. Parse robots.txt. Generate and verify link from JavaScript files. Link Finder. Find AWS-S3 from response source. Find subdomains from the response source. Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault. Format output easy to Grep. Support Burp input. Crawl multiple sites in parallel.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise AI Search, Intranet, and Wiki in one platform. Icon
    Enterprise AI Search, Intranet, and Wiki in one platform.

    Your company’s all-in-one solution for trusted information

    Cut through the noise and end information overload with Guru, an all-in-one wiki, intranet, and knowledge base that serves as your company's single source of truth.
  • 5
    Pholcus

    Pholcus

    Distributed high-concurrency crawler software written in pure golang

    Pholcus is a high-concurrency crawler software written in pure Go language that supports distributed, only used for programming learning and research. It supports three operating modes of stand-alone, server and client, and has three operating interfaces, Web, GUI, and command line; simple and flexible rules, concurrent batch tasks, and rich output methods (mysql/mongodb/kafka/csv/excel, etc.); In addition, it also supports horizontal and vertical grabbing modes, and a series of advanced...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GOPA

    GOPA

    GOPA, a spider written in Golang, for Elasticsearch

    GOPA, a spider written in Golang, for Elasticsearch. Lightweight, low footprint, memory requirement should, be 100MB. Easy to deploy, no runtime or dependency required. Easy to use, no programming or script ability needed, out-of-box features. First of all, get it, two opinions: download the pre-built package or compile it yourself. Besides Elasticsearch, Gopa doesn't require any other dependencies, just simply run ./gopa to start the program. It's safety to press ctrl+c to stop the current...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DHT

    DHT

    BitTorrent DHT Protocol && DHT Spider.

    ... the standard BitTorrent DHT protocol, not born for crawling the DHT network. NAT Traversal issue. You run the crawler in a local network. It will block ip which looks bad and a good ip may be misjudged.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    go_spider

    go_spider

    An awesome Go concurrent Crawler(spider) framework

    An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser. Html parsing is based on goquery package. Json parsing is based on simple JSON package. Jsonp...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next