Showing 121 open source projects for "crawl"

View related business solutions
  • EBizCharge Payment Platform for Accounts Receivable Icon
    EBizCharge Payment Platform for Accounts Receivable

    Getting paid has never been easier.

    Don’t let unpaid invoices limit your business’s growth. EBizCharge plugs directly into the tools your business already uses to speed up payment collection.
  • The next chapter in business mental wellness Icon
    The next chapter in business mental wellness

    Entrust your employee well-being to Calmerry's nationwide network of licensed mental health professionals.

    Calmerry is beneficial for businesses of all sizes, particularly those in high-stress industries, organizations with remote teams, and HR departments seeking to improve employee well-being and productivity
  • 1
    Dungeon Crawl: Stone Soup

    Dungeon Crawl: Stone Soup

    A game of dungeon exploration, combat and magic

    An open source roguelike adventure through dungeons filled with dangerous monsters in a quest to find the mystifyingly fabulous Orb of Zot. Crawl may seem easier than many other roguelikes at first glance, but dig a little deeper and you'll find it's just as challenging as some of the most difficult variations out there and a good deal harder than the rest. A strong set of design philosophies makes it much friendlier (and generally fairer) to the player - deaths are a learning experience...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Firecrawl

    Firecrawl

    Turn entire websites into LLM-ready markdown or structured data

    Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap is required.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 4
    Puppeteer

    Puppeteer

    Headless Chrome Node.js API

    ... such as generate page screenshots and PDFs, crawl a Single-Page Application, test Chrome extensions and more.
    Downloads: 14 This Week
    Last Update:
    See Project
  • Automated quote and proposal software for IT solution providers. | ConnectWise CPQ Icon
    Automated quote and proposal software for IT solution providers. | ConnectWise CPQ

    Create IT quote templates, automate workflows, add integrations & price catalogs to save time & reduce errors on manual data entry & updates.

    ConnectWise CPQ, formerly ConnectWise Sell, is a professional quote and proposal automation software for IT solution providers. ConnectWise CPQ offers a wide range of tools that enables IT solution providers to save time, quote more, and win big. Top features include professional quote or proposal templates, product catalog and sourcing, workflow automation, sales reporting, and integrations with best-in-breed solutions like Cisco, Dell, HP, and Salesforce.
  • 5
    Web-Check

    Web-Check

    All-in-one OSINT tool for analysing any website

    Comprehensive, on-demand open source intelligence for any website. Get an insight into the inner-workings of a given website: uncover potential attack vectors, analyse server architecture, view security configurations, and learn what technologies a site is using. Currently the dashboard will show: IP info, SSL chain, DNS records, cookies, headers, domain info, search crawl rules, page map, server location, redirect ledger, open ports, traceroute, DNS security extensions, site performance...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    Nebula libp2p DHT

    Nebula libp2p DHT

    A libp2p DHT crawler, monitor, and measurement tool

    A libp2p DHT crawler and monitor that tracks the liveness of peers. The crawler connects to DHT bootstrap peers and then recursively follows all entries in their k-buckets until all peers have been visited. The crawler supports the IPFS, Filecoin, Polkadot, Kusama, Rococo, Westend networks and more. The crawler can store its results as JSON documents or in a postgres database - the --dry-run flag prevents it from doing either. Nebula will print a summary of the crawl at the end instead. A crawl...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    TorBot

    TorBot

    Dark Web OSINT Tool

    ... install.sh Now you can run ./install.sh to create the torBot binary. Run ./torBot to execute the program. Crawl custom domains.(Completed). Check if the link is live.(Completed). Built-in Updater.(Completed). TorBot GUI (In progress). Social Media integration.(not Started).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    ...† and META nofollow tags. Please consider the load your crawl will place on seed sites and set politeness policies accordingly. Also, always identify your crawl with contact information in the User-Agent so sites that may be adversely affected by your crawl can contact you or adapt their server behavior accordingly.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Engage for Amazon Connect, the Pre-built Contact Center Platform Icon
    Engage for Amazon Connect, the Pre-built Contact Center Platform

    Utilizing the power of AWS and Generative AI, Engage provides your customers with highly personalized, exceptional experiences.

    Engage is a pre-built, intelligent contact center platform that transforms customer service.
  • 10
    Image Downloader

    Image Downloader

    Download images from Google, Bing, Baidu

    Crawl and download images using Selenium Using python3 and PyQt5. Supported Search Engine: Google, Bing, Baidu. Keywords input from the keyboard or input from line separated keywords list file for batch process. Download image using a customizable number of threads. Fully supported conditional search (eg. filetype:, site:). Switch for Google safe mode. Proxy configuration (socks, HTTP). CMD and GUI ways of using are provided.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Laravel Sitemap

    Laravel Sitemap

    Create and generate sitemaps with ease

    This package can generate a sitemap without you having to add urls to it manually. This works by crawling your entire site. The generator has the ability to execute JavaScript on each page so links injected into the dom by JavaScript will be crawled as well. The easiest way is to crawl the given domain and generate a sitemap with all found links. The destination of the sitemap should be specified by $path. If you don't want a crawled link to appear in the sitemap, just don't return...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PHPScraper

    PHPScraper

    A universal web-util for PHP

    ..., including interesting attributes. You can filter and combine these to your needs. In some cases there is an option to get a simple or detailed version. PHPScraper can assist in collecting feeds such as RSS feeds, sitemap.xml-entries and static search indexes. This can be useful when deciding on the next page to crawl or building up a list of pages on a website.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    VFS for Git

    VFS for Git

    Virtual file system for Git, enable Git at enterprise scale

    ... the files you have accessed, instead of having to examine every file in the repository. This ensures that operations like status and checkout are as fast as possible. Git struggles to handle enterprise-scale repositories. Operations like cloning will slow to a crawl when you have millions of files in a repository, and even something as simple as getting your repository status will leave you waiting.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Douyin TikTok Download API

    Douyin TikTok Download API

    Douyin TikTok Download API

    ... can deploy or transform this project yourself to achieve more functions, or you can directly call scraper.py in your project or install an existing pip package as a parsing library to easily crawl data, etc. Support input Douyin|TikTokuser homepage to crawl the author [homepage video data (remove watermark link, liked video list (permission must be public), video comment data, background music video list data, etc...).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Gerapy

    Gerapy

    Distributed Crawler Management Framework Based on Scrapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t run...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for URLs...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    ProxyPool

    ProxyPool

    An Efficient ProxyPool with Getter, Tester and Server

    Simple and efficient proxy pool, providing the following functions. Regularly crawl free proxy websites, easy and scalable. Use Redis to store brokers and sort broker availability. Regular testing and screening to eliminate unavailable proxies and leave available proxies. Provides a proxy API to randomly select available proxies that pass the test. The principle analysis of the proxy pool can be seen in " How to Build an Efficient Proxy Pool ". It is recommended to read it before using...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    SiteOne Crawler

    SiteOne Crawler

    SiteOne Crawler is a website analyzer and exporter

    SiteOne Crawler is a very useful and easy-to-use tool you'll ♥ as a Dev/DevOps, website owner or consultant. Works on all popular platforms - Windows, macOS, and Linux (x64 and arm64 too). It will crawl your entire website in depth, analyze and report problems, show useful statistics and reports, generate an offline version of the website, generate sitemaps, or send reports via email. Watch a detailed video with a sample report for Astro. build website. This crawler can be used as a command...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    StringZilla

    StringZilla

    10x faster string search, split, sort, and shuffle for long strings

    ... memory-maps a file from persistent memory without loading its copy into RAM. The contents of that file would remain immutable, and the mapping can be shared by multiple Python processes simultaneously. A standard dataset pre-processing use case would be to map a sizeable textual dataset like Common Crawl into memory, spawn child processes, and split the job between them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Roach

    Roach

    The complete web scraping toolkit for PHP

    Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Imgbot

    Imgbot

    An Azure Function solution to crawl through all of your image files

    Don’t spend another second worrying about compressing your images. Install Imgbot into your GitHub projects, and focus on your application. Install Imgbot from the GitHub marketplace into your projects with the click of a button. Imgbot will send you your first pull request optimizing all of the images that it can find. Imgbot watches for new images in your repository and opens more pull requests. When you’re shipping code and hitting deadlines, it’s easy to forget about optimizing your...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media and open directory downloading. It's a programmable downloader and also works with password protected sites. Say goodbye to downloading one...
    Leader badge
    Downloads: 108 This Week
    Last Update:
    See Project
  • 24
    Snap Lens Web Crawler

    Snap Lens Web Crawler

    Crawl and download Snap Lenses from lens.snapchat.com with ease.

    Crawl and download Snap Lenses from lens.snapchat.com with ease. This crawler is a dependency of Snap Camera Server https://snap-camera-server.sourceforge.io
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PGMania

    PGMania

    Astrophotography image & video processor. Full processing cycle

    ... the pair of lights is pair of lights. Computer processing of images in the style - this pixel we paint over, another we smear, and this we smudge is not performed. Everything happens thanks to mathematics and the laws of physics, as in nature. For this reason, PGMania is environmentally friendly - it doesn't litter your photos with things that were never been there. A processor with a magnifier will not crawl through your pictures blurring hot pixels, hiding satellites and cosmic rays.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next