Showing 24 open source projects for "crawling"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Dun and Bradstreet Risk Analytics - Supplier Intelligence Icon
    Dun and Bradstreet Risk Analytics - Supplier Intelligence

    Use an AI-powered solution for supply and compliance teams who want to mitigate costly supplier risks intelligently.

    Risk, procurement, and compliance teams across the globe are under pressure to deal with geopolitical and business risks. Third-party risk exposure is impacted by rapidly scaling complexity in domestic and cross-border businesses, along with complicated and diverse regulations. It is extremely important for companies to proactively manage their third-party relationships. An AI-powered solution to mitigate and monitor counterparty risks on a continuous basis, this cutting-edge platform is powered by D&B’s Data Cloud with 520M+ Global Business Records and 2B+ yearly updates for third-party risk insights. With high-risk procurement alerts and multibillion match points, D&B Risk Analytics leverages best-in-class risk data to help drive informed decisions. Perform quick and comprehensive screening, using intelligent workflows. Receive ongoing alerts of key business indicators and disruptions.
    Learn More
  • 1
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 4
    Douyin TikTok Download API

    Douyin TikTok Download API

    Douyin TikTok Download API

    ...Fast, asynchronous, free, open source, ad-free, long-term maintenance. This project is based on PyWebIO , FastAPI , HTTPX , a fast and asynchronous Douyin / TikTok data crawling tool, and realizes online batch parsing and downloading of watermark-free videos or atlases through the web, data crawling API, and iOS shortcut instructions for watermark-free download and other functions. You can deploy or transform this project yourself to achieve more functions, or you can directly call scraper.py in your project or install an existing pip package as a parsing library to easily crawl data, etc. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • 5
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Laravel Sitemap

    Laravel Sitemap

    Create and generate sitemaps with ease

    This package can generate a sitemap without you having to add urls to it manually. This works by crawling your entire site. The generator has the ability to execute JavaScript on each page so links injected into the dom by JavaScript will be crawled as well. The easiest way is to crawl the given domain and generate a sitemap with all found links. The destination of the sitemap should be specified by $path. If you don't want a crawled link to appear in the sitemap, just don't return it in the callable you pass to hasCrawled. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    SiteOne Crawler (desktop app)

    SiteOne Crawler (desktop app)

    A free, feature-rich web analyzer and exporter/cloner you will love!

    A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).
    Downloads: 16 This Week
    Last Update:
    See Project
  • 8
    Gerapy

    Gerapy

    Distributed Crawler Management Framework Based on Scrapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t run the crawler on our own machine, a good one. The method is to deploy Scrapy to a remote server for execution. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Reach Your Audience with Rise Vision, the #1 Cloud Digital Signage Software Solution Icon
    Reach Your Audience with Rise Vision, the #1 Cloud Digital Signage Software Solution

    K-12 Schools, Higher Education, Businesses, Restaurants

    Rise Vision is the #1 digital signage company, offering easy-to-use cloud digital signage software compatible with any player across multiple screens. Forget about static displays. Save time and boost sales with 500+ customizable content templates for your screens. If you ever need help, get free training and exceptionally fast support.
    Learn More
  • 10
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    GoSpider

    GoSpider

    Gospider - Fast web spider written in Go

    GoSpider - Fast web spider written in Go. Fast web crawling. Brute force and parse sitemap.xml. Parse robots.txt. Generate and verify link from JavaScript files. Link Finder. Find AWS-S3 from response source. Find subdomains from the response source. Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault. Format output easy to Grep. Support Burp input. Crawl multiple sites in parallel.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    django-dynamic-scraper

    django-dynamic-scraper

    Creating Scrapy scrapers via the Django admin interface

    Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface. With Django Dynamic Scraper (DDS) you can define your Scrapy scrapers dynamically via the Django admin interface and save your scraped items in the database you defined for your Django project. Since it simplifies things DDS is not usable for all kinds of scrapers, but...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DHT

    DHT

    BitTorrent DHT Protocol && DHT Spider.

    DHT implements the bittorrent DHT protocol in Go. It contains two modes, the standard mode, and the crawling mode. The standard mode follows the BEPs, and you can use it as a standard dht server. The crawling mode aims to crawl as much metadata info as possible. It doesn't follow the standard BEPs protocol. With the crawling mode, you can build another BTDigg. The default crawl mode configuration costs about 300M RAM. Set MaxNodes and BlackListMaxSize to fit yourself. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Nightmare

    Nightmare

    A high-level browser automation library

    ...The goal is to expose a few simple methods that mimic user actions (like goto, type and click), with an API that feels synchronous for each block of scripting, rather than deeply nested callbacks. It was originally designed for automating tasks across sites that don't have APIs, but is most often used for UI testing and crawling. Segment started with an open source project. Since then, we’ve open sourced hundreds of our repos. We want to continue supporting the community by publishing our code and other developers’ awesome open source projects. We use these open source projects as the foundations of our infrastructure to handle billions of API calls per day and to allow us to rapidly build and test code on the client.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    druid4arduino

    druid4arduino

    An automatic, configuration-less, GUI for Arduino projects.

    Druid4Arduino provides a simple GUI (graphical user interface) to interact with any SerialUI-based Arduino project. It works its magic by crawling the menu hierarchy (commands and sub-menus) provided by SerialUI and automatically re-configuring it’s user interface to match whatever options you’ve provided. It will connect to you’re arduino project through the USB serial port and display a reflection of all the commands and sub-menus defined in your program/sketch. It will also request and transmit any required input or error messages.
    Leader badge
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    Zoozle Search & Download Suchmaschine

    Zoozle Search & Download Suchmaschine

    Zoozle 2008 - 2010 Webpage, Tools and SQL Files

    Download search engine and directory with Rapidshare and Torrent - zoozle Download Suchmaschine All The files that run the World Leading German Download Search Engine in 2010 with 500 000 unique visitors a day - all the tools you need to set up a clone. Code Contains: - PHP Files for zoozle - Perl Crawler for gathering new content to database and all other cool tools i have...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Subversion Indexer Widget for GDS
    svnindexer gives the ability of crawling into a Subversion repository to Google Desktop Search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    A toolkit for crawling information from web pages by combining different kinds of "actions". Actions are simple operations such as navigation to a specified url or extraction of text from the html. Also available is a graphic user interface.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    JCrawler is a perfect cralwing/load-testing tool which is cookie-enabled and follows human crawling pattern (hit/second).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LCGML (Legacy Crawling Game Markup Language) makes it possible do script old fashioned text adventures. It is an XML-driven Document format. For the stable product, an perl interpreter is also planned
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Luanium

    A Lua-based crawling scripting language and leveraging selenium

    ...The trick here is to add the crawling commands into the Lua interpreter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Blackfire Player

    Blackfire Player

    Web Crawling, Web Testing, and Web Scraping application

    Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses. Some Blackfire Player use cases: Crawl a website/API and check expectations -- aka Acceptance Tests; Scrape a website/API and extract values; Monitor a website; Test code with unit test integration (PHPUnit, Behat, Codeception, ...); Test code behavior from the outside thanks to the native Blackfire Profiler integration -- aka Unit Tests from the HTTP layer (tm). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next