Showing 1069 open source projects for "web crawler source code"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Decap

    Decap

    A Git-based CMS for Static Site Generators

    Open source content management for your Git workflow. Use Decap CMS with any static site generator for a faster and more flexible web project. Get the speed, security, and scalability of a static site, while still providing a convenient editing interface for content. Content is stored in your Git repository alongside your code for easier versioning, multi-channel publishing, and the option to handle content updates directly in Git.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Mink

    Mink

    PHP web browser emulator abstraction

    Mink is an open-source PHP library that provides a browser abstraction layer for web application testing, allowing developers to simulate user interactions with websites in a consistent and flexible way. Instead of tying test logic to a specific browser driver, Mink introduces a unified API that can work with multiple drivers such as Goutte, Selenium, ChromeDriver, or BrowserKit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ciao

    ciao

    HTTP checks & tests (private & public) monitoring

    HTTP checks & tests (private & public) monitoring - check the status of your URL. ciao checks HTTP(S) URL endpoints for a HTTP status code (or errors on the lower TCP stack) and sends a notification on status change via E-Mail or Webhooks. It uses Cron syntax to schedule the checks and comes along with a Web UI and a RESTful JSON API. Create an open-source web application for checking URL statuses with a UI and a REST API which is easy to install and maintain (no external dependencies like Databases, Caches, etc.) in public and private environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Return YouTube Dislike

    Return YouTube Dislike

    Chrome extension to return youtube dislikes

    Return YouTube Dislike is an open-source extension that returns the YouTube dislike count. Available for Chrome and Firefox as a Web Extension. Also available for other browsers as JS Userscript. Additionally, the dislike field in the YouTube API was removed on December 13th, 2021, removing any ability to judge the quality of content before watching. With the removal of dislike stats from the YouTube API, our backend switched to using a combination of scraped dislike stats, and estimates extrapolated from extension user data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    CapRover

    CapRover

    Scalable PaaS (automated Docker+nginx), aka Heroku on Steroids

    CapRover is an extremely easy-to-use app/database deployment & web server manager for your NodeJS, Python, PHP, ASP.NET, Ruby, MySQL, MongoDB, Postgres, WordPress (and etc...) applications! It's blazingly fast and very robust as it uses Docker, Nginx, LetsEncrypt and NetData under the hood behind its simple-to-use interface. For a developer who does not like spending hours and days setting up a server, building tools, sending code to the server, building it, getting an SSL certificate,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Middy

    Middy

    The stylish Node.js middleware engine for AWS Lambda

    Middy is a very simple middleware engine that allows you to simplify your AWS Lambda code when using Node.js. If you have used web frameworks like Express, then you will be familiar with the concepts adopted in Middy and you will be able to get started very quickly. A middleware engine allows you to focus on the strict business logic of your Lambda and then attach additional common elements like authentication, authorization, validation, serialization, etc. in a modular and reusable way by...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    rebroswer-patches

    rebroswer-patches

    Patches for Puppeteer and Playwright to reduce automation detection

    rebrowser-patches is an open source collection of patches designed to improve the stealth capabilities of browser automation frameworks. It focuses primarily on enhancing Puppeteer and Playwright by modifying parts of their source code that may reveal automation activity to websites. Many modern websites rely on bot detection mechanisms that identify automation through behavioral or technical signals, and these patches aim to reduce those detection vectors. By applying targeted fixes, the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    NukeViet

    NukeViet

    NukeViet CMS is multi Content Management System

    NukeViet is the first opensource CMS in Vietnam. The lastest version - NukeViet 4 coding ground up supports lastest web technologies, including responsive web design (use HTML 5, CSS 3, Composer, XTemplate), jQuery, Ajax...) enabling you to build websites and online applications rapidly. With it own core libraries built in, NukeViet 4 is cross platforms and frameworks independent. By basic knowledge of PHP and MySQL, you can easily extend NukeViet for your purposes. NukeViet core is simply...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    SPX

    SPX

    A simple & straight-to-the-point PHP profiling extension

    SPX, which stands for Simple Profiling eXtension, is just another profiling extension for PHP. It differentiates itself from other similar extensions as being totally free and confined to your infrastructure (i.e. no data leaks to a SaaS). Very simple to use: just set an environment variable (command line) or switch on a radio button (web request) to profile your script. Thus, you are free of manually instrumenting your code (Ctrl-C a long running command line script is even supported)....
    Downloads: 3 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    Jimp

    Jimp

    An image processing library written entirely in JavaScript for Node

    An image processing library for Node written entirely in JavaScript, with zero native dependencies. If you're using this library with TypeScript the method of importing slightly differs from JavaScript. Instead of using require, you must import it with ES6 default import scheme. If you're using a web bundles (webpack, rollup, parcel) you can benefit from using the module build of jimp. Using the module build will allow your bundler to understand your code better and exclude things you aren't...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Akka HTTP

    Akka HTTP

    The Streaming-first HTTP server/module of Akka

    The Akka HTTP modules implement a full server- and client-side HTTP stack on top of akka-actor and akka-stream. It’s not a web framework but rather a more general toolkit for providing and consuming HTTP-based services. While interaction with a browser is of course also in scope it is not the primary focus of Akka HTTP. Akka HTTP follows a rather open design and many times offers several different API levels for “doing the same thing”. You get to pick the API level of abstraction that is...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    HTTP Kit

    HTTP Kit

    Clojure HTTP server/client library with WebSocket support

    http-kit is a minimalist, event-driven, high-performance Clojure HTTP server/client library with WebSocket and asynchronous support. A simple, high-performance event-driven HTTP client+server for Clojure. HTTP Kit is an (almost) drop-in replacement for the standard Ring Jetty adapter. So you can use it with all your current libraries (e.g. Compojure) and middleware. Using an event-driven architecture like Nginx, HTTP-kit is very, very fast. It comfortably handles tens of thousands of...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    PHPScraper

    PHPScraper

    A universal web-util for PHP

    PHPScraper is a universal web-scraping util for PHP, built with simplicity in mind. The goal is to make xPath Selectors optional and avoid the commonly needed boilerplate code. Just create an instance of PHPScraper, go to a website, and start collecting data. All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways. Many common use cases are covered already. You can find prepared extractors for various HTML...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    AWS Lambda NodeJS Runtime Interface

    AWS Lambda NodeJS Runtime Interface

    Extend your preferred base images to be Lambda compatible

    We have open-sourced a set of software packages, Runtime Interface Clients (RIC), that implement the Lambda Runtime API, allowing you to seamlessly extend your preferred base images to be Lambda compatible. The Lambda Runtime Interface Client is a lightweight interface that allows your runtime to receive requests from and send requests to the Lambda service. The Lambda NodeJS Runtime Interface Client is vended through npm. You can include this package in your preferred base image to make...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Russh

    Russh

    Rust SSH client & server library

    Russh provides a Rust library for implementing SSH clients and servers with a modern, async-friendly design. It exposes building blocks for authentication, channel management, port forwarding, and key handling, allowing you to embed SSH functionality directly into Rust applications. The API is designed to be explicit and composable, making it possible to implement custom behaviors like reverse tunnels, interactive shells, and service multiplexing. Because performance and safety are central,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    ahCrawler

    A PHP search engine for your website and web analytics tool. GNU GPL3

    ahCrawler is a set to implement your own search on your website and an analyzer for your web content. It can be used on a shared hosting. It consists of * crawler (spider) and indexer * search for your website(s) * search statistics * website analyzer (http header, short titles and keywords, linkchecker, ...) You need to install it on your own server. So all crawled data stay in your environment. You never know when an external webspider updated your content. Trigger a rescan...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    python-fxxk-spider

    python-fxxk-spider

    Collection of 100+ Python web scraping projects and crawler examples

    python-fxxk-spider is a curated collection of Python web scraping and crawler projects gathered in a single repository for reference and learning. It aggregates many independent scraping examples that target a wide range of websites, online services, and public data sources. Instead of being a single crawler tool, it functions as a catalog of ready-made Python spider implementations that demonstrate different scraping techniques. python-fxxk-spider includes scrapers for social media,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Lobo Evolution - Java Web Browser

    Lobo Evolution - Java Web Browser

    Lobo Evolution is an extensible all-Java web browser and RIA platform

    ...I'm waiting your first commit! Source code: https://github.com/LoboEvolution/LoboEvolution
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    MisterHouse:   Home Automation with Perl
    MisterHouse is a Windows/Unix home automation program written in Perl. It can respond to voice commands, web browsers, time of day, serial port and X10 data, external files, etc and can speak via Text to Speech engines. Support is on https://sourceforge.net/p/misterhouse/mailman/misterhouse-users/ and code is maintained on https://github.com/hollie/misterhouse
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    AeroFTP

    AeroFTP

    AeroFTP is a Cross-platform desktop client for FTP, SFTP, WebDAV, S3

    AeroFTP is a cross-platform file transfer client that goes beyond traditional FTP. Connect to 25+ protocols, FTP/FTPS, SFTP, WebDAV, S3, Google Drive, Dropbox, OneDrive, MEGA, Box, pCloud, Azure, Filen, and more from a single interface. Security-first: AeroVault v2 encrypted containers (AES-256-GCM-SIV), Cryptomator support, and zero telemetry. Built-in AeroAgent AI assistant with 19 providers and 47 tools for file operations and workflow automation. Includes Monaco editor,...
    Downloads: 455 This Week
    Last Update:
    See Project
  • 22
    Brill Software

    Brill Software

    A faster way to develop React Web Applications

    The Brill Framework allows React web applications to be built quickly using a "Low Code" approach. A Content Management System (CMS) supports editing of pages containing React components. The React components communicate with each other and the Server using a middleware that's based on WebSockets. With a "No Code" solution, there's always something you require that's not support. You spend ages bending the product to your requirements or pay the supplier to provide the components...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    BDE QR Generator

    BDE QR Generator

    Offline QR code generator for BDE

    ...Crucially, this does not use any web api calls. It can run offline. I'm not a coder so this isn't error handled anywhere near as well as it should. Source is in the files section. Built on QRCoder by Raffael Herrmann, released under MIT license. https://github.com/codebude/QRCode
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    TeemIp - IPAM and DDI solution

    TeemIp - IPAM and DDI solution

    IP Address Management - CMDB - Ticketing - DNS and Zone Management

    ...Project source code is located on https://github.com/TeemIP
    Leader badge
    Downloads: 220 This Week
    Last Update:
    See Project
  • 25
    NaviServer

    NaviServer

    NaviServer, a high performance web server written in C and Tcl

    NaviServer is an extensible web server suited to create scalable websites and services. Originally based on AOLserver (http://www.aolserver.com), the ongoing development is done independently under Mozilla Public License by a core group of people that use it for their businesses and by other supporters. Features: High performance multi-threaded architecture, massively scalable and extensible, many modules, dynamic scripted pages (ADP), caching functions (static files, Tcl byte code, chunks), pooled database connections, thread shared arrays, introspection commands, mass virtual hosting (no server restart), watchdog, control port and command mode, efficient handling of down-/uploads with async I/O, IPv4/IPv6 Core developers: Vlad Seryakov, Stephen Deasey, Zoran Vasiljevic, Gustaf Neumann Source: https://github.com/naviserver-project/naviserver Info: https://wiki.tcl-lang.org/page/NaviServer Documentation: https://naviserver.sourceforge.io/n/toc.html
    Leader badge
    Downloads: 26 This Week
    Last Update:
    See Project