Showing 1069 open source projects for "web crawler source code"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    FlareSolverr

    FlareSolverr

    Proxy server to bypass Cloudflare protection

    FlareSolverr is a proxy server to bypass Cloudflare and DDoS-GUARD protection. FlareSolverr starts a proxy server, and it waits for user requests in an idle state using few resources. When some request arrives, it uses puppeteer with the stealth plugin to create a headless browser (Firefox). It opens the URL with user parameters and waits until the Cloudflare challenge is solved (or timeout). The HTML code and the cookies are sent back to the user, and those cookies can be used to bypass...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 2
    LinkChecker

    LinkChecker

    Check links in web documents or full websites

    LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub Packages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Zenario

    Zenario

    Zenario is a web-based content management system (CMS)

    Zenario is a web-based content management system (CMS). It can be used for simple sites, with many "wysiwyg" features for making regular web pages, news items, blogs, and so on. It has powerful features for running extranet sites, such as customer portals, and online databases (e.g. of products, documents or videos). It also has multilingual features built in from the core, so that a site can easily be set up to deliver content in in multiple languages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    GoatCounter

    GoatCounter

    Easy web analytics. No tracking of personal data

    GoatCounter is an open-source web analytics platform available as a hosted service (free for non-commercial use) or self-hosted app. It aims to offer easy-to-use and meaningful privacy-friendly web analytics as an alternative to Google Analytics or Matomo. Privacy-aware; doesn’t track users with unique identifiers and doesn't need a GDPR notice. Fine-grained control over which data is collected. Also see the privacy policy and GDPR consent notices. Lightweight and fast; adds just ~3.5KB of...
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    Mini QR

    Mini QR

    Create & scan cute qr codes easily

    Mini QR is a web app focused on making QR codes feel friendly and design-forward, combining a polished QR generator with a built-in scanner so you can both create and decode codes in the same place. It emphasizes customization so the QR you generate can match a brand, event theme, or personal style, including color and styling controls, framed layouts with labels, and the ability to add a logo image. Because QR reliability matters as much as looks, it exposes practical settings like error...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 6
    crawlee

    crawlee

    A web scraping and browser automation library for Node.js

    Crawlee is a web scraping and browser automation library. It helps you build reliable crawlers. Fast. Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers. When you later find a great API to speed up your crawls, flip the switch back. It keeps your proxies healthy by rotating them smartly with good fingerprints that...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    wombat

    wombat

    Lightweight Ruby DSL for scraping structured data from web pages

    Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    Matomo

    Matomo

    Alternative to Google Analytics that gives you full control over data

    Google Analytics alternative that protects your data and your customers' privacy. Take back control with Matomo – a powerful web analytics platform that gives you 100% data ownership. You could lose your customers’ trust and risk damaging your reputation if people learn their data is used for Google’s “own purposes”. By choosing the ethical alternative, Matomo, you won’t make privacy sacrifices or compromise your site. You can even use Matomo without needing to ask for consent. With 100%...
    Downloads: 4 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    Dev Browser

    Dev Browser

    A Claude Skill to give your agent the ability to use a web browser

    Dev Browser is a browser automation skill/plugin that enables an AI agent to control a real browser for verification and testing during development. Its purpose is to close the gap between “code was written” and “the UI actually works,” by letting the agent navigate, interact with pages, and validate behavior in a live environment. A key idea is persistence: the browser can keep pages open so the agent can navigate once and then perform multiple interactions across scripts without losing...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 11
    Guzzle

    Guzzle

    An extensible PHP HTTP client

    Guzzle is a PHP HTTP client that makes it easy to send HTTP requests and trivial to integrate with web services. Simple interface for building query strings, POST requests, streaming large uploads, streaming large downloads, using HTTP cookies, uploading JSON data, etc... Can send both synchronous and asynchronous requests using the same interface. Uses PSR-7 interfaces for requests, responses, and streams. This allows you to utilize other PSR-7 compatible libraries with Guzzle. Abstracts...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    MemFree

    MemFree

    Hybrid AI Search Engine & AI Page Generator

    memfree is an open source hybrid AI search engine and page generation platform designed to help users retrieve information from both personal knowledge bases and the public web through a unified interface. The project combines retrieval-augmented search with AI summarization to deliver concise answers instead of forcing users to manually sift through multiple sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Serverless Express by Vendia

    Serverless Express by Vendia

    Run Node.js web applications and APIs using existing frameworks

    Run REST APIs and other web applications using your existing Node.js application framework (Express, Koa, Hapi, Sails, etc.), on top of AWS Lambda and Amazon API Gateway. Vendia is the real-time data cloud for rapidly building applications that securely share data across departments, companies, clouds, and regions. We’re excited to announce the latest release of Vendia Share! This release includes new features like smart contracts, user-level transactions, the beta release of Azure support,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    qrcp

    qrcp

    Transfer files over wifi from your computer to your mobile device

    qrcp binds a web server to the address of your Wi-Fi network interface on a random port and creates a handler for it. The default handler serves the content and exits the program when the transfer is complete. When used to receive files, qrcp serves an upload page and handles the transfer. Most QR apps can detect URLs in decoded text and act accordingly (i.e. open the decoded URL with the default browser), so when the QR code is scanned the content will begin downloading by the mobile...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    BrowserOS

    BrowserOS

    Agentic browser; privacy-first alternative to ChatGPT Atlas

    BrowserOS is an open-source, agentic web browser built on a Chromium base that integrates AI agents directly into the browsing experience. Rather than just doing standard browsing, it places AI intelligence at the core: you can connect your own API keys (for e.g., OpenAI, Anthropic, Google Gemini) or run local models (via e.g., Ollama) so that your browsing data and automation stay on your machine — privacy and control are emphasized throughout. The interface remains familiar to users of...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 16
    Eclipse Che

    Eclipse Che

    Next-gen container development platform, workspace server & cloud IDE

    Eclipse Che is a Kubernetes-native IDE that makes Kubernetes development accessible for development teams. It places everything a developer could need into containers in Kube pods including dependencies, embedded containerized runtimes, a web IDE, and project code. With the Kubernetes application in your development environment and an in-browser IDE, you can code, build, test and run applications exactly as they run on production from any machine.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    HTTP Client

    HTTP Client

    Async HTTP/1.1+2 client for PHP based on Amp

    This package provides an asynchronous HTTP client for PHP based on Amp. Its API simplifies standards-compliant HTTP resource traversal and RESTful web service consumption without obscuring the underlying protocol. The library manually implements HTTP over TCP sockets; as such it has no dependency on ext/curl. Streams entity bodies for memory management with large transfers. Supports all standard and custom HTTP method verbs. Simplifies HTTP form submissions. Implements secure-by-default TLS....
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    Netlify CMS

    Netlify CMS

    A Git-based CMS for static site generators

    Open source content management for your Git workflow. Use Netlify CMS with any static site generator for a faster and more flexible web project. Get the speed, security, and scalability of a static site, while still providing a convenient editing interface for content. Content is stored in your Git repository alongside your code for easier versioning, multi-channel publishing, and the option to handle content updates directly in Git.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 19
    ffsend

    ffsend

    Easily and securely share files from the command line

    Easily and securely share files and directories from the command line through a safe, private and encrypted link using a single simple command. Files are shared using the Send service and may be up to 1GB. Others are able to download these files with this tool, or through their web browser. All files are always encrypted on the client, and secrets are never shared with the remote host. An optional password may be specified, and a default file lifetime of 1 (up to 20) download or 24 hours is...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Monolog

    Monolog

    Sends logs to files, sockets, inboxes, databases and web services

    ...As of 1.11.0 Monolog public APIs will also accept PSR-3 log levels. Internally Monolog still uses its own level scheme since it predates PSR-3. Tidelift delivers commercial support and maintenance for the open source dependencies you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact dependencies you use. Monolog 1.x support is somewhat limited at this point and only important fixes will be done. You should migrate to Monolog 2 where possible to benefit from all the features.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    GhostText

    GhostText

    Use your text editor to write in your browser

    Whenever you’re writing more than a little snippet of code anywhere on the web, activate GhostText to open your preferred text editor and enjoy your own development environment. GhostText is a browser extension that connects to your editor via its own extension. Install both extensions and, if necessary, start the GhostText server in the editor’s extension. Most editor extensions are authored by third parties. You can create more extensions for your favorite editor! Refer to the protocol...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Mezzanine

    Mezzanine

    CMS framework for Django

    Mezzanine is a powerful open source content management platform built using the Django framework. In many ways it is like many other content management tools, offering an intuitive interface for managing all of your content. But Mezzanine is different in that it provides most of its functionality by default. While other platforms rely heavily on modules or reusable applications, Mezzanine comes ready with all the functionality you need, making it the more efficient choice. Mezzanine has a...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    newspaper4k

    newspaper4k

    Python library for scraping and analyzing online news articles easily

    Newspaper4k is a Python library designed for extracting, processing, and analyzing news articles from websites. It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Routr

    Routr

    The future of programmable SIP servers

    Lightweight sip proxy, location server, and registrar that provides a reliable and scalable SIP infrastructure for telephony carriers, communication service providers, and integrators. Routr provides all the tools required to deploy your VoIP network, including a command-line for remote server control. It can also be controlled via a RESTful API or a web-based GUI. Routr includes all tools for deploying your VoIP network. It offers remote server control via command-line, RESTful API, or a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PHPSocket.IO

    PHPSocket.IO

    A server side alternative implementation of socket.io

    phpSocket.io is a PHP implementation of the popular Socket.IO real-time communication protocol. It enables real-time, bidirectional communication between web clients and servers using WebSockets, with a syntax and structure similar to the original Node.js version. Built on top of Workerman, phpSocket.io is capable of handling thousands of concurrent connections and is ideal for building chat apps, live notifications, and collaborative tools in PHP.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB