Showing 30 open source projects for "data scraper website"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    Firecrawl

    Firecrawl

    Turn entire websites into LLM-ready markdown or structured data

    Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 4
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    Webstudio

    Webstudio

    Open source website builder and Webflow alternative

    Webstudio is an open source visual development platform that enables developers, designers, and cross-functional teams to build modern websites through a powerful visual builder while maintaining full ownership of their data and infrastructure. The project positions itself as a Webflow alternative but emphasizes openness, portability, and deep control over the generated frontend code. It connects to any headless CMS and exposes the full power of CSS within a visual interface, allowing users...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 6
    Logseq

    Logseq

    A privacy-first, open-source platform for knowledge management

    ...Logseq is a platform for knowledge management and collaboration. It focuses on privacy, longevity, and user control. The server will never store or analyze your private notes. Your data are plain text files and we currently support both Markdown and Emacs Org-mode (more to be added soon). In the unlikely event that the website is down or cannot be maintained, your data is, and will always be yours. No data lock-in, no proprietary formats, you can edit the same Markdown/Org-mode file with any tools at the same time. ...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 7
    Astro

    Astro

    The web framework for content-driven websites

    Astro powers the world's fastest marketing sites, blogs, e-commerce websites, and more. Astro improves website performance by rendering components on the server, sending lightweight HTML to the browser with zero unnecessary JavaScript overhead. Astro was designed to work with your content, no matter where it lives. Load data from your file system, external API, or your favorite CMS. Extend Astro with your favorite tools. Bring your own JavaScript UI components, CSS libraries, themes, integrations, and more. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 8
    Coral

    Coral

    A better commenting experience from Vox Media

    ...Coral increases loyalty and engagement on your website, giving you complete control of the interactions and data, without inserting any ads or trackers on your page. And it works great on mobile. Developers love working with Coral. You can easily connect it to your existing registration system, fully customize the look and feel, and extend the platform with our GraphQL API and by adding features to the open source codebase.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    Checkmate

    Checkmate

    Checkmate is an open-source, self-hosted tool

    Checkmate is an open-source, self-hosted infrastructure monitoring platform that provides real-time visibility into server health, uptime, response times, and incident activity through a modern web interface. The application continuously checks whether websites and services are accessible and performing optimally, generating alerts and reports when availability or performance degrades. It supports detailed infrastructure monitoring through an optional agent called Capture, which collects...
    Downloads: 4 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    JupyterLite

    JupyterLite

    Wasm powered Jupyter running in the browser

    ...Built using JupyterLab components and powered by WebAssembly technologies, it allows users to run Python and other language kernels directly in the browser through tools like Pyodide or Xeus. This architecture eliminates the need for installation or server infrastructure, making it highly accessible for education, demonstrations, and lightweight data science workflows. JupyterLite supports many core Jupyter features, including notebooks, code consoles, and interactive visualizations, while storing files locally using browser storage mechanisms such as IndexedDB. It is designed to be easily deployable as a static website, enabling developers to host fully functional notebook environments on platforms like GitHub Pages.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    Surmon.me

    Surmon.me

    Personal website and blog

    Surmon.me is a full-featured personal website and blog platform built with Vue and designed as part of a larger ecosystem of interconnected applications and services. The project functions as a server-side rendered (SSR) web application that delivers content dynamically while maintaining performance and SEO optimization. It is powered by a dedicated backend service called NodePress, which provides RESTful APIs for content management, data retrieval, and system operations. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    Suna

    Suna

    Suna - Open Source Generalist AI Agent

    ...Designed to assist users in accomplishing real-world tasks through natural conversation, Suna combines powerful capabilities with an intuitive interface. It serves as a digital companion for research, data analysis, and everyday challenges, integrating tools like browser automation, file management, web crawling, command-line execution, website deployment, and API integration. Suna's architecture comprises a FastAPI-based backend, a Next.js/React frontend, an agent Docker environment, and a Supabase database for state management. This modular design allows for seamless interaction and task execution through simple conversations. ​
    Downloads: 11 This Week
    Last Update:
    See Project
  • 13
    pwa-asset-generator

    pwa-asset-generator

    Automates PWA asset generation and image declaration

    Automates PWA asset generation and image declaration. Automatically generates icon and splash screen images, favicons and mstile images. Updates manifest.json and index.html files with the generated images according to Web App Manifest specs and Apple Human Interface guidelines. When you build a PWA with a goal of providing native-like experiences on multiple platforms and stores, you need to meet with the criteria of those platforms and stores with your PWA assets; icon sizes and splash...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    react-map-gl

    react-map-gl

    React friendly API wrapper around MapboxGL JS

    react-map-gl is a suite of React components designed to provide a React API for Mapbox GL JS-compatible libraries. More information is in the online documentation. Starting with v2.0, mapbox-gl requires a Mapbox token for any usage, with or without the Mapbox data service. See about Mapbox tokens for your options. To show maps from a service such as Mapbox you will need to register on their website in order to retrieve an access token required by the map component, which will be used to identify you and start serving up map tiles. The service will be free until a certain level of traffic is exceeded. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    Payload

    Payload

    Free and Open-source Headless CMS and Application Framework

    Payload has, hands-down, the best developer experience out of any headless CMS. Build whatever you need, however you want, and never hit a functionality roadblock. Payload is the go-to headless CMS for websites, SaaS apps, native apps, and anything else you need to build. Power any website, from enterprise to personal portfolio with Payload as a headless CMS. Its powerful version system and layout-building functionality unlocks the best CMS experience for your editors on the market. Payload...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 16
    GPT Crawler

    GPT Crawler

    Crawl a site to generate knowledge files to create your own custom GPT

    GPT Crawler is an open-source tool designed to automatically crawl websites and generate structured knowledge that can be used to build AI assistants and retrieval systems. It focuses on extracting high-quality textual content from web pages and preparing it in formats suitable for embedding, indexing, or fine-tuning workflows. The project is especially useful for teams that want to turn documentation sites or knowledge bases into conversational AI backends without building custom scrapers...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Rocket.Chat Mobile

    Rocket.Chat Mobile

    Rocket.Chat mobile clients

    ...Engage in contextual interactions with customers irrespective of how they contact you. Ensure long-term relationships and improved business outcomes. Create custom messaging experiences within your app or website by integrating and white labeling Rocket.Chat components and enterprise features. Extend and customize your workspace with custom apps, open APIs, powerful plugins and webhooks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    barba.js

    barba.js

    Create badass, fluid and smooth transitions between website’s pages

    Barba.js — aka Barba — is a small (7kb minified and compressed) and easy-to-use library that helps you create fluid and smooth transitions between your website's pages. It makes your website run like a SPA (Single Page Application) and help reduce the delay between your pages, minimize browser HTTP requests and enhance your user's web experience.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    SurveyJS

    SurveyJS

    JavaScript Survey and Form Library

    SurveyJS Form Library is distributed as npm packages and as scripts and style sheets that you can reference on your page. You can use it in any React, Angular, Vue, Knockout, or jQuery application. React, Angular, Knockout, and Vue3 are supported natively. To communicate with the server, the libraries use JSON objects that represent form schemas (content and layout of a form) and form results (answers). You have the option to build dynamic JSON-driven forms using our free full-featured...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 20
    Vue Styleguidist

    Vue Styleguidist

    Created from react styleguidist for Vue Components with a guide

    ...Focus on one component at a time, see all its variants and work faster with hot reload. Share components with your team, including designers and developers. See how components behave to different props and data right in the browser. vue-styleguidist takes the results of vue-docgen-api and creates a website to showcase and develop components. vue-docgen-api parses vue components and load their documentation in a JavaScript object. vue-inbrowser-compiler takes vue components code written in es6 and uses buble to make it compatible with all browser. vue-docgen-cli is a command line interface generating documentation files automatically from vue-docgen-api. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Reader LLM

    Reader LLM

    Convert any URL to an LLM-friendly input with a simple prefix

    Reader LLM is an open-source tool designed to convert web content into formats that are easier for large language models to process. The system works by transforming a webpage into a clean text or Markdown representation that removes unnecessary formatting and highlights the core information within the page. Developers can use a simple URL prefix to retrieve a version of a webpage that has been optimized for machine consumption, making it suitable for use in AI agents or retrieval-augmented...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    resumake.io

    resumake.io

    A website for automatically generating elegant LaTeX resumes

    An open‑source web application (built with Node.js, Koa, React/Redux) that lets users create elegant LaTeX resumes via a graphical interface—no manual LaTeX coding required. Templates are selectable, inputs are interactive, and PDF outputs are generated on‑the‑fly without storing user data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    NSFW Filter

    NSFW Filter

    Google Chrome extension that blocks NSFW images

    A Google Chrome extension that blocks NSFW images from the web pages that you load using TensorFlow JS. NSFW Filter web extension blocks NSFW content using AI. NSFW Filter allows you to block inappropriate, Not-Safe-For-Work content, protecting you online. A browser extension that blocks NSFW images from the web pages that you load using TensorFlowJS. When a web page is loaded, all the images remain hidden until they are found to be NSFW or not. If they are found to be NSFW, they remain...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    LoopGate

    LoopGate

    A Proof-of-Concept to token-gate content using Loopring L2 NFTs

    LoopGate is a web app that allows creators to token-gate content based on Loopring Layer-2 NFTs. It uses the Loopring API alongside the Piñata Submarine API to unlock hidden content hosted on IPFS. LoopGate is a web application built using in TypeScript using NextJS and TailwindCSS. LoopGate implements external SDKs/APIs. Most importantly, Loopring API, Query the Loopring blockchain to get NFT ownership data. Piñata API, Query and unlock submarined content on Piñata. ConnectKit, Provides a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Supercookie

    Supercookie

    Browser fingerprinting via favicon!

    Supercookie uses favicons to assign a unique identifier to website visitors. Unlike traditional tracking methods, this ID can be stored almost persistently and cannot be easily cleared by the user. The tracking method works even in the browser's incognito mode and is not cleared by flushing the cache, closing the browser or restarting the operating system, using a VPN or installing AdBlockers. The demo of "supercookie", as well as the publication of the source code of this repository, is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB