Browse free open source JavaScript Web Scrapers and projects below. Use the toggles on the left to filter open source JavaScript Web Scrapers by OS, license, language, programming language, and project status.

  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    ast-hook-forjs-re

    ast-hook-forjs-re

    AST-based JavaScript reverse engineering and variable tracing toolkit

    ast-hook-for-js-RE is an open source JavaScript reverse engineering toolkit designed to help analysts locate and understand client-side encryption logic used by web applications. It works by intercepting browser traffic through a local proxy server and modifying JavaScript code before it executes in the browser. Using Abstract Syntax Tree (AST) transformations, it injects hook functions into the code to monitor variable assignments and other runtime changes during execution. This allows ast-hook-for-js-RE to capture variable values in memory and store them in a searchable database, effectively enabling variable-level monitoring of program execution. When a user encounters encrypted parameters in network requests, the captured variable data can be searched to determine where those values originated in the code. Once the relevant variable and code location are identified, analysts can trace backward to extract or reproduce the encryption logic used by the site.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    datalus
    PHP web API designed to simplify object handling(loading, saving, querying, displaying, and editing), abstract the data from its display structure, and layout and allow the target data to be delivered to any supported format without special logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    diskover-community

    diskover-community

    Open source file indexing & storage analytics powered by Elasticsearch

    Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across infrastructure. By indexing file metadata from sources such as local file systems, network shares like NFS and SMB, and cloud storage, the tool provides a centralized way to analyze heterogeneous storage environments. Diskover also helps identify outdated or unused files, duplicate data, and inefficient storage usage that can waste resources or increase operational costs. A Python-based indexing engine performs the scanning and indexing tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of metadata types. Contributions and requests for other metadata types are welcome! You can also pass an options object as the first argument containing extra parameters. Some websites require the user-agent or cookies to be set in order to get the response.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    lightcrawler

    lightcrawler

    Website crawler that audits site pages automatically with Lighthouse

    Lightcrawler is a command-line tool designed to crawl a website and run automated audits on the discovered pages using Google Lighthouse. It works by starting from a given URL and recursively exploring linked pages to collect a set of pages that should be analyzed. Each discovered page is then evaluated using Lighthouse, which performs checks related to performance, accessibility, and web development best practices. This allows developers to audit multiple pages of a site automatically instead of manually running Lighthouse on each individual page. Lightcrawler supports configuration through a JSON configuration file, enabling users to customize how the crawler operates and which Lighthouse audits should be executed. Settings such as crawl depth and the number of concurrent browser instances can be configured to control how aggressively the crawler scans a site. It was created as a developer utility to help identify issues across an entire website more efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    owllook

    owllook

    Vertical novel search engine with unified reading and tracking tools

    Owllook is an open source vertical search engine designed for discovering and reading online novels from multiple sources. Instead of redirecting users to different sites, the system parses content from many novel platforms and presents it in a unified reading interface. It focuses on providing a simple and comfortable reading experience with features such as searching for books, following updates, bookmarking chapters, and maintaining a personal bookshelf. It aggregates results from multiple search engines and applies parsing rules to extract novel metadata, chapters, and content in a consistent format. Owllook also includes functionality for tracking reading history, displaying rankings based on search activity, and recommending books using a similarity-based approach. Owllook is built using asynchronous technologies to support efficient data retrieval and responsive interactions while reading or searching.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    pandora-box

    pandora-box

    Lightweight cross-platform desktop client for managing Mihomo proxies

    Pandora-Box is a lightweight desktop client designed to provide a graphical interface for the Mihomo proxy core. It allows users to manage proxy configurations and subscriptions through a simple and user-friendly interface rather than working directly with configuration files. Pandora-Box supports multiple proxy protocols and provides tools to organize and control network routing rules. It is designed to work for both casual users who want an easy setup and advanced users who need more control over proxy behavior. It also supports automatic rule grouping and features such as TUN mode to enable system-wide proxy routing. Pandora-Box focuses on delivering a clean interface with practical features for importing, managing, and converting proxy subscriptions. Pandora-Box combines a desktop interface with backend components to create a functional proxy management environment that simplifies complex networking configurations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    rebroswer-patches

    rebroswer-patches

    Patches for Puppeteer and Playwright to reduce automation detection

    rebrowser-patches is an open source collection of patches designed to improve the stealth capabilities of browser automation frameworks. It focuses primarily on enhancing Puppeteer and Playwright by modifying parts of their source code that may reveal automation activity to websites. Many modern websites rely on bot detection mechanisms that identify automation through behavioral or technical signals, and these patches aim to reduce those detection vectors. By applying targeted fixes, the project helps developers minimize automation leaks that are difficult or impossible to address through configuration options alone. The patches can be applied directly to installed libraries or used through drop-in replacements that already contain the modifications. Developers can enable, disable, or remove the patches when needed, making it flexible for experimentation or debugging workflows. rebrowser-patches is maintained with the expectation that automation libraries evolve over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    serverless-chrome

    serverless-chrome

    Run headless Chrome/Chromium on AWS Lambda

    Serverless Chrome contains everything you need to get started running headless Chrome on AWS Lambda (possibly Azure and GCP Functions soon). The aim of this project is to provide the scaffolding for using Headless Chrome during a serverless function invocation. Serverless Chrome takes care of building and bundling the Chrome binaries and making sure Chrome is running when your serverless function executes. In addition, this project also provides a few example services for common patterns (e.g. taking a screenshot of a page, printing to PDF, some scraping, etc.). Why? Because it's neat. It also opens up interesting possibilities for using the Chrome DevTools Protocol (and tools like Chromeless or Puppeteer) in serverless architectures and doing testing/CI, web-scraping, pre-rendering, etc. You must configure your AWS credentials either by defining AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environmental variables, or using an AWS profile.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    single-file-cli

    single-file-cli

    CLI tool to save complete web pages as single self-contained HTML file

    SingleFile CLI is an open source command-line tool designed to save complete web pages as a single self-contained HTML file. It captures the rendered page in a headless browser and embeds all required resources directly into the output document, including stylesheets, scripts, images, and fonts. By consolidating every dependency into one file, it allows users to preserve a faithful copy of a web page that can be viewed offline without requiring external assets. SingleFile CLI works by controlling a browser through the Chrome DevTools Protocol, rendering the page before extracting and packaging all necessary resources. This approach helps ensure that the saved page closely matches the original appearance and functionality. SingleFile CLI can be used for automated archiving, research, documentation, or offline reading workflows where preserving a page exactly as displayed is important.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    studiMaps is a web based application for visualization and analysis of social networks. It consists of two software components: a web-crawler for getting data and the web based application for visualization.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB