Showing 68 open source projects for "web scraper extractor"

View related business solutions
  • Powerful App Monitoring Without Surprise Bills Icon
    Powerful App Monitoring Without Surprise Bills

    AppSignal starts at $23/month with all features included. No overages, no hidden fees. 30-day free trial.

    Tired of monitoring tools that punish you for scaling? AppSignal offers transparent, predictable pricing with every feature unlocked on every plan. Track errors, monitor performance, detect anomalies, and manage logs across Ruby, Python, Node.js, and more. Trusted by developers since 2012 with free dev-to-dev support. No credit card required to start your 30-day trial.
    Try AppSignal Free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    shot-scraper

    shot-scraper

    A command-line utility for taking automated screenshots of websites

    shot-scraper is a command-line utility for taking automated screenshots of web pages using a headless browser engine. After installation, a single command can capture a full-page screenshot of a URL and save it to a file, making it ideal for documentation, monitoring, and visual regression tasks. Under the hood it uses a modern browser (installed via a one-time shot-scraper install step) and exposes options for viewport size, full-page versus clipped screenshots, and device emulation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Google Maps Extractor

    Google Maps Extractor

    Free Google Map Extractor(With Email) | Google Maps Scraper

    A free Google Map extractor for business leadsโ€”fast & efficient! This Google Maps scraper extracts phone numbers, emails, locations, and social media profiles, then exports to CSV. Visit: https://gmplus.io/
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhereโ€”across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool โ€“ it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Scraper of Death
    Scraper of Death is a web scraper. Multiple Scraping Methods Requests + BeautifulSoup (fast, lightweight) Selenium (JavaScript support, dynamic content)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    Ulixee Hero

    Ulixee Hero

    The web browser built for scraping

    It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12

    Media Extractor

    Google Chrome extension designed to intercept and download media files

    # Media Extractor Media Extractor is a Google Chrome extension designed to intercept and download media files directly from websites. The extension monitors network requests in the browser and allows users to download detected media files such as video and audio streams. ## Key Features - Intercepts media files from web pages - Supports video and audio downloads - Works directly inside Google Chrome - Simple and intuitive interface - No external tools required ## Use Cases - Downloading embedded video content - Saving audio streams from websites - Analyzing media network requests - Offline media access ## Download - Chrome Web Store (if published): https://github.com/exxellengames/Media-Extractor/releases ## Official Website - EN: https://exxellengames.great-site.net/en/ - RU: https://exxellengames.great-site.net/ru/ ## Author exxellengames
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AIโ€‘ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    ...The extractor tries to strike a balance between limiting noise (precision) and including all valid parts (recall). It also has to be robust and reasonably fast, it runs in production on millions of documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Snap Lens File Extractor

    Snap Lens File Extractor

    Online file extractor for the Snapcha lens file format

    Web Browser based JavaScript Online File Extractor, Parser, Unpacker and Zip File Converter. Reads and unpacks the Snap Camera / Snapchat Lens File Format (lens.lns / *.lns) Snap Lens Tool https://snap-lens-tool.sourceforge.io Snap Lens File Format https://snap-lens-file-format.sourceforge.io
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    ai-scrapper
    ๐Ÿš€ Discover AI Web Scraper! ๐Ÿš€ Tired of copying and pasting data from websites? I developed a desktop application with Electron and Gemini AI to extract structured data easily and efficiently! ๐Ÿค–โœจ
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20

    Media Dock

    Extract and download media from any website with ease and speed.

    ...It operates securely without tracking your activity and ensures smooth performance without slowing down your browsing experience. Perfect for students, content creators, and professionals, Media Extractor empowers users to access web media effortlessly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    linkedin2username

    linkedin2username

    Generate probable usernames from LinkedIn company employee lists

    ...This process helps security researchers, penetration testers, and investigators perform reconnaissance by building potential username lists for further security testing or OSINT analysis. Unlike tools that rely on official APIs, linkedin2username operates as a pure web scraper and therefore does not require API keys. The script uses Selenium to automate browser interactions and perform searches within LinkedIn to gather employee data.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    FungiRegEx

    FungiRegEx

    FungiRegEx

    This tool is a web-based search engine for regular expressions in the proteomes, all the information is obtained from the JGI (Joint Genome Institute) database through a scraper for all the available species; therefore this tool only considers fungi organisms. In this version, we use React JS in front-end and NodeJS + Express for back-end. Full Documentation Available on: https://victormiguelterronmacias.slite.page/p/J7BJU3hXhd72EJ/FungiRegEx-Software-documentation If you want to buy me a coffee: https://www.paypal.com/donate/?...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB