89 projects for "data scraper website" with 2 filters applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    Matomo

    Matomo

    Alternative to Google Analytics that gives you full control over data

    Google Analytics alternative that protects your data and your customers' privacy. Take back control with Matomo – a powerful web analytics platform that gives you 100% data ownership. You could lose your customers’ trust and risk damaging your reputation if people learn their data is used for Google’s “own purposes”. By choosing the ethical alternative, Matomo, you won’t make privacy sacrifices or compromise your site.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Laravel Sharp

    Laravel Sharp

    Laravel 10+ Content management framework

    Sharp is a content management framework, a toolset that provides help to build a CMS section in a website, with some rules in mind. The public website should not have any knowledge of the CMS, the CMS is a part of the system, not the center of it. In fact, removing the CMS should not have any effect on the project. Content administrators should work with their data and terminology, not CMS terms. I mean, if the project is about spaceships, space travels, and pilots, why would the CMS talk about articles, categories, and tags? ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    SkyCrypt

    SkyCrypt

    A Hypixel skyblock stats website

    SkyCrypt is a web-based application that allows players of Hypixel SkyBlock to view and share detailed information about their in-game profiles through a visually rich interface. It aggregates data from the Hypixel API and presents it in an organized format, including player statistics, skills, equipment, and inventory details. The project is built with a Node.js-based stack and integrates additional technologies such as MongoDB and Redis to handle data storage and caching. SkyCrypt enhances...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    ZetaJS

    ZetaJS

    JS wrapper for ZetaOffice in the browser

    The zeta.js library provides the facilities to run an instance of ZetaOffice integrated into your web site, allowing you to control it with JavaScript code via the LibreOffice UNO technology. Use cases range from an in-browser office suite that looks and feels just like its desktop counterpart, to fine-tuned custom text editing and spreadsheet capabilities embedded in your website, to a headless zetajs instance that does document conversion in the background.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    Surmon.me

    Surmon.me

    Personal website and blog

    Surmon.me is a full-featured personal website and blog platform built with Vue and designed as part of a larger ecosystem of interconnected applications and services. The project functions as a server-side rendered (SSR) web application that delivers content dynamically while maintaining performance and SEO optimization. It is powered by a dedicated backend service called NodePress, which provides RESTful APIs for content management, data retrieval, and system operations. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Hibernate

    Hibernate

    An object relational-mapping (ORM) library for Java

    The Hibernate projects offer a suite of powerful Java libraries to work with data. It is best known for Hibernate ORM, which provides relational persistence for Java models and is an implementation of the Jakarta Persistence specification. Hibernate projects do not consistently release binaries or documentation to SourceForge anymore. For up-to-date information, refer to the Hibernate website: * Hibernate ORM: https://hibernate.org/orm/ * Hibernate Validator: https://hibernate.org/validator/ * Hibernate Search: https://hibernate.org/search/ That website will also be updated with newer projects, such as Hibernate Reactive.
    Leader badge
    Downloads: 3,148 This Week
    Last Update:
    See Project
  • 12

    Pimped Apache Server Status

    Enhanced Apache Server Status page - for one or multiple servers

    The pimped Apache status makes the Apache server status readable, sortable and searchable. The pimped Apache status can merge the status of several servers that opens the possibility to identify the troubleshooter even in a loadbalanced website. The webbased tool offers a multilanguage, skinable interface with a built-in updater. In several views you see most requested pages, vhosts, used methods, IPs that make the most requests and more. All views are sortable tables you can filter by a keyword and are available as API Request too to get its data as CSV, XML or JSON. ...
    Leader badge
    Downloads: 58 This Week
    Last Update:
    See Project
  • 13
    FullSync

    FullSync

    Easy file synchronization for everyone

    FullSync is a powerful tool that helps you keep multiple copies of various data in sync. I.e. it can update your Website using (S)Ftp, backup your data or refresh a working copy from a remote server. It offers flexible rules, a scheduler and more
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    dirhunt

    dirhunt

    Web crawler that finds hidden web directories without brute force

    Dirhunt is an open source security tool designed to discover web directories and analyze website structures without relying on brute-force techniques. Instead of sending large numbers of guess-based requests, it operates as a specialized crawler that intelligently explores websites to identify accessible or hidden directories. Dirhunt can detect directories that expose “Index Of” listings, which may reveal files and other resources that were not intended to be publicly visible. It can also...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    MediaWiki Community Edition For Intranet

    MediaWiki Community Edition For Intranet

    The Free & Popular MediaWiki Web Software in Complete Virtual Machine

    ...If you are new to Virtual Machines, then please watch Video below ( taken from my other project. just replace td with mw wherever mentioned ) After starting this VM, please login to its administration panel with: Website Address: https://mw.local/ ( Accept Any Warnings Due to Usage of Self-Signed https Certificates ) Admin Username: admin Admin Password: change_this from any PC on your Local Network. Explore all the Left Side Menu Options & User Guide, before starting any page entry. Some Demo Page Data is also loaded for info sake. Feel free to clear them, as required. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    ...Once trained, the generated scraper can process new pages and return the extracted data in structured formats such as dictionaries or lists. This approach simplifies web scraping tasks by shifting the focus from rule-writing to example-based training. Internally, the project processes HTML documents, identifies relevant elements in the DOM, and builds extraction logic based on statistical or heuristic analysis of the training samples.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    DecryptLogin

    DecryptLogin

    Python library providing APIs for automated website login workflows

    DecryptLogin is a Python library designed to simplify automated login processes for many popular websites by providing ready-to-use APIs that simulate authentication behavior. It focuses on implementing login mechanisms through HTTP requests, allowing developers to programmatically authenticate with supported services without manually replicating complex login flows. It includes modules that handle different authentication modes such as PC login, mobile login, and QR code login depending on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DropYet

    DropYet

    It's the simple cloud.

    With DropYet you can manage your personal files in the easiest way. And it's not that bad looking. :) Enjoy the security, simplicity and beauty of DropYet. Upload, rename and delete files. The simpliest manager of files and folders. Furthermore, there are more features like securely sharing files, password encryption and encrypted file detection. Setting it up is easier than ever before. Just try DropYet. Currently working on MORE features for you. Now with Dark...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    ECommerceCrawlers

    ECommerceCrawlers

    Collection of Python ecommerce and website crawler examples projects

    ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    ...Using text segmentation and frequency analysis, the project can create a word cloud representing common keywords found in the dataset. This makes the repository both a scraping example and a small data analysis experiment built around the collected content. Overall, mzitu serves as a learning-oriented implementation of Python web scraping, data processing, and visualization techniques.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    CrazyStat

    CrazyStat

    Free PHP web analytics script

    CrazyStat is a web analytics script written in PHP. It does not need access to server-logfiles or a MySQL-database to generate statistics of your website visitors. The script has very good usability and still has lots of features. The stats only need one screen length to present all the information. It's fast because of it's caching technology and uses minimal webspace by compressing logfiles. It's free and released under GPL as open source software. With CrazyStat, it is easy to respect...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Zenario

    Zenario

    One of the world's leading multilingual website platforms

    View the Demo - http://zenar.io/demo Zenario is a web-based content management system. It can be used for simple sites, with many "wysiwyg" features, but is really designed to run extranet sites, such as customer portals. It also has multi-lingual features built in from the core.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    ScraperEdit for XBMC

    XML bindings and a GUI for creating and editing XBMC Scrapers

    This program is an editor for creating XBMC Scrapers. It is similar to ScraperEditor, an other editor using ScraperXML, that runs under .Net environment. This program runs under Sun/Oracle's Java Runtime. HELP WANTED! I am looking for someone, who would help me writing documentation, like user's manual and on-line help. Also if someone want to help, translated language files are always welcome...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Wap Auto Index Advance

    Wap Auto Index Advance

    Auto Index wap is Advance of Download Portal (Multi Language)

    Djamolwap 13v -Advance Auto Index With Web Admin Panel + Multi Language + Themes ||||||||||||||||||||||||||||||||||||| New Updates ||||||||||||||||||||||||||||||||||||| - Multi Language Website 1) English 2) Urdu 3) Gujrati 4) Russian - User/Visitor manual change language website - Multi Language Plugin On/Off - Added Function in Admin Panel - Automatic All Mp3 Tag Setting Added _____________________________________________ Official Website : http://ai.djamol.com Demo...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    dbierekTiledSite

    dbierekTiledSite

    A tile based website

    This is a templated HTML5 website using AJAX, PHP, CSS, and jQuery. See it in action: http://wicked-lightning-34-144532.usw1-2.nitrousbox.com/
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB