Showing 218 open source projects for "data"

View related business solutions
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • 1
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    theHarvester

    theHarvester

    E-mails, subdomains and names

    ...Use it for open source intelligence (OSINT) gathering to help determine a company's external threat landscape on the internet. The tool gathers emails, names, subdomains, IPs and URLs using multiple public data sources.
    Downloads: 79 This Week
    Last Update:
    See Project
  • 4
    Mihomo

    Mihomo

    A simple Python Pydantic model for Honkai

    Mihomo is a Python client library leveraging Pydantic to model parsed Honkai: Star Rail user data from the Mihomo public API. It provides structured types, type hints, and convenience methods to fetch and transform player profiles, daily stats, and character details efficiently.
    Downloads: 71 This Week
    Last Update:
    See Project
  • Run Any Workload on Compute Engine VMs Icon
    Run Any Workload on Compute Engine VMs

    From dev environments to AI training, choose preset or custom VMs with 1–96 vCPUs and industry-leading 99.95% uptime SLA.

    Compute Engine delivers high-performance virtual machines for web apps, databases, containers, and AI workloads. Choose from general-purpose, compute-optimized, or GPU/TPU-accelerated machine types—or build custom VMs to match your exact specs. With live migration and automatic failover, your workloads stay online. New customers get $300 in free credits.
    Try Compute Engine
  • 5
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    ...It can be used for data mining, monitoring and automated testing.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 6
    Helium Browser

    Helium Browser

    Private, fast, and honest web browser

    ...Helium blocks ads and trackers by default through an integrated, unbiased uBlock Origin extension prepackaged as a native browser component. Its UI and feature set emphasize minimalism, no “smart” recommendations, account sync, or background data collection, resulting in a distraction-free browsing experience that respects user autonomy. The browser is available across macOS, Linux, and Windows, each version built from a fully open source pipeline for reproducibility and trust. Development focuses on maintaining compatibility with modern web standards while decoupling Chromium from its Google dependencies and services.
    Downloads: 141 This Week
    Last Update:
    See Project
  • 7
    Snoop Project

    Snoop Project

    This is the most powerful software taking into account CIS location

    ...Snoop is a research work (own database / closed bugbounty) in the field of searching and processing public data on the Internet. In terms of specialized search, Snoop is able to compete with traditional search engines.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 8
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 9
    PostHog

    PostHog

    PostHog provides open-source web & product analytics

    PostHog is an all‑in‑one open‑source platform for product and web analytics—offering event-based analytics, session recording, feature flagging, A/B testing, cohorts, and more—that you can self‑host, with full support for data privacy and enterprise compliance. Sync data from external tools like Stripe, Hubspot, your data warehouse, and more. Query it alongside your product data. Run custom filters and transformations on your incoming data. Send it to 25+ tools or any webhook in real time or batch export large amounts to your warehouse. Capture traces, generations, latency, and cost for your LLM-powered app.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Ship AI Apps Faster with Vertex AI Icon
    Ship AI Apps Faster with Vertex AI

    Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

    Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
    Try Vertex AI Free
  • 10
    AIOHTTP

    AIOHTTP

    Asynchronous HTTP client/server framework for asyncio and Python

    ...Now it is possible by registering special signal handlers on every request processing stage. The main change is dropping yield from support and using async/await everywhere. Farewell, Python 3.4. You often want to send some sort of data in the URL’s query string. If you were constructing the URL by hand, this data would be given as key/value pairs in the URL after a question mark, e.g. httpbin.org/get?key=val. Requests allows you to provide these arguments as a dict, using the params keyword argument. aiohttp internally performs URL canonicalization before sending request.
    Downloads: 45 This Week
    Last Update:
    See Project
  • 11
    hosts

    hosts

    Consolidate and extend hosts files from several well-curated sources

    ...Currently, we offer the following categories: fakenews, social, gambling, and porn. Extensions are optional, and can be combined in various ways with the base hosts file. The combined products are stored in the alternates folder. Data for extensions are stored in the extensions folder. You manage extensions by curating this folder tree, where you will find the data for fakenews, social, gambling, and porn extension data that we maintain and provide for you. Create an optional blacklist file. The contents of this file (containing a listing of additional domains in hosts file format) are appended to the unified hosts file during the update process. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 12
    openvpn-monitor

    openvpn-monitor

    openvpn-monitor is a web based OpenVPN monitor

    ...It typically runs on the same host as the OpenVPN server, however, it does not necessarily need to. OpenVPN-monitor is a web-based OpenVPN monitor, that shows current connection information, such as users, location, and data transferred.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Dataproc Templates

    Dataproc Templates

    Dataproc templates and pipelines for solving simple in-cloud data task

    Dataproc templates are designed to address various in-cloud data tasks, including data import/export/backup/restore and bulk API operations. These templates leverage the power of Google Cloud's Dataproc, supporting both Dataproc Serverless and Dataproc clusters. Google provides this collection of pre-implemented Dataproc templates as a reference and for easy customization.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    OpenWPM

    OpenWPM

    A web privacy measurement framework

    OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Algo VPN

    Algo VPN

    Set of Ansible scripts that simplifies the setup of a personal VPN

    ...For anyone who is privacy conscious, travels for work frequently, or can’t afford a dedicated IT department, this one’s for you. Really, the paid-for services are just commercial honeypots. If an attacker can compromise a VPN provider, they can monitor a whole lot of sensitive data. Paid-for VPNs tend to be insecure: they share keys, their weak cryptography gives a false sense of security, and they require you to trust their operators. Even if you’re not doing anything wrong, you could be sharing the same endpoint with someone who is. In that case, your network traffic will be analyzed when law enforcement makes that seizure.
    Downloads: 39 This Week
    Last Update:
    See Project
  • 16
    Scout Suite

    Scout Suite

    Multi-cloud security auditing tool

    ...Once the data has been gathered, all users may be performed offline. Our self-service cloud account monitoring platform, NCC Scout, is a user-friendly SaaS providing you with the ability to constantly monitor your public cloud accounts, allowing you to check they’re configured to comply with industry best practice.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    CloudEvents

    CloudEvents

    CloudEvents Specification

    ...The lack of a common way of describing events means developers must constantly re-learn how to consume events. This also limits the potential for libraries, tooling and infrastructure to aide the delivery of event data across environments, like SDKs, event routers or tracing systems. The portability and productivity we can achieve from event data is hindered overall. CloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DuckDuckGo Android App

    DuckDuckGo Android App

    Privacy browser for Android

    DuckDuckGo is an app that gives you utmost privacy when browsing online. It stops you from getting tracked and protects your personal and private information, no matter where the internet may take you. Apart from providing standard browsing functionality, DuckDuckGo blocks all hidden third-party trackers, forces sites to use an encrypted connection where available, and provides a Privacy Grade rating for each website you visit.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    Linkedin Scraper

    Linkedin Scraper

    A library that scrapes Linkedin for user data

    Linkedin Scraper is a library that scrapes Linkedin for user data. Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper. The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    HTTPie

    HTTPie

    A CLI, cURL-like tool for humans

    HTTPie is a modern command-line HTTP client that makes CLI interaction with web services as human-friendly as possible. It offers a plethora of friendly features that make it an excellent curl alternative. It is equipped with an intuitive UI, JSON support, syntax highlighting and so much more. HTTPie gives a single http command for sending arbitrary HTTP requests with a simple, natural syntax, and displayed in a formatted, colorized terminal output. HTTPie can be installed on macOS,...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Letterboxd Recommendations

    Letterboxd Recommendations

    Scraping publicly-accessible Letterboxd data for movie recommendations

    Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username. A user's "star" ratings are scraped from their Letterboxd profile and assigned numerical ratings from 1 to 10 (accounting for half stars). Their ratings are then combined with a sample of ratings from the top 4000 most active users on the site to create a collaborative filtering recommender model using singular value decomposition (SVD). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Pelican

    Pelican

    Static site generator that supports Markdown and reST syntax

    Pelican is a static site generator that requires no database or server-side logic. Chronological content (e.g., articles, blog posts) as well as static pages. Integration with external services. Site themes (created using Jinja2 templates). Publication of articles in multiple languages. Generation of Atom and RSS feeds. Code syntax highlighting via Pygments. Import existing content from WordPress, Dotclear, or RSS feeds. Fast rebuild times due to content caching and selective output writing....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    finvizfinance

    finvizfinance

    Finviz analysis python library

    ...Stock charts, fundamental & technical information, insider information and stock news. Forex charts and performance. Crypto charts and performance. Screener and Group provide data frames for comparing stocks according to different filters and trading signals. Getting information (fundament, description, outer rating, stock news, inside trader) of an individual stock.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →