Showing 3462 open source projects for "data"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    ShredOS

    ShredOS

    Shredos Disk Eraser 64 bit for all Intel 64 bit processors

    For all Intel and compatible 64 & 32 bit processors. ShredOS is a USB bootable (BIOS or UEFI) small linux distribution with the sole purpose of securely erasing the entire contents of your disks using the program nwipe. If you are familiar with dwipe from DBAN then you will feel right at home with ShredOS and nwipe. What are the advantages of nwipe over dwipe/DBAN? Well as everybody probably knows, DBAN development stopped in 2015 which means it has not received any further bug fixes or...
    Downloads: 472 This Week
    Last Update:
    See Project
  • 2
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    ...Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 3
    Anna’s Archive

    Anna’s Archive

    Comprehensive search engine for books, papers, comics, magazines

    ...It relies heavily on technologies such as Elasticsearch for search functionality and MariaDB for structured data storage, enabling fast and efficient querying across massive datasets. The system is designed with redundancy and replication in mind, allowing distributed deployments and mirrored environments to handle high traffic and large data volumes. It also includes tooling for importing datasets, managing metadata, and maintaining structured archives using custom formats.
    Downloads: 56 This Week
    Last Update:
    See Project
  • 4
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 97 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Mihomo

    Mihomo

    A simple Python Pydantic model for Honkai

    Mihomo is a Python client library leveraging Pydantic to model parsed Honkai: Star Rail user data from the Mihomo public API. It provides structured types, type hints, and convenience methods to fetch and transform player profiles, daily stats, and character details efficiently.
    Downloads: 132 This Week
    Last Update:
    See Project
  • 6
    scikit-learn

    scikit-learn

    Machine learning in Python

    scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    geowifi

    geowifi

    OSINT tool for locating WiFi networks using BSSID or SSID data

    ...It queries several public WiFi geolocation databases and aggregates the results to help identify the approximate location of a wireless access point. By combining multiple data sources such as Wigle, Apple, Google, WifiDB, Mylnikov, and Combain, the tool can provide location data that may include coordinates and additional network metadata. Users can run searches through a command-line interface by specifying either the BSSID (MAC address) or the SSID of a network. The results can be displayed in different formats, including a structured JSON output or an interactive HTML map showing the discovered locations. geowifi also supports API-based integrations with certain services, which allows geowifi to retrieve more accurate or detailed geolocation data when valid API credentials are configured.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 8
    graphify

    graphify

    AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)

    ...Overall, graphify serves as a bridge between raw data and visual insight.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    Instagram OSINT Tool

    Instagram OSINT Tool

    Instagram OSINT tool for gathering profile data and public posts

    ...The results are saved locally in structured formats such as JSON-style data inside text files, making them easy to analyze or integrate into other applications. InstagramOSINT also exposes a Python API so developers can import the functionality.
    Downloads: 29 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    Great Expectations

    Great Expectations

    Always know what to expect from your data

    Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams. Expectations are assertions for data.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    Mimesis

    Mimesis

    High-performance fake data generator for Python

    Mimesis is an open source high-performance fake data generator for Python, able to provide data for various purposes in various languages. It's currently the fastest fake data generator for Python, and supports many different data providers that can produce data related to people, food, transportation, internet and many more. Mimesis is really easy to use, with everything you need just an import away.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    MiroFish

    MiroFish

    A Simple and Universal Swarm Intelligence Engine

    MiroFish is a next-generation artificial intelligence prediction engine that leverages multi-agent technology and swarm-intelligence simulation to model, simulate, and forecast complex real-world scenarios. The system extracts “seed” information from sources such as breaking news, policy documents, and market signals to construct a high-fidelity digital parallel world populated by thousands of virtual agents with independent memory and behavior rules. Users can inject variables or conditions...
    Downloads: 669 This Week
    Last Update:
    See Project
  • 14
    folium

    folium

    Python data, Leaflet.js maps

    folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via folium. folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    spyder

    spyder

    The scientific Python development environment

    Spyder is a free and open source scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It features a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package. Spyder’s multi-language Editor integrates a number of powerful tools right out of the box for an easy to use, efficient editing experience. ...
    Downloads: 170 This Week
    Last Update:
    See Project
  • 17
    Positron

    Positron

    Positron, a next-generation data science IDE

    Positron is a next-generation integrated development environment (IDE) created by Posit PBC (formerly RStudio Inc) specifically tailored for data science workflows in Python, R, and multi-language ecosystems. It aims to unify exploratory data analysis, production code, and data-app authoring in a single environment so that data scientists move from “question → insight → application” without switching tools. Built on the open-source Code-OSS foundation, Positron provides a familiar coding experience along with specialized panes and tooling for variable inspection, data-frame viewing, plotting previews, and interactive consoles designed for analytical work. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Amulet Map Editor

    Amulet Map Editor

    A new Minecraft world editor and converter

    ...The program works natively with the block state format introduced in 1.13 which enables editing of all world formats. Amulet is built on top of a world converter that converts all world data into a custom superset format. This means that all worlds can be modified in the same way rather than having custom logic for each world format. Amulet comes with a built-in world converter that can be used to convert any world Amulet can open into any other world Amulet can open.
    Downloads: 561 This Week
    Last Update:
    See Project
  • 19
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Streamlit

    Streamlit

    The fastest way to build data apps in Python

    A faster way to build and share data apps. Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience is required. Build an app in a few lines of code with our magically simple API. Then see it automatically update as you iteratively save the source file. Adding a widget is the same as declaring a variable. No need to write a backend, define routes, handle HTTP requests, connect a frontend, write HTML, CSS, JavaScript, etc. ...
    Downloads: 38 This Week
    Last Update:
    See Project
  • 21
    TOML

    TOML

    Tom Preston-Werner's obvious, minimal language

    ...TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages. TOML shares traits with other file formats used for application configuration and data serialization, such as YAML and JSON. TOML and JSON both are simple and use ubiquitous data types, making them easy to code for or parse with machines. TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    ...The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain is especially helpful if batch operations can be optimized – for instance, when synchronous API calls can be parallelized or where an LLM API offers batch processing.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Arize Phoenix

    Arize Phoenix

    Uncover insights, surface problems, monitor, and fine tune your LLM

    Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative) are an amazing technology that will power many of future ML use cases. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Matplotlib

    Matplotlib

    matplotlib: plotting with Python

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 25
    Sweetviz

    Sweetviz

    Visualize and compare datasets, target values and associations

    Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application. The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Shows how a target value (e.g. ...
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB