Page 2 | data free download

Showing 3462 open source projects for "data"

View related business solutions

Python Clear Filters & Widen Search

$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

ShredOS

Shredos Disk Eraser 64 bit for all Intel 64 bit processors

For all Intel and compatible 64 & 32 bit processors. ShredOS is a USB bootable (BIOS or UEFI) small linux distribution with the sole purpose of securely erasing the entire contents of your disks using the program nwipe. If you are familiar with dwipe from DBAN then you will feel right at home with ShredOS and nwipe. What are the advantages of nwipe over dwipe/DBAN? Well as everybody probably knows, DBAN development stopped in 2015 which means it has not received any further bug fixes or...

Downloads: 472 This Week

Last Update: 2026-04-02
See Project
2

Label Studio

Label Studio is a multi-type data labeling and annotation tool

...Support for multiple data types including images, audio, text, HTML, time-series, and video.

Downloads: 24 This Week

Last Update: 2026-03-13
See Project
3

Anna’s Archive

Comprehensive search engine for books, papers, comics, magazines

...It relies heavily on technologies such as Elasticsearch for search functionality and MariaDB for structured data storage, enabling fast and efficient querying across massive datasets. The system is designed with redundancy and replication in mind, allowing distributed deployments and mirrored environments to handle high traffic and large data volumes. It also includes tooling for importing datasets, managing metadata, and maintaining structured archives using custom formats.

Downloads: 56 This Week

Last Update: 2026-03-23
See Project
4

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 97 This Week

Last Update: 3 days ago
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Mihomo

A simple Python Pydantic model for Honkai

Mihomo is a Python client library leveraging Pydantic to model parsed Honkai: Star Rail user data from the Mihomo public API. It provides structured types, type hints, and convenience methods to fetch and transform player profiles, daily stats, and character details efficiently.

Downloads: 132 This Week

Last Update: 4 days ago
See Project
6

scikit-learn

Machine learning in Python

scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.

Downloads: 10 This Week

Last Update: 2025-12-10
See Project
7

geowifi

OSINT tool for locating WiFi networks using BSSID or SSID data

...It queries several public WiFi geolocation databases and aggregates the results to help identify the approximate location of a wireless access point. By combining multiple data sources such as Wigle, Apple, Google, WifiDB, Mylnikov, and Combain, the tool can provide location data that may include coordinates and additional network metadata. Users can run searches through a command-line interface by specifying either the BSSID (MAC address) or the SSID of a network. The results can be displayed in different formats, including a structured JSON output or an interactive HTML map showing the discovered locations. geowifi also supports API-based integrations with certain services, which allows geowifi to retrieve more accurate or detailed geolocation data when valid API credentials are configured.

Downloads: 22 This Week

Last Update: 3 days ago
See Project
8

graphify

AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)

...Overall, graphify serves as a bridge between raw data and visual insight.

Downloads: 10 This Week

Last Update: 2 days ago
See Project
9

Instagram OSINT Tool

Instagram OSINT tool for gathering profile data and public posts

...The results are saved locally in structured formats such as JSON-style data inside text files, making them easy to analyze or integrate into other applications. InstagramOSINT also exposes a Python API so developers can import the functionality.

Downloads: 29 This Week

Last Update: 4 days ago
See Project
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
10

Great Expectations

Always know what to expect from your data

Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams. Expectations are assertions for data.

Downloads: 3 This Week

Last Update: 2026-04-15
See Project
11

SDGym

Benchmarking synthetic data generation methods

The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking.

Downloads: 5 This Week

Last Update: 6 days ago
See Project
12

Mimesis

High-performance fake data generator for Python

Mimesis is an open source high-performance fake data generator for Python, able to provide data for various purposes in various languages. It's currently the fastest fake data generator for Python, and supports many different data providers that can produce data related to people, food, transportation, internet and many more. Mimesis is really easy to use, with everything you need just an import away.

Downloads: 3 This Week

Last Update: 2026-01-11
See Project
13

MiroFish

A Simple and Universal Swarm Intelligence Engine

MiroFish is a next-generation artificial intelligence prediction engine that leverages multi-agent technology and swarm-intelligence simulation to model, simulate, and forecast complex real-world scenarios. The system extracts “seed” information from sources such as breaking news, policy documents, and market signals to construct a high-fidelity digital parallel world populated by thousands of virtual agents with independent memory and behavior rules. Users can inject variables or conditions...

Downloads: 669 This Week

Last Update: 2026-03-05
See Project
14

folium

Python data, Leaflet.js maps

folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via folium. folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. ...

Downloads: 5 This Week

Last Update: 2025-06-16
See Project
15

Cleanlab

The standard data-centric AI package for data quality and ML

cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog.

Downloads: 4 This Week

Last Update: 2026-01-13
See Project
16

spyder

The scientific Python development environment

Spyder is a free and open source scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It features a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package. Spyder’s multi-language Editor integrates a number of powerful tools right out of the box for an easy to use, efficient editing experience. ...

Downloads: 170 This Week

Last Update: 2026-04-07
See Project
17

Positron

Positron, a next-generation data science IDE

Positron is a next-generation integrated development environment (IDE) created by Posit PBC (formerly RStudio Inc) specifically tailored for data science workflows in Python, R, and multi-language ecosystems. It aims to unify exploratory data analysis, production code, and data-app authoring in a single environment so that data scientists move from “question → insight → application” without switching tools. Built on the open-source Code-OSS foundation, Positron provides a familiar coding experience along with specialized panes and tooling for variable inspection, data-frame viewing, plotting previews, and interactive consoles designed for analytical work. ...

Downloads: 3 This Week

Last Update: 2026-04-14
See Project
18

Amulet Map Editor

A new Minecraft world editor and converter

...The program works natively with the block state format introduced in 1.13 which enables editing of all world formats. Amulet is built on top of a world converter that converts all world data into a custom superset format. This means that all worlds can be modified in the same way rather than having custom logic for each world format. Amulet comes with a built-in world converter that can be used to convert any world Amulet can open into any other world Amulet can open.

Downloads: 561 This Week

Last Update: 2026-04-08
See Project
19

Diffgram

Training data (data labeling, annotation, workflow) for all data types

From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality.

Downloads: 2 This Week

Last Update: 2024-10-14
See Project
20

Streamlit

The fastest way to build data apps in Python

A faster way to build and share data apps. Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience is required. Build an app in a few lines of code with our magically simple API. Then see it automatically update as you iteratively save the source file. Adding a widget is the same as declaring a variable. No need to write a backend, define routes, handle HTTP requests, connect a frontend, write HTML, CSS, JavaScript, etc. ...

Downloads: 38 This Week

Last Update: 2026-03-31
See Project
21

TOML

Tom Preston-Werner's obvious, minimal language

...TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages. TOML shares traits with other file formats used for application configuration and data serialization, such as YAML and JSON. TOML and JSON both are simple and use ubiquitous data types, making them easy to code for or parse with machines. TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. ...

Downloads: 5 This Week

Last Update: 2025-12-18
See Project
22

DataChain

AI-data warehouse to enrich, transform and analyze unstructured data

...The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain is especially helpful if batch operations can be optimized – for instance, when synchronous API calls can be parallelized or where an LLM API offers batch processing.

Downloads: 5 This Week

Last Update: 3 days ago
See Project
23

Arize Phoenix

Uncover insights, surface problems, monitor, and fine tune your LLM

Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative) are an amazing technology that will power many of future ML use cases. ...

Downloads: 5 This Week

Last Update: 2 days ago
See Project
24

Matplotlib

matplotlib: plotting with Python

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...

Downloads: 18 This Week

Last Update: 2025-11-13
See Project
25

Sweetviz

Visualize and compare datasets, target values and associations

Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application. The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Shows how a target value (e.g. ...

Downloads: 3 This Week

Last Update: 2026-04-11
See Project