Page 30 | data free download

Showing 3124 open source projects for "data"

View related business solutions

Python Clear Filters & Widen Search

Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud
Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.

Try Cloud SQL Free
$300 in Free Credit for Your Google Cloud Projects
Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
1

nanoGPT

The simplest, fastest repository for training/finetuning models

NanoGPT is a minimalistic yet powerful reimplementation of GPT-style transformers created by Andrej Karpathy for educational and research use. It distills the GPT architecture into a few hundred lines of Python code, making it far easier to understand than large, production-scale implementations. The repo is organized with a training pipeline (dataset preprocessing, model definition, optimizer, training loop) and inference script so you can train a small GPT on text datasets like Shakespeare...

Downloads: 0 This Week

Last Update: 2025-11-12
See Project
2

gplearn

Genetic Programming in Python, with a scikit-learn inspired API

...It begins by building a population of naive random formulas to represent a relationship between known independent variables and their dependent variable targets in order to predict new data. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations.

Downloads: 0 This Week

Last Update: 2026-01-07
See Project
3

Semantic Router

Superfast AI decision making and processing of multi-modal data

Semantic Router is a superfast decision-making layer for your LLMs and agents. Rather than waiting for slow, unreliable LLM generations to make tool-use or safety decisions, we use the magic of semantic vector space — routing our requests using semantic meaning. Combining LLMs with deterministic rules means we can be confident that our AI systems behave as intended. Cramming agent tools into the limited context window is expensive, slow, and fundamentally limited. Semantic Router enables...

Downloads: 0 This Week

Last Update: 2025-11-18
See Project
4

Kapitan

Generic templated configuration management for Kubernetes

...Kapitan's inventory-driven model, powerful templating capabilities, and native secret management tools offer granular control, fostering consistency, reducing errors, and safeguarding sensitive data. Empower your team to make changes to your infrastructure whilst maintaining full control, with a GitOps approach and full transparency.

Downloads: 0 This Week

Last Update: 2025-08-12
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

garak

Developers and anyone seeking an LLM solution to scan for vulnerabilit

garak checks if an LLM can be made to fail in a way we don't want. garak probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. garak's a free tool, we love developing it and are always interested in adding functionality to support applications. garak is a command-line tool, it's developed in Linux and OSX. Just grab it from PyPI and you should be good to go.

Downloads: 0 This Week

Last Update: 2026-02-04
See Project
6

Changelog CI

Changelog CI is a GitHub Action that enables a project

...First, it tries to get the latest release from the repository (If available). Then, it checks all the pull requests/commits merged after the last release using the GitHub API. After that, it parses the data and generates the changelog. It is able to use Markdown or reStructuredText to generate a Changelog. Finally, It writes the generated changelog at the beginning of the CHANGELOG.md/CHANGELOG.rst (or user-provided filename) file. In addition to that, if a user provides a configuration file (JSON/YAML), Changelog CI parses the user-provided configuration file and renders the changelog according to user's configuration.

Downloads: 0 This Week

Last Update: 2025-07-15
See Project
7

Ludwig AI

Low-code framework for building custom LLMs, neural networks

...Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks. Declarative YAML configuration file is all you need to train a state-of-the-art LLM on your data. Support for multi-task and multi-modality learning. Comprehensive config validation detects invalid parameter combinations and prevents runtime failures. Automatic batch size selection, distributed training (DDP, DeepSpeed), parameter efficient fine-tuning (PEFT), 4-bit quantization (QLoRA), and larger-than-memory datasets. Retain full control of your models down to the activation functions. ...

Downloads: 0 This Week

Last Update: 2024-07-30
See Project
8

Trafilatura

Python & command-line tool to gather text on the Web

...Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. The extractor tries to strike a balance between limiting noise (precision) and including all valid parts (recall). It also has to be robust and reasonably fast, it runs in production on millions of documents.

Downloads: 0 This Week

Last Update: 2024-12-03
See Project
9

Graphene-Django

Integrate GraphQL into your Django project

Graphene-Django is built on top of Graphene. Graphene-Django provides some additional abstractions that make it easy to add GraphQL functionality to your Django project. First time? We recommend you start with the installation guide to get set up and the basic tutorial. It is worth reading the core graphene docs to familiarize yourself with the basic utilities. Graphene Django has a number of additional features that are designed to make working with Django easy. Our primary focus in this...

Downloads: 0 This Week

Last Update: 2025-03-13
See Project
Cut Cloud Costs with Google Compute Engine
Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.

Try Compute Engine
10

Bootstrap Your Own Latent (BYOL)

Usable Implementation of "Bootstrap Your Own Latent" self-supervised

...This repository offers a module that one can easily wrap any image-based neural network (residual network, discriminator, policy network) to immediately start benefitting from unlabelled image data. There is now new evidence that batch normalization is key to making this technique work well. A new paper has successfully replaced batch norm with group norm + weight standardization, refuting that batch statistics are needed for BYOL to work. Simply plugin your neural network, specifying (1) the image dimensions as well as (2) the name (or index) of the hidden layer, whose output is used as the latent representation used for self-supervised training.

Downloads: 0 This Week

Last Update: 2024-07-15
See Project
11

Determined

Determined, deep learning training platform

The fastest and easiest way to build deep learning models. Distributed training without changing your model code. Determined takes care of provisioning machines, networking, data loading, and fault tolerance. Build more accurate models faster with scalable hyperparameter search, seamlessly orchestrated by Determined. Use state-of-the-art algorithms and explore results with our hyperparameter search visualizations. Interpret your experiment results using the Determined UI and TensorBoard, and reproduce experiments with artifact tracking. ...

Downloads: 0 This Week

Last Update: 2025-03-19
See Project
12

Google Spreadsheets Python

Google Sheets Python API

gspread is a Python API for Google Sheets. A service account is a special type of Google account intended to represent a non-human user that needs to authenticate and be authorized to access data in Google APIs [sic]. Since it’s a separate account, by default it does not have access to any spreadsheet until you share it with this account. Just like any other Google account. To access spreadsheets via Google Sheets API you need to authenticate and authorize your application. Older versions of gspread have used oauth2client. ...

Downloads: 0 This Week

Last Update: 2025-05-14
See Project
13

Full Stack FastAPI and PostgreSQL

Full stack, modern web application generator

...REST backend tests based on Pytest, integrated with Docker, so you can test the full API interaction, independent on the database. As it runs in Docker, it can build a new data store from scratch each time.

Downloads: 0 This Week

Last Update: 2026-01-23
See Project
14

Agent Lightning

The absolute trainer to light up AI agents

...It’s designed to be compatible with a wide range of agent architectures and frameworks — from LangChain and OpenAI Agent SDKs to AutoGen and custom Python agents — making it broadly applicable across different agent tooling ecosystems. Agent-Lightning introduces a lightweight training pipeline that observes agents’ execution traces, converts them into structured data, and feeds them into training algorithms, enabling users to improve agent behaviors systematically. The project emphasizes minimalist integration, so you can drop this into existing systems without extensive rewrites, focusing instead on iterative performance improvement.

Downloads: 0 This Week

Last Update: 2026-02-06
See Project
15

Agent SOP

Natural language workflows for AI agents

...It defines reusable SOP templates that agents can instantiate with context-specific parameters, allowing organizations to codify best practices for customer support, data processing, document workflows, or incident response. The framework supports monitoring and state tracking, so external systems can observe progress, intervene if necessary, and log outcomes for compliance or auditing. Integrations with common messaging and task orchestration systems enable SOP agents to interact with email, ticket queues, and databases as part of their workflows.

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
16

claude-code-transcripts

Tools for publishing transcripts for Claude Code sessions

claude-code-transcripts is a command-line utility that takes session files exported from Claude Code (in JSON or JSONL format) and turns them into clean, navigable HTML transcripts that can be viewed in any modern web browser. It is designed to make the often dense and verbose outputs from AI coding sessions easier to read, share, and archive by breaking conversations into paginated, annotated pages with navigable timelines of prompts and responses. Users can run this tool locally or fetch...

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
17

sqlit

A user friendly TUI for SQL databases

...For querying, it emphasizes productivity features like syntax highlighting, searchable query history, and vim-style keybindings so power users can move fast. For exploring data at scale, it can load and inspect very large result sets and provides filtering and fuzzy search to find rows and values efficiently.

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
18

Self-hosted AI Package

Run all your local AI together in one package

...The stack typically includes Ollama for running local large language models, n8n as a low-code workflow automation platform, Supabase for database and vector storage, Open WebUI for interacting with models, Flowise for agent building, and additional services like SearXNG, Neo4j, and Langfuse for search, knowledge graphs, and observability. This integrated setup allows users to experiment with RAG pipelines, automated workflows, AI agents, and project data management without relying on external hosted services, increasing flexibility and privacy. The repository comes with example workflows (such as Local RAG AI Agent workflows) and environment configurations that help streamline setup and encourage customization.

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
19

StatsForecast

Fast forecasting with statistical and econometric models

...The library implements a broad set of models, including AutoARIMA, ETS, CES, Theta, plus a battery of benchmarking and baseline methods, giving users flexibility in selecting forecasting approaches depending on data characteristics (trend, seasonality, intermittent demand, etc.). Its internal implementation leverages numba to compile performance-critical code to optimized machine-level instructions, which makes the models much faster than many traditional Python counterparts.

Downloads: 0 This Week

Last Update: 2025-11-26
See Project
20

Atheris

A Coverage-Guided, Native Python Fuzzer

...The tool integrates smoothly with Python’s packaging and unit-test ecosystems, so you can wrap existing tests as fuzz targets and keep results understandable. It supports structured input strategies and custom mutators, which is especially helpful for text and data formats common in Python workloads. In practice, Atheris compresses weeks of edge-case brainstorming into hours of automated exploration with actionable, minimized reproductions.

Downloads: 0 This Week

Last Update: 2025-11-25
See Project
21

Penzai

A JAX research toolkit to build, edit, & visualize neural networks

Penzai, developed by Google DeepMind, is a JAX-based library for representing, visualizing, and manipulating neural network models as functional pytree data structures. It is designed to make machine learning research more interpretable and interactive, particularly for tasks like model surgery, ablation studies, architecture debugging, and interpretability research. Unlike conventional neural network libraries, Penzai exposes the full internal structure of models, enabling fine-grained inspection and modification after training. ...

Downloads: 0 This Week

Last Update: 2025-10-09
See Project
22

JEPA

PyTorch code and models for V-JEPA self-supervised learning from video

...This makes learning focus on semantics and structure, yielding features that transfer well with simple linear probes and minimal fine-tuning. The repository provides training recipes, data pipelines, and evaluation utilities for image JEPA variants and often includes ablations that illuminate which masking and architectural choices matter. Because the objective is non-autoregressive and operates in embedding space, JEPA tends to be compute-efficient and stable at scale. The approach has become a strong alternative to contrastive or pixel-reconstruction methods for representation learning.

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
23

DLRM

An implementation of a deep learning recommendation model (DLRM)

...The implementation is optimized for performance at scale, supporting multi-GPU and multi-node execution, quantization, embedding partitioning, and pipelined I/O to feed huge embeddings efficiently. It includes data loaders for standard benchmarks (like Criteo), training scripts, evaluation tools, and capabilities like mixed precision, gradient compression, and memory fusion to maximize throughput.

Downloads: 0 This Week

Last Update: 2026-01-12
See Project
24

MoCo (Momentum Contrast)

Self-supervised visual learning using momentum contrast in PyTorch

MoCo is an open source PyTorch implementation developed by Facebook AI Research (FAIR) for the papers “Momentum Contrast for Unsupervised Visual Representation Learning” (He et al., 2019) and “Improved Baselines with Momentum Contrastive Learning” (Chen et al., 2020). It introduces Momentum Contrast (MoCo), a scalable approach to self-supervised learning that enables visual representation learning without labeled data. The core idea of MoCo is to maintain a dynamic dictionary with a momentum-updated encoder, allowing efficient contrastive learning across large batches. The repository includes implementations for both MoCo v1 and MoCo v2, the latter improving training stability and performance through architectural and augmentation enhancements. ...

Downloads: 0 This Week

Last Update: 22 hours ago
See Project
25

DINOv2

PyTorch code and models for the DINOv2 self-supervised learning

DINOv2 is a self-supervised vision learning framework that produces strong, general-purpose image representations without using human labels. It builds on the DINO idea of student–teacher distillation and adapts it to modern Vision Transformer backbones with a carefully tuned recipe for data augmentation, optimization, and multi-crop training. The core promise is that a single pretrained backbone can transfer well to many downstream tasks—from linear probing on classification to retrieval, detection, and segmentation—often requiring little or no fine-tuning. The repository includes code for training, evaluating, and feature extraction, with utilities to run k-NN or linear evaluation baselines to assess representation quality. ...

Downloads: 0 This Week

Last Update: 2025-12-22
See Project