data.6bin free download

Showing 878 open source projects for "data.6bin"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
1

Data-Juicer

Data processing for and with foundation models

Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
2

Synthetic Data Generator

SDG is a specialized framework

...It also includes a data processing module capable of handling different data types, preprocessing columns, managing missing values, and converting formats automatically before model training.

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
3

Data Version Control

Git-based data version control for machine learning workflows

DVC (Data Version Control) is an open source tool designed to bring version control principles to machine learning and data science workflows. It enables developers and data scientists to track datasets, machine learning models, and experiment results in a way that integrates with existing Git repositories. Instead of storing large datasets directly in Git, DVC keeps lightweight metadata in the repository while storing the actual data in external storage systems. ...

Downloads: 0 This Week

Last Update: 2026-03-31
See Project
4

Data Science Articles from CodeCut

Collection of useful data science topics along with articles

The Data-science repository from CodeCutTech is a curated collection of educational content focused on practical tools and workflows used in modern data science projects. Instead of providing a single software package, the repository aggregates articles, tutorials, and examples covering many topics within the data science ecosystem. The materials address areas such as MLOps, data management, project organization, testing practices, visualization techniques, and productivity tools used by data scientists. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Data Science Interviews

Data science interview questions and answers

Data Science Interviews is an open-source repository that collects common data science interview questions along with community-provided answers and explanations. The project serves as a preparation resource for students, job seekers, and professionals who want to review the technical knowledge required for data science roles. The repository organizes questions into different categories including theoretical machine learning concepts, technical programming questions, and probability or statistics problems. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
6

Dash Data Agent

Self-learning data agent that grounds its answers in layers of content

Dash is a self-learning data agent built by the Agno AI community that generates grounded answers to English queries over structured data by synthesizing SQL and reasoning based on six layers of context, improving automatically with each run. It sidesteps common limitations of simple text-to-SQL agents by incorporating multiple context layers — including schema structure, human annotations, known query patterns, institutional knowledge from docs, machine-discovered error patterns, and live runtime context — to generate SQL queries that are both technically correct and semantically meaningful. ...

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
7

Agentic Data Scientist

An end-to-end Data Scientist

Agentic Data Scientist is an experimental AI-driven research framework that orchestrates data science workflows through autonomous agents that can reason, plan, and execute complex analytics tasks. Unlike traditional scripted pipelines, this project lets AI agents break down high-level research goals into sub-tasks such as data acquisition, cleaning, modeling, evaluation, and reporting, with minimal human direction.

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
8

Synthetic Data Vault (SDV)

Synthetic Data Generation for tabular, relational and time series data

The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset. Synthetic data can then be used to supplement, augment and in some cases replace real data when training Machine Learning models.

Downloads: 2 This Week

Last Update: 2026-04-24
See Project
9

cracking-the-data-science-interview

A Collection of Cheatsheets, Books, Questions, and Portfolio

Cracking the Data Science Interview is an open educational repository that collects study materials, resources, and reference links for preparing for data science interviews. The project organizes content across many fundamental areas of data science, including statistics, probability, SQL, machine learning, and deep learning. It includes cheat sheets that summarize important technical concepts commonly discussed during technical interviews.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

DeerFlow

Deep Research framework, combining language models with tools

DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”) that collaborate in a structured workflow, allowing tasks like literature reviews, data gathering, data analysis, code execution, and final report generation to be largely automated. ...

Downloads: 167 This Week

Last Update: 2 days ago
See Project
11

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 106 This Week

Last Update: 2026-04-20
See Project
12

Label Studio

Label Studio is a multi-type data labeling and annotation tool

...Support for multiple data types including images, audio, text, HTML, time-series, and video.

Downloads: 18 This Week

Last Update: 2026-03-13
See Project
13

scikit-learn

Machine learning in Python

scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.

Downloads: 8 This Week

Last Update: 2025-12-10
See Project
14

graphify

AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)

...Overall, graphify serves as a bridge between raw data and visual insight.

Downloads: 8 This Week

Last Update: 15 hours ago
See Project
15

marimo

A reactive notebook for Python

...Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots, make working with data feel refreshingly fast, futuristic, and intuitive. Version with git, run as Python scripts, import symbols from a notebook into other notebooks or Python files, and lint or format with your favorite tools. You'll always be able to reproduce your collaborators' results. Notebooks are executed in a deterministic order, with no hidden state, delete a cell and marimo deletes its variables while updating affected cells.

Downloads: 6 This Week

Last Update: 1 day ago
See Project
16

cognee

Deterministic LLMs Outputs for AI Applications and AI Agents

Cognee implements scalable, modular data pipelines that allow for creating the LLM-enriched data layer using graph and vector stores. Cognee acts a semantic memory layer, unveiling hidden connections within your data and infusing it with your company's language and principles. This self-optimizing process ensures ultra-relevant, personalized, and contextually aware LLM retrievals.

Downloads: 9 This Week

Last Update: 12 hours ago
See Project
17

MiroFish

A Simple and Universal Swarm Intelligence Engine

MiroFish is a next-generation artificial intelligence prediction engine that leverages multi-agent technology and swarm-intelligence simulation to model, simulate, and forecast complex real-world scenarios. The system extracts “seed” information from sources such as breaking news, policy documents, and market signals to construct a high-fidelity digital parallel world populated by thousands of virtual agents with independent memory and behavior rules. Users can inject variables or conditions...

Downloads: 397 This Week

Last Update: 2026-03-05
See Project
18

GPT-SoVITS

1 min voice data can also be used to train a good TTS model

GPT‑SoVITS is a state-of-the-art voice conversion and TTS system that enables zero‑shot and few‑shot synthesis based on a short vocal sample (e.g., 5 seconds). It supports cross‑lingual speech synthesis across English, Chinese, Japanese, Korean, Cantonese, and more. It's powered by VITS architecture enhanced for few‑sample adaptation and real‑time usability.

Downloads: 49 This Week

Last Update: 2025-07-29
See Project
19

RAGFlow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

...It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

Downloads: 10 This Week

Last Update: 6 days ago
See Project
20

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything

X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks.

Downloads: 29 This Week

Last Update: 2026-04-26
See Project
21

GeoAI

GeoAI: Artificial Intelligence for Geospatial Data

GeoAI is a comprehensive open-source Python package designed to integrate artificial intelligence techniques with geospatial data analysis, enabling users to perform advanced geographic modeling and visualization tasks with ease. It provides a unified framework that combines machine learning libraries such as PyTorch and Transformers with geospatial tools, allowing users to process satellite imagery, aerial photos, and vector datasets in a streamlined workflow.

Downloads: 10 This Week

Last Update: 2026-04-15
See Project
22

Cleanlab

The standard data-centric AI package for data quality and ML

cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog.

Downloads: 1 This Week

Last Update: 2026-01-13
See Project
23

DATAGEN

AI-driven multi-agent research assistant automating hypothesis

...The project integrates several modern AI frameworks including LangChain, LangGraph, and large language models to manage reasoning and data processing tasks. Through this architecture, the system can combine structured data analysis with natural language reasoning to generate insights and research outputs. The platform is designed for researchers, analysts, and developers who want to accelerate data exploration and automate parts of the research lifecycle.

Downloads: 2 This Week

Last Update: 2026-04-18
See Project
24

LlamaIndex

Central interface to connect your LLM's with external data

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion. Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning. Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when the context is too big. ...

Downloads: 3 This Week

Last Update: 2026-04-21
See Project
25

SDGym

Benchmarking synthetic data generation methods

The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking.

Downloads: 1 This Week

Last Update: 2026-04-17
See Project