data free download - SourceForge

Showing 58 open source projects for "data"

View related business solutions

Data Science Linux Clear Filters & Widen Search

Easily Host LLMs and Web Apps on Cloud Run
Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.

Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.

Try Cloud Run Free
$300 in Free Credit for Your Google Cloud Projects
Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
1

Cookiecutter Data Science

Project structure for doing and sharing data science work

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! ...

Downloads: 0 This Week

Last Update: 2025-07-24
See Project
2

AI Data Science Team

An AI-powered data science team of agents

AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...

Downloads: 0 This Week

Last Update: 2026-01-26
See Project
3

RStudio

RStudio is an integrated development environment (IDE) for R

RStudio is a powerful, full-featured integrated development environment (IDE) tailored primarily for the R programming language but increasingly supportive of other languages like Python and Julia. It brings together console, editor, plotting, workspace, history, and file-management panes into a unified interface, helping data scientists, statisticians, and analysts to work more productively. The IDE is cross-platform: there are desktop versions for Windows, macOS and Linux, as well as a server version for remote or multi-user deployment via a web browser. In addition to code editing and execution, RStudio offers extensive support for reproducible research via R Markdown, notebooks, and integration with version control systems like Git and SVN. ...

Downloads: 80 This Week

Last Update: 2026-02-04
See Project
4

ggplot2

An implementation of the Grammar of Graphics in R

...ggplot2 is a part of the tidyverse, an ecosystem of R packages designed for data science.

Downloads: 23 This Week

Last Update: 2026-02-03
See Project
Build AI Apps with Gemini 3 on Vertex AI
Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.

Try Vertex AI Free
5

Quadratic

Data science spreadsheet with Python & SQL

Quadratic enables your team to work together on data analysis to deliver better results, faster. You already know how to use a spreadsheet, but you’ve never had this much power before. Quadratic is a Web-based spreadsheet application that runs in the browser and as a native app (via Electron). Our goal is to build a spreadsheet that enables you to pull your data from its source (SaaS, Database, CSV, API, etc) and then work with that data using the most popular data science tools today (Python, Pandas, SQL, JS, Excel Formulas, etc). ...

Downloads: 8 This Week

Last Update: 1 day ago
See Project
6

Positron

Positron, a next-generation data science IDE

Positron is a next-generation integrated development environment (IDE) created by Posit PBC (formerly RStudio Inc) specifically tailored for data science workflows in Python, R, and multi-language ecosystems. It aims to unify exploratory data analysis, production code, and data-app authoring in a single environment so that data scientists move from “question → insight → application” without switching tools. Built on the open-source Code-OSS foundation, Positron provides a familiar coding experience along with specialized panes and tooling for variable inspection, data-frame viewing, plotting previews, and interactive consoles designed for analytical work. ...

Downloads: 5 This Week

Last Update: 2026-02-10
See Project
7

Milvus

Vector database for scalable similarity search and AI applications

...Rich APIs designed for data science workflows. Consistent user experience across laptop, local cluster, and cloud. Embed real-time search and analytics into virtually any application. Milvus’ built-in replication and failover/failback features ensure data and applications can maintain business continuity in the event of a disruption. Component-level scalability makes it possible to scale up and down on demand.

Downloads: 9 This Week

Last Update: 2026-02-12
See Project
8

marimo

A reactive notebook for Python

...Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots, make working with data feel refreshingly fast, futuristic, and intuitive. Version with git, run as Python scripts, import symbols from a notebook into other notebooks or Python files, and lint or format with your favorite tools. You'll always be able to reproduce your collaborators' results. Notebooks are executed in a deterministic order, with no hidden state, delete a cell and marimo deletes its variables while updating affected cells.

Downloads: 6 This Week

Last Update: 2026-02-12
See Project
9

cuDF

GPU DataFrame Library

Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming. For additional examples, browse our complete API documentation, or check out our more detailed notebooks. cuDF can be installed with conda (miniconda, or the full Anaconda distribution) from the rapidsai channel. cuDF is supported only on Linux, and with Python versions 3.7 and later. ...

Downloads: 4 This Week

Last Update: 2026-02-05
See Project
99.99% Uptime for MySQL and PostgreSQL on Google Cloud
Enterprise Plus edition delivers sub-second maintenance downtime and 2x read/write performance. Built for critical apps.

Cloud SQL Enterprise Plus gives you a 99.99% availability SLA with near-zero downtime maintenance—typically under 10 seconds. Get 2x better read/write performance, intelligent data caching, and 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server with built-in vector search for gen AI apps. New customers get $300 in free credit.

Try Cloud SQL Free
10

XGBoost

Scalable and Flexible Gradient Boosting

...XGBoost works by implementing machine learning algorithms under the Gradient Boosting framework. It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems. XGBoost can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.

Downloads: 7 This Week

Last Update: 2026-02-10
See Project
11

Great Expectations

Always know what to expect from your data

Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams. Expectations are assertions for data.

Downloads: 1 This Week

Last Update: 2026-02-13
See Project
12

Metaflow

A framework for real-life data science

Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.

Downloads: 1 This Week

Last Update: 2026-02-09
See Project
13

PySyft

Data science on data without acquiring a copy

...Wherever your data wants to live in your ownership, the Syft ecosystem exists to help keep it there while allowing it to be used privately.

Downloads: 1 This Week

Last Update: 2025-02-13
See Project
14

Awesome Fraud Detection Research Papers

A curated list of data mining papers about fraud detection

A curated list of data mining papers about fraud detection from several conferences.

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
15

Nuclio

High-Performance Serverless event and data processing platform

Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. ...

Downloads: 1 This Week

Last Update: 8 hours ago
See Project
16

NVIDIA Merlin

Library providing end-to-end GPU-accelerated recommender systems

...For more information, see NVIDIA Merlin on the NVIDIA developer website. Transform data (ETL) for preprocessing and engineering features. Accelerate your existing training pipelines in TensorFlow, PyTorch, or FastAI by leveraging optimized, custom-built data loaders. Scale large deep learning recommender models by distributing large embedding tables that exceed available GPU and CPU memory. Deploy data transformations and trained models to production with only a few lines of code.

Downloads: 0 This Week

Last Update: 2024-06-14
See Project
17

DearPyGui

Graphical User Interface Toolkit for Python with minimal dependencies

...DPG is well suited for creating simple user interfaces as well as developing complex and demanding graphical interfaces. DPG offers a solid framework for developing scientific, engineering, gaming, data science and other applications that require fast and interactive interfaces. The Tutorials will provide a great overview and links to each topic in the API Reference for more detailed reading. Complete theme and style control. GPU-based rendering and efficient C/C++ code.

Downloads: 2 This Week

Last Update: 2026-02-04
See Project
18

Dask

Parallel computing with task scheduling

...It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
19

NannyML

Detecting silent model failure. NannyML estimates performance

NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases. NannyML closes the loop with performance monitoring and post deployment data science, empowering data scientist to quickly understand and automatically detect silent model failure. ...

Downloads: 0 This Week

Last Update: 2025-07-12
See Project
20

SageMaker Training Toolkit

Train machine learning models within Docker containers

Train machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. ...

Downloads: 2 This Week

Last Update: 2025-09-22
See Project
21

the-turing-way

Book repository for The Turing Way

A community‑led open handbook and living documentation project from the Alan Turing Institute, providing best practices and open guidance for reproducible, ethical, collaborative data science and research.

Downloads: 0 This Week

Last Update: 2025-08-06
See Project
22

AWS SDK for pandas

Easy integration with Athena, Glue, Redshift, Timestream, Neptune

aws-sdk-pandas (formerly AWS Data Wrangler) bridges pandas with the AWS analytics stack so DataFrames flow seamlessly to and from cloud services. With a few lines of code, you can read from and write to Amazon S3 in Parquet/CSV/JSON/ORC, register tables in the AWS Glue Data Catalog, and query with Amazon Athena directly into pandas. The library abstracts efficient patterns like partitioning, compression, and vectorized I/O so you get performant data lake operations without hand-rolling boilerplate. ...

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
23

targets

Function-oriented Make-like declarative workflows for R

The targets package is a pipeline / workflow management tool in R, designed to coordinate multi‐step computational workflows in data science / statistics. It tracks dependencies between “targets” (computational steps), skips steps whose upstream data or code hasn’t changed, supports parallel computation, branching (dynamic generation of sub‐targets), file format abstractions, and encourages reproducible and efficient analyses. It’s something like GNU Make for R, but more integrated. ...

Downloads: 0 This Week

Last Update: 2026-02-09
See Project
24

Recommenders

Best practices on recommendation systems

...Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks. Independent or incubating algorithms and utilities are candidates for the contrib folder. ...

Downloads: 0 This Week

Last Update: 2024-12-23
See Project
25

ClearML

Streamline your ML workflow

...The ClearML Server storing experiment, model, and workflow data, and supports the Web UI experiment manager, and ML-Ops automation for reproducibility and tuning. It is available as a hosted service and open source for you to deploy your own ClearML Server. The ClearML Agent for ML-Ops orchestration, experiment and workflow reproducibility, and scalability.

Downloads: 0 This Week

Last Update: 2026-01-25
See Project