Showing 1214 open source projects for "python data analysis"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    Weights and Biases

    Weights and Biases

    Tool for visualizing and tracking your machine learning experiments

    Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production models. Quickly identify model regressions. Use W&B to visualize results in real time, all in a central dashboard. Focus on the interesting ML. Spend less time manually tracking results in spreadsheets and text files. Capture dataset versions with W&B Artifacts to identify how changing data affects your resulting models. Reproduce any model, with saved...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Griptape

    Griptape

    Python framework for AI workflows and pipelines with chain of thought

    The Griptape framework provides developers with the ability to create AI systems that operate across two dimensions: predictability and creativity. For predictability, Griptape enforces structures like sequential pipelines, DAG-based workflows, and long-term memory. To facilitate creativity, Griptape safely prompts LLMs with tools (keeping output data off prompt by using short-term memory), which connects them to external APIs and data stores. The framework allows developers to transition...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    RecBole

    RecBole

    A unified, comprehensive and efficient recommendation library

    A unified, comprehensive and efficient recommendation library. We design general and extensible data structures to unify the formatting and usage of various recommendation datasets. We implement more than 100 commonly used recommendation algorithms and provide formatted copies of 28 recommendation datasets. We support a series of widely adopted evaluation protocols or settings for testing and comparing recommendation algorithms.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    fastai

    fastai

    Deep learning library

    ...It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    gTTS

    gTTS

    Python library and CLI tool to interface with Google Translate

    gTTS (Google Text-to-Speech) is a Python library and command-line tool that wraps the speech functionality of Google Translate. It lets you send text to the Google Translate TTS endpoint and receive spoken audio back as MP3 data, either written to a file, a file-like object, or standard output. The library is designed to handle long texts, using a speech-specific sentence tokenizer that keeps intonation and punctuation natural while splitting requests into acceptable chunks. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    Pedalboard

    Pedalboard

    A Python library for audio

    pedalboard is a Python library for working with audio: reading, writing, rendering, adding effects, and more. It supports the most popular audio file formats and a number of common audio effects out of the box and also allows the use of VST3® and Audio Unit formats for loading third-party software instruments and effects. pedalboard was built by Spotify’s Audio Intelligence Lab to enable using studio-quality audio effects from within Python and TensorFlow.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    OpenMLDB

    OpenMLDB

    OpenMLDB is an open-source machine learning database

    ...However, a feature engineering script developed by data scientists (Python scripts in most cases) cannot be directly deployed into production for online inference because it usually cannot meet the engineering requirements, such as low latency, high throughput and high availability.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    OpenMemory

    OpenMemory

    Local long-term memory engine for AI apps with persistent storage

    OpenMemory is a self-hosted memory engine designed to provide long-term, persistent storage for AI and LLM-powered applications. It enables developers to give otherwise stateless models a structured memory layer that can store, retrieve, and manage contextual information over time. OpenMemory is built around a hierarchical memory architecture that organizes data into semantic sectors and connects them through a graph-based structure for efficient retrieval. It supports multiple embedding...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    BrowserOS

    BrowserOS

    Agentic browser; privacy-first alternative to ChatGPT Atlas

    BrowserOS is an open-source, agentic web browser built on a Chromium base that integrates AI agents directly into the browsing experience. Rather than just doing standard browsing, it places AI intelligence at the core: you can connect your own API keys (for e.g., OpenAI, Anthropic, Google Gemini) or run local models (via e.g., Ollama) so that your browsing data and automation stay on your machine — privacy and control are emphasized throughout. The interface remains familiar to users of...
    Downloads: 35 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    MegaParse

    MegaParse

    File Parser optimised for LLM Ingestion with no loss

    MegaParse is a file parser optimized for Large Language Model (LLM) ingestion, ensuring no loss of information. It efficiently parses various document formats, such as PDFs, DOCX, and PPTX, converting them into formats ideal for processing by LLMs. This tool is essential for applications that require accurate and comprehensive data extraction from diverse document types.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Matrix

    Matrix

    Multi-Agent daTa geneRation Infra and eXperimentation framework

    Matrix is a distributed, large-scale engine for multi-agent synthetic data generation and experiments: it provides the infrastructure to run thousands of “agentic” workflows concurrently (e.g. multiple LLMs interacting, reasoning, generating content, data-processing pipelines) by leveraging distributed computing (like Ray + cluster management). The idea is to treat data generation as a “data-to-data” transformation: each input item defines a task, and the runtime orchestrates asynchronous,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    CodiumAI PR-Agent

    CodiumAI PR-Agent

    AI-Powered tool for automated pull request analysis

    CodiumAI PR-Agent is an open-source tool aiming to help developers review pull requests faster and more efficiently. It automatically analyzes the pull request and can provide several types of commands. See the Usage Guide for instructions how to run the different tools from CLI, online usage, Or by automatically triggering them when a new PR is opened. You can try GPT-4 powered PR-Agent, on your public GitHub repository, instantly. Just mention @CodiumAI-Agent and add the desired command in...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Cube Studio

    Cube Studio

    Cube Studio open source cloud native one-stop machine learning

    Cube Studio is an open-source, cloud-native end-to-end machine learning and AI platform designed to support the full lifecycle of AI development — from data preparation and interactive notebook coding to distributed training, model tuning, and deployment in production-ready environments. It provides a unified interface where teams can manage data sources, track datasets, and build pipelines using drag-and-drop workflow orchestration, making it accessible for both engineers and data...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    CoPaw

    CoPaw

    Your Personal AI Assistant; easy to install, deploy on local or coud

    CoPaw is a personal AI assistant designed to run on your own machine or in the cloud, giving you full control over memory, models, and data. Built by the AgentScope team, it connects to multiple chat platforms—including DingTalk, Feishu, QQ, Discord, iMessage, and more—through a single unified assistant. CoPaw supports both cloud-based LLM providers and fully local models such as llama.cpp, MLX, and Ollama, allowing you to operate without API keys if preferred. It includes a browser-based...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 15
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Label Sleuth

    Label Sleuth

    Open source no-code system for text annotation and building of text

    An open-source no-code system for text annotation and building text classifiers. No AI knowledge needed. From task definition to working model in just a few hours! While domain experts label their data, Label Sleuth automatically trains in the background-appropriate machine learning models. To avoid wasted labeling effort, Label Sleuth employs active learning techniques to guide the user in what they should be labeled next. Domain experts can quickly start labeling their data through an...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Marvin

    Marvin

    A batteries-included library for building AI-powered software

    Meet Marvin: a batteries-included library for building AI-powered software. Marvin's job is to integrate AI directly into your codebase by making it look and feel like any other function. Marvin introduces a new concept called AI Functions. These functions differ from conventional ones in that they don’t rely on source code, but instead generate their outputs on-demand through AI. With AI functions, you don't have to write complex code for tasks like extracting entities from web pages,...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 18
    TensorFlow

    TensorFlow

    TensorFlow is an open source library for machine learning

    Originally developed by Google for internal use, TensorFlow is an open source platform for machine learning. Available across all common operating systems (desktop, server and mobile), TensorFlow provides stable APIs for Python and C as well as APIs that are not guaranteed to be backwards compatible or are 3rd party for a variety of other languages. The platform can be easily deployed on multiple CPUs, GPUs and Google's proprietary chip, the tensor processing unit (TPU). TensorFlow...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 19
    LongBench

    LongBench

    LongBench v2 and LongBench (ACL 25'&24')

    LongBench is a comprehensive benchmark designed to evaluate the ability of large language models to understand and reason over very long textual contexts. Traditional language model benchmarks typically evaluate tasks involving relatively short inputs, which does not reflect many real-world applications such as analyzing large documents or entire code repositories. LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    EverMemOS

    EverMemOS

    Long-term memory OS for AI with structured recall and context awarenes

    EverMemOS is an open-source memory operating system built to give AI agents long-term, structured memory. It captures conversations, transforms them into organized memory units, and enables agents to recall past interactions with context and meaning. Instead of treating each prompt independently, it builds evolving user profiles, tracks preferences, and connects related events into coherent narratives. Its architecture combines memory storage, indexing, and retrieval with agent-level...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    VIPER

    VIPER

    AI-powered red team platform for adversary simulation toolkit

    Viper is a comprehensive red teaming and adversary simulation platform designed to support cybersecurity professionals in conducting advanced security assessments. It integrates a wide range of tools and capabilities required for penetration testing, post-exploitation, and attack simulation workflows into a unified environment. Viper emphasizes ease of use through a graphical interface, allowing users to manage complex operations without relying solely on command-line tools. It includes a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    GluonTS

    GluonTS

    Probabilistic time series modeling in Python

    ...We split the dataset into train and test parts, by removing the last three years (36 months) from the train data. Thus, we will train a model on just the first nine years of data. Python has the notion of extras – dependencies that can be optionally installed to unlock certain features of a package. We make extensive use of optional dependencies in GluonTS to keep the amount of required dependencies minimal. To still allow users to opt-in to certain features, we expose many extra dependencies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    E2M

    E2M

    E2M converts various file types (doc, docx, epub, html, htm, url

    E2M is a SourceForge mirror of the e2m open-source project, which focuses on providing tools or services designed to convert or process content between different formats or systems. Projects with similar naming conventions typically emphasize automation workflows where input data from one environment is transformed into another representation or output structure. The mirrored repository allows users to access the project’s codebase independently from its original hosting platform while...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    stt

    stt

    Voice Recognition to Text Tool

    stt is a standalone speech recognition tool that locally converts spoken content in audio or video files into textual formats without requiring internet access, giving users control over their data and reducing reliance on external APIs. It leverages open-source speech models such as Faster-Whisper to recognize and transcribe human speech into plain text, structured JSON objects, or subtitle files with time codes, making it suitable for both personal and professional transcription tasks. The project is designed to be easy to deploy: you can run a local Python server that exposes an HTTP API for uploading audio/video files and retrieving transcriptions in different formats. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    Avalanche

    Avalanche

    End-to-End Library for Continual Learning based on PyTorch

    Avalanche is an end-to-end Continual Learning library based on Pytorch, born within ContinualAI with the unique goal of providing a shared and collaborative open-source (MIT licensed) codebase for fast prototyping, training and reproducible evaluation of continual learning algorithms. Avalanche can help Continual Learning researchers in several ways. This module maintains a uniform API for data handling: mostly generating a stream of data from one or more datasets. It contains all the major...
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB