Showing 1057 open source projects for "python data analysis"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Get Physics Done (GPD)

    Get Physics Done (GPD)

    The first open-source agentic AI physicist

    Get Physics Done (GPD) is an open-source project designed to accelerate scientific research in physics by leveraging modern computational tools and automation techniques. It aims to simplify the process of performing simulations, calculations, and experimental analysis by providing structured workflows that integrate computational physics methods with reproducible research practices. The project focuses on reducing the friction involved in setting up experiments, running simulations, and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    DeiT (Data-efficient Image Transformers)
    DeiT (Data-efficient Image Transformers) shows that Vision Transformers can be trained competitively on ImageNet-1k without external data by using strong training recipes and knowledge distillation. Its key idea is a specialized distillation strategy—including a learnable “distillation token”—that lets a transformer learn effectively from a CNN or transformer teacher on modest-scale datasets. The project provides compact ViT variants (Tiny/Small/Base) that achieve excellent...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    ...The first software requirement is Python 2.6, 2.7, or Python 3.3+. This is required to use the library. PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations. To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 4
    Airtable MCP

    Airtable MCP

    Airtable integration for AI-powered applications

    Airtable MCP is an integration tool that enables AI-powered applications to access and manipulate Airtable databases directly from the IDE using Anthropic's Model Context Protocol (MCP). It allows querying, creating, updating, and deleting records using natural language, facilitating seamless data management. ​
    Downloads: 4 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    MineContext

    MineContext

    MineContext is your proactive context-aware AI partner

    MineContext is an open-source, proactive AI assistant designed to capture, understand, and leverage a user’s digital context in order to provide meaningful insights, summaries, and productivity support. The system continuously collects contextual data from sources such as screenshots and user activity, then processes and organizes this information into structured knowledge that can be reused later. Unlike traditional chat-based assistants, MineContext operates in the background and delivers...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    Strix

    Strix

    Open-source AI hackers to find and fix your app’s vulnerabilities

    Strix is an open source agent-driven security platform that uses autonomous AI agents to identify, investigate, and validate vulnerabilities in software applications. The system is designed to mimic the behavior of real attackers by executing dynamic testing and verifying findings through proof-of-concept exploitation. Unlike traditional vulnerability scanners that rely heavily on static analysis, Strix agents actively run code, probe systems, and attempt exploitation to confirm whether...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 7
    Robyn

    Robyn

    Experimental, AI/ML-powered and open sourced Marketing Mix Modeling

    Robyn is an open-source, AI/ML-powered Marketing Mix Modeling (MMM) toolkit developed by Meta Marketing Science under the “facebookexperimental” GitHub umbrella. Its goal is to democratize rigorous MMM: what traditionally required expert statisticians and expensive consulting becomes accessible to any company with data. Robyn takes in historical data (spends on different marketing channels, conversions, or revenue, and optional context or organic-media variables) and uses a combination of...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    Petastorm

    Petastorm

    Petastorm library enables single machine or distributed training

    ...It can also be used from pure Python code. A dataset created using Petastorm is stored in Apache Parquet format. On top of a Parquet schema, petastorm also stores higher-level schema information that makes multidimensional arrays into a native part of a petastorm dataset. Petastorm supports extensible data codecs. These enable a user to use one of the standard data compressions (jpeg, png) or implement her own.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    SLM Lab

    SLM Lab

    Modular Deep Reinforcement Learning framework in PyTorch

    SLM Lab is a modular and extensible deep reinforcement learning framework designed for research and practical applications. It provides implementations of various state-of-the-art RL algorithms and emphasizes reproducibility, scalability, and detailed experiment tracking. SLM Lab is structured around a flexible experiment management system, allowing users to define, run, and analyze RL experiments efficiently.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    ToolUniverse

    ToolUniverse

    Democratizing AI scientists with ToolUniverse

    ...Instead of requiring custom pipelines or fine-tuning, ToolUniverse wraps around existing models and enables them to reason, experiment, and iterate on complex workflows such as drug discovery, data analysis, and hypothesis testing. The platform abstracts tool usage behind a consistent interface, allowing AI agents to compose multi-step workflows, refine tool definitions automatically, and even generate new tools from natural language descriptions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    notebooklm-py

    notebooklm-py

    Unofficial Python API and agentic skill for Google NotebookLM

    ...The project covers notebook management, source ingestion, conversational querying, research workflows, and sharing controls, while also enabling the generation of a wide range of study and media artifacts. These outputs include audio overviews, videos, slide decks, infographics, quizzes, flashcards, reports, data tables, and mind maps, with configurable formats and export options.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 12
    Sage Chat

    Sage Chat

    Chat with any codebase in under two minutes | Fully local

    Sage is an open-source AI developer assistant designed to help engineers understand and work with complex codebases more effectively. The tool functions similarly to an intelligent research agent that can analyze a repository and answer questions about how the software works. Instead of focusing solely on code generation, Sage emphasizes code comprehension, system architecture analysis, and integration guidance. Developers can ask natural language questions about a project, and the system...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Trail of Bits Skills Marketplace

    Trail of Bits Skills Marketplace

    Trail of Bits Claude Code skills for security research, vulnerability

    Trail of Bits Skills Marketplace is a specialized Claude Code skills marketplace built by the security research firm Trail of Bits that focuses on enhancing AI-assisted workflows for vulnerability discovery, testing, and secure development. The repository groups a set of plug-in skills tailored toward static analysis, code auditing, secure defaults detection, and other practices that matter in software security. Users can easily add the marketplace to a Claude Code environment, browse...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for ML security

    Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, sci-kit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, generation, certification, etc.).
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Unstructured.IO

    Unstructured.IO

    Open source libraries and APIs to build custom preprocessing pipelines

    The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data into...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    Vanna

    Vanna

    Chat with your SQL database

    Vanna.AI is an AI-powered tool for natural language database querying, enabling users to interact with databases using simple English queries. It converts natural language questions into SQL queries, making data access more intuitive for non-technical users.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Code-Graph-RAG

    Code-Graph-RAG

    The ultimate RAG for your monorepo

    Code-Graph-RAG is an advanced retrieval-augmented generation system designed specifically for understanding and interacting with large, multi-language codebases by transforming them into structured knowledge graphs. It uses Tree-sitter to parse source code into abstract syntax trees, extracting relationships between functions, classes, and modules to build a graph-based representation of the entire codebase. This structured approach enables more accurate and context-aware querying compared...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 18
    MLRun

    MLRun

    Machine Learning automation and tracking

    MLRun is an open MLOps framework for quickly building and managing continuous ML and generative AI applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications, significantly reducing engineering efforts, time to production, and computation resources. MLRun breaks the silos between data, ML, software, and DevOps/MLOps teams, enabling collaboration and fast continuous...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    refinery

    refinery

    Open-source choice to scale, assess and maintain natural language data

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact. You are one of the people we've built refinery for. refinery helps you to build better NLP models in a data-centric approach. Semi-automate your labeling, find low-quality subsets in your training data, and monitor your data in one place. refinery doesn't get rid of manual labeling, but it makes sure that your valuable time is spent well. Also, the makers...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    SurfSense

    SurfSense

    Connect any LLM to your internal knowledge sources

    SurfSense is an open-source AI research and knowledge assistant platform that connects any large language model to internal knowledge sources so teams and individuals can explore, query, and collaborate on insights in real time. Built as an alternative to proprietary tools like NotebookLM, Perplexity, and Glean, SurfSense allows integrations with a wide range of external data sources including Slack, Notion, Google Drive, GitHub, YouTube, and many enterprise systems, making it possible to...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    ChatDev

    ChatDev

    Create Customized Software using Natural Language Idea

    ChatDev is an AI-powered development tool designed to simulate the software development lifecycle using multi-agent collaboration. It allows multiple AI agents to take on roles such as product managers, developers, and testers to collaboratively generate, refine, and evaluate software code. This project explores how AI can be leveraged to automate and optimize development workflows.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 22
    JSON_REPAIR

    JSON_REPAIR

    A python module to repair invalid JSON from LLMs

    json_repair is an open-source Python library designed to automatically fix malformed JSON data and convert it into valid, parseable structures. The tool is particularly useful in scenarios where JSON output is generated by large language models or external services that may produce syntactically invalid responses. Instead of failing when encountering errors such as missing quotes, trailing commas, or incomplete objects, the library analyzes the malformed data and reconstructs it into valid JSON. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    NeMo Curator

    NeMo Curator

    Scalable data pre processing and curation toolkit for LLMs

    NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    StatsForecast

    StatsForecast

    Fast forecasting with statistical and econometric models

    ...The library implements a broad set of models, including AutoARIMA, ETS, CES, Theta, plus a battery of benchmarking and baseline methods, giving users flexibility in selecting forecasting approaches depending on data characteristics (trend, seasonality, intermittent demand, etc.). Its internal implementation leverages numba to compile performance-critical code to optimized machine-level instructions, which makes the models much faster than many traditional Python counterparts.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 25
    Hazm

    Hazm

    Persian NLP Toolkit

    Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB