Showing 1058 open source projects for "python data analysis"

View related business solutions
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    LitterBox

    LitterBox

    A secure sandbox environment for malware developers and red teamers

    LitterBox is a controlled malware-analysis and payload-testing sandbox aimed at red teams who need to validate evasions and behaviors before deployment. It provides an isolated environment to exercise payloads against modern detection stacks, verify signatures and heuristics, and observe runtime characteristics without leaking binaries to third-party vendors. The README frames typical use cases: testing evasion, validating detections, analyzing behavior, and keeping sensitive tooling...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Memori

    Memori

    SQL-native memory layer enabling persistent context for AI agents

    Memori is an open source SQL-native memory engine designed to add persistent memory capabilities to AI applications, large language models, and multi-agent systems. It provides a memory layer that automatically captures conversations and interactions between users and AI models, allowing systems to retain knowledge across sessions instead of operating statelessly. It extracts structured information such as facts, preferences, rules, and summaries from interactions and stores them in standard...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Hamilton DAGWorks

    Hamilton DAGWorks

    Helps scientists define testable, modular, self-documenting dataflow

    Hamilton is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is portable; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is expressive; Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Physical Symbolic Optimization (Φ-SO)

    Physical Symbolic Optimization (Φ-SO)

    Physical Symbolic Optimization

    Physical Symbolic Optimization (Φ-SO) - A symbolic optimization package built for physics. Symbolic regression module uses deep reinforcement learning to infer analytical physical laws that fit data points, searching in the space of functional forms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    nesa

    nesa

    Run AI models end-to-end encrypted

    nesa is an open-source initiative focused on building decentralized AI infrastructure that enables secure, verifiable, and privacy-preserving machine learning and inference across distributed environments. The project aims to address key challenges in modern AI systems, such as data privacy, trust, and centralization, by leveraging cryptographic techniques and decentralized architectures. NESA allows developers to run AI computations in a way that ensures data integrity and confidentiality,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Superduper

    Superduper

    Superduper: Integrate AI models and machine learning workflows

    Superduper is a Python-based framework for building end-2-end AI-data workflows and applications on your own data, integrating with major databases. It supports the latest technologies and techniques, including LLMs, vector-search, RAG, and multimodality as well as classical AI and ML paradigms. Developers may leverage Superduper by building compositional and declarative objects that out-source the details of deployment, orchestration versioning, and more to the Superduper engine. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Featuretools

    Featuretools

    An open source python library for automated feature engineering

    An open source Python framework for automated feature engineering. Featuretools automatically creates features from temporal and relational datasets. Featuretools uses DFS for automated feature engineering. You can combine your raw data with what you know about your data to build meaningful features for machine learning and predictive modeling. Featuretools provides APIs to ensure only valid data is used for calculations, keeping your feature vectors safe from common label leakage problems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    AutoResearchClaw

    AutoResearchClaw

    Autonomous research from idea to paper. Chat an Idea. Get a Paper 🦞

    AutoResearchClaw is an open-source framework designed to automatically generate full academic research papers from a single idea or topic. Built in Python, it orchestrates a multi-stage research pipeline that gathers literature, formulates hypotheses, runs experiments, analyzes results, and writes the final paper. The system retrieves real academic references from sources such as arXiv and Semantic Scholar to ensure credible citations. It can automatically generate code for experiments, run...
    Downloads: 31 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    TimeMixer

    TimeMixer

    Decomposable Multiscale Mixing for Time Series Forecasting

    TimeMixer is a deep learning framework designed for advanced time series forecasting and analysis using a multiscale neural architecture. The model focuses on decomposing time series data into multiple temporal scales in order to capture both short-term seasonal patterns and long-term trends. Instead of relying on traditional recurrent or transformer-based architectures, TimeMixer is implemented as a fully multilayer perceptron–based model that performs temporal mixing across different resolutions of the data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    deepjazz

    deepjazz

    Deep learning driven jazz generation using Keras & Theano

    deepjazz is a deep learning project that generates jazz music using recurrent neural networks trained on MIDI files. The repository demonstrates how machine learning can learn musical structure and produce original compositions. It uses the Keras and Theano libraries to build a two-layer Long Short-Term Memory network capable of learning temporal patterns in music. The system analyzes musical sequences from an input MIDI file and then generates new musical notes that follow similar stylistic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    X-AnyLabeling

    X-AnyLabeling

    Effortless data labeling with AI support from Segment Anything

    X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 13
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    CTGAN

    CTGAN

    Conditional GAN for generating synthetic tabular data

    CTGAN is a collection of Deep Learning based synthetic data generators for single table data, which are able to learn from real data and generate synthetic data with high fidelity. If you're just getting started with synthetic data, we recommend installing the SDV library which provides user-friendly APIs for accessing CTGAN. The SDV library provides wrappers for preprocessing your data as well as additional usability features like constraints. When using the CTGAN library directly, you may...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OpenDataMCP

    OpenDataMCP

    Connect any Open Data to any LLM with Model Context Protocol

    An initiative aimed at connecting open datasets to Large Language Models (LLMs) using the Model Context Protocol, facilitating seamless access and integration of public data into AI applications. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Transformer Debugger

    Transformer Debugger

    Tool for exploring and debugging transformer model behaviors

    Transformer Debugger (TDB) is a research tool developed by OpenAI’s Superalignment team to investigate and interpret the behaviors of small language models. It combines automated interpretability methods with sparse autoencoders, enabling researchers to analyze how specific neurons, attention heads, and latent features contribute to a model’s outputs. TDB allows users to intervene directly in the forward pass of a model and observe how such interventions change predictions, making it...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    OpenAdapt

    OpenAdapt

    Open Source Generative Process Automation

    OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web Graphical User Interfaces (GUIs). OpenAdapt learns to automate your desktop and web workflows by observing your demonstrations. Spend less time on repetitive tasks and more on work that truly matters. Boost team productivity in HR operations. Automate candidate sourcing using LinkedIn Recruiter, LinkedIn Talent Solutions, GetProspect, Reply.io, outreach.io, Gmail/Outlook, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    TPOT

    TPOT

    A Python Automated Machine Learning tool that optimizes ML

    Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    PySyft

    PySyft

    Data science on data without acquiring a copy

    Most software libraries let you compute over the information you own and see inside of machines you control. However, this means that you cannot compute on information without first obtaining (at least partial) ownership of that information. It also means that you cannot compute using machines without first obtaining control over those machines. This is very limiting to human collaboration and systematically drives the centralization of data, because you cannot work with a bunch of data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    OpenPlanter

    OpenPlanter

    Language-model investigation agent with a terminal UI

    OpenPlanter is an open-source Python project focused on building an intelligent automated planting or gardening system powered by software control and data processing. The repository is designed to help developers and hobbyists create programmable plant management workflows that can monitor, schedule, and optimize growing conditions. It emphasizes automation and extensibility, allowing integration with sensors, environmental data, and control logic for smart cultivation setups. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Petastorm

    Petastorm

    Petastorm library enables single machine or distributed training

    ...It can also be used from pure Python code. A dataset created using Petastorm is stored in Apache Parquet format. On top of a Parquet schema, petastorm also stores higher-level schema information that makes multidimensional arrays into a native part of a petastorm dataset. Petastorm supports extensible data codecs. These enable a user to use one of the standard data compressions (jpeg, png) or implement her own.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    A.I.G

    A.I.G

    Full-stack AI Red Teaming platform

    AI-Infra-Guard is a powerful open-source security platform from Tencent’s Zhuque Lab designed to assess the safety and resilience of AI infrastructures, codebases, and components through automated scanning and evaluation tools. It brings together AI infrastructure vulnerability scanning, MCP server risk analysis, and jailbreak evaluation into a unified workflow so that enterprises and individuals can identify critical security issues without relying on external services. Users can deploy it...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    SuperDuperDB

    SuperDuperDB

    Integrate, train and manage any AI models and APIs with your database

    Build and manage AI applications easily without needing to move your data to complex pipelines and specialized vector databases. Integrate AI and vector search directly with your database including real-time inference and model training. Just using Python. A single scalable deployment of all your AI models and APIs which is automatically kept up-to-date as new data is processed immediately. No need to introduce an additional database and duplicate your data to use vector search and build on top of it. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    WhatsApp MCP Server

    WhatsApp MCP Server

    WhatsApp MCP server enabling AI access to chats and messaging

    ...It acts as a bridge between WhatsApp and large language models, allowing controlled access to messages, chats, and contacts. whatsapp-mcp is composed of two main components: a Go-based bridge that connects to the WhatsApp Web API and stores data locally, and a Python-based MCP server that exposes tools for AI interaction. All message data is stored in a local SQLite database and is only accessed when explicitly requested through defined tools, giving users control over how their data is used. It supports both sending and receiving messages, including various media types such as images, audio, videos, and documents. ...
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB