Showing 118 open source projects for "data modeling"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    NeuroMatch Academy (NMA)

    NeuroMatch Academy (NMA)

    NMA Computational Neuroscience course

    ...We have curated a curriculum that spans most areas of computational neuroscience (a hard task in an increasingly big field!). We will expose you to both theoretical modeling and more data-driven analyses. The Neuro Video Series is a series of 12 videos that covers basic neuroscience concepts and neuroscience methods. These videos are completely optional and do not need to be watched in a fixed order so you can pick and choose which videos will help you brush up on your knowledge. The pre-reqs refresher days are asynchronous, so you can go through the material on your own time. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    GluonTS

    GluonTS

    Probabilistic time series modeling in Python

    ...We split the dataset into train and test parts, by removing the last three years (36 months) from the train data. Thus, we will train a model on just the first nine years of data. Python has the notion of extras – dependencies that can be optionally installed to unlock certain features of a package. We make extensive use of optional dependencies in GluonTS to keep the amount of required dependencies minimal. To still allow users to opt-in to certain features, we expose many extra dependencies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    WrenAI

    WrenAI

    Open-source SQL AI Agent for Text-to-SQL. Make Text2SQL Easy

    ...With Wren AI, you can process metadata, schema, terminology, data relationships, and the logic behind calculations and aggregations with “Modeling Definition Language”, to generate accurate SQL queries with semantic context. When starting a new conversation in Wren AI, your question is used to find the most relevant tables. From these, LLM generates three relevant questions for the user to choose from.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    ContextGem

    ContextGem

    ContextGem: Effortless LLM extraction from documents

    ContextGem is an open-source framework designed to simplify the extraction of structured data and insights from documents using large language models (LLMs). It provides a flexible, intuitive API that minimizes boilerplate code, enabling developers to build complex extraction workflows efficiently. ContextGem supports various document formats and integrates with multiple LLM providers, making it a versatile tool for tasks like contract analysis, anomaly detection, and information retrieval.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Claude Ads

    Claude Ads

    Comprehensive paid advertising audit & optimization skill

    ...The architecture uses parallel subagents to speed up audits and includes financial modeling and A/B testing guidance. It runs locally, ensuring privacy and control over sensitive advertising data. The project is aimed at replacing manual audit workflows with fast, automated, and repeatable analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    BuildingAI

    BuildingAI

    Build your own AI application system for free

    BuildingAI is an open-source project focused on applying artificial intelligence techniques to architectural design and building information modeling workflows. The platform aims to bridge the gap between natural language interfaces and building design tools by allowing AI systems to interpret user instructions and convert them into structured architectural operations. By combining generative AI capabilities with building data models, the system can assist with tasks such as design generation, spatial reasoning, and building component creation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Claude for Financial Services

    Claude for Financial Services

    Reference agents, skills, and data for the financial-services

    ...It supports deployment either as Claude Cowork plugins or through the Claude Managed Agents API, allowing organizations to integrate the same logic into internal systems and automation pipelines. The repository includes tools for competitive analysis, financial modeling, market research, data-pack generation, and strategic synthesis. Its architecture emphasizes modularity, enabling firms to customize workflows and extend functionality for proprietary use cases. Overall, the project serves as a foundation for building AI-enhanced financial research and decision-support systems.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    PySINDy

    PySINDy

    A package for the sparse identification of nonlinear dynamical systems

    PySINDy is a Python library that implements the Sparse Identification of Nonlinear Dynamics (SINDy) method for discovering mathematical models of dynamical systems from data. The framework focuses on identifying governing equations that describe the behavior of complex physical systems by selecting sparse combinations of candidate functions. Instead of fitting a purely predictive machine learning model, PySINDy attempts to recover interpretable differential equations that explain how a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    EconML

    EconML

    Python Package for ML-Based Heterogeneous Treatment Effects Estimation

    EconML is a Python package for estimating heterogeneous treatment effects from observational data via machine learning. This package was designed and built as part of the ALICE project at Microsoft Research with the goal of combining state-of-the-art machine learning techniques with econometrics to bring automation to complex causal inference problems. One of the biggest promises of machine learning is to automate decision-making in a multitude of domains. At the core of many data-driven...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    Machine Learning Study

    Machine Learning Study

    This repository is for helping those interested in machine learning

    ...It often demonstrates how to implement algorithms using widely used libraries such as NumPy, pandas, scikit-learn, and TensorFlow. Many examples include dataset preparation, visualization of results, and experimentation with different modeling approaches.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    NVIDIA PhysicsNeMo

    NVIDIA PhysicsNeMo

    Open-source deep-learning framework for building and training

    ...The framework focuses on the emerging field of physics-informed machine learning, where neural networks are used alongside physical equations to model complex scientific systems. PhysicsNeMo provides modular Python components that allow developers to create scalable training and inference pipelines for models that combine data-driven learning with physics-based constraints. It is built on top of the PyTorch ecosystem and integrates with GPU-accelerated computing environments to handle computationally demanding simulations and datasets. The framework supports a wide range of scientific applications, including computational fluid dynamics, climate modeling, weather prediction, and engineering simulations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    NBA Sports Betting Machine Learning

    NBA Sports Betting Machine Learning

    NBA sports betting using machine learning

    NBA-Machine-Learning-Sports-Betting is an open-source Python project that applies machine learning techniques to predict outcomes of National Basketball Association games for analytical and betting-related research. The system gathers historical team statistics and game data spanning multiple seasons, beginning with the 2007–2008 NBA season and continuing through the present. Using this dataset, the project constructs matchup features that represent team performance trends and contextual...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Alpamayo 1

    Alpamayo 1

    Bridging Reasoning and Action Prediction

    ...The model is designed as a foundational component rather than a complete driving stack, allowing developers to build custom autonomous vehicle applications on top of it. It incorporates vision-language-action modeling, enabling it to process sensor data and contextual information simultaneously. Alpamayo supports tasks such as trajectory prediction, auto-labeling, and reasoning-based decision making. The system is optimized for high-performance GPU environments and is intended primarily for experimentation and benchmarking. Overall, it represents an advanced step toward integrating reasoning into autonomous driving pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Kaldi

    Kaldi

    kaldi-asr/kaldi is the official location of the Kaldi project

    ...Kaldi is designed for researchers who need a highly customizable environment to experiment with new algorithms, as well as for practitioners who want robust, production-ready ASR pipelines. It includes extensive tools for data preparation, feature extraction, acoustic and language modeling, decoding, and evaluation. With its modular design, Kaldi allows users to adapt the system to a wide range of languages and domains. As one of the most influential projects in speech recognition, it has become a foundation for much of the modern work in ASR.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    mlforecast

    mlforecast

    Scalable machine learning for time series forecasting

    mlforecast is a time-series forecasting framework built around machine-learning models, designed to make forecasting both efficient and scalable. It lets you apply any regressor that follows the typical scikit-learn API, for example, gradient-boosted trees or linear models, to time-series data by automating much of the messy feature engineering and data preparation. Instead of writing custom code to build lagged features, rolling statistics, and date-based predictors, mlforecast generates...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    UNO

    UNO

    A Universal Customization Method for Single and Multi Conditioning

    UNO is a project by ByteDance introduced in 2025, titled “A Universal Customization Method for Both Single and Multi-Subject Conditioning.” It suggests a framework for image (or more general generative) modeling where the model can be conditioned either on a single subject or multiple subjects — which may correspond to generating or customizing images featuring specific people, styles, or objects, possibly with fine-grained control over subject identity or composition. Because the project is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Large Concept Model

    Large Concept Model

    Language modeling in a sentence representation space

    Large Concept Model is a research codebase centered on concept-centric representation learning at scale, aiming to capture shared structure across many categories and modalities. It organizes training around concepts (rather than just raw labels), encouraging models to understand attributes, relations, and compositional structure that transfer across tasks. The repository provides training loops, data tooling, and evaluation routines to learn and probe these concept embeddings, typically...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    deepjazz

    deepjazz

    Deep learning driven jazz generation using Keras & Theano

    deepjazz is a deep learning project that generates jazz music using recurrent neural networks trained on MIDI files. The repository demonstrates how machine learning can learn musical structure and produce original compositions. It uses the Keras and Theano libraries to build a two-layer Long Short-Term Memory network capable of learning temporal patterns in music. The system analyzes musical sequences from an input MIDI file and then generates new musical notes that follow similar stylistic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    JiT

    JiT

    PyTorch implementation of JiT

    JiT is an open-source PyTorch implementation of a state-of-the-art image diffusion model designed around a minimalist yet powerful architecture for pixel-level generative modeling, based on the paper Back to Basics: Let Denoising Generative Models Denoise. Rather than predicting noise, JiT models directly predict clean image data, which the research suggests aligns better with the manifold structure of natural images and leads to stronger generative performance at high resolution. This implementation supports training on large datasets like ImageNet with configurable model variants, and practical scripts for setup, training, and evaluation on GPUs are included, leveraging PyTorch’s ecosystem for real-world experimentation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    ...The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. Meanwhile, PLM integrates with PE to power vision-language modeling, achieving results competitive with leading multimodal systems such as QwenVL2.5 and InternVL3, all while being fully reproducible with open data. The project supports a wide range of research applications, from visual recognition and dense prediction to fine-grained multimodal understanding. Additionally, it includes several large-scale open datasets for both image and video perception.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    AutoMLPipeline.jl

    AutoMLPipeline.jl

    Package that makes it trivial to create and evaluate machine learning

    ...To illustrate, here is a pipeline expression and evaluation of a typical machine learning workflow that extracts numerical features (numf) for ica (Independent Component Analysis) and pca (Principal Component Analysis) transformations, respectively, concatenated with the hot-bit encoding (ohe) of categorical features (catf) of a given data for rf (Random Forest) modeling.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Featuretools

    Featuretools

    An open source python library for automated feature engineering

    ...Featuretools automatically creates features from temporal and relational datasets. Featuretools uses DFS for automated feature engineering. You can combine your raw data with what you know about your data to build meaningful features for machine learning and predictive modeling. Featuretools provides APIs to ensure only valid data is used for calculations, keeping your feature vectors safe from common label leakage problems. You can specify prediction times row-by-row. Featuretools come with a library of low-level functions that can be stacked to create features. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    BAML

    BAML

    The AI framework that adds the engineering to prompt engineering

    BAML is an open-source framework and domain-specific language designed to bring structured engineering practices to prompt development for large language model applications. Instead of treating prompts as unstructured text, BAML introduces a schema-driven approach where prompts are defined as typed functions with explicit inputs and outputs. This design allows developers to treat language model interactions as predictable software components rather than ad-hoc prompt strings. The framework...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    StatsForecast

    StatsForecast

    Fast forecasting with statistical and econometric models

    StatsForecast is a Python library for time-series forecasting that delivers a suite of classical statistical and econometric forecasting models optimized for high performance and scalability. It is designed not just for academic experiments but for production-level time-series forecasting, meaning it handles forecasting for many series at once, efficiently, reliably, and with minimal overhead. The library implements a broad set of models, including AutoARIMA, ETS, CES, Theta, plus a battery...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    ...It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. It also contains training code and recipes, so researchers can fine-tune on custom data or explore new objectives without building infrastructure from scratch. Example notebooks, CLI tools, and audio utilities help with prompt design, conditioning on reference audio, and post-processing to produce ready-to-share outputs.
    Downloads: 8 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB