Showing 1355 open source projects for "linux ai"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot). The repository...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    Megatron

    Megatron

    Ongoing research training transformer models at scale

    Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This repository is for ongoing research on training large transformer language models at scale. We developed efficient, model-parallel (tensor, sequence, and pipeline), and multi-node pre-training of transformer based models such as GPT, BERT, and T5 using mixed precision. Megatron is also used in NeMo Megatron, a framework to help enterprises overcome the challenges of building and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Ultravox

    Ultravox

    Fast multimodal LLM for real-time voice interaction and AI apps

    Ultravox is an open source multimodal large language model designed specifically for real-time voice-based interactions. It is built to process both text and spoken audio directly, eliminating the need for a separate speech recognition stage and enabling more seamless conversational experiences. Ultravox works by combining text prompts with encoded audio inputs, allowing it to understand spoken language alongside written instructions in a unified pipeline. Internally, it leverages pretrained...
    Downloads: 6 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    MLE-bench

    MLE-bench

    AI multi-agent framework for automating data-driven R&D workflows

    RD-Agent is an open source AI framework designed to automate research and development workflows in data-driven domains. It uses large language models and multiple collaborating agents to simulate the typical cycle of research, experimentation, and improvement that human data scientists follow. It separates the process into two core phases: a research stage that proposes hypotheses and ideas, and a development stage that implements and evaluates them through code execution and experiments. By...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    xFormers

    xFormers

    Hackable and optimized Transformers building blocks

    xformers is a modular, performance-oriented library of transformer building blocks, designed to allow researchers and engineers to compose, experiment, and optimize transformer architectures more flexibly than monolithic frameworks. It abstracts components like attention layers, feedforward modules, normalization, and positional encoding, so you can mix and match or swap optimized kernels easily. One of its key goals is efficient attention: it supports dense, sparse, low-rank, and...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Tiktoken

    Tiktoken

    tiktoken is a fast BPE tokeniser for use with OpenAI's models

    tiktoken is a high-performance, tokenizer library (based on byte-pair encoding, BPE) designed for use with OpenAI’s models. It handles encoding and decoding text to token IDs efficiently, with minimal overhead. Because tokenization is a fundamental step in preparing text for models, tiktoken is optimized for speed, memory, and correctness in model contexts (e.g. matching OpenAI’s internal tokenization). The repo supports multiple encodings (e.g. “cl100k_base”) and lets users switch encoding...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Prompt flow

    Prompt flow

    Build high-quality LLM apps

    Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, and evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    claude-code-best-practice

    claude-code-best-practice

    Practice made claude perfect

    claude-code-best-practice is a structured knowledge repository that documents advanced workflows, architectural patterns, and optimization strategies for developers using Claude Code in agentic development environments. Rather than being a traditional software library, the project functions as a living playbook that demonstrates how to compose skills, agents, memory files, and rules into maintainable AI-assisted coding systems. The repository emphasizes modularity and progressive disclosure,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    SAM 3D Objects

    SAM 3D Objects

    Models for object and human mesh reconstruction

    SAM 3D Objects is a foundation model that reconstructs full 3D geometry, texture, and spatial layout of objects and scenes from a single image. Given one RGB image and object masks (for example, from the Segment Anything family), it can generate a textured 3D mesh for each object, including pose and approximate scene layout. The model is specifically designed to be robust in real-world images with clutter, occlusions, small objects, and unusual viewpoints, where many earlier 3D-from-image...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    AIConfig

    AIConfig

    AIConfig is a config-based framework to build generative AI apps

    AIConfig is an open-source framework designed to simplify the development and management of generative AI applications by separating AI logic from application code. The framework allows prompts, model configurations, and parameters to be stored as structured configuration files that can be version controlled and managed independently from the rest of the software system. This approach improves collaboration between developers, prompt engineers, and machine learning practitioners by turning...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models. Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    Agently 4

    Agently 4

    Build GenAI application quick and easy

    Agently is a Python framework for building generative-AI (“GenAI”) applications; it focuses on enabling developers to orchestrate AI agents, workflows, and event-driven logic in a robust, reusable way. With Agently, one can define agents that call different models, chain tasks, trigger workflows based on events, and switch models with minimal code changes. It abstracts away boilerplate around model API calls, tool usage, prompt management, and workflow state. The project aims at...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    DOLMA

    DOLMA

    Data and tools for generating and inspecting OLMo pre-training data

    DOLMA (Data Optimization and Learning for Model Alignment) is a framework designed to manage large-scale datasets for training and fine-tuning language models efficiently.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 17
    UpTrain

    UpTrain

    Your open-source LLM evaluation toolkit

    Get scores for factual accuracy, context retrieval quality, guideline adherence, tonality, and many more. You can’t improve what you can’t measure. UpTrain continuously monitors your application's performance on multiple evaluation criterions and alerts you in case of any regressions with automatic root cause analysis. UpTrain enables fast and robust experimentation across multiple prompts, model providers, and custom configurations, by calculating quantitative scores for direct comparison...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Streamer-Sales

    Streamer-Sales

    LLM Large Model of Selling Anchor

    Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    ChatGLM-6B

    ChatGLM-6B

    ChatGLM-6B: An Open Bilingual Dialogue Language Model

    ChatGLM-6B is an open bilingual (Chinese + English) conversational language model based on the GLM architecture, with approximately 6.2 billion parameters. The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    Pedalboard

    Pedalboard

    A Python library for audio

    pedalboard is a Python library for working with audio: reading, writing, rendering, adding effects, and more. It supports the most popular audio file formats and a number of common audio effects out of the box and also allows the use of VST3® and Audio Unit formats for loading third-party software instruments and effects. pedalboard was built by Spotify’s Audio Intelligence Lab to enable using studio-quality audio effects from within Python and TensorFlow. Internally at Spotify, pedalboard...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 23
    BertViz

    BertViz

    BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

    BertViz is an interactive tool for visualizing attention in Transformer language models such as BERT, GPT2, or T5. It can be run inside a Jupyter or Colab notebook through a simple Python API that supports most Huggingface models. BertViz extends the Tensor2Tensor visualization tool by Llion Jones, providing multiple views that each offer a unique lens into the attention mechanism. The head view visualizes attention for one or more attention heads in the same layer. It is based on the...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    PyTorch Lightning

    PyTorch Lightning

    The lightweight PyTorch wrapper for high-performance AI research

    Scale your models, not your boilerplate with PyTorch Lightning! PyTorch Lightning is the ultimate PyTorch research framework that allows you to focus on the research while it takes care of everything else. It's designed to decouple the science from the engineering in your PyTorch code, simplifying complex network coding and giving you maximum flexibility. PyTorch Lightning can be used for just about any type of research, and was built for the fast inference needed in AI research and...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    VibeThinker

    VibeThinker

    Diversity-driven optimization and large-model reasoning ability

    VibeThinker is a compact but high-capability open-source language model released by WeiboAI (Sina AI Lab). It contains about 1.5 billion parameters, far smaller than many “frontier” models, yet it is explicitly optimized for reasoning, mathematics, and code generation tasks rather than general open-domain chat. The innovation lies in its training methodology: the team uses what they call the Spectrum-to-Signal Principle (SSP), where a first stage emphasizes diversity of reasoning paths (the...
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB