Showing 107 open source projects for "foundation"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    Hugging Face - Speech To Speech

    Hugging Face - Speech To Speech

    Open speech-to-speech models and pipelines by Hugging Face toolkit AI

    ...It is designed to help researchers and developers experiment with multilingual and cross-lingual voice applications. It integrates with the broader Hugging Face ecosystem, making it easier to load pretrained models and run inference. It also serves as a foundation for building real-time or batch audio transformation systems. Overall, it highlights an emerging approach to voice technology that reduces latency and preserves more of the original speech characteristics.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets, enabling stronger alignment between textual prompts and generated visual content. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    memsearch

    memsearch

    A Markdown-first memory system, a standalone library for any AI agent

    ...Memsearch is designed to be agent-friendly, making it easy to plug into existing AI workflows and enhance reasoning capabilities. Its markdown-first approach ensures transparency and portability of stored knowledge. Overall, it provides a robust foundation for building AI systems with persistent and intelligent memory.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    HolmesGPT

    HolmesGPT

    CNCF Sandbox Project

    ...Rather than requiring engineers to manually correlate large volumes of monitoring data, HolmesGPT automatically synthesizes evidence and presents explanations in natural language. The project is developed by Robusta and has been accepted as a Cloud Native Computing Foundation Sandbox project, highlighting its relevance to the cloud-native ecosystem. It is designed to operate as an automated troubleshooting assistant that can analyze incidents continuously and support on-call engineers during outages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    BCEmbedding

    BCEmbedding

    Netease Youdao's open-source embedding and reranker models

    ...It includes an EmbeddingModel for semantic vector generation and a RerankerModel for refining and ordering search results. The project is optimized for bilingual and cross-lingual retrieval, especially across Chinese and English. It is used as a foundation for RAG systems such as QAnything and other Youdao products. The models are designed to work directly without fine-tuning across common business scenarios such as education, medicine, law, finance, literature, FAQs, textbooks, and general conversation. BCEmbedding also provides integrations for popular RAG frameworks, making it easier to add semantic search and reranking to AI applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    AG2

    AG2

    Framework for building and orchestrating multi-agent AI systems

    ...AG2 is intended for developers experimenting with autonomous systems, research prototypes, or production-grade agent pipelines. AG2 emphasizes flexibility, allowing users to integrate different models and customize behaviors depending on their use case. Overall, it serves as a foundation for building scalable and modular AI agent ecosystems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Magicoder

    Magicoder

    Empowering Code Generation with OSS-Instruct

    ...The project focuses on improving the quality and diversity of code generation by training models with a novel dataset construction approach known as OSS-Instruct. This technique uses open-source code repositories as a foundation for generating more realistic and diverse instruction datasets for training language models. By grounding training data in real open-source examples, Magicoder aims to reduce bias and improve the reliability of code generation results compared to models trained solely on synthetic instructions. The project includes model implementations, training resources, and evaluation benchmarks that demonstrate how the approach improves instruction-following and code synthesis capabilities. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    LLaMA Models

    LLaMA Models

    Utilities intended for use with Llama models

    This repository serves as the central hub for the Llama foundation model family, consolidating model cards, licenses and use policies, and utilities that support inference and fine-tuning across releases. It ties together other stack components (like safety tooling and developer SDKs) and provides canonical references for model variants and their intended usage. The project’s issues and releases reflect an actively used coordination point for the ecosystem, where guidance, utilities, and compatibility notes are published. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    OpenAI Harmony

    OpenAI Harmony

    Renderer for the harmony response format to be used with gpt-oss

    ...For users accessing gpt-oss through third-party providers like HuggingFace, Ollama, or vLLM, Harmony formatting is handled automatically, but developers building custom inference setups must implement it directly. With its flexible design, Harmony serves as the foundation for creating more interpretable, controlled, and extensible interactions with open-weight language models.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Vision Agents

    Vision Agents

    Open Vision Agents by Stream. Build voice and vision agents quickly

    ...Vision Agents is model-agnostic, so developers can connect providers such as OpenAI, Gemini, Claude, Hugging Face, YOLO, Roboflow, and others. Its main value is giving developers a flexible foundation for multimodal agents that operate on live audio and video instead of only static prompts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    CS-Ebook

    CS-Ebook

    Curated list of classic, high-quality computer science books

    ...It spans core areas such as computer fundamentals, programming languages, software engineering, mathematics, data science, and artificial intelligence, making it suitable for learners at different stages. Rather than hosting files, the project serves as a discovery guide, helping users identify essential reading materials and build a strong technical foundation. CS-Ebook is actively maintained and updated to reflect relevant and modern resources while preserving foundational texts. Its organized structure allows users to navigate topics efficiently and follow a progressive learning path. Contributions are encouraged, ensuring the list evolves with community input and continues to highlight valuable resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    OpenTinker

    OpenTinker

    OpenTinker is an RL-as-a-Service infrastructure for foundation models

    OpenTinker is an open-source Reinforcement Learning-as-a-Service (RLaaS) infrastructure intended to democratize reinforcement learning for large language model (LLM) agents. Traditional RL setups can be monolithic and difficult to configure, but OpenTinker separates concerns across agent definition, environment interaction, and execution, which lets developers focus on defining the logic of agents and environments separately from how training and inference are run. It introduces a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    NeMo Curator

    NeMo Curator

    Scalable data pre processing and curation toolkit for LLMs

    NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline expansion and accelerating model convergence through the preparation of high-quality tokens. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MetaVoice-1B

    MetaVoice-1B

    Foundational model for human-like, expressive TTS

    MetaVoice — in the form of its source repository “metavoice-src” — is a large-scale text-to-speech (TTS) model. Specifically, the base model (MetaVoice-1B) uses around 1.2 billion parameters and has been trained on a massive dataset — reportedly around 100,000 hours of speech data. The goal is to provide human-like, expressive, and flexible TTS: able to generate natural-sounding speech that can handle diverse inputs and likely generalize over voice styles, intonation, prosody, and perhaps...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Claude for Financial Services

    Claude for Financial Services

    Reference agents, skills, and data for the financial-services

    ...Its architecture emphasizes modularity, enabling firms to customize workflows and extend functionality for proprietary use cases. Overall, the project serves as a foundation for building AI-enhanced financial research and decision-support systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    llms-from-scratch-cn

    llms-from-scratch-cn

    Build a large language model from 0 only with Python foundation

    llms-from-scratch-cn is an educational open-source project designed to teach developers how to build large language models step by step using practical code and conceptual explanations. The repository provides a hands-on learning path that begins with the fundamentals of natural language processing and gradually progresses toward implementing full GPT-style architectures from the ground up. Rather than focusing on using pre-trained models through APIs, the project emphasizes understanding...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    FireRed-Image-Edit is an open-source general-purpose image editing model and toolset designed to deliver high-fidelity, visually coherent edits across a wide range of editing tasks, from simple object modifications to complex enhancements like restoration and style preservation. It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong instruction following and editing consistency. The model excels in maintaining visual and text stylistic fidelity, allowing users to preserve the original artistic qualities of an image while applying creative changes according to natural language instructions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Ling-V2

    Ling-V2

    Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI

    Ling-V2 is an open-source family of Mixture-of-Experts (MoE) large language models developed by the InclusionAI research organization with the goal of combining state-of-the-art performance, efficiency, and openness for next-generation AI applications. It introduces highly sparse architectures where only a fraction of the model’s parameters are activated per input token, enabling models like Ling-mini-2.0 to achieve reasoning and instruction-following capabilities on par with much larger...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Skill Scanner

    Skill Scanner

    Security Scanner for Agent Skills

    ...While still evolving with contributions and issue discussions, it shows the community’s interest in building safer AI ecosystems around reusable capabilities. The scanner also serves as a foundation for more sophisticated vetting frameworks that might be incorporated into CI/CD pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Chinese-XLNet

    Chinese-XLNet

    Chinese XLNet pre-trained model

    Chinese-XLNet is a Chinese language pre-trained model based on the XLNet architecture, providing an advanced foundation for natural language processing tasks in Mandarin and other Chinese dialects. Unlike traditional masked language modeling, XLNet uses a permutation language modeling objective that captures bidirectional context more effectively by training over all possible token orderings, yielding richer contextual representations. This model is trained on large-scale Chinese text datasets to learn linguistic patterns, long-range dependencies, and semantic nuance typical of Chinese writing, making it useful for tasks like text classification, question answering, named entity recognition, and language generation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Dolphin

    Dolphin

    Document Image Parsing via Heterogeneous Anchor Prompting”

    ...Because multimedia delivery requirements vary widely (adaptive streaming, live feeds, cross-platform compatibility, custom UI, performance constraints), Dolphin aims to offer a foundation that developers can build upon or adapt to their needs. It is designed to integrate with other tools and libraries and provide stable playback or media-processing pipelines, while remaining open-source so that users can inspect, extend, and adapt it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    4M

    4M

    4M: Massively Multimodal Masked Modeling

    4M is a training framework for “any-to-any” vision foundation models that uses tokenization and masking to scale across many modalities and tasks. The same model family can classify, segment, detect, caption, and even generate images, with a single interface for both discriminative and generative use. The repository releases code and models for multiple variants (e.g., 4M-7 and 4M-21), emphasizing transfer to unseen tasks and modalities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework from Tencent Hunyuan, built on their HunyuanVideo foundation. It extends video generation so that given a static reference image plus an optional prompt, it generates a video sequence that preserves the reference image’s identity (especially in the first frame) and allows stylized effects via LoRA adapters. The repository includes pretrained weights, inference and sampling scripts, training code for LoRA effects, and support for parallel inference via xDiT. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Agent S

    Agent S

    Agent S: an open agentic framework that uses computers like a human

    ...The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines powerful foundation models (such as GPT-5) with grounding models like UI-TARS to translate visual inputs into precise executable actions. It supports flexible deployment via CLI, SDK, or cloud, and integrates with multiple model providers including OpenAI, Anthropic, Gemini, Azure, and Hugging Face endpoints. With optional local code execution, reflection mechanisms, and compositional planning, Agent S provides a scalable and research-driven framework for building advanced computer-use agents.
    Downloads: 2 This Week
    Last Update:
    See Project
Auth0 Logo