Best Artificial Intelligence Software for Hugging Face - Page 6

Compare the Top Artificial Intelligence Software that integrates with Hugging Face as of December 2025 - Page 6

This a list of Artificial Intelligence software that integrates with Hugging Face. Use the filters on the left to add additional filters for products that have integrations with Hugging Face. View the products that work with Hugging Face in the table below.

  • 1
    Nutanix Enterprise AI
    Make enterprise AI apps and data easy to deploy, operate, and develop with secure AI endpoints using AI large language models and APIs for generative AI. Nutanix Enterprise AI simplifies and secures GenAI, empowering enterprises to pursue unprecedented productivity gains, revenue growth, and the value that GenAI promises. Streamline workflows to help monitor and manage AI endpoints conveniently, unleashing your inner AI talent. Deploy AI models and secure APIs effortlessly with a point-and-click interface. Choose from Hugging Face, NVIDIA NIM, or your own private models. Run enterprise AI securely, on-premises, or in public clouds on any CNCF-certified Kubernetes runtime while leveraging your current AI tools. Easily create or remove access to your LLMs with role-based access controls of secure API tokens for developers and GenAI application owners. Create URL-ready JSON code for API-ready testing in a single click.
  • 2
    Muse

    Muse

    Microsoft

    Microsoft has unveiled Muse, a groundbreaking generative AI model designed to revolutionize gameplay ideation. Developed in collaboration with Ninja Theory, Muse is a World and Human Action Model (WHAM) trained on data from the game Bleeding Edge. This AI model possesses a comprehensive understanding of 3D game environments, including physics and player interactions, enabling it to generate consistent and diverse gameplay sequences. Muse can produce game visuals and predict controller actions, facilitating rapid prototyping and creative exploration for game developers. By analyzing over 1 billion images and actions, Muse demonstrates the potential to assist in game preservation by recreating classic titles for modern platforms. While still in the early stages, with current outputs at a resolution of 300×180 pixels, Muse represents a significant advancement in integrating AI into the game development process, aiming to enhance, not replace, human creativity.
  • 3
    PaliGemma 2
    PaliGemma 2, the next evolution in tunable vision-language models, builds upon the performant Gemma 2 models, adding the power of vision and making it easier than ever to fine-tune for exceptional performance. With PaliGemma 2, these models can see, understand, and interact with visual input, opening up a world of new possibilities. It offers scalable performance with multiple model sizes (3B, 10B, 28B parameters) and resolutions (224px, 448px, 896px). PaliGemma 2 generates detailed, contextually relevant captions for images, going beyond simple object identification to describe actions, emotions, and the overall narrative of the scene. Our research demonstrates leading performance in chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation, as detailed in the technical report. Upgrading to PaliGemma 2 is a breeze for existing PaliGemma users.
  • 4
    Evo 2

    Evo 2

    Arc Institute

    Evo 2 is a genomic foundation model capable of generalist prediction and design tasks across DNA, RNA, and proteins. It utilizes a frontier deep learning architecture to model biological sequences at single-nucleotide resolution, achieving near-linear scaling of compute and memory relative to context length. Trained with 40 billion parameters and a 1 megabase context length, Evo 2 processes over 9 trillion nucleotides from diverse eukaryotic and prokaryotic genomes. This extensive training enables Evo 2 to perform zero-shot function prediction across multiple biological modalities, including DNA, RNA, and proteins, and to generate novel sequences with plausible genomic architecture. The model's capabilities have been demonstrated in tasks such as designing functional CRISPR systems and predicting disease-causing mutations in human genes. Evo 2 is publicly accessible via Arc's GitHub repository and is integrated into the NVIDIA BioNeMo framework.
  • 5
    Undrstnd

    Undrstnd

    Undrstnd

    ​Undrstnd Developers empowers developers and businesses to build AI-powered applications with just four lines of code. Experience incredibly fast AI inference times, up to 20 times faster than GPT-4 and other leading models. Our cost-effective AI services are designed to be up to 70 times cheaper than traditional providers like OpenAI. Upload your own datasets and train models in under a minute with our easy-to-use data source feature. Choose from a variety of open source Large Language Models (LLMs) to fit your specific needs, all backed by powerful, flexible APIs. Our platform offers a range of integration options to make it easy for developers to incorporate our AI-powered solutions into their applications, including RESTful APIs and SDKs for popular programming languages like Python, Java, and JavaScript. Whether you're building a web application, a mobile app, or an IoT device, our platform provides the tools and resources you need to integrate our AI-powered solutions seamlessly.
  • 6
    VLLM

    VLLM

    VLLM

    VLLM is a high-performance library designed to facilitate efficient inference and serving of Large Language Models (LLMs). Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry. It offers state-of-the-art serving throughput by efficiently managing attention key and value memory through its PagedAttention mechanism. It supports continuous batching of incoming requests and utilizes optimized CUDA kernels, including integration with FlashAttention and FlashInfer, to enhance model execution speed. Additionally, vLLM provides quantization support for GPTQ, AWQ, INT4, INT8, and FP8, as well as speculative decoding capabilities. Users benefit from seamless integration with popular Hugging Face models, support for various decoding algorithms such as parallel sampling and beam search, and compatibility with NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, and more.
  • 7
    Intel Open Edge Platform
    The Intel Open Edge Platform simplifies the development, deployment, and scaling of AI and edge computing solutions on standard hardware with cloud-like efficiency. It provides a curated set of components and workflows that accelerate AI model creation, optimization, and application development. From vision models to generative AI and large language models (LLM), the platform offers tools to streamline model training and inference. By integrating Intel’s OpenVINO toolkit, it ensures enhanced performance on Intel CPUs, GPUs, and VPUs, allowing organizations to bring AI applications to the edge with ease.
  • 8
    JAX

    JAX

    JAX

    ​JAX is a Python library designed for high-performance numerical computing and machine learning research. It offers a NumPy-like API, facilitating seamless adoption for those familiar with NumPy. Key features of JAX include automatic differentiation, just-in-time compilation, vectorization, and parallelization, all optimized for execution on CPUs, GPUs, and TPUs. These capabilities enable efficient computation for complex mathematical functions and large-scale machine-learning models. JAX also integrates with various libraries within its ecosystem, such as Flax for neural networks and Optax for optimization tasks. Comprehensive documentation, including tutorials and user guides, is available to assist users in leveraging JAX's full potential. ​
  • 9
    01.AI

    01.AI

    01.AI

    01.AI offers a comprehensive AI/ML model deployment platform that simplifies the process of training, deploying, and managing machine learning models at scale. It provides powerful tools for businesses to integrate AI into their operations with minimal technical complexity. 01.AI supports end-to-end AI solutions, including model training, fine-tuning, inference, and monitoring. 01. AI's services help businesses optimize their AI workflows, allowing teams to focus on model performance rather than infrastructure. It is designed to support various industries, including finance, healthcare, and manufacturing, offering scalable solutions that enhance decision-making and automate complex tasks.
  • 10
    Amazon SageMaker Unified Studio
    Amazon SageMaker Unified Studio is a comprehensive, AI and data development environment designed to streamline workflows and simplify the process of building and deploying machine learning models. Built on Amazon DataZone, it integrates various AWS analytics and AI/ML services, such as Amazon EMR, AWS Glue, and Amazon Bedrock, into a single platform. Users can discover, access, and process data from various sources like Amazon S3 and Redshift, and develop generative AI applications. With tools for model development, governance, MLOps, and AI customization, SageMaker Unified Studio provides an efficient, secure, and collaborative environment for data teams.
  • 11
    Aurascape

    Aurascape

    Aurascape

    ​Aurascape is an AI-native security platform designed to help businesses innovate securely in the age of AI. It provides comprehensive visibility into AI application interactions, safeguarding against data loss and AI-driven threats. Key features include monitoring AI activities across numerous applications, protecting sensitive data to ensure compliance, defending against zero-day threats, facilitating secure deployment of AI copilots, enforcing coding assistant guardrails, and automating AI security workflows. Aurascape's mission is to enable organizations to adopt AI technologies confidently while maintaining robust security measures. AI applications interact in fundamentally new ways. Communications are dynamic, real-time, and autonomous. Prevent new threats, protect data with unprecedented precision, and keep teams productive. Monitor unsanctioned app usage, risky authentication, and unsafe data sharing.
  • 12
    Phi-4-reasoning
    Phi-4-reasoning is a 14-billion parameter transformer-based language model optimized for complex reasoning tasks, including math, coding, algorithmic problem solving, and planning. Trained via supervised fine-tuning of Phi-4 on carefully curated "teachable" prompts and reasoning demonstrations generated using o3-mini, it generates detailed reasoning chains that effectively leverage inference-time compute. Phi-4-reasoning incorporates outcome-based reinforcement learning to produce longer reasoning traces. It outperforms significantly larger open-weight models such as DeepSeek-R1-Distill-Llama-70B and approaches the performance levels of the full DeepSeek-R1 model across a wide range of reasoning tasks. Phi-4-reasoning is designed for environments with constrained computing or latency. Fine-tuned with synthetic data generated by DeepSeek-R1, it provides high-quality, step-by-step problem solving.
  • 13
    Phi-4-reasoning-plus
    Phi-4-reasoning-plus is a 14-billion parameter open-weight reasoning model that builds upon Phi-4-reasoning capabilities. It is further trained with reinforcement learning to utilize more inference-time compute, using 1.5x more tokens than Phi-4-reasoning, to deliver higher accuracy. Despite its significantly smaller size, Phi-4-reasoning-plus achieves better performance than OpenAI o1-mini and DeepSeek-R1 at most benchmarks, including mathematical reasoning and Ph.D. level science questions. It surpasses the full DeepSeek-R1 model (with 671 billion parameters) on the AIME 2025 test, the 2025 qualifier for the USA Math Olympiad. Phi-4-reasoning-plus is available on Azure AI Foundry and HuggingFace.
  • 14
    Phi-4-mini-reasoning
    Phi-4-mini-reasoning is a 3.8-billion parameter transformer-based language model optimized for mathematical reasoning and step-by-step problem solving in environments with constrained computing or latency. Fine-tuned with synthetic data generated by the DeepSeek-R1 model, it balances efficiency with advanced reasoning ability. Trained on over one million diverse math problems spanning multiple levels of difficulty from middle school to Ph.D. level, Phi-4-mini-reasoning outperforms its base model on long sentence generation across various evaluations and surpasses larger models like OpenThinker-7B, Llama-3.2-3B-instruct, and DeepSeek-R1. It features a 128K-token context window and supports function calling, enabling integration with external tools and APIs. Phi-4-mini-reasoning can be quantized using Microsoft Olive or Apple MLX Framework for deployment on edge devices such as IoT, laptops, and mobile devices.
  • 15
    HunyuanCustom
    HunyuanCustom is a multi-modal customized video generation framework that emphasizes subject consistency while supporting image, audio, video, and text conditions. Built upon HunyuanVideo, it introduces a text-image fusion module based on LLaVA for enhanced multi-modal understanding, along with an image ID enhancement module that leverages temporal concatenation to reinforce identity features across frames. To enable audio- and video-conditioned generation, it further proposes modality-specific condition injection mechanisms, an AudioNet module that achieves hierarchical alignment via spatial cross-attention, and a video-driven injection module that integrates latent-compressed conditional video through a patchify-based feature-alignment network. Extensive experiments on single- and multi-subject scenarios demonstrate that HunyuanCustom significantly outperforms state-of-the-art open and closed source methods in terms of ID consistency, realism, and text-video alignment.
  • 16
    Foundry Local

    Foundry Local

    Microsoft

    Foundry Local is a local version of Azure AI Foundry that enables local execution of large language models (LLMs) directly on your Windows device. This on-device AI inference solution provides privacy, customization, and cost benefits compared to cloud-based alternatives. Best of all, it fits into your existing workflows and applications with an easy-to-use CLI and REST API.
  • 17
    MedGemma

    MedGemma

    Google DeepMind

    MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension. Developers can use MedGemma to accelerate building healthcare-based AI applications. MedGemma currently comes in two variants: a 4B multimodal version and a 27B text-only version. MedGemma 4B utilizes a SigLIP image encoder that has been specifically pre-trained on a variety of de-identified medical data, including chest X-rays, dermatology images, ophthalmology images, and histopathology slides. Its LLM component is trained on a diverse set of medical data, including radiology images, histopathology patches, ophthalmology images, and dermatology images. MedGemma 4B is available in both pre-trained (suffix: -pt) and instruction-tuned (suffix -it) versions. The instruction-tuned version is a better starting point for most applications.
  • 18
    Cake AI

    Cake AI

    Cake AI

    Cake AI is a comprehensive AI infrastructure platform that enables teams to build and deploy AI applications using hundreds of pre-integrated open source components, offering complete visibility and control. It provides a curated, end-to-end selection of fully managed, best-in-class commercial and open source AI tools, with pre-built integrations across the full breadth of components needed to move an AI application into production. Cake supports dynamic autoscaling, comprehensive security measures including role-based access control and encryption, advanced monitoring, and infrastructure flexibility across various environments, including Kubernetes clusters and cloud services such as AWS. Its data layer equips teams with tools for data ingestion, transformation, and analytics, leveraging tools like Airflow, DBT, Prefect, Metabase, and Superset. For AI operations, Cake integrates with model catalogs like Hugging Face and supports modular workflows using LangChain, LlamaIndex, and more.
  • 19
    TensorWave

    TensorWave

    TensorWave

    TensorWave is an AI and high-performance computing (HPC) cloud platform purpose-built for performance, powered exclusively by AMD Instinct Series GPUs. It delivers high-bandwidth, memory-optimized infrastructure that scales with your most demanding models, training, or inference. TensorWave offers access to AMD’s top-tier GPUs within seconds, including the MI300X and MI325X accelerators, which feature industry-leading memory capacity and bandwidth, with up to 256GB of HBM3E supporting 6.0TB/s. TensorWave's architecture includes UEC-ready capabilities that optimize the next generation of Ethernet for AI and HPC networking, and direct liquid cooling that delivers exceptional total cost of ownership with up to 51% data center energy cost savings. TensorWave provides high-speed network storage, ensuring game-changing performance, security, and scalability for AI pipelines. It offers plug-and-play compatibility with a wide range of tools and platforms, supporting models, libraries, etc.
  • 20
    TILDE

    TILDE

    ielab

    TILDE (Term Independent Likelihood moDEl) is a passage re-ranking and expansion framework built on BERT, designed to enhance retrieval performance by combining sparse term matching with deep contextual representations. The original TILDE model pre-computes term weights across the entire BERT vocabulary, which can lead to large index sizes. To address this, TILDEv2 introduces a more efficient approach by computing term weights only for terms present in expanded passages, resulting in indexes that are 99% smaller than those of the original TILDE. This efficiency is achieved by leveraging TILDE as a passage expansion model, where passages are expanded using top-k terms (e.g., top 200) to enrich their content. It provides scripts for indexing collections, re-ranking BM25 results, and training models using datasets like MS MARCO.
  • 21
    Qualcomm Cloud AI SDK
    The Qualcomm Cloud AI SDK is a comprehensive software suite designed to optimize trained deep learning models for high-performance inference on Qualcomm Cloud AI 100 accelerators. It supports a wide range of AI frameworks, including TensorFlow, PyTorch, and ONNX, enabling developers to compile, optimize, and execute models efficiently. The SDK provides tools for model onboarding, tuning, and deployment, facilitating end-to-end workflows from model preparation to production deployment. Additionally, it offers resources such as model recipes, tutorials, and code samples to assist developers in accelerating AI development. It ensures seamless integration with existing systems, allowing for scalable and efficient AI inference in cloud environments. By leveraging the Cloud AI SDK, developers can achieve enhanced performance and efficiency in their AI applications.
  • 22
    VMware Private AI Foundation
    VMware Private AI Foundation is a joint, on‑premises generative AI platform built on VMware Cloud Foundation (VCF) that enables enterprises to run retrieval‑augmented generation workflows, fine‑tune and customize large language models, and perform inference in their own data centers, addressing privacy, choice, cost, performance, and compliance requirements. It integrates the Private AI Package (including vector databases, deep learning VMs, data indexing and retrieval services, and AI agent‑builder tools) with NVIDIA AI Enterprise (comprising NVIDIA microservices like NIM, NVIDIA’s own LLMs, and third‑party/open source models from places like Hugging Face). It supports full GPU virtualization, monitoring, live migration, and efficient resource pooling on NVIDIA‑certified HGX servers with NVLink/NVSwitch acceleration. Deployable via GUI, CLI, and API, it offers unified management through self‑service provisioning, model store governance, and more.
  • 23
    Centific

    Centific

    Centific

    Centific’s frontier AI data foundry platform, powered by NVIDIA edge computing, is purpose-built to accelerate AI deployments by increasing flexibility, security, and scalability through comprehensive workflow orchestration. It centralizes AI project management in a unified AI Workbench, overseeing pipelines, model training, deployment, and reporting within a single, streamlined environment, while it handles data ingestion, preprocessing, and transformation. RAG Studio simplifies retrieval-augmented generation workflows, the Product Catalog organizes reusable assets, and Safe AI Studio embeds built-in safeguards to ensure compliance, reduce hallucinations, and protect sensitive data. Its plugin-based modular architecture supports both PaaS and SaaS models with metering to monitor consumption, and a centralized model catalog offers version control, compliance checks, and flexible deployment options.
  • 24
    Phi-4-mini-flash-reasoning
    Phi-4-mini-flash-reasoning is a 3.8 billion‑parameter open model in Microsoft’s Phi family, purpose‑built for edge, mobile, and other resource‑constrained environments where compute, memory, and latency are tightly limited. It introduces the SambaY decoder‑hybrid‑decoder architecture with Gated Memory Units (GMUs) interleaved alongside Mamba state‑space and sliding‑window attention layers, delivering up to 10× higher throughput and a 2–3× reduction in latency compared to its predecessor without sacrificing advanced math and logic reasoning performance. Supporting a 64 K‑token context length and fine‑tuned on high‑quality synthetic data, it excels at long‑context retrieval, reasoning tasks, and real‑time inference, all deployable on a single GPU. Phi-4-mini-flash-reasoning is available today via Azure AI Foundry, NVIDIA API Catalog, and Hugging Face, enabling developers to build fast, scalable, logic‑intensive applications.
  • 25
    Voxtral

    Voxtral

    Mistral AI

    Voxtral models are frontier open source speech‑understanding systems available in two sizes—a 24 B variant for production‑scale applications and a 3 B variant for local and edge deployments, both released under the Apache 2.0 license. They combine high‑accuracy transcription with native semantic understanding, supporting long‑form context (up to 32 K tokens), built‑in Q&A and structured summarization, automatic language detection across major languages, and direct function‑calling to trigger backend workflows from voice. Retaining the text capabilities of their Mistral Small 3.1 backbone, Voxtral handles audio up to 30 minutes for transcription or 40 minutes for understanding and outperforms leading open source and proprietary models on benchmarks such as LibriSpeech, Mozilla Common Voice, and FLEURS. Accessible via download on Hugging Face, API endpoint, or private on‑premises deployment, Voxtral also offers domain‑specific fine‑tuning and advanced enterprise features.
  • 26
    Naptha

    Naptha

    Naptha

    Naptha is a modular AI platform for autonomous agents that empowers developers and researchers to build, deploy, and scale cooperative multi‑agent systems on the agentic web. Its core innovations include Agent Diversity, which continuously upgrades performance by orchestrating diverse models, tools, and architectures; Horizontal Scaling, which supports collaborative networks of millions of AI agents; Self‑Evolved AI, where agents learn and optimize themselves beyond human‑designed capabilities; and AI Agent Economies, which enable autonomous agents to generate useful goods and services. Naptha integrates seamlessly with popular frameworks and infrastructure, LangChain, AgentOps, CrewAI, IPFS, NVIDIA stacks, and more, via a Python SDK that upgrades existing agent frameworks with next‑generation enhancements. Developers can extend or publish reusable components on the Naptha Hub, run full agent stacks anywhere a container can execute on Naptha Nodes.
  • 27
    Paal AI

    Paal AI

    Paal AI

    Paal is a robust AI ecosystem for creating, deploying, and managing advanced AI applications across Web2 and Web3 channels. Users can build customizable Paal Bots that deliver real‑time AI assistance on any topic or crypto market data, white‑label solutions for communities or brands, and autonomous trading agents that execute buy and sell orders based on AI‑generated signals with configurable parameters like trade size, take‑profit, and stop‑loss. The Enterprise Agents suite offers drag‑and‑drop workflow design, REST API and knowledge‑base integrations, IoT agent support, and a live testing playground, enabling automation of complex tasks and seamless integration with external systems. Creative professionals can produce animations, 3D characters, and continuous content distribution across streaming services and social media while tracking performance metrics.
  • 28
    GLM-4.5
    GLM‑4.5 is Z.ai’s latest flagship model in the GLM family, engineered with 355 billion total parameters (32 billion active) and a companion GLM‑4.5‑Air variant (106 billion total, 12 billion active) to unify advanced reasoning, coding, and agentic capabilities in one architecture. It operates in a “thinking” mode for complex, multi‑step reasoning and tool use, and a “non‑thinking” mode for instant responses, supporting up to 128 K token context length and native function calling. Available via the Z.ai chat platform and API, with open weights on HuggingFace and ModelScope, GLM‑4.5 ingests diverse inputs to solve general problem‑solving, common‑sense reasoning, coding from scratch or within existing projects, and end‑to‑end agent workflows such as web browsing and slide generation. Built on a Mixture‑of‑Experts design with loss‑free balance routing, grouped‑query attention, and an MTP layer for speculative decoding, it delivers enterprise‑grade performance.
  • 29
    Command A Reasoning
    Command A Reasoning is Cohere’s most advanced enterprise-ready language model, engineered for high-stakes reasoning tasks and seamless integration into AI agent workflows. The model delivers exceptional reasoning performance, efficiency, and controllability, scaling across multi-GPU setups with support for up to 256,000-token context windows, ideal for handling long documents and multi-step agentic tasks. Organizations can fine-tune output precision and latency through a token budget, allowing a single model to flexibly serve both high-accuracy and high-throughput use cases. It powers Cohere’s North platform with leading benchmark performance and excels in multilingual contexts across 23 languages. Designed with enterprise safety in mind, it balances helpfulness with robust safeguards against harmful outputs. A lightweight deployment option allows running the model securely on a single H100 or A100 GPU, simplifying private, scalable use.
  • 30
    Command A Translate
    Command A Translate is Cohere’s enterprise-grade machine translation model crafted to deliver secure, high-quality translation across 23 business-relevant languages. Built on a powerful 111-billion-parameter architecture with an 8K-input / 8K-output context window, it achieves industry-leading performance that surpasses models like GPT-5, DeepSeek-V3, DeepL Pro, and Google Translate across a broad suite of benchmarks. The model supports private deployments for sensitive workflows, allowing enterprises full control over their data, and introduces an innovative “Deep Translation” workflow, an agentic, multi-step refinement process that iteratively enhances translation quality for complex use cases. External validation from RWS Group confirms its excellence in challenging translation tasks. Additionally, the model’s weights are available for research via Hugging Face under a CC-BY-NC license, enabling deep customization, fine-tuning, and private deployment flexibility.