• Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    Free LLM API resources

    Free LLM API resources

    A list of free LLM inference resources accessible via API

    ...This list helps developers, hobbyists, and researchers quickly find models they can use for prototyping, experimentation, or production proofs-of-concept without needing paid subscriptions, reducing friction for innovation. The repository typically categorizes offerings by provider, type of service (text, embeddings, vision), availability conditions (open without key, free tier with key), and usage examples to make discovery and adoption easier.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    vLLM

    vLLM

    A high-throughput and memory-efficient inference and serving engine

    vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 3
    Paper2Slides

    Paper2Slides

    From Paper to Presentation in One Click

    Paper2Slides is an automation tool that converts research papers, reports, and other documents into polished slide decks and posters with minimal manual effort. It is designed to replace the repetitive work of turning dense technical documents into presentation-friendly structure by extracting key points, figures, and data into a coherent visual narrative. The system supports multiple input formats, so you can process PDFs and common office documents rather than being locked to a single file type. It uses an extraction approach intended to capture critical insights comprehensively, including important visuals and data points that often get missed in naive summarization. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    KVCache-Factory

    KVCache-Factory

    Unified KV Cache Compression Methods for Auto-Regressive Models

    KVCache-Factory is an open-source research framework designed to explore and implement unified key-value cache compression techniques for autoregressive transformer models. In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    R-KV

    R-KV

    Redundancy-aware KV Cache Compression for Reasoning Models

    R-KV is an open-source research project that focuses on improving the efficiency of large language model inference through key-value cache compression techniques. Modern transformer models rely heavily on KV caches during autoregressive decoding, which store intermediate attention states to accelerate generation. However, these caches can consume large amounts of memory, especially in reasoning-oriented models with long context windows. R-KV introduces a method for compressing the KV cache during decoding, allowing models to maintain reasoning performance while reducing memory consumption and computational overhead. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Scikit-LLM

    Scikit-LLM

    Seamlessly integrate LLMs into scikit-learn

    Seamlessly integrate powerful language models like ChatGPT into sci-kit-learn for enhanced text analysis tasks. At the moment the majority of the Scikit-LLM estimators are only compatible with some of the OpenAI models. Hence, a user-provided OpenAI API key is required. Additionally, Scikit-LLM will ensure that the obtained response contains a valid label. If this is not the case, a label will be selected randomly (label probabilities are proportional to label occurrences in the training set). Note: unlike in a typical supervised setting, the performance of a zero-shot classifier greatly depends on how the label itself is structured. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Mirascope

    Mirascope

    LLM abstractions that aren't obstructions

    Mirascope is a powerful, flexible, and user-friendly library that simplifies the process of working with LLMs through a unified interface that works across various supported providers, including OpenAI, Anthropic, Mistral, Gemini, Groq, Cohere, LiteLLM, Azure AI, Vertex AI, and Bedrock. Whether you're generating text, extracting structured information, or developing complex AI-driven agent systems, Mirascope provides the tools you need to streamline your development process and create...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    UCCL

    UCCL

    UCCL is an efficient communication library for GPUs

    ...UCCL is designed to work with heterogeneous hardware environments, allowing GPUs from different vendors and network interfaces to communicate efficiently without vendor lock-in. The system also supports specialized workloads such as reinforcement learning weight transfers, key-value cache sharing, and expert parallelism for mixture-of-experts models. Its architecture emphasizes flexibility and extensibility so that developers can implement custom communication protocols tailored to specific machine learning workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    CAG

    CAG

    Cache-Augmented Generation: A Simple, Efficient Alternative to RAG

    ...Traditional retrieval-augmented generation systems rely on real-time retrieval of documents from databases or vector stores during inference. CAG proposes a different approach by preloading relevant knowledge into the model’s context window and precomputing the model’s key-value cache before queries are processed. This strategy allows the model to generate responses using the cached context directly, eliminating the need for repeated retrieval operations during runtime. As a result, the approach can significantly reduce latency and simplify system architecture compared with traditional RAG pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Host LLMs in Production With On-Demand GPUs Icon
    Host LLMs in Production With On-Demand GPUs

    NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

    Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.
    Try Free
  • 10
    gptcommit

    gptcommit

    A git prepare-commit-msg hook for authoring commit messages with GPT-3

    ...If you're not satisfied with the generated message, you can always edit it before committing. By default, gptcommit uses the GPT-3 model. Please ensure you have sufficient credits in your OpenAI account to use it. Commit messages are a key channel for developers to communicate their work with others, especially in code reviews. When making complex code changes, it can be tedious to thoroughly document the contents of each change. I often felt the impulse to just title my commit “fix bug” and move on. Surfacing these changes with gptcommit helps the author and reviewer by bringing attention to these additional changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Lemon AI

    Lemon AI

    Full-stack Open-source Self-Evolving General AI Agent

    ...The system includes a multi-agent architecture that supports planning, action execution, reflection, and memory, allowing the agent to reason through tasks and refine results iteratively. A key component of the framework is a virtual machine sandbox environment that safely executes code generated by the agent without affecting the host system.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    BertViz

    BertViz

    BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

    ...It is based on the excellent Tensor2Tensor visualization tool. The model view shows a bird's-eye view of attention across all layers and heads. The neuron view visualizes individual neurons in the query and key vectors and shows how they are used to compute attention.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    PasteGuard

    PasteGuard

    Masks sensitive data and secrets before they reach AI

    PasteGuard is an open-source privacy proxy that protects sensitive information like personal data and API secrets by detecting and masking them before they reach large language model APIs such as OpenAI or Anthropic Claude. It sits between an application and the LLM provider, automatically replacing names, emails, tokens, and other personally identifiable information (PII) with placeholders so that external services never see raw sensitive values, and then optionally unmasking them in the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    HASH

    HASH

    The best way to use and work with blocks

    This is HASH's public monorepo which contains our public code, docs, and other key resources. HASH is a platform for decision-making, which helps you integrate, understand and use data in a variety of different ways. HASH does this by combining various different powerful tools together into one simple interface. These range from data pipelines and a graph database, through to an all-in-one workspace, no-code tool builder, and agent-based simulation engine.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    DecryptPrompt

    DecryptPrompt

    Summarize Prompt & LLM papers, open source data & models

    ...The project collects papers, technical reports, and research materials that explore prompting techniques, model architectures, and reasoning strategies used in modern AI systems. It serves as a structured knowledge base where developers and researchers can quickly find key papers about topics such as chain-of-thought reasoning, prompt optimization, reasoning frameworks, and model training techniques. The repository organizes research into thematic sections that cover different prompting methodologies and reasoning paradigms used in LLM development. Many of the resources focus on understanding how prompts influence model behavior and how prompting strategies can improve reasoning or efficiency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    DocStrange

    DocStrange

    Extract and convert data from any document, images, pdfs, word doc

    ...It is built for developers who need high-quality parsing from scans, photos, PDFs, office files, and other document sources while preserving privacy and control over the processing flow. One of its key differentiators is deployment flexibility: it offers a cloud API for managed usage as well as a fully private offline mode that runs locally on a GPU. The platform also supports synchronous extraction, streaming responses, and asynchronous processing for larger documents, which makes it adaptable to both interactive workflows and heavier back-end pipelines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Prompt Engineering Techniques

    Prompt Engineering Techniques

    Collection of tutorials for Prompt Engineering techniques

    Prompt Engineering Techniques is a focused companion repository that teaches prompt engineering systematically, from fundamentals to advanced strategies. It contains around twenty-plus hands-on Jupyter notebooks, each dedicated to a specific technique such as basic prompt structures, prompt templates and variables, zero-shot prompting, few-shot prompting, chain-of-thought, self-consistency, constrained generation, role prompting, task decomposition, and more. The tutorials are designed to be...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    OpenAI Forward

    OpenAI Forward

    An efficient forwarding service designed for LLMs

    ...Its main purpose is to make model access more manageable and efficient by adding operational controls such as request rate limiting, token rate limiting, caching, logging, routing, and key management around existing LLM endpoints. The project can proxy both local and cloud-hosted language model services, which makes it useful for teams that want a single control layer regardless of whether they are using something like LocalAI or a hosted provider compatible with OpenAI-style APIs. A major emphasis of the repository is asynchronous performance, using tools such as uvicorn, aiohttp, and asyncio to support high-throughput forwarding workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    NLP-Knowledge-Graph

    NLP-Knowledge-Graph

    Research and application of technologies such as nl processing

    ...The project aims to help researchers and developers understand how structured knowledge representations can enhance language processing systems. It includes curated materials covering key topics such as knowledge graph construction, entity recognition, relation extraction, graph embeddings, and semantic reasoning. By combining NLP techniques with graph-based data models, knowledge graphs allow systems to represent complex relationships between entities and improve tasks such as question answering, information retrieval, and recommendation systems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    SageAttention

    SageAttention

    NeurIPS2025 Spotlight] Quantized Attention

    ...Since attention operations are often the most computationally expensive component of modern AI models, SageAttention introduces quantization techniques that significantly reduce computational overhead while preserving model accuracy. The system achieves this by using low-precision numerical formats such as INT4, FP8, or INT8 to represent key matrices within the attention computation. These optimizations allow models to perform matrix operations faster and consume less memory during inference. SageAttention is designed to function as a plug-and-play replacement for standard attention implementations, enabling developers to accelerate existing models without modifying their architecture.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    RAG from Scratch

    RAG from Scratch

    Demystify RAG by building it from scratch

    ...Instead of relying on complex frameworks or cloud services, the repository demonstrates the entire RAG pipeline using transparent and minimal implementations. The project walks through key concepts such as generating embeddings, building vector databases, retrieving relevant documents, and integrating the retrieved context into language model prompts. Each example is written with detailed explanations so that developers can understand the internal mechanics of semantic search and context-aware language generation. The repository emphasizes learning through direct implementation, allowing users to see how each component of the RAG architecture functions independently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    AgentEvolver

    AgentEvolver

    Towards Efficient Self-Evolving Agent System

    ...The system focuses on improving the efficiency and scalability of training autonomous agents by allowing them to generate tasks, explore environments, and refine strategies without heavy reliance on manually curated datasets. Its architecture combines reinforcement learning with LLM-driven reasoning mechanisms to guide exploration and learning. The framework introduces several key mechanisms, including self-questioning to create new learning tasks, self-navigating to improve exploration through experience reuse, and self-attributing to assign rewards based on the usefulness of actions. These mechanisms enable agents to continuously improve their capabilities while interacting with complex environments and tools. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    LearnLLM.AI

    LearnLLM.AI

    Sharing knowledge about big models that everyone can understand

    LLMForEverybody is an open-source educational repository designed to make large language model concepts accessible to a broad audience, including beginners, developers, and job candidates preparing for AI-related interviews. The project organizes knowledge about LLMs into a structured learning path that begins with foundational research papers and progresses through the evolution of modern model architectures. It covers a wide range of topics including attention mechanisms, tokenization...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    LLM-Agent-Paper-List

    LLM-Agent-Paper-List

    The paper list of the 86-page SCIS cover paper

    ...The project functions as a curated companion resource to a comprehensive survey paper examining the rise and potential of LLM-driven agent systems. Within the repository, research papers are categorized according to key aspects of agent architecture, including components such as reasoning systems, perception mechanisms, and action modules. The project also organizes literature into thematic sections that explore different application scenarios, including single-agent systems, collaborative multi-agent environments, and human-agent interaction. By structuring the literature in this way, the repository helps researchers navigate a rapidly growing body of work related to autonomous AI systems built on language models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    LLM TLDR

    LLM TLDR

    95% token savings. 155x faster queries. 16 languages

    ...The system supports both extractive and abstractive summarization styles so that users can choose whether they want condensed highlights or a more narrative paraphrase of key ideas. To enhance usability, LLM-TLDR includes command-line tools and integration examples for common workflows like batch summarization, webhook ingestion, and automation in documentation pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo