Showing 48 open source projects for "tests"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Qodo Cover

    Qodo Cover

    AI tool that generates tests to improve code coverage quickly

    Qodo Cover is an open source developer tool designed to automate the creation of unit tests using generative AI, helping teams improve code coverage with minimal manual effort. It operates as a command-line interface and can also be integrated into continuous integration workflows, making it adaptable to different development environments. It analyzes an existing codebase, identifies gaps in test coverage, and generates new tests that target uncovered or weakly tested areas. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    CodiumAI Cover-Agent

    CodiumAI Cover-Agent

    CodiumAI Cover-Agent: An AI-Powered Tool for Automated Test Generation

    CodiumAI Cover Agent aims to help efficiently increasing code coverage, by automatically generating qualified tests to enhance existing test suites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    HumanEval

    HumanEval

    Code for the paper "Evaluating Large Language Models Trained on Code"

    human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a standard for evaluating code generation models, including those in the Codex and GPT families. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Giskard

    Giskard

    Collaborative & Open-Source Quality Assurance for all AI models

    ...The Giskard scan automatically detects vulnerability issues such as performance bias, data leakage, unrobustness, spurious correlation, overconfidence, underconfidence, unethical issue, etc. Giskard automatically generates relevant tests based on the vulnerabilities detected by the scan. You can easily customize the tests depending on your use case by defining domain-specific data slicers and transformers as fixtures of your test suites.
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    HunyuanVideo

    HunyuanVideo

    HunyuanVideo: A Systematic Framework For Large Video Generation Model

    ...The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU memory usage / improve efficiency. Parallel inference code to speed up sampling, utilities and tests included.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    LMCache

    LMCache

    Supercharge Your LLM with the Fastest KV Cache Layer

    ...Its design supports reuse beyond strict prefix matching and enables sharing across serving instances, improving efficiency under real multi-tenant traffic. The broader project includes examples, tests, a server component, and public posts describing cross-engine sharing and inter-GPU KV transfers. These capabilities aim to lower latency, cut GPU cycles, and stabilize performance for production workloads with overlapping prompts or retrieval-augmented contexts. The end result is a cache fabric for LLMs that complements engines rather than replacing them.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 7
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    ...The project has a strong focus on developer ergonomics, with thorough development guidelines, environment configuration using .env variables, and a clear structure for tests, tools and agents.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 8
    Evidently

    Evidently

    Evaluate and monitor ML models from validation to production

    Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor ML models from validation to production. It works with tabular, text data and embeddings.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    Fara-7B

    Fara-7B

    An Efficient Agentic Model for Computer Use

    ...It provides stakeholders with a way to benchmark and evaluate models across dimensions such as fairness, robustness, security, privacy, and ethical considerations. Rather than relying on ad-hoc or manual review processes, FARA enables organizations to profile AI behavior using standardized tests, metrics, and reporting templates, making evaluations reproducible and comparable over time. The framework supports plugin-based modules that can be tailored to industry-specific concerns or regulatory requirements, helping compliance teams, auditors, and engineers collaborate on shared assessment goals.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    Anthropic's Original Performance

    Anthropic's Original Performance

    Anthropic's original performance take-home, now open for you to try

    ...The project sets up a baseline performance problem where participants work to reduce simulated “clock cycles” required to run a given workload, effectively challenging them to engineer faster code under constraints. This take-home includes starter code, tests, and tools to debug performance, aiming to measure how effectively one can apply algorithmic improvements and optimizations. Because it’s framed around beating baseline scores — and even outperforming previous automated systems — it encourages both deep knowledge of Python and creative problem-solving.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Agentless

    Agentless

    An agentless approach to automatically solve software development

    ...It then generates multiple candidate patches for the identified locations using language model reasoning and diff-style edits. In the final stage, the framework validates potential patches by running regression tests and additional reproduction tests to confirm whether the fix resolves the original error. Based on these results, the system ranks the candidate patches and selects the most reliable solution to submit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    ...With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and improving markdown conversion, reflecting active community use in research flows. It’s designed to be drop-in for MCP clients, giving them typed inputs/outputs and predictable errors around a well-known academic corpus. For developers building research copilots, it removes the glue work of wiring arXiv APIs into an agent toolchain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    CodiumAI PR-Agent

    CodiumAI PR-Agent

    AI-Powered tool for automated pull request analysis

    CodiumAI PR-Agent is an open-source tool aiming to help developers review pull requests faster and more efficiently. It automatically analyzes the pull request and can provide several types of commands. See the Usage Guide for instructions how to run the different tools from CLI, online usage, Or by automatically triggering them when a new PR is opened. You can try GPT-4 powered PR-Agent, on your public GitHub repository, instantly. Just mention @CodiumAI-Agent and add the desired command in...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    AI Chatbot Framework

    AI Chatbot Framework

    Python chatbot framework with Natural Language Understanding

    Building a chatbot can sound daunting, but it’s totally doable. AI Chatbot Framework is an AI powered conversational dialog interface built in Python. With this tool, it’s easy to create Natural Language conversational scenarios with no coding efforts whatsoever. The smooth UI makes it effortless to create and train conversations to the bot and it continuously gets smarter as it learns from conversations it has with people. AI Chatbot Framework can live on any channel of your choice (such as...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Serena

    Serena

    Agent toolkit providing semantic retrieval and editing capabilities

    Serena is a coding-focused agent toolkit that turns an LLM into a practical software-engineering agent with semantic retrieval and editing over real repositories. It operates as an MCP server (and other integrations), exposing IDE-like tools so agents can locate symbols, reason about code structure, make targeted edits, and validate changes. The toolkit is LLM-agnostic and framework-agnostic, positioning itself as a drop-in capability for different chat UIs, orchestrators, or custom agent...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    pmdarima

    pmdarima

    Statistical library designed to fill the void in Python's time series

    A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    GLM-4.5

    GLM-4.5

    GLM-4.5: Open-source LLM for intelligent agents by Z.ai

    GLM-4.5 is a cutting-edge open-source large language model designed by Z.ai for intelligent agent applications. The flagship GLM-4.5 model has 355 billion total parameters with 32 billion active parameters, while the compact GLM-4.5-Air version offers 106 billion total parameters and 12 billion active parameters. Both models unify reasoning, coding, and intelligent agent capabilities, providing two modes: a thinking mode for complex reasoning and tool usage, and a non-thinking mode for...
    Downloads: 40 This Week
    Last Update:
    See Project
  • 18
    LLM Colosseum

    LLM Colosseum

    Benchmark LLMs by fighting in Street Fighter 3

    ...The system places language models inside the environment of the classic video game Street Fighter III, where they must interpret the game state and decide which actions to perform during combat. This setup creates a dynamic environment that tests reasoning, situational awareness, and decision-making abilities in real time. Instead of relying purely on reward signals as in reinforcement learning agents, the models analyze contextual information and generate strategic actions based on the game environment. Performance is evaluated using a competitive ranking system that assigns models an ELO rating based on their results across matches against other models.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    gTTS

    gTTS

    Python library and CLI tool to interface with Google Translate

    gTTS (Google Text-to-Speech) is a Python library and command-line tool that wraps the speech functionality of Google Translate. It lets you send text to the Google Translate TTS endpoint and receive spoken audio back as MP3 data, either written to a file, a file-like object, or standard output. The library is designed to handle long texts, using a speech-specific sentence tokenizer that keeps intonation and punctuation natural while splitting requests into acceptable chunks. It supports...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 20
    ASSERT

    ASSERT

    Requirement-driven evaluation harness for AI agents and LLM

    ASSERT is a requirement-driven evaluation harness for AI agents and LLM applications. It turns natural-language specifications, policies, product requirements, and launch criteria into structured tests that can be reviewed, executed, scored, and improved. The pipeline derives behavior categories, generates single-turn and multi-turn test cases, runs them against a target system, and uses an LLM judge to score conversations against the stated policies. It can evaluate hosted models, custom agents, multi-agent systems, REST clients, and frameworks such as LangGraph, CrewAI, AutoGen, DSPy, LlamaIndex, and OpenAI Agents SDK. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    NVIDIA Earth2Studio

    NVIDIA Earth2Studio

    Open-source deep-learning framework

    ...Users can extend Earth2Studio with optional model packs, advanced data interfaces, statistical operators, and backend integrations that support flexible workflows from simple tests to large-scale operational inference.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Robyn

    Robyn

    Experimental, AI/ML-powered and open sourced Marketing Mix Modeling

    Robyn is an open-source, AI/ML-powered Marketing Mix Modeling (MMM) toolkit developed by Meta Marketing Science under the “facebookexperimental” GitHub umbrella. Its goal is to democratize rigorous MMM: what traditionally required expert statisticians and expensive consulting becomes accessible to any company with data. Robyn takes in historical data (spends on different marketing channels, conversions, or revenue, and optional context or organic-media variables) and uses a combination of...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 23
    AWS Deep Learning Containers

    AWS Deep Learning Containers

    A set of Docker images for training and serving models in TensorFlow

    AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    AI Marketing Skills

    AI Marketing Skills

    Open-source AI marketing skills for Claude Code

    ...The system is organized into multiple domains such as growth experimentation, sales pipeline generation, content production, outbound marketing, SEO optimization, and financial analysis, effectively covering the entire revenue lifecycle of a business. Each skill functions as an executable capability that can be invoked on demand, enabling users to perform tasks like running A/B tests, generating high-quality content, or analyzing conversion funnels with minimal manual effort.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Devon

    Devon

    Open source AI pair programmer for coding, debugging, automation

    ...Devon integrates with multiple large language models, allowing users to choose between different providers for performance, cost, and latency considerations. It is capable of performing tasks such as debugging, writing tests, analyzing code structure, and navigating complex repositories. Devon also includes features for session management, enabling users to start, pause, and revert actions while maintaining context.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo