scoring free download

31 projects for "scoring" with 2 filters applied:

Artificial Intelligence BSD Clear Filters & Widen Search

Error to trace to log to deploy. One click. No SSH.
Catch the cause before the pager goes off.

AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.

Free 30 days.
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
1

Prometheus-Eval

Evaluate your LLM's response with Prometheus and GPT4

...It also provides training data and utilities for fine-tuning evaluator models so they can assess outputs according to custom scoring rubrics such as helpfulness, accuracy, or style.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
2

CyberStrikeAI

CyberStrikeAI is an AI-native security testing platform built in Go

...It supports role-based testing, letting teams define security roles with tailored tool access and prompts, and includes a skills system that encapsulates specialized testing strategies that the AI can incorporate into its planning. Through comprehensive lifecycle management, results are tracked, aggregated, and visualized, with support for versioned persistence, search, and risk severity scoring.

Downloads: 2 This Week

Last Update: 2 days ago
See Project
3

darwin-skill

Autoresearch-inspired autonomous skill optimization for Claude Code

...Instead of treating prompts or skill definitions as static assets, the system applies a continuous improvement cycle that evaluates performance, proposes changes, tests outcomes, and either retains or reverts modifications. The framework introduces a scoring system across multiple dimensions, enabling quantitative assessment of skill quality and ensuring that only improvements are preserved over time. It incorporates a “ratchet mechanism” similar to version control workflows, guaranteeing that performance never degrades as iterations progress. The system also separates the agents responsible for editing and evaluating skills to avoid bias, which improves the reliability of optimization results.

Downloads: 1 This Week

Last Update: 2026-06-14
See Project
4

AutoAgent AI

Autonomous harness engineering

...Instead of manually tuning prompts or workflows, developers define high-level goals in a configuration file, and the system continuously modifies its own tools, orchestration, and logic based on benchmark performance. It operates through a loop of testing, analyzing failures, and refining the agent’s configuration to maximize a scoring metric. The framework uses a single-file agent harness combined with structured tasks and evaluation suites to guide optimization. It runs inside Docker for safe execution and reproducibility. This approach shifts agent development from manual design to automated optimization. The system is particularly useful for building domain-specific agents that need continuous performance improvement.

Downloads: 2 This Week

Last Update: 2026-04-28
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

Stop Slop

A skill file for removing AI tells from prose

...The project targets common AI habits such as filler openings, overused contrasts, unnecessary adverbs, vague language, passive phrasing, and metronomic sentence rhythm. It also includes a scoring rubric that rates drafts across dimensions such as directness, rhythm, trust, authenticity, and density. The skill is useful for drafting, editing, polishing, and quality-checking prose before publication. Its main value is giving writers and AI assistants a practical checklist for making text feel less synthetic and more intentional.

Downloads: 0 This Week

Last Update: 2026-05-25
See Project
6

ChainForge

An open-source visual programming environment

...The platform enables rapid experimentation by generating permutations of prompts and inputs, making it possible to test hundreds of variations in parallel and analyze performance trends more effectively. It also includes evaluation nodes that allow developers to define scoring functions, enabling automated benchmarking of outputs based on custom criteria such as accuracy, formatting, or relevance.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
7

what-to-eat

An AI-based intelligent recipe generation platform

...It supports a wide range of cuisines, including traditional Chinese regional styles and international dishes, making it versatile for different cultural preferences. The system goes beyond simple recipe suggestions by including features such as wine pairing recommendations, sauce design, and health scoring, providing a more holistic cooking experience. It also includes a dynamic configuration system that allows users to switch between AI models and adjust parameters in real time without restarting the application.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
8

AI Marketing Skills

Open-source AI marketing skills for Claude Code

AI Marketing Skills is a comprehensive open-source framework designed to transform AI agents into fully operational marketing and sales systems by equipping them with structured, reusable “skills” that automate real business workflows. Instead of simple prompts, the project provides complete operational modules that include scripts, scoring systems, and decision-making logic, allowing AI tools like Claude Code to execute complex marketing tasks end-to-end. The system is organized into multiple domains such as growth experimentation, sales pipeline generation, content production, outbound marketing, SEO optimization, and financial analysis, effectively covering the entire revenue lifecycle of a business. ...

Downloads: 1 This Week

Last Update: 2026-05-28
See Project
9

WebGLM

An Efficient Web-enhanced Question Answering System

...WebGLM introduces several components that coordinate this process, including a retrieval module that selects relevant web documents, a generator that produces answers, and a scoring system that evaluates the quality of generated responses. The architecture aims to improve the reliability and usefulness of AI systems that answer questions about current or external knowledge sources.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
10

MLflow

Open source platform for the machine learning lifecycle

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud).

Downloads: 6 This Week

Last Update: 5 days ago
See Project
11

CLIP

CLIP, Predict the most relevant text snippet given an image

CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart. Once trained, you can give it any text labels and ask it to pick which label best matches a given...

Downloads: 3 This Week

Last Update: 2026-03-25
See Project
12

Supermemory

Memory engine and app that is extremely fast, scalable

...It often incorporates clustering, semantic search, and summarization modules to reduce cognitive load and surface key ideas, which makes it useful for research, study, writing, and long-term project tracking. Users can interact with the system via conversational queries or traditional search interfaces, and the system leverages vector embeddings and memory scoring to prioritize the most relevant results.

Downloads: 1 This Week

Last Update: 2026-06-13
See Project
13

MemPalace

The highest-scoring AI memory system ever benchmarked

MemPalace is an open-source AI memory system designed to solve one of the most persistent limitations of large language models: the loss of context between sessions. Instead of relying on summarization or selective extraction like most memory tools, it takes a radically different approach by storing conversations in their entirety and making them retrievable through structured organization and semantic search. The system is inspired by the classical “memory palace” mnemonic technique,...

Downloads: 3 This Week

Last Update: 7 days ago
See Project
14

Cheat on Content

Workflow that turns every post into a calibrated experiment

Cheat on Content is an AI-assisted workflow for creators who want to make content performance measurable instead of relying on instinct alone. It turns every post into a structured experiment by asking creators to score ideas, make blind predictions, publish, review results after a defined time window, and evolve their own content rubric. Rather than generating posts for the creator, it focuses on sharpening judgment and helping users understand why certain content performs better. The...

Downloads: 1 This Week

Last Update: 2026-05-17
See Project
15

VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs)

VLMEvalKit is an open-source evaluation toolkit designed for benchmarking large vision-language models that combine visual understanding with natural language reasoning. The toolkit provides a unified framework that allows researchers and developers to evaluate multimodal models across a wide range of datasets and standardized benchmarks with minimal setup. Instead of requiring complex data preparation pipelines or multiple repositories for each benchmark, the system enables evaluation...

Downloads: 1 This Week

Last Update: 2026-03-05
See Project
16

rag-search

RAG Search API

...It is built to be easily deployable, requiring only environment configuration and dependency installation to run a functional RAG service. The system supports configurable filtering, scoring thresholds, and reranking options, allowing developers to fine-tune retrieval quality. Its architecture is modular, separating handlers, services, and utilities to support customization and extension. Overall, rag-search serves as a practical starter backend for teams building AI search or question-answering applications on their own data.

Downloads: 1 This Week

Last Update: 2026-03-03
See Project
17

React Doctor

Your agent writes bad React

React Doctor is a developer tool that scans React codebases and identifies problems that commonly appear in AI-generated or poorly maintained frontend code. It gives projects a clear health score from 0 to 100, making technical issues easier to understand, prioritize, and communicate. The scanner checks areas such as state management, effects, performance, architecture, accessibility, security, and dead code. It works across popular React environments, including Next.js, Vite, and React...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
18

Agent Behavior Monitoring

The open source post-building layer for agents

Agent Behavior Monitoring is an open-source framework designed to monitor, evaluate, and improve the behavior of AI agents operating in real or simulated environments. The system focuses on agent behavior monitoring by collecting interaction data and analyzing how agents perform across different scenarios and tasks. Developers can use the framework to observe agent actions in both online production environments and offline evaluation settings, making it useful for debugging and performance...

Downloads: 1 This Week

Last Update: 2026-05-27
See Project
19

ASSERT

Requirement-driven evaluation harness for AI agents and LLM

ASSERT is a requirement-driven evaluation harness for AI agents and LLM applications. It turns natural-language specifications, policies, product requirements, and launch criteria into structured tests that can be reviewed, executed, scored, and improved. The pipeline derives behavior categories, generates single-turn and multi-turn test cases, runs them against a target system, and uses an LLM judge to score conversations against the stated policies. It can evaluate hosted models, custom...

Downloads: 0 This Week

Last Update: 2026-06-04
See Project
20

SEO Machine

A specialized Claude Code workspace for creating long-form

SEO Machine is an AI-powered content production system built as a structured workspace for generating long-form, SEO-optimized blog content through automated workflows. It integrates research, writing, analysis, and optimization into a single pipeline, allowing users to produce high-quality articles tailored to search engine performance. The system uses specialized commands and agents to perform tasks such as keyword research, competitor analysis, content drafting, and optimization. It...

Downloads: 0 This Week

Last Update: 2026-04-10
See Project
21

Qwen3-VL-Embedding

Multimodal embedding and reranking models built on Qwen3-VL

Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a...

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
22

Atropos

Language Model Reinforcement Learning Environments frameworks

...This framework facilitates experimentation with RLHF (Reinforcement Learning from Human Feedback), RLAIF, or multi-turn training approaches by abstracting environment logic, scoring, and logging into reusable components.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
23

LLM-Pruner

On the Structural Pruning of Large Language Models

LLM-Pruner is an open-source framework designed to compress large language models through structured pruning techniques while maintaining their general capabilities. Large language models often require enormous computational resources, making them expensive to deploy and inefficient for many practical applications. LLM-Pruner addresses this issue by identifying and removing non-essential components within transformer architectures, such as redundant attention heads or feed-forward...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
24

uqlm

Uncertainty Quantification for Language Models, is a Python package

UQLM is a Python library developed to detect hallucinations and quantify uncertainty in the outputs of large language models. The system implements a variety of uncertainty quantification techniques that assign confidence scores to model responses. These scores help developers determine how likely a generated answer is to contain errors or fabricated information. The library includes both black-box and white-box approaches to uncertainty estimation. Black-box methods evaluate model outputs...

Downloads: 0 This Week

Last Update: 2026-06-08
See Project
25

Hallucination Leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations

Hallucination Leaderboard is an open research project that tracks and compares the tendency of large language models to produce hallucinated or inaccurate information when generating summaries. The project provides a standardized benchmark that evaluates different models using a dedicated hallucination detection system known as the Hallucination Evaluation Model. Each model is tested on document summarization tasks to measure how often generated responses introduce information that is not...

Downloads: 0 This Week

Last Update: 2026-05-11
See Project