Automated Interpretability download

The automated-interpretability repository implements tools and pipelines for automatically generating, simulating, and scoring explanations of neuron (or latent feature) behavior in neural networks. Instead of relying purely on manual, ad hoc interpretability probing, this repo aims to scale interpretability by using algorithmic methods that produce candidate explanations and assess their quality. It includes a “neuron explainer” component that, given a target neuron or latent feature, proposes natural language explanations or heuristics (e.g. “this neuron activates when the input has property X”) and then simulates activation behavior across example inputs to test whether the explanation holds. The project also contains a “neuron viewer” web component for browsing neurons, explanations, and activation patterns, making it more interactive and exploratory.

Features

A neuron explainer module that proposes natural language or rule-based explanations for neuron/latent feature behavior
Simulation / scoring of explanations by comparing predicted activations vs true activations across inputs
A neuron viewer UI to browse neurons, see activations, and inspect explanations
Demo notebooks illustrating how explanations are generated and evaluated (e.g. explain_puzzles.ipynb)
Infrastructure for activation capture and analysis (e.g. modules like activations.py)
Ranking / scoring heuristics to decide which explanations are more faithful or useful

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Automated Interpretability

Automated Interpretability Web Site

Other Useful Business Software

Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free

Rate This Project

User Reviews

Be the first to post a review of Automated Interpretability!

Additional Project Details

Programming Language

Python

Related Categories

Python Artificial Intelligence Software, Python Large Language Models (LLM)

Registered

2025-10-03

Similar Business Software

Gemini Enterprise Agent Platform

Gemini Enterprise Agent Platform is a comprehensive solution from Google Cloud designed to help organizations build, scale, govern, and optimize AI agents. It represents the evolution of Vertex AI, combining advanced model development with new capabilities for agent orchestration and...

See Software
Adobe Firefly

Adobe Firefly is an AI-powered creative platform that enables users to generate and edit images, videos, and other media using simple text prompts. It provides an intuitive workspace where users can create content on an infinite canvas and experiment with different creative ideas. The platform...

See Software
Jobma

Jobma is an intelligent AI video-interviewing and assessment platform trusted by companies worldwide. It offers a range of hiring automation tools, including asynchronous one-way video interviewing, live video interviewing, interview scheduling, technical assessments, and more. The platform...

See Software
GetResponse

GetResponse is a lifecycle automation platform for businesses that want to drive more consistent, repeatable revenue across the entire customer journey. We help you move customers from first touch to first purchase, repeat purchase, and reactivation - using email, SMS, push notifications, and...

See Software
Google Cloud Platform

Google Cloud is a cloud-based service that allows you to create anything from simple websites to complex applications for businesses of all sizes. New customers get $300 in free credits to run, test, and deploy workloads. All customers can use 25+ products for free, up to monthly usage...

See Software
LM-Kit.NET

LM-Kit.NET is a complete local AI runtime for .NET that lets engineering teams ship AI-powered features without cloud dependencies, per-token costs, or data leaving the network. Most .NET AI integrations stop at inference. LM-Kit.NET covers the full range of capabilities production...

See Software

Report inappropriate content

Automated Interpretability

Code for Language models can explain neurons in language models paper

Get an email when there's a new version of Automated Interpretability

Features

Project Samples

Project Activity

Categories

License

Follow Automated Interpretability

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered