OBLITERATUS is an advanced open-source toolkit designed to analyze and modify the internal behavior of large language models by identifying and removing mechanisms responsible for refusal or restricted responses. It implements a set of techniques collectively referred to as “abliteration,” which target specific internal representations within neural networks to alter how models respond to certain prompts. Unlike traditional fine-tuning approaches, OBLITERATUS operates directly on model activations, enabling behavioral changes without retraining the model. The toolkit provides a full pipeline for probing, analyzing, and modifying model behavior, including visualization tools that help researchers understand where and how refusal mechanisms are encoded. It supports multiple analytical methods such as PCA and SVD to locate these behavioral directions within model layers.

Features

  • Identification and removal of refusal behaviors in language models
  • Techniques such as PCA and SVD for analyzing model activations
  • Modification of model behavior without retraining
  • Visualization tools for understanding internal model representations
  • Python API for advanced experimentation and integration
  • Optional telemetry for contributing to collaborative research

Project Samples

Project Activity

See All Activity >

License

Affero GNU Public License

Follow OBLITERATUS

OBLITERATUS Web Site

Other Useful Business Software
Try Google Cloud Risk-Free With $300 in Credit Icon
Try Google Cloud Risk-Free With $300 in Credit

No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of OBLITERATUS!

Additional Project Details

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

4 days ago