OBLITERATUS is an advanced open-source toolkit designed to analyze and modify the internal behavior of large language models by identifying and removing mechanisms responsible for refusal or restricted responses. It implements a set of techniques collectively referred to as “abliteration,” which target specific internal representations within neural networks to alter how models respond to certain prompts. Unlike traditional fine-tuning approaches, OBLITERATUS operates directly on model activations, enabling behavioral changes without retraining the model. The toolkit provides a full pipeline for probing, analyzing, and modifying model behavior, including visualization tools that help researchers understand where and how refusal mechanisms are encoded. It supports multiple analytical methods such as PCA and SVD to locate these behavioral directions within model layers.

Features

  • Identification and removal of refusal behaviors in language models
  • Techniques such as PCA and SVD for analyzing model activations
  • Modification of model behavior without retraining
  • Visualization tools for understanding internal model representations
  • Python API for advanced experimentation and integration
  • Optional telemetry for contributing to collaborative research

Project Samples

Project Activity

See All Activity >

License

Affero GNU Public License

Follow OBLITERATUS

OBLITERATUS Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of OBLITERATUS!

Additional Project Details

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

14 hours ago