Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders.
TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior. It can be used to answer questions like, "Why does the model output token A instead of token B for this prompt?" or "Why does attention head H attend to token T for this prompt?" It does so by identifying specific components (neurons, attention heads, autoencoder latents) that contribute to the behavior, showing automatically generated explanations of what causes those components to activate most strongly, and tracing connections between components to help discover circuits.
These videos give an overview of TDB and show how it can be used to investigate indirect object identification in GPT-2 small:
Follow these steps to install the repo. You'll first need python/pip, as well as node/npm.
Though optional, we recommend you use a virtual environment or equivalent:
# If you're already in a venv, deactivate it.
deactivate
# Create a new venv.
python -m venv ~/.virtualenvs/transformer-debugger
# Activate the new venv.
source ~/.virtualenvs/transformer-debugger/bin/activate
Once your environment is set up, follow the following steps:
git clone git@github.com:openai/transformer-debugger.git
cd transformer-debugger
# Install neuron_explainer
pip install -e .
# Set up the pre-commit hooks.
pre-commit install
# Install neuron_viewer.
cd neuron_viewer
npm install
cd ..
To run the TDB app, you'll then need to follow the instructions to set up the activation server backend and neuron viewer frontend.
To validate changes:
pytest
mypy --config=mypy.ini .
Please cite as:
Mossing, et al., “Transformer Debugger”, GitHub, 2024.
BibTex citation:
@misc{mossing2024tdb,
title={Transformer Debugger},
author={Mossing, Dan and Bills, Steven and Tillman, Henk and Dupré la Tour, Tom and Cammarata, Nick and Gao, Leo and Achiam, Joshua and Yeh, Catherine and Leike, Jan and Wu, Jeff and Saunders, William},
year={2024},
publisher={GitHub},
howpublished={\url{https://github.com/openai/transformer-debugger}},
}
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。