AI Control with Mechanistic Interpretability

March 19, 2024

I’m excited to share our recent paper on AI Control through Mechanistic Interpretability approaches.