AI Control with Mechanistic Interpretability
I’m excited to share our recent paper on AI Control through Mechanistic Interpretability approaches. You can read the full paper here.
I’m excited to share our recent paper on AI Control through Mechanistic Interpretability approaches. You can read the full paper here.