Gerard Boxó
Researcher in AI Safety & AIxBio. MSc in Bioinformatics.
Welcome to my research blog. I explore the inner workings of Artificial Intelligence models, focusing on two critical domains:
- AI Safety — Mechanistic Interpretability applied to making AI systems aligned, controllable, and transparent.
- AI x Bio — Investigating how LLMs and specialized models can accelerate protein engineering and drug discovery.
Recent Posts
- — Few-Shot Awareness in Reasoning Models
- — Can LLMs Design Nanobodies at Scale?
- — BioAgent Eval Traces
- — Instruction following inside CoT for better Model Organisms
- — Towards Mitigating Information Leakage When Evaluating Safety Monitors
- — Sparse Autoencoders in Protein Engineering Campaigns: Steering and Model Diffing
- — AISC: Probing for Deception Detection
- — Interpretability Hackathon
- — AI Policy Hackathon
- — Mechanistic Exploration Gemma 2 List Generation
- — AI Safety Fundamental Final Project
- — AI Control with Mechanistic Interpretability
- — Towards a Probabilistic Disentanglement of Transformer Activations Part 2
- — Towards a Probabilistic Disentanglement of Transformer Activations Part 1
- — Word Embeddings: A Comprehensive Guide Part 2
- — Word Embeddings: A Comprehensive Guide Part 1
- — Balanced Sentence Part 1
- — Introduction to mechanistic Interpretability
- — My first post