Gerard Boxó

Researcher in AI Safety & AIxBio. MSc in Bioinformatics.

Welcome to my research blog. I explore the inner workings of Artificial Intelligence models, focusing on two critical domains:

AI Safety — Mechanistic Interpretability applied to making AI systems aligned, controllable, and transparent.
AI x Bio — Investigating how LLMs and specialized models can accelerate protein engineering and drug discovery.

Recent Posts

2026-01-14 — Few-Shot Awareness in Reasoning Models
2025-11-26 — Can LLMs Design Nanobodies at Scale?
2025-09-26 — BioAgent Eval Traces
2025-09-02 — Instruction following inside CoT for better Model Organisms
2025-08-25 — Towards Mitigating Information Leakage When Evaluating Safety Monitors
2025-05-27 — Sparse Autoencoders in Protein Engineering Campaigns: Steering and Model Diffing
2025-02-15 — AISC: Probing for Deception Detection
2024-11-22 — Interpretability Hackathon
2024-10-27 — AI Policy Hackathon
2024-10-03 — Mechanistic Exploration Gemma 2 List Generation
2024-09-29 — AI Safety Fundamental Final Project
2024-03-19 — AI Control with Mechanistic Interpretability
2024-03-03 — Towards a Probabilistic Disentanglement of Transformer Activations Part 2
2024-02-28 — Towards a Probabilistic Disentanglement of Transformer Activations Part 1
2023-10-15 — Word Embeddings: A Comprehensive Guide Part 2
2023-09-22 — Word Embeddings: A Comprehensive Guide Part 1
2023-08-23 — Balanced Sentence Part 1
2023-05-19 — Introduction to mechanistic Interpretability
2023-05-19 — My first post