Alessandro Stolfo

OAT Y 24
Andreasstrasse 5
8050 Zürich, Switzerland
Hi! I am a doctoral student in the Institute for Machine Learning at ETH Zürich, where I am advised by Prof. Mrinmaya Sachan, and co-advised by Prof. Yonatan Belinkov (Technion).
My research focuses on the interpretability and reliability of (large) language models. I study how models represent information and how those representations shape behavior and errors. I am excited to leverage these insights to design methods that make language models better and safer.
In summer 2024, I interned with the AI Frontiers group at Microsoft Research in Redmond, WA, where I had the opportunity to collaborate with Besmira Nushi and Eric Horvitz. Previously, in summer 2023, I interned with the Machine Learning Research Group at Oracle Labs in Burlington, MA, working with Ari Kobren.
Before starting my doctoral studies, I obtained a Master’s degree in Data Science at ETH Zürich, and I worked as a software engineer at Rethink-Resource. I completed my undergraduate studies in Computer Engineering at Politecnico di Milano.
I am grateful to be a recipient of the CYD Doctoral Fellowship.
For ETH students: Feel free to reach out via email if you’re interested in having me supervise your MSc thesis or semester project. I welcome project proposals, but even if you don’t have concrete ideas and are simply passionate about leveraging interpretability to improve models, please don’t hesitate to contact me. I typically allocate my supervision budget 4-6 weeks before the semester starts, so that’s the best timing to reach out.
news
Jul 24, 2025 | Gave a talk at NEC Labs EU about our recent work on LLM steering. Check it out on YouTube. |
---|---|
May 28, 2024 | Interning in the AI Frontiers group at Microsoft Research in Redmond, WA. |
Nov 22, 2023 | Attending the ML Alignment & Theory Scholars (MATS) Program, mentored by Neel Nanda. |
Jul 17, 2023 | Started my internship in the ML Research Group at Oracle Labs in Burlington, MA. |
Apr 22, 2022 | Answered a couple of questions for this EPFL News article. Check it out! |
selected publications
- NeurIPS 2025
- ICLR 2025
- NeurIPS 2024
- ICML 2024Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
- EMNLP 2023A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
- ACL 2023A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models