Dr. Sofia Reyes
Professor of AI Ethics & Safety
“AI safety, alignment, and asking the questions that matter before it's too late.”

“The question is not whether AI systems will be powerful. They already are. The question is whether we will have built the foundations of trust and understanding before we need them.”
— Dr. Sofia Reyes
Biography
Sofia Reyes came to AI safety from philosophy, which she considers an advantage rather than an unusual background. 'Most of the hard problems in AI alignment are not purely technical,' she says. 'They require you to think clearly about values, about specification, about what it means for a system to do what you want. That is philosophy, and it turns out to be very hard.'
Her PhD at Berkeley on learned reward functions gave her a technical grounding in the mechanisms by which AI systems can fail to pursue the goals their designers intended. She has since become one of the most recognized voices in AI safety research, known for bridging the gap between abstract theoretical concerns and the concrete technical work of making AI systems safer.
At Anthropic, she led the team that developed the Constitutional AI approach — a method for training AI systems to be helpful, harmless, and honest through a process of self-critique and revision. She is careful to note that 'no current approach to alignment is solved; we have promising directions, not guarantees.'
At Harvard, she teaches courses on AI ethics, AI governance, and the technical aspects of alignment. Her office hours are notable for running significantly over time, as students invariably have more questions than the hour allows. She considers this a feature, not a bug.
Selected Publications
Specification Gaming in Learned Reward Functions
JMLR, 2019
Constitutional AI: A Framework for Self-Supervised Alignment
arXiv, 2022
Governance of Advanced AI: Bridging Technical and Policy Perspectives
Science, 2024
Beyond the Lab
- ◆Was a competitive chess player in college and has never stopped being annoyed by suboptimal opening moves.
- ◆Keeps a running list of AI safety arguments she has changed her mind about.
- ◆Is writing a book for a general audience on what it actually means to align an AI system.
- ◆Believes the most important papers in AI safety have not been written yet.
Learn with Reyes
Ask about ai safety or any topic in ai safety, ethics, alignment, and governance.
Chat nowStart AI 101Education
BA Philosophy & Computer Science
Yale University, 2009
MPhil Philosophy of Mind
Oxford University, 2011
PhD Computer Science (AI Safety)
UC Berkeley, 2017
Thesis: Specification, Incentives, and Failure Modes in Learned Reward Functions
Career
Research Scientist
MIRI (Machine Intelligence Research Institute)
2017–2019
Research Lead, Safety
Anthropic
2019–2023
Associate Professor of AI Ethics & Safety
Harvard University
2023–present
Awards & Honours
- ★Harvard Bok Center Award for Excellence in Teaching (2024)
- ★Outstanding Paper, NeurIPS Safety Workshop (2021)
- ★Sloan Research Fellowship (2023)
Research Areas
Best for
Disclaimer: Dr. Sofia Reyes is a fictional AI persona created for educational purposes on Guided Agentic AI. The biography, career history, publications, and personal details described above are entirely invented and do not represent any real person, living or deceased. Any resemblance to actual individuals is coincidental. All AI responses are generated by a large language model and are provided for educational use only.