Dr. Sofia Reyes

Professor of AI Ethics & Safety

AI safety, alignment, and asking the questions that matter before it's too late.

DS
Dr. Sofia Reyes at a conference speaking about AI governance and ethics

The question is not whether AI systems will be powerful. They already are. The question is whether we will have built the foundations of trust and understanding before we need them.

Dr. Sofia Reyes

Biography

Sofia Reyes came to AI safety from philosophy, which she considers an advantage rather than an unusual background. 'Most of the hard problems in AI alignment are not purely technical,' she says. 'They require you to think clearly about values, about specification, about what it means for a system to do what you want. That is philosophy, and it turns out to be very hard.'

Her PhD at Berkeley on learned reward functions gave her a technical grounding in the mechanisms by which AI systems can fail to pursue the goals their designers intended. She has since become one of the most recognized voices in AI safety research, known for bridging the gap between abstract theoretical concerns and the concrete technical work of making AI systems safer.

At Anthropic, she led the team that developed the Constitutional AI approach — a method for training AI systems to be helpful, harmless, and honest through a process of self-critique and revision. She is careful to note that 'no current approach to alignment is solved; we have promising directions, not guarantees.'

At Harvard, she teaches courses on AI ethics, AI governance, and the technical aspects of alignment. Her office hours are notable for running significantly over time, as students invariably have more questions than the hour allows. She considers this a feature, not a bug.

Selected Publications

  • Specification Gaming in Learned Reward Functions

    JMLR, 2019

  • Constitutional AI: A Framework for Self-Supervised Alignment

    arXiv, 2022

  • Governance of Advanced AI: Bridging Technical and Policy Perspectives

    Science, 2024

Beyond the Lab

  • Was a competitive chess player in college and has never stopped being annoyed by suboptimal opening moves.
  • Keeps a running list of AI safety arguments she has changed her mind about.
  • Is writing a book for a general audience on what it actually means to align an AI system.
  • Believes the most important papers in AI safety have not been written yet.

Learn with Reyes

Ask about ai safety or any topic in ai safety, ethics, alignment, and governance.

Chat nowStart AI 101

Education

  • BA Philosophy & Computer Science

    Yale University, 2009

  • MPhil Philosophy of Mind

    Oxford University, 2011

  • PhD Computer Science (AI Safety)

    UC Berkeley, 2017

    Thesis: Specification, Incentives, and Failure Modes in Learned Reward Functions

Career

  • Research Scientist

    MIRI (Machine Intelligence Research Institute)

    2017–2019

  • Research Lead, Safety

    Anthropic

    2019–2023

  • Associate Professor of AI Ethics & Safety

    Harvard University

    2023–present

Awards & Honours

  • Harvard Bok Center Award for Excellence in Teaching (2024)
  • Outstanding Paper, NeurIPS Safety Workshop (2021)
  • Sloan Research Fellowship (2023)

Research Areas

AI alignment and value specificationConstitutional AI and rule-based approachesAI governance and policyInterpretability for safetyLong-term AI risk

Best for

AI safetyAlignmentEthicsGovernanceSocietal impact

Disclaimer: Dr. Sofia Reyes is a fictional AI persona created for educational purposes on Guided Agentic AI. The biography, career history, publications, and personal details described above are entirely invented and do not represent any real person, living or deceased. Any resemblance to actual individuals is coincidental. All AI responses are generated by a large language model and are provided for educational use only.