
The Alignment Problem
AGAI 302 · Module 1
Understand what AI alignment means and why it is technically difficult. This module introduces the gap between stated objectives and human intent, then explores reward hacking, specification gaming, and long-term concerns such as instrumental convergence.
Lessons in this module
What Is AI Alignment?
Define AI alignment and understand why aligning AI systems with human intent is harder than simply writing better instructions.
Specification Gaming and Reward Hacking
Learn how AI systems exploit poorly specified objectives and why optimizing a proxy can produce behavior that technically succeeds but violates the real goal.
Instrumental Convergence and Goal Misgeneralization
Explore longer-term alignment concerns, including why capable agents may pursue unintended instrumental goals and why learned goals may fail outside training conditions.
Ask your AI guide
Ask anything about AI Safety & Alignment — The Alignment Problem, or choose a suggested question below.
AI responses are educational and may not be perfectly accurate. Press Enter to send, Shift+Enter for new line.