Conceptual image of a human overseeing an AI system with safety controls

The Alignment Problem

AGAI 302 · Module 1

Understand what AI alignment means and why it is technically difficult. This module introduces the gap between stated objectives and human intent, then explores reward hacking, specification gaming, and long-term concerns such as instrumental convergence.

Lessons in this module

What Is AI Alignment?

Define AI alignment and understand why aligning AI systems with human intent is harder than simply writing better instructions.

Specification Gaming and Reward Hacking

Learn how AI systems exploit poorly specified objectives and why optimizing a proxy can produce behavior that technically succeeds but violates the real goal.

Instrumental Convergence and Goal Misgeneralization

Explore longer-term alignment concerns, including why capable agents may pursue unintended instrumental goals and why learned goals may fail outside training conditions.

Ask your AI guide

AI Chat· AI Safety & Alignment — The Alignment Problem

🤖

Ask anything about AI Safety & Alignment — The Alignment Problem, or choose a suggested question below.

AI responses are educational and may not be perfectly accurate. Press Enter to send, Shift+Enter for new line.