Paper-Conference

For the first time, we propose a framework that quantifies the probability of successful oversight as a function of the capabilities of the overseer and the system being overseen. We also find scaling laws in four different oversight games that approximate how domain performance depends on general AI system capability.

Apr 25, 2025