Learning human objectives by evaluating hypothetical behaviours

We present a new method for training reinforcement learning agents from human feedback in the presence of unknown unsafe states.