Kaj Sotala


“[M]any kinds of reinforcement learning agents would, if given the opportunity, use a “delusion box” which allowed them to modify the observations they got from the environment. This way, they would always receive the kinds of signals that gave them the maximum reward. You could say, in a sense, that those kinds of agents only care about their subjective expectation – as long as they experience what they want, they don’t care about the rest of the world. And it’s important for them that they are the ones who get those experiences, because their utility function only cares about their own reward. [I]nstead of just caring about our subjective experience, we use our subjective experiences to construct a model of the world. We don’t want to delude ourselves, because we also care about the world around us, and our world model tells us that deluding ourselves wouldn’t actually change the world.”

Kaj Sotala, An attempt to dissolve subjective expectation and personal identity (LessWrong, February 22, 2013)