A toy example of a very simple version of the AI control problem, that some have found illuminating. A simple model-free agent, with reward misaligned with its human designers, starts to deceive them and manipulate their precautions as its predictive depth increases.
Original post at http://lesswrong.com/lw/mrp/a_toy_model_of_the_control_problem/
Original post at http://lesswrong.com/lw/mrp/a_toy_model_of_the_control_problem/
- Category
- Academic
Sign in or sign up to post comments.
Be the first to comment