Stuart Armstrong talks about the No Free Lunch result in value learning (you cannot deduce the preferences of a potentially irrational agent by observing its behaviour; and simplicity doesn't help), how this connects with humans' theory of mind, and sketches out his research agenda for learning human preferences despite this impossibility result.
Relevant links: "Occam's razor is insufficient to infer the preferences of irrational agents" https://arxiv.org/abs/1712.05812
"Research Agenda v0.9: Synthesising a human's preferences into a utility function" https://www.lesswrong.com/posts/CSEdLLEkap2pubjof/research-agenda-v0-9-synthesising-a-human-s-preferences-into
Relevant links: "Occam's razor is insufficient to infer the preferences of irrational agents" https://arxiv.org/abs/1712.05812
"Research Agenda v0.9: Synthesising a human's preferences into a utility function" https://www.lesswrong.com/posts/CSEdLLEkap2pubjof/research-agenda-v0-9-synthesising-a-human-s-preferences-into
- Category
- Academic
Sign in or sign up to post comments.
Be the first to comment