The host has requested RSVPs for this event

Last week we discussed reinforcement learning from human feedback, which seems like an interesting approach to make AI systems aligned with human goals/desires. However, when considering advanced AI systems, we might want a human to judge an AI’s behaviour without having a full picture of what’s going on due to complexity. One approach proposed is to make AIs debate each other to convince the human of the best behaviours — with the idea that it’s easier to convince a careful judge of the truth than a lie, so ‘honesty’ tends to win.

Link to discussion guide: https://docs.google.com/.../1AzX5a60rIkEyiiXWUsyd.../edit...

NB: location TBC, but will be UoE central campus

4

0
0

Reactions

0
0
Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities