Any SF fan will be familiar with Asimov’s famous Three Laws of Robotics, designed to ensure that robots were safe to be around. Scientists at Google, OpenAI, Stanford and Berkeley have just published a paper proposing the real-life equivalent for AI systems.
In a blog post summarising the proposal, Google Research’s Chris Olah says that while the team believes that AI will greatly benefit humanity, the risks do also need to be considered …
We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks.
Most of the discussion to date has, argues Olah, been ‘very hypothetical and speculative,’ so the team wanted to examine the real-life challenges. It came up with five issues.
- Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
- Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
- Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
- Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
- Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.
Earlier this month, Google’s Deep Mind team examined the same issue and produced its own paper on a ‘big red button‘ approach to interrupting AIs when they were doing something actually or potentially harmful.