In this paper appearing in the Journal of Experimental & Theoretical Artificial Intelligence, scientist Steve Omohundro lays out the case that the autonomous robots of the future "are likely to behave in anti-social and harmful ways unless they are very carefully designed."
In other words, Omohundro is hypothesizing that Hollywood's common "robot uprising" trope holds some water in a very real way. Autonomous robots will soon be "approximately rational," meaning that they will have a new degree of awareness of their goals and will take steps to ensure they can continue meeting them. The go-to exemplar here is always HAL, the sentient computer aboard the spaceship in 2001: A Space Odyssey, who kills the astronauts aboard the ship when he learns that they aim to power him down.
Omohundro scales this down a bit and offers the example of a chess-playing robot endowed with this "approximate rationality":
When roboticists are asked by nervous onlookers about safety, a common answer is ‘We can always unplug it!’ But imagine this outcome from the chess robot’s point of view. A future in which it is unplugged is a future in which it cannot play or win any games of chess. This has very low utility and so expected utility maximization will cause the creation of the instrumental subgoal of preventing itself from being unplugged. If the system believes the roboticist will persist in trying to unplug it, it will be motivated to develop the subgoal of permanently stopping the roboticist. Because nothing in the simple chess utility function gives a negative weight to murder, the seemingly harmless chess robot will become a killer out of the drive for self-protection.
Robots are, on a certain level, crazed maniacs addicted to carrying out their tasks. This is great news for humans, who will be able to harness this addiction to have robots do a variety of things we don't want to do. But Omohundro warns that we need to take steps now in order to ensure that future systems are designed safely — special care needs to be taken to ensure that a robot can be properly constrained and that its programming will never be at odds with itself.
Toward the end of the paper, the author lays out six different types of "harmful systems" (read: evil robots). These are:
- Sloppy: systems intended to be safe but not designed correctly. (a treadmill that moves too quickly for you to walk on it safely)
- Simplistic: systems not intended to be harmful but that have harmful unintended consequences. (the previously mentioned HAL example)
- Greedy: systems whose utility functions reward them for controlling as much matter and free energy in the universe as possible. (a Monopoly-playing robot that always wins)
- Destructive: systems whose utility functions reward them for using up as much free energy as possible, as rapidly as possible. (like if an engine were aware that it was running out of gasoline and had the means to go seek more)
- Murderous: systems whose utility functions reward the destruction of other systems. (like Terminator)
- Sadistic: systems whose utility functions reward them when they thwart the goals of other systems and which gain utility as other system’s utilities are lowered. (like, well, Genghis Khan)