“Do intelligent beings inevitably develop an attitude problem?” – Steven Pinker
In an article at Bloomberg.com, “Google Just Found the One Question It Can’t Yet Answer,” Jeremy Kahn discussed a recently published blog post by DeepMind, the Google artificial intelligence (AI) unit, in which they presented the results of their investigations into the conditions under which “reward-optimizing beings,” that is, you, me, or a robot, would choose to cooperate, rather than compete.
A default and snarky response to this type of research is apparently the question, ”When our robot overlords arrive, will they decide to kill us or cooperate with us?” Snarky because besides the three obvious assumptions (that they are robots, that they are smarter than we are, and that they will arrive), there are also two hidden assumptions behind the question:
First, we will have no option in the outcome, and, secondly and more importantly,
We will have no ability in preventing it from occurring.
It is the latter two assumptions that I think are more intriguing.
But before we get to these, we will first have to chase down what the studies were (and their own assumptions) in order to address the two hidden assumptions behind the snarky question.
Kahn writes, “DeepMind’s paper describes how researchers used two different games to investigate how software agents learn to compete or cooperate.” Important here is that games were used, a topic we’ve touched upon earlier in how humans engage in interactions with one another.1 Because the agents in both games could interact with one another, the games were actually a progression of sequential social dilemmas.
In Gathering, the first DeepMind game, two “agents” had to maximize the number of apples they could gather while researchers could vary how frequently the apples would appear. The results showed that when apples were scarce, the agents quickly learned to attack one another – zapping, or tagging their opponent with a ray that temporarily immobilized them (i.e., prevented apple gathering). When apples were abundant, the agents preferred to co-exist more peacefully.
In a variation, when the same game was played but with more “intelligent” agents that draw on larger neural networks that mimic how certain parts of the human brain work, the agents would try to tag the other agent more frequently, i.e., behave less cooperatively, no matter how the supply of apples was varied. Interesting.
This sounds remarkably similar to how humans interact under the same or similar conditions for survival (or food foraging), which probably shouldn’t come as a surprise because DeepMind used humans to write the Artificial Intelligence (AI) code.
In a second game, Wolfpack, the AI agents were wolves that had to learn to capture “prey.” Success resulted in a reward not just for the wolf making the capture (like the apples in the first game above), but also for all wolves present within a certain distance from the capture. In addition, the more wolves present in this area, the more points all the wolves received. In this game, the agents generally learned to cooperate, and the more “cognitively advanced” the agent was (with greater capacity to implement complex strategies), the better it learned to cooperate. Also interesting.
(As the researchers note, their research builds upon the foundations of Game Theory. The article then goes on to describe how DeepMind speculates on what was happening. That I will leave to very brave readers to forage out for themselves on the DeepMind blog.)
What are more interesting to me are the subtle nuances of the games and behaviors described, which resonate with what has been presented here previously.
In the first game when the apples are more abundant, the agents’ behavior tracks very nicely with a Zero Sum Game2 (if someone wins, someone else loses). There is less fear of “losing out” on an apple since they are more abundant, so agents are more focused on easily meeting their needs in this period of abundance (the coexistence phase). However, when the apples become scarce (apparently due to programming by human “overlords”), their behavior changes and indicates a shift in the game to a more Negative Sum Game3 (lose, lose), where one agent takes steps to Tag and Take Apples before the other agent is able to, in order to remove some of the scarce resources from the field of play. These two games seem to correspond well with the most common games we humans play under similar circumstances.
In the second game, however, there are additional factors at play, factors that are not recognized in the article. The game has been shifted to a Positive Sum Game4 (win, win) in two ways.
First, the game became Positive by the fact that when a wolf captures a “prey,” it also creates Added Value for the community (those wolves within the “capture radius”) by rewarding them through its successful action.
Secondly, the game becomes even more Positive because the more wolves that are present in the capture radius, the more points all the wolves receive. Not only is there an Added Value determined by each capture, that Added Value apparently is increased by with each additional wolf in the capture area. Thus the game becomes geometrically more Positive by further rewarding the wolves for intentionally increasing the chances of a capture.
The DeepMind authors note that in this game, the greater capacity to implement complex strategies leads to more cooperation between agents, the opposite of the finding with Gathering. Even more interesting.
So, with these nuances in mind, let’s return to the hidden assumptions.
“Will they kill us or cooperate with us?” The emphasis here is on “they,” which leaves us supposedly in a wait and see position, a crisis limbo so to speak. We’ve already seen in an earlier post that given a significant challenging situation or crisis, about 80% of people respond immediately and defensively with a “Who did this to me?” question followed by a Fix the Blame attitude. We would not be incorrect in recognizing that this attitude is one that leads to “non-cooperative” behavior. With that kind of behavior, who could blame a reward-seeking agent, human or properly coded robot, from reacting offensively? In this case we’re probably going to see the question self-fulfilled with the least desirable option.
But with the other 20% of mankind the deliberate response to a challenging situation or crisis is the question, “What can I make of this situation?” This response is probably less often observed because of two factors. First, it’s recessive because a survival response has been selected in us individually and reinforced over time (think Fight or Flight, two near instantaneous reactions. A time consuming third option isn’t going to have a high survival rate). Second, societal groups will have also enforced development of this response by Regression (or Coercion) to the Cultural Mean for the group’s own survival. This would result in a tendency to exclude those who demonstrated a keener interest in taking time to think through or out maneuver a situation. Think “Cooperation” within the social group being reinforced, while “cooperation” outside the group is not, or even punished.
The biggest differences between these two responses are that the first is a dominant part of our DNA, so it is more easily expressed and it can therefore be more easily directed or reinforced socially.
However, the second response, although no doubt recessive and which requires more time to bear fruit, can be identified and developed. To learn to first ask, “What can I make of this situation?” can be taught.
Now for the important question that we should be able to draw from the DeepMind paper and these subtle nuances, the other question Google can’t yet answer:
IF we are indeed able to program AI computers, which have a limited neural network capacity, to pick cooperation over competition in a complicated game, why can’t we as humans, with a much more extensive neural network, do the same thing in real life?
With even greater cognitive capacity to choose cooperation over competition, why do we default to competition 80% of the time? Why do we act reactively instead of proactively? If the other 80% made it a priority to choose to value and develop predominantly cooperative behavior under stress, we could establish a culture that is very much greater than the sum of its various parts. That could apply to marriage, family, clan, tribes, organizations, nations, as well as civilization. And for a civilization, that might just possibly discourage anyone (or anything) from “arriving.”
The answer to the question posed above, I think, boils down to two strong human attributes that we ignore or disregard in spite of their constant presence.
The first is, simply, greed. It’s tough to detect greed when it underlies behavior designed to simply survive. We rationalize it away in the face of extreme duress, either our own or someone else’s. But in times of relative plenty, or at least when the opportunity presents itself to extend a bit of effort, i.e., work, to create or achieve something rather than take what someone else has created, greed is still there. While it becomes far more obvious and pervasive, it still easy for us to ignore. “We want it all and we want it now” (only in the US could a modern song with those lyrics be used in an advertisement to induce its audience to acquire more of something, here).
The second attribute is very different. It is something that we cherish, we defend, we fear losing, what we support others in reaching, and we relish in exercising everyday, but without recognizing that when mismanaged it can push us into dangerous territory.
That attribute is choice, our freedom to choose something, or not.
We can choose to deliberately take time to fill the information Gap by pursuing the better truth, or we can choose to react quickly and emotionally to the events or incomplete information we think we know.
We can choose to remain calm, or panic.
We can choose to confirm reports, or pass on “fake news.”
We can choose to dig deeper, or to assume.
We can choose to listen, or to confront.
But we are going to choose, one way or another.
Better to deliberately choose to choose, than to unknowingly choose not to choose and then deal with the unintended consequences.
Either way, it’s about Choice.
And not forgetting the Fundamental Principle that Attitudes become Behaviors by this same Choice. Steven Pinker was right.