This report explores the core case for why the development of artificial gen- eral intelligence (AGI) might pose an existential threat to humanity. It stems
from my dissatisfaction with existing arguments on this topic: early work is less relevant in the context of modern machine learning, while more recent work is scattered and brief. This report aims to fill that gap by providing a detailed investigation into the potential risk from AGI misbehaviour, grounded by our
current knowledge of machine learning, and highlighting important uncertain- ties. It identifies four key premises, evaluates existing arguments about them,
AGI Safety is a well-intentioned research paper replete with jargon and intended for academics in the AI/AGI safety field. This reviewer is not qualified to provide any feedback on its incremental scientific/research utility for other professionals in the field. The review is an attempt at the far broader issue the paper ignores or does not cover adequately and from that the far bigger control mechanism that could be possible and worth discussing.
The issue is the time element in the comprehensibility of AI/AGI. The paper still talks about any AGI that may evolve in the immediate future, while the humans are mostly in control of the AI development. The entire framing of AGI, goals, processes, and methods discussed in the article is how AGI codes and structures could look based on the machine learning as it exists.
An ever-exponentially improving AGI could be a chaotic jumble, with the footprints of its creators' intents, mal-intents, methods, errors, and resources only in the early days, which are now. We are already at a stage of being avowed by its specialized versions' end results (the way Google seems to figure out what we are about to search or AlphaGo plays the game of Go). More importantly, we already require hours and years to discern a bare sketch of how it performs specific tasks in a fraction of a second.
Our dependency and inability to fathom our creations will soon result in a stage where humanity becomes a toddler in the world whose forces it can only marvel at, like the cavemen. For a handful of more decades, we will retain the illusions of control and understanding. Still, unlike for the hunter-gatherers, our society is on an exponential, de-evolutionary slide from a human comprehensibility perspective - here, what we understand of the events or forces around us continues to decline even if what we obtain from them improves.
There may not be any AGI (let's forget the definitional issues) for a while - say even a few hundred years or ever - with the intention (the way we humans define in our languages) to take over the world or suppress humans as a race. However, the resultant some world somewhen may not appear any different. We continue to describe all future human AGIs in human languages (including our understanding of math). Some machine somewhere might have already created a 100-page-long equivalent mathematical paradox in an equation form whose mere existence we will forever remain unaware of. Say, it could be already within the current version of a GPT-3, which all future versions of GPT would try to solve/improve with the effects evident to us only in the results. The point is that in neuroscience, genetics, machine learning deployed in quantum computing, machine vision, etc., AIs are already doing things we cannot understand. Our descriptions - and from there the efforts of control - are falling woefully short as in this article. And, once again, this is the state of the affair in 2021!
This is also all before we factor in other uncontrollable forces that are in the human domain. Irrespective of the local or global control groups being formed to retain humanity's control over machines, competitive forces between nations, businesses, and other types of human groups will find quick and surreptitious ways around them. Most control groups are scantily funded compared to the AI-generating behemoths trying to gain some advantage over rivals (business, political, or any other type).
Like natural evolution, AI evolution could be because of the inherent mutations and modifications within and path dependent. Some of today's errors and omissions in the program code or data collection/partial analysis may result in effects with massive implications near- or long-term. One can imagine the examples in vaccine creations or tomorrow's potential brain control procedures easily. The path dependency is also created from the activities of today's relatively dumb machines - for instance, in the roles played by machines in the creations, modifications, re-creations and re-modifications of quantum theory and computers.
From the days of early Homo Sapiens to now, our ability to understand and manipulate our surround was improving. Now, it is on an irreversible reverse because of our own creations. In articles like this, we are clutching at straws in making the incomprehension comprehensible like the earliest faith-healers. Perhaps, one should not be so nihilistically fatalistic. Still, it isn't easy to see how humans retain their dominance over the next two millennia, if not the next two centuries or even decades. This article certainly does not show a way to this reader.
That said, we should try with all sincerity and earnestness like the author. Given that we are not switching off all machines for the sake of humans of all future, almost any alternate understanding or suggestions are likely to have the same chances of success. If a reader starts with this attitude and ignores the larger level of abstraction discussed above, there is some new material in the book, particularly for the researchers involved in the formulation of guidelines for today's machines. This reader has many issues with details, too, at every point, but most of them are tiny compared to the assumptions of time-immutable, comprehensibility at the root.
The alternate approach is to fight fire with fire. Suppose the researchers/controllers spend more time using competitive rather than regulatory forces while contemplating the machines overall (vying against each other on results) or processes within. In this scenario, the focus is on building a vast variety of new/competing machines with alternate goals of monitoring each other and controlling those showing human-perspective adversarial results. The same rivaling mechanism could be considered in processes and details, rather than designing the top-down, one-size rules that are so un-implementable in the world we live in, if not outright impractical in controlling the machines forever. The researchers' focus could be on how to create a more chaotic, competitive machine world. In that case, we might have far more benefits for far longer. Just a thought!
Provides a nice, comprehensive overview of AI safety. Recommended for people who want an introduction to some main ideas within the field. Concludes with a good overview of relevant disaster scenarios.