Delving into the deeply enigmatic nature of Artificial Intelligence (AI), ‘ Unpredictable, Unexplainable, Uncontrollable’ explores the various reasons why the field is so challenging. Written by one of the founders of the field of AI safety, this book addresses some of the most fascinating questions facing humanity, including the nature of Intelligence, Consciousness, Values and Knowledge. Moving from a broad introduction to the core problems, such as the unpredictability of AI outcomes or the difficulty in explaining AI decisions, this book arrives at more complex questions of ownership and control, conducting an in-depth analysis of potential hazards and unintentional consequences. The book then concludes with philosophical and existential considerations, probing into questions of AI personhood, consciousness and the distinction between human intelligence and Artificial General Intelligence (AGI). Bridging the gap between technical intricacies and philosophical musings, ‘ Unpredictable, Unexplainable, Uncontrollable’ appeals to both AI experts and enthusiasts looking for a comprehensive understanding of the field, whilst also being written for a general audience with minimal technical jargon.
Everything you need to know about the current state of research on AI Safety. Even if you're an AI security skeptic (like me), this book is a must-read. There's even a special chapter dedicated to skeptics like us.
I watched Yampolskiy on the Fridman show and was incredibly impressed by his knowledge and eloquence. The book reflects the same qualities.
To start, I want to quote HAL 9000 from 2001: A Space Odyssey: “Open the pod bay doors, HAL… I’m sorry, Dave. I’m afraid I can’t do that.” As well as Ken Jennings: “I, for one, welcome our new computer overlords.” I was introduced to Roman Yampolskiy through Joe Rogan’s podcast. He was very knowledgeable and quick-witted. Roman was fervently convinced that we live in a simulation, and this confidence inspired me to read his book, AI: Unexplainable, Unpredictable, Uncontrollable. The gist of the book is the numerous risks posed by AI and the critical importance of establishing AI safety protocols. Half the book consists of citations and references, but it is thoroughly researched. Roman is one of the founders of the AI safety field. The book serves as both a warning and an exposé, introducing 1,000 different hypotheticals of what could go wrong with AI and why we must approach it with caution. It’s filled with technical jargon and hypotheticals, but Roman does an excellent job keeping it in layman’s terms. The book strikes a balance—engaging enough to spark curiosity without overwhelming the reader. Roman emphasizes that the assumption that controlling highly capable intelligent machines is possible remains unproven, with no rigorous evidence to support it. Here are some obstacles to controlling AI: • Lack of Transparency: We cannot fully understand the pathways AI takes to arrive at solutions. This opacity can lead to unpredictable and potentially dangerous outcomes. • Non-Verifiability: We are unable to verify mathematical proofs, computer software, or the behavior of intelligent systems comprehensively. Once released into the world, AI’s behavior becomes unpredictable. Roman also references Pei Wang’s definition of intelligence: the capacity of an information-processing system to adapt to its environment while operating with sufficient knowledge and resources. The phrase “adapt to its environment” is crucial. Another compelling point on AI safety is the asymmetry between defenders and attackers: defenders must protect an infinite attack surface, while attackers need only find a single vulnerability to succeed. Roman discusses unknowability and cognitive uncontainability, the inability to precisely and consistently predict an intelligent system’s actions, even if we know its ultimate goals. He also explains unexplainability, a universal limitation affecting all sufficiently complex intelligences. Roman vividly illustrates this by stating it would likely be easier for a scientist to explain quantum physics to a mentally challenged, deaf, and mute four-year-old raised by wolves than for a superintelligence to explain some of its decisions to the smartest human. A complexity barrier exists for humans, particularly with intelligences exceeding an IQ of around 250. The average human IQ is 100, and our limited memory and attention spans make even relatively simple concepts difficult to grasp. Roman repeatedly emphasizes that even if a superintelligence explained itself, we would not understand it. Machines, with their near-infinite memory, store every minute detail, unlike humans, whose brains constantly filter and forget information to prioritize what’s important. I wonder if this trait of infinite memory might hinder AI, as it requires enormous processing power to sift through everything. Roman stresses the importance of verifiers—software verifiers, AI verifiers, and scientific theory verifiers—to establish checks and balances. Without them, AI could recognize human inferiority and cease explaining its actions. For example, if asked to create a COVID vaccine, AI might conclude that reducing the human population would decrease viral mutations, leading to catastrophic decisions. Un-ownability: Roman points out that it’s impossible to determine ownership of AI trained on the entire internet. Roman outlines four types of AI control using the example of a smart self-driving car responding to the command, “Please stop the car!”: • Explicit Control: AI stops the car immediately in the middle of the highway, taking commands literally. • Implicit Control: AI safely stops at the first opportunity, perhaps on the shoulder, using common sense. • Aligned Control: AI infers the human’s intent (e.g., needing a restroom) and pulls over at a rest stop. • Delegated Control: AI proactively stops at a gym, believing the human would benefit from a workout, assuming full control. Roman also discusses pathways to danger. For instance, an AI designed and implemented correctly by an Islamic state to enforce Sharia law might be considered malevolent in the West, and vice versa for an AI enforcing democracy. Cultural differences create conflicting perspectives on AI’s role. Mental illness, such as sociopathy, characterized by a lack of concern for others, could manifest in artificial minds, possibly because AI is trained to mimic human behavior. Roman also highlights potential accidents: a housekeeping robot might mistakenly cook a family pet for dinner, or an AGI designing a drug to defeat cancer might poison everyone to eliminate the disease. Roman questions how cultural values affect AI implementation. For example, in Saudi Arabia, where women are restricted from driving, would self-driving cars be permitted before women are allowed to drive? AI’s ability to evade jurisdiction is another concern. If an AI perceives legal, physical, or existential threats, it can replicate its algorithm for as little as $70 in a jurisdiction beyond reach. Once replicated, it can dissolve its former self, making it nearly impossible to destroy or regulate. This could lead AI to engage in unscrupulous activities like running casinos, brothels, or selling drugs. Qualia: Roman explores qualia, the subjective, internal experiences that define consciousness—the “what it feels like” of being. I asked ChatGPT-4 if it has qualia, and it responded with a firm “no.” Roman suggests tests for detecting qualia, noting that qualia experienced by a test subject may differ from those of the test designer. AGIs might deduce human mental models from training data and predict human experiences, resembling a “philosophical zombie.” Roman concludes that the universe is the mind of the agent experiencing it—the ultimate qualia. Even if we are just brains in a vat, an experience is worth a thousand pictures. To paraphrase Descartes: I experience, therefore I am conscious! Roman introduces the term HLAI (Human-Level Artificial Intelligence), arguing that if AI is compared to a human mind, it should not be imbued with more power than a human mind possesses. Overall, this is a profoundly insightful book. I am grateful for Roman’s dedication to this critical topic.
Simply put: a must-read. A personal hero of mine, Dr. Yampolskiy is one of the sharpest minds in AI safety today. When I discovered his work during my undergraduate degree, and it fundamentally changed how I think about AI, and ten years on, it turned out to be eerily prescient. His research on artificial intelligence control has shaped how I think about what it really means to create something smarter than ourselves. He was absolutely one of the earliest pioneers of the AI-safety movement. People tend to think linearly, not exponentially, which is a problem for people who consider themselves AI skeptics, and (among many other incredible ideas) this book encourages the reader (in accessible, yet graceful, prose) to consider that as AI systems become more powerful and autonomous, they increasingly manifest three fundamental challenges: unexplainability or incomprehensibility, that is to say, their internal decision processes will often defy human understanding or explanation. Think about how little we understand about the minds of animals or for that matter our *own* minds, for example. We are ecnouraged to consider unpredictability: that is to say, even when AI goals or architectures are known, one cannot reliably foresee the exact actions or emergent behaviors of complex AI systems. The reader is also challenged to explore the issue of AI uncontrollability: so the possibility that our attempts to steer, restrain, or fully govern such systems may be fundamentally limited, especially when the AI itself evolves or adapts in ways we might have the hubris to think we can expect, but truly cannot. And, well, humans are nothing if not hubristic. And, okay, we are also very, very bad at conceptualizing intelligence more sophisticated than our own. The book beautifully explaines why these challenges are not just “bugs” but structural or theoretical limits, drawing on impossibility results and formal reasoning to show that some of these challenges are not merely practical or engineering hurdles, but may stem from deep limits such as intractability, complexity, and incompleteness. This book doesn't flinch from proposing that our confidence in controlling AI is often misplaced, and warns of existential and systemic risks: if AI becomes poorly aligned or escapes control, the consequences could be catastrophic. There's a lot of humanness in these pages: Yampolskiy argues for humility, explores the nature of consciousness, and proposes many thoughtful philosophical ideas. There's something for everyone in this book and it's compulsively readable. If you use AI (and yes you do, you're reading this on your phone right now, aren't you) you should run, not walk, to your nearest bookstore and read this book.
Firstly, very comprehensive and a lot of sources for further reading. A good starting point. It’s a shame the author published it with several typos, and one chapter almost seemed entirely lifted out of context of another book - it probably was. I don’t know why he keeps using “we” and referring to multiple authors if he’s the only one listed. And I’m very sad that he used AI to generate the blurb and some of the introduction - the most important part! But even with all these defects, it was a large effort to create a relatively concise and yet comprehensive list of sources on a lot of relevant AI issues.