Overview

1 What Did Ilya See?

The chapter opens with the viral question “What did Ilya see?” and uses it to frame a broader inquiry into how frontier AI advances collide with rising safety anxieties. It introduces Sutskever’s List—a curated set of papers he once said covered “90% of what matters”—as a lens for understanding both the evolution of modern AI and Sutskever’s mental model. Rather than presenting isolated results, the narrative weaves technical milestones with cultural shifts to surface recurring tensions: acceleration vs. caution, scaling vs. alignment, and ambition vs. uncertainty. GPT-2 serves as the early turning point that makes these stakes concrete, setting context for deeper technical discussions that follow.

The story traces Sutskever’s rise from graduate work with Geoffrey Hinton to the AlexNet breakthrough that catalyzed the deep learning era, his impact at Google Brain, and the founding of OpenAI as a counterbalance to Big Tech. Early OpenAI efforts—Gym, Dota 2, and the shift to Transformers—culminated in the GPT series, with Sutskever credited as a chief architect of GPT-3’s scaling. As his stature grew, he became more publicly reflective about AGI’s implications, even venturing provocative ideas about consciousness in large neural networks. Throughout, the chapter presents his intellectual journey not as biography but as the thread connecting research choices, institutional strategy, and an evolving philosophy of responsible progress.

Safety concerns gradually eclipse optimism: Sutskever champions “Superalignment,” urges urgency about superintelligence, and infuses OpenAI with a quasi-spiritual focus on beneficial AGI. The earlier GPT-2 episode—staged release, public warnings, and backlash over “ClosedAI”—emerges in hindsight as a rehearsal for governance dilemmas to come. Those tensions peak in late 2023 when the board ousts Sam Altman amid opaque risk worries and rumors of a breakthrough, triggering an employee revolt, Sutskever’s public regret, and his eventual 2024 departure alongside the dissolution of the Superalignment team. The enduring meme “What did Ilya see?” captures the chapter’s central motif: a field advancing by scaling, haunted by the suspicion that its own creators may glimpse risks the rest of the world has not yet fully understood.

The List

Legendary game developer John Carmack faced a challenge. After reshaping the gaming world with iconic titles like Wolfenstein, Quake, and DOOM, Carmack directed his curiosity toward artificial intelligence. Yet, AI was uncharted territory. During a recruiting meeting with OpenAI’s founders, Carmack inquired about how he could accelerate his understanding. In response, Sutskever handed him something that quickly took on a life of its own: a curated reading list that Ilya promised would deliver “90% of what matters today” in artificial intelligence.

Carmack took the challenge to heart. “And I did. I plowed through all those things, and it all started sorting out [AI] in my head.”

The notion that readers could access “90% of what matters” from so few references in a rapidly advancing field was tantalizing. The fact that Sutskever personally curated the list added to the allure, especially given his measured online presence. Despite having over 481,000 followers on X, he posts infrequently, adding to the mystique. Sutskever’s reticence transforms what might otherwise be a routine reading list into a totem.

Yet, despite Carmack frequently mentioning the List in interviews, its contents remain undisclosed. What’s interesting is that the secrecy didn’t cause people to lose interest. In fact, it had the opposite effect. As word spread on forums and social media, the excitement grew, especially following ChatGPT’s popularity. ChatGPT was released in November 2022 and reached over 100 million users within two months of its launch.[84] This was a catalyst, reigniting curiosity about the List. By early 2023, interest had reached a fever pitch. A Hacker News thread, “What were the papers on the list Ilya Sutskever gave John Carmack?” drew more than one hundred and thirty comments of speculation and crowdsourced guesses.[85] So many people wanted the List that Carmack posted on X, expressing his hope that Ilya would make it public. Carmack wrote that “a canonical list of references from a leading figure would be appreciated by many.”[86] Yet, that never happened.

The intrigue surrounding Sutskever’s List has grown over time. It has become a common cultural touchstone. “Have you read Sutskever’s List?” has become shorthand for asking if someone knows the fundamentals of modern AI research, even if that person hasn’t actually read the papers. Just knowing about the List carries cachet.

Today, a certain mythos clings to it, partly because Ilya has never officially published it, despite Carmack’s public nudging. Paradoxically, Sutskever’s silence reinforces its allure, drawing in everyone from eager enthusiasts to intrigued book publishers, some of whom, I can personally attest, found the temptation irresistible. In a field where breakthroughs arrive relentlessly, there’s something uniquely comforting about the notion of a stable canon, quietly handed down by one of the field’s grandmasters. Yet, to many, it’s more than a study guide. Many look at Sutskever’s List and try to decode his worldview. What exactly does Ilya Sutskever believe we need to understand about AI? Is the list for safety hawks? Alignment believers? Friends of AGI? Practitioners? In a word, yes.

The List includes papers defining the deep learning revolution in computer vision, such as AlexNet and ResNet. It also highlights attention-based neural networks like Pointer Networks and Transformers. Recurrent and long short-term memory networks are also prominent, reflecting the dominant natural language processing paradigms before attention-based architectures. The List also included pedagogical works such as Andrej Karpathy’s blog, The Unreasonable Effectiveness of Recurrent Neural Networks, Chris Olah’s influential essay, Understanding LSTM Networks, and Alexander “Sasha” Rush’s, The Annotated Transformer.

The list recognizes engineering innovations and includes papers such as GPipe, which details pipeline parallelism for training large models. It also discusses scaling laws, including OpenAI’s 2020 study, demonstrating how model performance improves as scale increases. These artifacts underscore Sutskever’s interest in understanding how far brute-force scale can push the frontier and at what cost.

The List doesn’t stop at engineering feats or new architectural designs. It also includes papers on minimum description length, Kolmogorov complexity, and algorithmic randomness, all topics Ilya regularly discusses in interviews. The presence of Scott Aaronson’s blog, The First Law of Complexodynamics, and his whimsical Coffee Automaton signal an openness to complexity theory.

In short, Sutskever’s List offers a panoramic view of the converging threads that define the era. From deep learning breakthroughs in vision and language to scaling strategies and architectural innovations to theoretical reflections on complexity and intelligence. The List is a compilation of the last fifteen years of machine learning research. It’s little wonder that many now treat these works as a de facto canon for serious AI practitioners, engineers, and researchers.

This book analyzes the artifacts on the List primarily by theme, though the general progression still follows a loose chronological order as we connect conceptual threads. Moreover, each artifact is given dedicated attention including its technical and cultural significance. But some artifacts are treated more concisely than others. Rest assured that every artifact on the List will be explored, leaving no essential insights overlooked.

Additionally, while the List guides the book’s core narrative, primarily covering roughly a decade from 2012 to 2022, our exploration extends beyond this window. The book selectively introduces earlier and later research to provide critical context, illuminate blind spots, clarify Sutskever’s broader philosophical perspective, and trace the threads connecting older research to modern practice. Nevertheless, the primary focus remains securely tethered to these core artifacts and their immediate intellectual surroundings.

Manning books often adopt a “mental model” approach, offering readers a clear conceptual framework shaped by an author’s experiences and insights. This book is different. The mental model presented here is not my own; instead, it reflects Ilya Sutskever’s worldview, which is derived directly from the themes and artifacts in his List and shaped further by his public statements and interviews. By unpacking Ilya’s mental model, readers gain more than historical context; they acquire a conceptual toolkit for interpreting past developments, anticipating future shifts, and guiding practical decisions.

Furthermore, understanding Sutskever’s perspective equips practitioners to navigate polarized debates around AI, while equipping practitioners to recognize which architectures have enduring value. Understanding what has worked—and why—not only informs practical engineering trade-offs but also positions readers to understand how we got here and perhaps anticipate future paradigm shifts rather than react to them. In short, the book provides conceptual clarity and, where applicable, practical insights, all viewed through the eyes of one of AI’s most influential thinkers.

With this context in mind, let’s introduce the foundational themes of Sutskever’s worldview. Consider these initial insights not as definitive conclusions, but as starting points we’ll progressively refine while exploring deeper into the ideas shaping modern AI.

Don’t Bet Against Deep Learning: Ilya explicitly believes that “one doesn’t bet against deep learning.”[89] This belief is reflected in the List, highlighting deep learning and its successes while omitting references to older symbolic approaches and classical planning. Remarkably, reinforcement learning (RL) is also missing, despite Sutskever leading OpenAI’s early work on reinforcement learning and its role in projects like AlphaGo and OpenAI Five. Instead, the list concentrates almost solely on supervised and self-supervised learning with deep learning.

Engineering Pragmatism: The List prioritizes large-scale engineering efforts over purely theoretical innovation. Influential projects like AlexNet and Deep Speech 2 illustrate this philosophy, combining established techniques and significant computational resources to deliver unprecedented real-world performance gains. While critics dismiss this approach as merely incremental “engineering” rather than innovative “science,” Ilya argues that tangible progress arises precisely from pragmatic experimentation, iterative improvement, and real-world deployment, rather than isolated theoretical insights.

Do More with Less at Scale: Sutskever’s research philosophy is that genuine progress emerges from refining ideas at scale and uncovering what works through experimentation. For example, ResNets introduced a simple architectural trick called residual connections that unlocked the training of extremely deep models.[90] Transformers replaced recurrent architectures with parallelizable attention. Even a paper like “Order Matters,” one of the more niche papers on the List, explores how rearranging sequence data can make training more efficient. These examples emphasize conceptual simplicity and practical impact at scale and aligns with Ilya’s preference for “minimum innovation for maximum results.”[91]

Emergence and Compression: Sutskever’s List explores Kolmogorov complexity, minimum description length (MDL), and even an essay pondering how complexity rises then falls in closed systems, all hinting at a philosophical bent in his thinking. Why would a busy AI engineer recommend reading about algorithmic randomness or the “coffee automaton”? This is likely because Sutskever views intelligence as a compression process and ultimately finds simpler, more abstract representations of complex phenomena. In this view, intelligence emerges when a system can distill raw experience into a minimal, generalizable form.

With the conceptual framing in mind, let us next turn to Sutskever’s List. Before Residual Networks, before Transformers, before scaling laws, there was AlexNet. It marked the end of one era and the beginning of another.

FAQ

What does the phrase “What did Ilya see?” refer to?

It emerged after a November 2023 tweet by Elon Musk asking why Ilya Sutskever wanted to fire Sam Altman. The question went viral, capturing fears that Sutskever had glimpsed a serious AI risk and becoming shorthand for broader anxieties about frontier AI.

What is “Sutskever’s List,” and why is it central to the book?

It’s a curated set of ~30 papers Sutskever once described to John Carmack as containing “90% of what matters today.” The book treats the papers as interconnected threads that reveal both AI’s evolution and Sutskever’s mental model—his priorities, themes, and tradeoffs.

Why does Chapter One focus on GPT-2 even though it’s not on Sutskever’s List?

GPT-2 marks a turning point where OpenAI publicly pressed pause, foregrounding the tension between scaling and safety. Its controversy illustrates the real-world stakes of Sutskever’s philosophy and foreshadows later conflicts inside OpenAI.

How did Ilya Sutskever rise to prominence in AI?

He studied under Geoffrey Hinton at the University of Toronto and co-authored AlexNet, which won ILSVRC 2012 and transformed computer vision. Google acquired DNNresearch in 2013; at Google Brain he helped advance deep learning (e.g., TensorFlow collaboration, AlphaGo paper). He later co-founded OpenAI.

Why was OpenAI founded as a counterbalance, and what was Sutskever’s role?

After clashes over AI’s future—most notably Musk’s dispute with Larry Page—OpenAI was conceived as a counterweight to incumbent power. As Chief Scientist, Sutskever became the “soul” of the lab, pushing ambitious research while navigating openness, safety, and the pace toward AGI.

What was the GPT-2 controversy, and what risks were cited?

On Feb 14, 2019, OpenAI withheld the full GPT-2 (1.5B parameters), citing risks like fake news, impersonation, abusive content, and automated spam/phishing. It staged a gradual release, drawing criticism (“ClosedAI”) and debate. The full model was released in Nov 2019 with “no strong evidence of misuse.”

How did GPT-2 change norms for releasing AI models?

It introduced “responsible disclosure” and staged releases, anticipated by OpenAI’s updated Charter. The approach influenced others (e.g., Grover’s staged release; Google’s Imagen withheld; Meta’s restricted LLaMA 1). GPT-2 became the first high-profile pause balancing innovation against safety.

What led to Sam Altman’s ouster, and how did Sutskever’s stance shift?

On Nov 17, 2023, OpenAI’s board removed Altman, citing a “breakdown of communications.” Reports referenced a safety letter and a secret “Q*” effort. After intense backlash, Sutskever apologized three days later, stepped down from the board, and eventually parted ways with OpenAI in May 2024.

What was the Superalignment team, and why did it matter?

Co-led by Sutskever and Jan Leike, it aimed to align a potential superintelligence within four years. Sutskever argued such systems could arrive this decade, reflecting his urgency about safety. Both leaders left in 2024, and the team was dismantled.

What themes and tensions does the chapter set up for the rest of the book?

It surfaces acceleration vs. caution, scaling vs. alignment, ambition vs. uncertainty, and acting early vs. acting too late. The chapter provides historical context (AlexNet, ResNet, Transformers) and frames the List as a map of ideas behind one of AI’s most consequential eras.

2.5.1 Network Architecture

2.5.2 Training Innovations

2.5.3 Data Augmentation

2.5.4 Efficient and Scalable Training

2.5.5 The Fallout

3.2.1 ResNet vs. AlexNet & ZFNet

3.2.2 ResNet vs. VGGNet

3.2.3 ResNet vs. GoogLeNet

3.4.1 Post-Activation to Pre-Activation

3.4.2 Ablation Studies

4.4.1 Core Architecture

4.4.2 Training Techniques

4.4.3 Architectural Enhancements

4.4.4 Language Models and Decoding

4.4.5 Significance and Broader Impact

4.4.6 An Engineering Shift

pro $24.99 per month

lite $19.99 per month

team

pro

team

pro

team