SHRDLU - Understanding Language Through Action

Michael Brenndoerfer

Data, Analytics & AI Machine Learning LLM and GenAI History of Language AI

In 1968, Terry Winograd's SHRDLU system demonstrated a revolutionary approach to natural language understanding by grounding language in a simulated blocks world. Unlike earlier pattern-matching systems, SHRDLU built genuine comprehension through spatial reasoning, reference resolution, and the connection between words and actions. This landmark system revealed both the promise and profound challenges of symbolic AI, establishing benchmarks that shaped decades of research in language understanding, knowledge representation, and embodied cognition.

Part of History of Language AI

This article is part of the free-to-read History of Language AI book

View full handbook

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

1968: SHRDLU

In the late 1960s, artificial intelligence research faced a critical question: could computers genuinely understand language, or were they limited to clever mimicry? Two years after ELIZA had demonstrated that pattern matching alone could create a convincing illusion of comprehension, Terry Winograd's SHRDLU system took a radically different approach to natural language understanding. Rather than relying on surface-level tricks to simulate conversation, SHRDLU attempted to build authentic understanding by grounding language in a physical environment.

The key insight driving SHRDLU was that true language comprehension requires more than just processing words. It demands an understanding of the world those words describe, the ability to reason about objects and their relationships, and the capacity to connect linguistic commands to physical actions. To explore these ideas, Winograd created a simulated environment, a world of colored blocks, pyramids, and boxes that could be manipulated by a virtual robot arm. Within this carefully constrained domain, SHRDLU demonstrated capabilities that had never been seen before in a computer system.

Loading component...

Understanding Through Action

SHRDLU's approach to language understanding was fundamentally grounded in action and perception. The system operated within what researchers called a "blocks world," a simplified virtual environment populated with geometric objects: blocks, pyramids, and boxes of various colors, all resting on a flat surface. A simulated robot arm could manipulate these objects, picking them up, moving them, and stacking them according to instructions. This tight coupling between language and action became the foundation for SHRDLU's remarkable capabilities.

What made SHRDLU truly revolutionary was the sophistication with which it processed language and connected it to behavior. The system could parse complex sentences using formal grammar rules">formal grammar rules, breaking down the structure of commands and questions to extract their meaning. It maintained a detailed world state, continuously tracking the positions, properties, and relationships of all objects in its environment. When given a command, SHRDLU could plan and execute appropriate actions, moving objects according to the user's instructions. Equally important, it could answer questions about its world, recalling past actions and describing the current configuration of objects.

Unlike ELIZA, which created the illusion of understanding through clever pattern matching, SHRDLU represented a genuine attempt at comprehension. It built an internal model of its world, updated that model as it performed actions, and reasoned about spatial relationships and object properties using explicit rules and representations. This was the first system to convincingly demonstrate that a computer could understand language when that language was grounded in a physical context, even if that physical world was only a simulation. The system's ability to connect words to objects, actions to changes in the world, and questions to knowledge gave it a form of understanding that went far beyond surface-level pattern matching.

The Blocks World: A Perfect Testing Ground

The choice of a blocks world as SHRDLU's domain was far from arbitrary. Winograd needed an environment that was complex enough to require genuine understanding but constrained enough to make the problem tractable with 1960s computing resources and AI techniques. The blocks world struck this delicate balance perfectly.

In this simplified domain, the vocabulary was finite and unambiguous. Each object had clear, well-defined properties: color, shape, size, and position. There was no confusion about what "the red block" meant, no metaphorical language to interpret, no ambiguity about whether "pyramid" referred to an Egyptian monument or a geometric shape. This clarity allowed the system to focus on the structure of language itself rather than struggling with lexical ambiguity.

The semantics of actions were equally clear and well-defined. Commands like "pick up," "put down," and "stack" had precise, executable meanings. Each action corresponded to a specific sequence of operations the robot arm could perform, and the effects of these actions on the world state were completely predictable. This meant that SHRDLU could verify its understanding through action: if it correctly understood a command, executing that command would produce the expected result in the simulated world.

Perhaps most importantly, the blocks world required genuine spatial reasoning. Understanding commands and answering questions demanded that SHRDLU grasp spatial relationships: concepts like "on top of," "under," "beside," "taller than," and "inside." These prepositions and comparative terms required the system to reason about the three-dimensional arrangement of objects and how they related to one another. This was no simple task, it required building and maintaining a structured representation of the spatial configuration of the world.

This carefully constrained environment gave Winograd the freedom to focus on the core challenges of language understanding without confronting the infinite complexity of open-domain conversation. Within these boundaries, he could build a system that demonstrated genuine comprehension, showing that computers could understand not just isolated words but the relationships between words, the structure of sentences, and the connection between language and the world it described.

A Conversation with SHRDLU

The best way to appreciate SHRDLU's capabilities is to observe it in action. The following exchange, adapted from Winograd's original demonstrations, illustrates the system's sophisticated language understanding and reasoning abilities. As you read through this conversation, notice how SHRDLU handles ambiguity, resolves references, and explains its own reasoning processes.

Loading component...

The Breakthroughs Hidden in the Conversation

This seemingly straightforward exchange reveals several profound achievements in natural language understanding. Each turn in the conversation demonstrates capabilities that were unprecedented in 1968 and that remain challenging even for modern systems. Let's examine what makes these interactions so remarkable.

The first breakthrough appears when the human asks SHRDLU to "Grasp the pyramid." In many blocks world scenarios, there are multiple pyramids of different colors present in the scene. SHRDLU recognizes this ambiguity immediately. Rather than making an arbitrary choice, guessing which pyramid the user intended, or simply failing with an error message, the system explicitly asks for clarification: "I don't understand which pyramid you mean." This behavior demonstrates genuine reference resolution, the ability to recognize when a referring expression could point to multiple possible objects and to handle that ambiguity gracefully. SHRDLU understands not just the words but the referential problem they create.

The second breakthrough emerges in the command "Find a block which is taller than the one you are holding and put it into the box." This single sentence requires SHRDLU to perform a remarkable sequence of cognitive operations. First, it must remember what object it is currently holding, maintaining a representation of its own state. Second, it must compare the heights of all other blocks in the scene with the height of the held object, performing spatial reasoning and comparison. Third, it must identify which specific block satisfies the height constraint. Fourth, it must plan a sequence of actions to move that block into the box, potentially requiring intermediate steps if other objects are in the way. Finally, it must execute this plan. This entire chain of reasoning happens seamlessly, demonstrating that SHRDLU can handle complex nested relative clauses and translate them into structured sequences of actions.

The third breakthrough comes in SHRDLU's handling of the pronoun "it." When SHRDLU responds "By 'it', I assume you mean the block which is taller than the one I am holding," it's doing more than simple pronoun resolution. The system is demonstrating metacognitive awareness, the ability to reflect on its own understanding and communicate that understanding back to the user. SHRDLU doesn't just silently resolve the pronoun and act. Instead, it explains its interpretation, making its reasoning transparent. This allows the human to correct any misunderstandings before the action is executed, turning the interaction into a genuine collaboration.

The final question, "What does the box contain?" demonstrates yet another capability: the ability to query the current world state and report it in natural language. SHRDLU must identify the box (using reference resolution again), examine which objects are located inside it, and generate a grammatically correct response that enumerates those objects: "The blue pyramid and the blue block." This requires not just perception and knowledge representation but also natural language generation, the ability to produce appropriate linguistic responses to questions.

Together, these capabilities painted a compelling picture of what genuine language understanding might look like. SHRDLU wasn't simply matching patterns or following scripts. It was building representations, reasoning about those representations, and connecting language to a model of the world.

The Limits of Understanding

For all its impressive capabilities, SHRDLU's limitations proved as instructive as its successes. The system's constraints revealed fundamental challenges that would shape decades of subsequent research in artificial intelligence and natural language processing.

The most severe limitation was what researchers called domain brittleness. SHRDLU's understanding was entirely dependent on its blocks world. The system couldn't generalize beyond this narrow domain. If you wanted to add a new type of object, say a sphere or a cylinder, you couldn't simply tell SHRDLU about it. Instead, you had to modify the system's core knowledge representation, update its reasoning procedures, and potentially revise its grammar rules. Extending the system to handle a different domain entirely, such as a kitchen environment or a toolbox, would require rebuilding most of it from scratch. SHRDLU's knowledge was hand-coded and domain-specific, not learned or transferable.

This brittleness stemmed from a deeper problem: the knowledge acquisition bottleneck. Every piece of knowledge SHRDLU possessed had to be explicitly programmed by a human expert. The system knew that blocks could be stacked because Winograd had written code to represent that fact. It knew what "taller than" meant because that comparison had been explicitly defined. As researchers tried to expand such systems to handle richer, more realistic domains, they discovered that the amount of knowledge required grew exponentially. Each new sentence pattern demanded new grammar rules. Each new concept required new representations and reasoning procedures. The complexity quickly became unmanageable.

SHRDLU also confronted what philosophers and AI researchers called the frame problem. This is the challenge of determining which aspects of a situation are relevant when reasoning about actions and their consequences. In the simple blocks world, when you pick up a block, most things stay the same: other blocks don't move, colors don't change, the table remains where it is. But specifying all the things that don't change is surprisingly difficult, especially as domains become more complex. In richer environments, the frame problem makes complete symbolic representation intractable.

Related to this was the symbol grounding problem, a question that became central to cognitive science and AI: how do linguistic symbols connect to real-world meaning? SHRDLU's symbols were grounded in its simulated world, but that world was itself a set of symbols. The system manipulated representations of blocks, not actual blocks. It never confronted the challenge of connecting language to raw sensory data, to actual images or physical interactions. This raised deeper questions about whether SHRDLU's "understanding" was genuine or merely a sophisticated form of symbol manipulation.

A Turning Point in the History of AI

Despite its limitations, SHRDLU fundamentally transformed how researchers thought about language understanding and artificial intelligence more broadly. The system's influence extended far beyond its technical contributions, shaping research agendas and philosophical perspectives for decades.

SHRDLU provided the first truly convincing demonstration that computers could achieve genuine language understanding, at least within a limited domain. Before SHRDLU, systems like ELIZA had created clever illusions of comprehension through pattern matching. SHRDLU showed that deeper understanding was possible when language was grounded in perception, action, and world knowledge. This shifted expectations for what natural language processing systems should aspire to achieve.

The system also highlighted the critical importance of grounding in language understanding. SHRDLU demonstrated that meaning doesn't reside in words alone but emerges from the connection between language and the world it describes. This insight anticipated decades of later research on embodied cognition, situated language understanding, and the integration of language with perception and action. Modern systems that combine language models with vision, robotics, or interactive environments are, in many ways, following the path that SHRDLU first explored.

The microworlds methodology that SHRDLU exemplified became a dominant approach in AI research for two decades. Researchers across many subfields adopted the strategy of studying intelligence within simplified, constrained domains where problems were tractable. While this approach had its limitations, it allowed researchers to make concrete progress on specific capabilities while building toward more general solutions. The microworlds tradition directly influenced research in planning, reasoning, knowledge representation, and human-computer interaction.

SHRDLU also foreshadowed modern research in what we now call embodied AI, systems that integrate language understanding with perception and physical interaction. Contemporary work on robots that follow natural language instructions, virtual assistants that control smart home devices, and multimodal models that combine vision and language all build on the insight that SHRDLU demonstrated: language understanding is most powerful when connected to action and perception.

Perhaps most significantly, SHRDLU represented both the culmination and the beginning of the end of purely rule-based approaches to natural language processing. The system showed what could be achieved through careful engineering, explicit knowledge representation, and hand-crafted rules. But it also revealed the limitations of this approach: the brittleness, the inability to scale, the knowledge acquisition bottleneck. These limitations would eventually drive the field toward statistical methods, machine learning, and data-driven approaches that could acquire linguistic knowledge automatically from large corpora of text.

In the years following SHRDLU's development, researchers would struggle with the challenge of moving beyond microworlds to handle the full complexity of human language. The path forward would require fundamental changes in approach, new mathematical frameworks, and vastly more computational power. But SHRDLU had established a crucial benchmark: it showed what genuine language understanding could look like and set a standard that future systems would strive to meet, and eventually exceed, using very different methods.

Loading component...

Quiz: Understanding SHRDLU

Test your knowledge of Terry Winograd's revolutionary language understanding system.

Loading component...

Back to History of Language AI

Previous Chapter

Viterbi Algorithm (1967)

Next Chapter

Vector Space Model & TF-IDF (1968)

Reference

BIBTEXAcademic

@misc{shrdluunderstandinglanguagethroughaction, author = {Michael Brenndoerfer}, title = {SHRDLU - Understanding Language Through Action}, year = {2025}, url = {https://mbrenndoerfer.com/writing/history-shrdlu-language-understanding-blocks-world}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-16} }

APAAcademic

Michael Brenndoerfer (2025). SHRDLU - Understanding Language Through Action. Retrieved from https://mbrenndoerfer.com/writing/history-shrdlu-language-understanding-blocks-world

MLAAcademic

Michael Brenndoerfer. "SHRDLU - Understanding Language Through Action." 2025. Web. 11/16/2025. <https://mbrenndoerfer.com/writing/history-shrdlu-language-understanding-blocks-world>.

CHICAGOAcademic

Michael Brenndoerfer. "SHRDLU - Understanding Language Through Action." Accessed 11/16/2025. https://mbrenndoerfer.com/writing/history-shrdlu-language-understanding-blocks-world.

HARVARDAcademic

Michael Brenndoerfer (2025) 'SHRDLU - Understanding Language Through Action'. Available at: https://mbrenndoerfer.com/writing/history-shrdlu-language-understanding-blocks-world (Accessed: 11/16/2025).

SimpleBasic

Michael Brenndoerfer (2025). SHRDLU - Understanding Language Through Action. https://mbrenndoerfer.com/writing/history-shrdlu-language-understanding-blocks-world

Direct link:

https://mbrenndoerfer.com/writing/history-shrdlu-language-understanding-blocks-world

Part of History of Language AI

This article is part of the free-to-read History of Language AI book

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

InteractiveSHRDLU - Understanding Language Through Action

1968: SHRDLU

Understanding Through Action

The Blocks World: A Perfect Testing Ground

A Conversation with SHRDLU

The Breakthroughs Hidden in the Conversation

The Limits of Understanding

A Turning Point in the History of AI

Quiz: Understanding SHRDLU

Viterbi Algorithm (1967)

Vector Space Model & TF-IDF (1968)

Reference

About the author: Michael Brenndoerfer

Related Content

HDBSCAN Clustering: Complete Guide to Hierarchical Density-Based Clustering with Automatic Cluster Selection

Hierarchical Clustering: Complete Guide with Dendrograms, Linkage Criteria & Implementation

Exponential Smoothing (ETS): Complete Guide to Time Series Forecasting with Weighted Averages & Holt-Winters

Stay updated