Lesk Algorithm: Word Sense Disambiguation & the Birth of Context-Based NLP

Michael Brenndoerfer

History of Language AI Data, Analytics & AI Machine Learning

A comprehensive guide to Michael Lesk's groundbreaking 1983 algorithm for word sense disambiguation. Learn how dictionary-based context overlap revolutionized computational linguistics and influenced modern language AI from embeddings to transformers.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

1983: The Lesk AlgorithmLink Copied

In the early 1980s, natural language processing stood at a curious juncture. Researchers had successfully built systems that could parse sentences, extract syntactic structures, and even perform rudimentary semantic analysis. Yet these systems stumbled on a problem so fundamental, so seemingly simple, that it exposed a profound gap in computational understanding of language: words with multiple meanings. When a machine encountered the word "bank" in a sentence, how could it determine whether the text referred to a financial institution or the edge of a river? When "interest" appeared, was it curiosity or a percentage rate? This wasn't merely an academic curiosity—it was a practical barrier preventing computers from truly understanding human language.

Michael Lesk, working at Bell Laboratories in Murray Hill, New Jersey, recognized that while humans resolved these ambiguities effortlessly through context, computers lacked any systematic method for doing so. The field had dictionaries—machine-readable versions were beginning to emerge—but possessing definitions wasn't the same as knowing which definition applied. Lesk understood that the solution lay not in complex linguistic rules or hand-crafted disambiguation logic, but in a deceptively elegant idea: the words surrounding an ambiguous term would themselves provide clues, and those clues could be found by comparing context against dictionary definitions.

The algorithm Lesk proposed in his 1986 paper (though developed around 1983) would become one of the most influential approaches to word sense disambiguation, a problem that remains central to language AI four decades later. Its beauty lay in its simplicity—so simple that researchers could implement it in an afternoon, yet effective enough that variants of the algorithm are still used in modern systems. More importantly, the Lesk algorithm represented a philosophical shift in computational linguistics: rather than trying to encode human linguistic intuition into explicit rules, it leveraged the statistical properties of language itself, letting the data speak.

This approach foreshadowed the data-driven revolution that would transform language AI in the coming decades. While Lesk's method predated modern machine learning by years, it embodied a key insight that would prove prophetic: context carries information, and that information can be captured through surprisingly straightforward computational methods. The algorithm emerged at a time when WordNet was still being designed, when neural networks were out of favor, and when "corpus linguistics" meant analyzing thousands of sentences rather than billions. Yet its core principle—that meaning emerges from the company words keep—would echo through the history of language AI, from distributional semantics to word embeddings to modern transformer models.

The Ambiguity ProblemLink Copied

Every natural language teems with ambiguity, and polysemy—the phenomenon of single words carrying multiple distinct meanings—presents perhaps the most pervasive challenge for computational understanding. Consider the apparent simplicity of this sentence: "The bank can guarantee deposits will eventually cover all losses." To a human reader, the interpretation feels immediate and obvious, but that immediacy conceals a remarkable cognitive achievement. The word "bank" alone has more than a dozen distinct senses in most dictionaries, ranging from financial institutions to riverbanks, from turning aircraft to pool shots in billiards. Without conscious effort, readers select the appropriate meaning using contextual cues scattered throughout the sentence—"guarantee," "deposits," "losses"—words that collectively evoke the financial domain.

Computational systems of the early 1980s had no principled mechanism for performing this disambiguation. Rule-based approaches attempted to encode explicit disambiguation rules: if "bank" appears near "money," select the financial sense; if it appears near "river" or "stream," select the geographical sense. These hand-crafted rules proved brittle and labor-intensive, requiring linguists to anticipate every possible context for every ambiguous word. The approach didn't scale. English alone contains hundreds of thousands of polysemous words, and attempting to write exhaustive rules for each would require decades of expert effort. Moreover, context operates subtly—relevant clues might appear sentences away from the ambiguous word, or might emerge from the interaction of multiple contextual elements rather than any single "trigger word."

The problem became acute as researchers attempted to build practical systems. Machine translation systems mistranslated sentences because they selected wrong word senses. Information retrieval systems returned irrelevant documents because they couldn't distinguish between different meanings of search terms. Text-to-speech systems chose incorrect pronunciations for heteronyms—words spelled identically but pronounced differently depending on meaning, like "lead" (to guide) versus "lead" (the metal). Question-answering systems extracted nonsensical answers because they misunderstood ambiguous words in questions or documents. Each application exposed the same fundamental gap: without resolving lexical ambiguity, deeper language understanding remained elusive.

Previous attempts at automated disambiguation had focused primarily on syntactic or selectional restriction approaches. These methods used grammatical patterns and semantic constraints to filter impossible readings—for instance, knowing that verbs like "eat" require animate subjects and edible objects. While useful for certain types of ambiguity, these approaches failed for polysemy cases where multiple senses satisfied syntactic and basic semantic constraints. The sentence "The bank issued new bonds" works grammatically whether "bank" means riverbank or financial institution, even though only one makes semantic sense. What was needed wasn't a filter for impossible readings, but a positive method for selecting the most plausible sense given the full contextual evidence.

The Elegant SolutionLink Copied

Lesk's insight was to turn the dictionary itself into a disambiguation tool. Rather than viewing dictionary definitions merely as explanations for humans, he recognized them as semantic fingerprints—unique collections of words that characterized each sense. If you looked up "bank" in a dictionary, the financial institution sense would be defined using words like "money," "account," "deposit," "loan," and "credit," while the riverbank sense would use words like "river," "edge," "shore," "water," and "slope." These definition vocabularies were distinctive precisely because lexicographers had crafted them to distinguish meanings. Lesk realized that by comparing the words in these definitions against the words surrounding an ambiguous term in actual text, a system could measure which sense best "fit" the context.

The algorithm operates through a remarkably straightforward procedure. Given a sentence containing an ambiguous target word, the system first retrieves all dictionary definitions for that word. It then collects the words appearing in a fixed window around the target—perhaps five or ten words on either side. The core computation involves calculating overlaps: for each candidate sense of the target word, the algorithm counts how many words appear in both the sense's dictionary definition and the surrounding context. The sense with the maximum overlap wins. If the context window contains the words "deposits," "guarantee," and "losses," and the financial institution sense of "bank" has a definition mentioning "deposits" and "money," while the riverbank sense has no overlapping words, the financial sense receives a higher score.

This basic version, often called "Simplified Lesk," was just the starting point. Lesk's original formulation included a more sophisticated variant that considered not just the target word's definitions, but also the definitions of neighboring words in the context. This created a richer comparison space: the algorithm compared the definition of one sense of "bank" against the definitions of the senses of "guarantee," "deposits," and "losses." When multiple words in the context window had their own polysemy, this approach allowed the algorithm to jointly disambiguate them, selecting sense combinations that showed mutual definitional overlap. The intuition was compelling—related senses of different words should have related definitions, sharing vocabulary from the same semantic domain.

Let's walk through a concrete example to see the algorithm in action. Consider disambiguating "bank" in "I sat by the bank fishing." Using simplified Lesk with hypothetical definitions:

Bank (financial): "An institution for receiving, lending, and safeguarding money and facilitating financial transactions."
Bank (riverbank): "The sloping margin of a watercourse, such as a river or stream."

The context window contains: "sat," "by," "fishing." The financial definition has no overlapping words (overlap = 0). The riverbank definition shares no direct overlap either, but suppose the algorithm expands slightly—if "fishing" appears in definitions related to water activities, and those definitions share words like "river" with the riverbank definition, extended overlaps emerge. More realistically, in the full Lesk algorithm, we'd also look at definitions for "fishing," which would contain words like "catch," "fish," "water," "sport," "angling." The definition of "bank" (riverbank) contains "watercourse," "river," "stream"—semantic neighbors of "water" and "fish." While not identical words, the semantic clustering becomes apparent.

Overlap Scoring Variations

The original Lesk algorithm simply counted overlapping words, but numerous refinements have been proposed over the decades. Some variants weight overlaps by word frequency, giving less common words (which are more semantically distinctive) greater influence. Others use squared overlap counts, emphasizing definitions with multiple matching words over those with single matches. Still others expand beyond exact word matches to include morphological variants (fishing/fish/fisher) or semantic relations from resources like WordNet, treating synonyms and hypernyms as partial matches. These variations address a core limitation: dictionary definitions are often quite short, making exact lexical overlap sparse for anything but the most explicitly related contexts.

The algorithm's computational profile was remarkably modest, especially for 1983. Computing overlaps required only string matching and counting—operations any computer could perform efficiently. Unlike syntactic parsing, which required complex grammar rules and search through structural possibilities, or semantic analysis, which demanded expensive inference, Lesk's method scaled linearly with the number of candidate senses and context size. You could disambiguate a thousand-word document in seconds on 1980s hardware. This practicality mattered enormously for early adoption.

The Dictionary DependencyLink Copied

The algorithm's effectiveness hinged entirely on the quality and coverage of the dictionary being used. Lesk's original experiments relied on machine-readable dictionaries that were just becoming available in the early 1980s—digitized versions of printed reference works like the Oxford English Dictionary or Merriam-Webster. These resources varied dramatically in their suitability for computational use. Some provided rich, verbose definitions with many content words; others offered terse, minimal definitions that left little surface form for overlap detection. The granularity of sense distinctions mattered too: dictionaries with very fine-grained sense divisions (distinguishing, for instance, between "bank" as a building versus "bank" as an institution) made disambiguation harder by fragmenting definitional overlap across multiple similar senses.

The choice of dictionary fundamentally shaped what the algorithm could achieve. A financial dictionary would provide excellent disambiguation for domain-specific text but fail on general language. A children's dictionary with simplified definitions might lack the technical vocabulary needed for scientific texts but perform well on informal writing. No single dictionary served all purposes, yet creating domain-specific or application-specific dictionaries required substantial lexicographic effort. This tension between generality and specificity reflected a broader challenge in early NLP: resources designed for human readers often proved awkward for computational processing, while resources designed computationally (like early semantic networks) lacked the coverage and nuance of human-curated references.

Applications and Real-World ImpactLink Copied

Despite its simplicity—or perhaps because of it—the Lesk algorithm found its way into numerous practical systems throughout the 1980s and 1990s. Machine translation systems incorporated Lesk-based disambiguation as a preprocessing step, resolving ambiguities in the source language before attempting translation. This proved particularly valuable for languages with extensive polysemy or where ambiguous source words mapped to distinct translations. Information retrieval systems used the algorithm to disambiguate both query terms and document terms, improving precision by matching senses rather than word forms. A search for "Java" intending the programming language could be distinguished from "Java" meaning the island or the coffee, reducing irrelevant results.

Text understanding systems—programs designed to answer questions, extract information, or summarize documents—employed Lesk-based methods to build more accurate semantic representations. Knowing that a news article about banks referred to financial institutions rather than river geography allowed systems to correctly link entities, extract relationships, and categorize content. The algorithm even found applications in linguistic research itself: computational linguists used it to automatically sense-tag corpora, creating training data for supervised learning approaches that would emerge later. These annotated corpora, created partly through Lesk-based automatic tagging followed by manual correction, became invaluable resources for the field.

One particularly notable application emerged in the construction and evaluation of semantic networks. As resources like WordNet developed in the late 1980s—George Miller and colleagues at Princeton were building a lexical database that organized word senses into hierarchical semantic relationships—researchers needed methods to map naturally occurring text onto WordNet senses. The Lesk algorithm, adapted to use WordNet glosses (short definitions) and sense relations, became a standard baseline for this task. It provided a simple, reproducible method for automatic sense tagging, enabling experiments that required large-scale sense-disambiguated data without the expense of massive human annotation efforts.

The algorithm also served an important pedagogical role. Because it was so straightforward to implement and understand, it became a teaching tool in natural language processing courses. Students could code a working word sense disambiguation system in a single assignment, gaining hands-on experience with a real NLP task and a genuine research problem. This accessibility helped train a generation of NLP researchers, many of whom later developed more sophisticated disambiguation methods while retaining an appreciation for Lesk's elegant simplicity.

Limitations and ChallengesLink Copied

For all its elegance, the Lesk algorithm suffered from significant limitations that prevented it from achieving human-level disambiguation accuracy. The most fundamental problem was sparse overlap: in many cases, the words in the context and the words in the correct sense definition simply didn't match, even when the sense was clearly correct to human readers. Dictionary definitions are necessarily concise, typically containing twenty to fifty words for common senses. The context window around an ambiguous word, while larger, might not contain any of these specific definitional words even when the overall semantic domain matched. A text discussing banking might use "financial," "transaction," "lending," and "credit" in the surrounding context, but the dictionary definition of "bank" might instead use "institution," "money," "safeguard," and "currency"—semantically related concepts described with different vocabulary.

This lexical gap problem was exacerbated by the rigid surface-form matching the algorithm employed. Synonyms provided no benefit: if the definition said "vehicle" and the context said "car," no overlap registered. Morphological variants similarly failed to match: "fishing" in the context didn't match "fish" in the definition unless special preprocessing handled morphology. Deeper semantic relationships like hypernymy (cat/animal) or meronymy (wheel/car) went entirely unrecognized. The algorithm operated in a world of exact string equality, blind to the rich semantic structure of language.

Short contexts proved particularly problematic. Telegraphic text—headlines, queries, social media posts, technical lists—often lacked sufficient context words for reliable overlap computation. A two-word Google search query like "java performance" might contain too little context to reliably disambiguate "java," especially if the dictionary definitions themselves were brief. The algorithm would either produce ties (multiple senses with zero overlap) or unstable results (disambiguation based on a single coincidental word match).

The algorithm also struggled with senses that had substantive overlap in their definitions. Many polysemous words have related senses—"bank" as a building and "bank" as an institution, "chicken" as an animal and "chicken" as meat. These sense distinctions, while meaningful to humans, often involve definitions that share significant vocabulary. The financial institution sense might mention "building," "customers," and "services," while the physical building sense mentions "structure," "customers," and "services." The definitional overlap makes discrimination difficult, leading to systematic confusion between closely related senses.

Perhaps more subtly, the algorithm had no mechanism for handling pragmatic or figurative language. Metaphorical uses of words—"He's a real bank of information"—would be disambiguated based on surrounding literal vocabulary, often yielding nonsensical sense assignments. Idioms posed similar challenges: in "break the bank," the phrase means something distinct from either literal sense of "bank," but the algorithm had no way to recognize this. It operated under the assumption that every word usage corresponded to one of its dictionary senses, an assumption that natural language regularly violates.

The Knowledge Acquisition Bottleneck

A deeper issue lurked beneath the surface: the algorithm's dependency on high-quality, comprehensive dictionaries exposed what researchers called the "knowledge acquisition bottleneck." Every improvement to disambiguation performance seemed to require more detailed lexical resources—richer definitions, more example sentences, semantic relations between senses. Creating these resources demanded extensive human effort from trained lexicographers. Even with substantial investment, dictionaries reflected the judgments and coverage decisions of their creators, potentially embedding biases and gaps. This bottleneck would drive much subsequent research toward methods that could learn from data rather than relying on hand-crafted resources, ultimately leading to the statistical and neural approaches that dominate modern language AI.

Extensions and DescendantsLink Copied

The research community quickly recognized both the promise and limitations of Lesk's approach, spawning decades of refinements and extensions. Many of these variants attempted to address the sparse overlap problem by expanding the comparison space. The "Extended Lesk" algorithm, proposed by Banerjee and Pedersen in 2002, incorporated not just the target word's definitions but also the definitions of words related to each sense in a semantic network like WordNet. If one sense of "bank" was linked to "money" through semantic relations, the definition of "money" would be included in the comparison, effectively enriching the definitional footprint of each sense.

Other researchers adapted the algorithm to use corpus statistics. Instead of simple overlap counts, these variants weighted matching words by their inverse document frequency (IDF), emphasizing rare, semantically distinctive words over common ones. The word "institution" matching in both definition and context might receive higher weight than "the" or "of." Some approaches went further, using the entire corpus to build word association scores, then computing not just exact matches but semantic similarity between definition words and context words based on their corpus-derived associations. These hybrid methods began to blur the line between Lesk's knowledge-based approach and emerging statistical methods.

Another important extension addressed the context limitation by incorporating selectional preferences and syntactic information. These variants analyzed the syntactic role of the ambiguous word—was it a subject, object, or modifier?—and weighted overlaps from syntactically related words more heavily than distant words. For disambiguating a verb, the definitions of its subject and object would contribute more than random nearby words. This syntactically informed approach improved precision, especially for verb disambiguation, though it required adding a syntactic parser to the processing pipeline.

Perhaps the most significant descendant of Lesk's approach emerged with the development of embedding-based methods in the 2010s. These modern approaches replace exact word matching with semantic similarity computed in vector space. Rather than checking if "car" equals "vehicle," they compute the cosine similarity between embedding vectors for "car" and "vehicle," capturing semantic relatedness. The core Lesk intuition remains—comparing the semantic content of context against the semantic content of sense definitions—but the comparison operates in continuous vector space rather than discrete word space. Techniques like BERT-based word sense disambiguation essentially perform a learned, contextualized variant of the Lesk algorithm, using transformer representations to measure how well a context fits each candidate sense.

Legacy in Modern Language AILink Copied

The Lesk algorithm occupies an interesting place in the history of language AI: it's simultaneously obsolete and foundational. Few modern systems use the original algorithm; its performance has been surpassed by supervised machine learning approaches, particularly deep learning models trained on sense-annotated corpora. Modern word sense disambiguation systems achieve accuracy rates twenty to thirty percentage points higher than classic Lesk variants, leveraging contextual embeddings, attention mechanisms, and training on millions of labeled examples. In purely practical terms, Lesk has been superseded.

Yet its conceptual legacy permeates contemporary language AI. The central insight—that context provides evidence for meaning, and that evidence can be captured computationally through comparison—echoes through the field. Distributional semantics, famously summarized by Firth's dictum "you shall know a word by the company it keeps," formalizes and extends the Lesk intuition. Word embeddings like Word2Vec and GloVe learn representations where semantically similar words cluster together, essentially learning to recognize the kind of contextual overlap that Lesk computed explicitly. These embeddings enable semantic comparison without hand-crafted definitions, but they're doing computationally what Lesk did manually: using distributional evidence to disambiguate meaning.

The algorithm also foreshadowed the shift from rule-based to data-driven NLP. While Lesk himself still relied on a hand-crafted resource (the dictionary), his method avoided writing explicit disambiguation rules. The algorithm extracted patterns from data—the data being dictionary definitions and the text itself. This represented an intermediate step between classical rule-based AI and modern statistical learning. Researchers in the 1990s and 2000s built directly on this foundation, asking: if we can use the statistical overlap between definitions and contexts, why not use statistical patterns learned from large corpora? This question led to supervised learning approaches that trained classifiers on sense-annotated examples, learning disambiguation patterns directly from data rather than inferring them from definitions.

Modern transformer models like BERT and GPT can be understood as learning contextual representations that implicitly solve word sense disambiguation. When BERT generates a context-dependent embedding for "bank" in "I sat by the bank fishing," that embedding differs from the embedding of "bank" in "The bank approved my loan," capturing the sense distinction. The model hasn't explicitly chosen between dictionary senses; instead, it's learned representations where different contexts of the same word type push its embedding toward different regions of the semantic space—regions that align with human sense distinctions. This is Lesk's principle scaled to massive data and parameterized through neural networks: context shapes meaning, and that shaping can be captured computationally.

Lesk as a Baseline and Teaching Tool

Interestingly, the Lesk algorithm remains relevant in contemporary research not as a competitive system but as a baseline and diagnostic tool. When researchers develop new disambiguation methods, they often compare against Lesk variants to demonstrate improvement. More importantly, analyzing where Lesk succeeds and fails provides insights into the nature of disambiguation tasks. Cases where even advanced neural models struggle often align with cases where Lesk struggles—short contexts, rare senses, figurative language—suggesting these represent genuinely hard aspects of the problem rather than artifacts of any particular approach. In NLP education, Lesk continues to be taught as a first introduction to disambiguation, illustrating core concepts before students tackle more complex methods.

The Broader LessonsLink Copied

Looking back from the vantage point of modern language AI, the Lesk algorithm offers several lessons that extend beyond word sense disambiguation. First, it demonstrated that even simple algorithms, applied cleverly, could make progress on problems that seemed to require deep linguistic knowledge or sophisticated reasoning. The temptation in early AI was often to build complex systems with intricate rule structures and knowledge representations. Lesk showed that sometimes a simple metric—count the overlapping words—could yield useful results. This lesson would be repeatedly relearned: in part-of-speech tagging with hidden Markov models, in parsing with probabilistic context-free grammars, in translation with phrase-based statistical models.

Second, the algorithm highlighted the fundamental importance of context in language understanding. While this seems obvious in retrospect, many early NLP systems treated words as atomic symbols, ignoring or minimizing context. Lesk provided a concrete computational mechanism for incorporating context, making explicit how surrounding words shaped meaning. This focus on context would become central to virtually every subsequent advance in language AI, from n-gram language models to recurrent neural networks to the attention mechanisms in transformers.

Third, Lesk's work illustrated the value of leveraging existing linguistic resources in novel ways. Rather than building a disambiguation system from scratch, he recognized that dictionaries—created for an entirely different purpose—could be repurposed for computation. This resourcefulness characterized much early NLP research, as researchers mined treebanks, corpora, thesauri, and other linguistic data sources for computational applications their creators never envisioned. The approach anticipated the data-driven mindset that now dominates the field: find the data that captures the patterns you need, then extract those patterns computationally.

Finally, the algorithm's limitations proved as instructive as its successes. The sparse overlap problem, the difficulty with short contexts, the struggles with figurative language—these failures delineated the boundaries of what surface-form overlap could achieve. They pointed researchers toward richer semantic representations, toward statistical learning, toward neural methods that could capture semantic similarity rather than just lexical identity. Understanding why Lesk failed helped the field understand what word sense disambiguation really required: not just matching words, but understanding meaning. That deeper challenge continues to drive research today.

Conclusion: Elegance and InfluenceLink Copied

Michael Lesk's 1983 algorithm for word sense disambiguation exemplifies a particular kind of scientific contribution: elegant, intuitive, practical, and ultimately inspirational. It wasn't the final answer to disambiguation—no single approach has achieved that—but it provided a clear, implementable starting point that crystallized the problem and suggested paths forward. The algorithm's simplicity made it accessible, ensuring wide adoption and experimentation. Its limitations were instructive, motivating subsequent research. Its core insight—that contextual overlap provides evidence for sense selection—remains valid even as the methods for computing that overlap have evolved dramatically.

In the four decades since Lesk proposed his algorithm, word sense disambiguation has been transformed by the same forces that revolutionized all of language AI: large-scale datasets, statistical machine learning, neural networks, transfer learning, massive pre-training. Modern systems bear little surface resemblance to the original algorithm with its dictionary lookups and word counts. Yet the through-line persists: context determines meaning, and computational systems can leverage contextual evidence to disambiguate language. Lesk showed this was possible with dictionaries and overlap counts. Today's systems show it's possible at scale with billions of parameters and trillions of tokens. The tools have changed; the insight endures.

For researchers in 1983, accustomed to painstakingly crafted linguistic rules and hand-built knowledge bases, Lesk's approach offered something refreshing: a demonstration that simple, data-proximate methods could capture semantic phenomena. This pragmatic, algorithmic sensibility—identifying what can be measured, finding simple ways to measure it, and building systems that work rather than systems that perfectly mirror human cognition—helped shape the evolution of language AI. The field has grown vastly more sophisticated, but that sensibility remains valuable. Sometimes the most elegant solution is also the most direct: count what matches, and see what you can learn.

QuizLink Copied

Ready to test your understanding of the Lesk algorithm and word sense disambiguation? Challenge yourself with these questions covering the key concepts, limitations, and historical significance of this foundational approach to computational linguistics.

Loading component...

Comments

Back to History of Language AI

Previous Chapter

Chinese Room Argument (1980)

Next Chapter

Backpropagation (1986)

Reference

BIBTEXAcademic

@misc{leskalgorithmwordsensedisambiguationthebirthofcontextbasednlp, author = {Michael Brenndoerfer}, title = {Lesk Algorithm: Word Sense Disambiguation & the Birth of Context-Based NLP}, year = {2025}, url = {https://mbrenndoerfer.com/writing/lesk-algorithm-word-sense-disambiguation-nlp-history}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-01-01} }

APAAcademic

Michael Brenndoerfer (2025). Lesk Algorithm: Word Sense Disambiguation & the Birth of Context-Based NLP. Retrieved from https://mbrenndoerfer.com/writing/lesk-algorithm-word-sense-disambiguation-nlp-history

MLAAcademic

Michael Brenndoerfer. "Lesk Algorithm: Word Sense Disambiguation & the Birth of Context-Based NLP." 2026. Web. today. <https://mbrenndoerfer.com/writing/lesk-algorithm-word-sense-disambiguation-nlp-history>.

CHICAGOAcademic

Michael Brenndoerfer. "Lesk Algorithm: Word Sense Disambiguation & the Birth of Context-Based NLP." Accessed today. https://mbrenndoerfer.com/writing/lesk-algorithm-word-sense-disambiguation-nlp-history.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Lesk Algorithm: Word Sense Disambiguation & the Birth of Context-Based NLP'. Available at: https://mbrenndoerfer.com/writing/lesk-algorithm-word-sense-disambiguation-nlp-history (Accessed: today).

SimpleBasic

Michael Brenndoerfer (2025). Lesk Algorithm: Word Sense Disambiguation & the Birth of Context-Based NLP. https://mbrenndoerfer.com/writing/lesk-algorithm-word-sense-disambiguation-nlp-history

Direct link:

https://mbrenndoerfer.com/writing/lesk-algorithm-word-sense-disambiguation-nlp-history

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, leading AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications Contact Books

Lesk Algorithm: Word Sense Disambiguation & the Birth of Context-Based NLP

1983: The Lesk AlgorithmLink Copied

The Ambiguity ProblemLink Copied

The Elegant SolutionLink Copied

The Dictionary DependencyLink Copied

Applications and Real-World ImpactLink Copied

Limitations and ChallengesLink Copied

Extensions and DescendantsLink Copied

Legacy in Modern Language AILink Copied

The Broader LessonsLink Copied

Conclusion: Elegance and InfluenceLink Copied

QuizLink Copied

Comments

Reference

About the author: Michael Brenndoerfer

Related Content

Statistical Parsers: From Rules to Probabilities - Revolution in Natural Language Parsing

Maximum Entropy & Support Vector Machines in NLP: Feature-Based Discriminative Learning

FrameNet - A Computational Resource for Frame Semantics

Stay updated

Comments

About the author: Michael Brenndoerfer

Related Content

Statistical Parsers: From Rules to Probabilities - Revolution in Natural Language Parsing

Maximum Entropy & Support Vector Machines in NLP: Feature-Based Discriminative Learning

FrameNet - A Computational Resource for Frame Semantics

Stay updated