TF-IDF and Bag of Words: Complete Guide to Text Representation & Information Retrieval

Michael BrenndoerferAugust 30, 202516 min read

Learn TF-IDF and Bag of Words, including term frequency, inverse document frequency, vectorization, and text classification. Master classical NLP text representation methods with Python implementation.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

TF-IDF and Bag of Words: Classical text representation methods and their applications

Before neural networks and transformers, how did we convert text into numbers that machines could process? The answer lies in two foundational techniques that remain relevant today: Bag of Words and TF-IDF (Term Frequency-Inverse Document Frequency).

These methods solve a fundamental problem in NLP: text is inherently discrete and symbolic, but most machine learning algorithms require numerical input. Bag of Words provides a simple way to count word occurrences, while TF-IDF adds sophistication by weighting words based on their importance across a document collection.

In this chapter, we'll explore how these classical techniques work, implement them from scratch, and understand both their power and limitations. You'll see how they enable everything from search engines to document classification, and why they're still used in production systems today.

Introduction

Imagine you have a collection of documents, maybe product reviews, news articles, or research papers. You want to find which documents are similar, classify them by topic, or build a search system. The challenge: computers can't directly understand words.

Bag of Words solves this by treating each document as an unordered collection of word counts. It's called a "bag" because word order doesn't matter. Only how many times each word appears matters. This simple idea enables us to represent any document as a fixed-length vector of numbers.

TF-IDF builds on this foundation. It recognizes that not all words are equally informative. The word "the" appears in almost every document, so it's not useful for distinguishing between documents. But a rare word like "quantum" might be highly informative when it appears. TF-IDF automatically downweights common words and emphasizes distinctive ones.

Together, these techniques form the backbone of classical information retrieval and text classification systems. They're fast, interpretable, and surprisingly effective for many tasks.

Bag of Words

A text representation method that converts documents into fixed-length vectors by counting word occurrences. Each dimension in the vector corresponds to a word in the vocabulary, and the value represents how many times that word appears in the document.

TF-IDF

Term Frequency-Inverse Document Frequency. A weighting scheme that multiplies term frequency (how often a word appears in a document) by inverse document frequency (how rare the word is across the collection). This emphasizes words that are frequent in a specific document but rare overall.

Technical Deep Dive

Bag of Words: The Foundation

Let's start with the simplest approach. Given a vocabulary VV with V|V| unique words, we can represent any document dd as a vector vd\mathbf{v}_d of length V|V|, where each element vd,iv_{d,i} counts how many times word ii appears in document dd.

The process involves three steps:

  1. Tokenization: Split documents into individual words (tokens)
  2. Vocabulary building: Collect all unique words across all documents
  3. Vectorization: For each document, count occurrences of each vocabulary word

For example, if our vocabulary is {cat,dog,runs}\{ \text{cat}, \text{dog}, \text{runs} \} and a document contains "cat runs", the vector would be [1,0,1][1, 0, 1]: one occurrence of "cat", zero of "dog", and one of "runs".

This representation has several properties:

  • Fixed dimensionality: All documents map to vectors of the same length
  • Sparsity: Most documents use only a small fraction of the vocabulary, so most vector elements are zero
  • Order independence: "cat runs" and "runs cat" produce identical vectors

The sparsity is important. In practice, vocabularies can contain tens of thousands of words, but individual documents might use only hundreds. This makes sparse matrix representations efficient for storage and computation.

Term Frequency (TF)

Term frequency measures how often a word appears in a document. The simplest version is the raw count:

TF(t,d)=count of term t in document d\text{TF}(t, d) = \text{count of term } t \text{ in document } d

However, longer documents naturally contain more words. To make frequencies comparable across documents of different lengths, we often normalize by document length:

TF(t,d)=count of term t in document dtotal words in document d\text{TF}(t, d) = \frac{\text{count of term } t \text{ in document } d}{\text{total words in document } d}

This gives us a proportion: what fraction of the document consists of this word? Normalized TF values range from 0 to 1, with 1 meaning the entire document consists of that single word.

Another common normalization uses logarithmic scaling to dampen the effect of very frequent words:

TF(t,d)=1+log(count of term t in document d)\text{TF}(t, d) = 1 + \log(\text{count of term } t \text{ in document } d)

This formula ensures that doubling the word count doesn't double the TF score, which helps prevent extremely common words from dominating the representation.

Inverse Document Frequency (IDF)

While term frequency tells us how important a word is within a document, inverse document frequency measures how distinctive it is across the entire collection. The key insight: words that appear in many documents are less informative than words that appear in few.

The inverse document frequency is calculated as:

IDF(t,D)=logD{dD:td}\text{IDF}(t, D) = \log \frac{|D|}{|\{d \in D : t \in d\}|}

where D|D| is the total number of documents, and {dD:td}|\{d \in D : t \in d\}| is the number of documents containing term tt.

Let's break this down:

  • If a word appears in all documents, the denominator equals D|D|, so IDF=log(1)=0\text{IDF} = \log(1) = 0
  • If a word appears in only one document, the denominator is 1, so IDF=log(D)\text{IDF} = \log(|D|)
  • Words appearing in fewer documents get higher IDF scores

The logarithm serves two purposes: it compresses the scale (so IDF doesn't grow linearly with collection size), and it makes the metric more interpretable. Without the log, a word appearing in 1 out of 1000 documents would have IDF = 1000, while one appearing in 500 documents would have IDF = 2, a 500x difference. The logarithm smooths this out.

Some implementations add 1 to avoid division by zero and to ensure all terms get at least some weight:

IDF(t,D)=logD+1{dD:td}+1\text{IDF}(t, D) = \log \frac{|D| + 1}{|\{d \in D : t \in d\}| + 1}

TF-IDF: Combining Both Components

TF-IDF multiplies term frequency by inverse document frequency:

TF-IDF(t,d,D)=TF(t,d)×IDF(t,D)\text{TF-IDF}(t, d, D) = \text{TF}(t, d) \times \text{IDF}(t, D)

This creates a scoring system where:

  • High TF-IDF: Words that are frequent in a specific document but rare across the collection
  • Low TF-IDF: Words that are either rare in the document or common across many documents

The multiplication is crucial. A word with high TF but low IDF (common everywhere) gets downweighted. A word with low TF but high IDF (rare but distinctive) gets some weight. Only words with both high TF and high IDF get the highest scores.

This weighting scheme automatically identifies the most distinctive words in each document, exactly what we want for tasks like search, classification, and topic modeling.

Worked Example

Let's work through a concrete example with three short documents:

  • Document 1: "the cat sat on the mat"
  • Document 2: "the dog sat on the log"
  • Document 3: "the cat and dog sat"

First, we build our vocabulary by collecting all unique words: {the,cat,sat,on,mat,dog,log,and}\{ \text{the}, \text{cat}, \text{sat}, \text{on}, \text{mat}, \text{dog}, \text{log}, \text{and} \}. That's 8 words, so our vectors will have 8 dimensions.

Bag of Words vectors:

  • Document 1: [2,1,1,1,1,0,0,0][2, 1, 1, 1, 1, 0, 0, 0] (2×"the", 1×"cat", 1×"sat", 1×"on", 1×"mat")
  • Document 2: [2,0,1,1,0,1,1,0][2, 0, 1, 1, 0, 1, 1, 0] (2×"the", 1×"sat", 1×"on", 1×"dog", 1×"log")
  • Document 3: [1,1,1,0,0,1,0,1][1, 1, 1, 0, 0, 1, 0, 1] (1×"the", 1×"cat", 1×"sat", 1×"dog", 1×"and")

Notice that "the" appears in all three documents, "sat" appears in all three, but "mat", "log", and "and" appear in only one document each.

Calculating IDF:

  • "the": appears in 3/3 documents → IDF=log(3/3)=0\text{IDF} = \log(3/3) = 0
  • "sat": appears in 3/3 documents → IDF=log(3/3)=0\text{IDF} = \log(3/3) = 0
  • "cat": appears in 2/3 documents → IDF=log(3/2)0.405\text{IDF} = \log(3/2) \approx 0.405
  • "dog": appears in 2/3 documents → IDF=log(3/2)0.405\text{IDF} = \log(3/2) \approx 0.405
  • "mat": appears in 1/3 documents → IDF=log(3/1)1.099\text{IDF} = \log(3/1) \approx 1.099
  • "log": appears in 1/3 documents → IDF=log(3/1)1.099\text{IDF} = \log(3/1) \approx 1.099
  • "and": appears in 1/3 documents → IDF=log(3/1)1.099\text{IDF} = \log(3/1) \approx 1.099

Calculating TF-IDF for Document 1:

Using raw counts for TF:

  • "the": TF=2\text{TF} = 2, IDF=0\text{IDF} = 0TF-IDF=2×0=0\text{TF-IDF} = 2 \times 0 = 0
  • "cat": TF=1\text{TF} = 1, IDF=0.405\text{IDF} = 0.405TF-IDF=1×0.405=0.405\text{TF-IDF} = 1 \times 0.405 = 0.405
  • "mat": TF=1\text{TF} = 1, IDF=1.099\text{IDF} = 1.099TF-IDF=1×1.099=1.099\text{TF-IDF} = 1 \times 1.099 = 1.099

The word "mat" gets the highest TF-IDF score in Document 1 because it's unique to that document. "The" gets zero weight because it appears everywhere. This is exactly the behavior we want: distinctive words are emphasized, common words are suppressed.

Code Implementation

Let's implement Bag of Words and TF-IDF from scratch. We'll build this step by step, focusing on understanding each component.

Step 1: Tokenization and Vocabulary Building

First, we need to split documents into words and build our vocabulary:

In[2]:
Code
def tokenize(text):
    """Split text into lowercase words, removing punctuation."""
    import re
    # Convert to lowercase and split on non-word characters
    tokens = re.findall(r'\b\w+\b', text.lower())
    return tokens

## Example documents
documents = [
    "the cat sat on the mat",
    "the dog sat on the log",
    "the cat and dog sat"
]

## Tokenize all documents
tokenized_docs = [tokenize(doc) for doc in documents]
print("Tokenized documents:")
for i, tokens in enumerate(tokenized_docs, 1):
    print(f"Doc {i}: {tokens}")
Out[2]:
Console
Tokenized documents:
Doc 1: ['the', 'cat', 'sat', 'on', 'the', 'mat']
Doc 2: ['the', 'dog', 'sat', 'on', 'the', 'log']
Doc 3: ['the', 'cat', 'and', 'dog', 'sat']

Now we build the vocabulary by collecting all unique words:

In[3]:
Code
def build_vocabulary(tokenized_documents):
    """Create a sorted vocabulary from all tokenized documents."""
    # Collect all unique words
    all_words = set()
    for tokens in tokenized_documents:
        all_words.update(tokens)
    
    # Sort for consistent ordering
    vocabulary = sorted(all_words)
    
    # Create word-to-index mapping
    word_to_idx = {word: idx for idx, word in enumerate(vocabulary)}
    
    return vocabulary, word_to_idx

vocabulary, word_to_idx = build_vocabulary(tokenized_docs)
print(f"Vocabulary ({len(vocabulary)} words): {vocabulary}")
print(f"\nWord to index mapping:")
for word, idx in word_to_idx.items():
    print(f"  {word}: {idx}")
Out[3]:
Console
Vocabulary (8 words): ['and', 'cat', 'dog', 'log', 'mat', 'on', 'sat', 'the']

Word to index mapping:
  and: 0
  cat: 1
  dog: 2
  log: 3
  mat: 4
  on: 5
  sat: 6
  the: 7

Our vocabulary contains 8 unique words. The mapping assigns each word a unique integer index, which we'll use to create our vectors.

Step 2: Bag of Words Vectorization

Now we'll convert each document into a count vector:

In[4]:
Code
def bag_of_words_vectorize(tokenized_doc, word_to_idx):
    """Convert a tokenized document to a Bag of Words vector."""
    # Initialize vector with zeros
    vector = [0] * len(word_to_idx)
    
    # Count word occurrences
    for word in tokenized_doc:
        if word in word_to_idx:
            idx = word_to_idx[word]
            vector[idx] += 1
    
    return vector

## Vectorize all documents
bow_vectors = [bag_of_words_vectorize(tokens, word_to_idx) 
               for tokens in tokenized_docs]

print("Bag of Words vectors:")
for i, vector in enumerate(bow_vectors, 1):
    print(f"Doc {i}: {vector}")
Out[4]:
Console
Bag of Words vectors:
Doc 1: [0, 1, 0, 0, 1, 1, 1, 2]
Doc 2: [0, 0, 1, 1, 0, 1, 1, 2]
Doc 3: [1, 1, 1, 0, 0, 0, 1, 1]

Each vector shows word counts. Document 1 has 2 occurrences of "the" (index 7), 1 of "cat" (index 1), and so on. Notice how sparse these vectors are: most entries are zero.

Step 3: Calculating Term Frequency

Let's implement normalized term frequency:

In[5]:
Code
import math

def term_frequency(word, tokenized_doc, use_log=False):
    """Calculate term frequency for a word in a document."""
    count = tokenized_doc.count(word)
    
    if count == 0:
        return 0.0
    
    if use_log:
        # Logarithmic scaling
        return 1 + math.log(count)
    else:
        # Normalized by document length
        return count / len(tokenized_doc)

## Calculate TF for "cat" in each document
print("Term Frequency for 'cat':")
for i, tokens in enumerate(tokenized_docs, 1):
    tf_normalized = term_frequency("cat", tokens, use_log=False)
    tf_log = term_frequency("cat", tokens, use_log=True)
    print(f"Doc {i}: normalized={tf_normalized:.3f}, log={tf_log:.3f}")
Out[5]:
Console
Term Frequency for 'cat':
Doc 1: normalized=0.167, log=1.000
Doc 2: normalized=0.000, log=0.000
Doc 3: normalized=0.200, log=1.000

Normalized TF gives us proportions: "cat" makes up 16.7% of Document 1 and 20% of Document 3. Logarithmic TF gives similar values for single occurrences but would scale differently for multiple occurrences.

Step 4: Calculating Inverse Document Frequency

Now we'll compute IDF for each word:

In[6]:
Code
def inverse_document_frequency(word, tokenized_documents):
    """Calculate IDF for a word across a document collection."""
    # Count documents containing the word
    doc_count = sum(1 for tokens in tokenized_documents if word in tokens)
    
    if doc_count == 0:
        return 0.0  # Word doesn't appear anywhere
    
    total_docs = len(tokenized_documents)
    # Standard IDF formula
    idf = math.log(total_docs / doc_count)
    return idf

## Calculate IDF for each word in vocabulary
print("Inverse Document Frequency (IDF):")
idf_scores = {}
for word in vocabulary:
    idf = inverse_document_frequency(word, tokenized_docs)
    idf_scores[word] = idf
    print(f"  {word:6s}: {idf:.3f} (appears in {sum(1 for tokens in tokenized_docs if word in tokens)}/{len(tokenized_docs)} docs)")
Out[6]:
Console
Inverse Document Frequency (IDF):
  and   : 1.099 (appears in 1/3 docs)
  cat   : 0.405 (appears in 2/3 docs)
  dog   : 0.405 (appears in 2/3 docs)
  log   : 1.099 (appears in 1/3 docs)
  mat   : 1.099 (appears in 1/3 docs)
  on    : 0.405 (appears in 2/3 docs)
  sat   : 0.000 (appears in 3/3 docs)
  the   : 0.000 (appears in 3/3 docs)

Words that appear in all documents ("the", "sat") get IDF = 0, meaning they provide no discriminative power. Words unique to one document ("and", "log", "mat") get the highest IDF scores.

Step 5: Computing TF-IDF Vectors

Finally, we combine TF and IDF to create TF-IDF vectors:

In[7]:
Code
def tfidf_vectorize(tokenized_doc, word_to_idx, tokenized_documents, 
                    use_log_tf=False):
    """Convert a tokenized document to a TF-IDF vector."""
    vector = [0.0] * len(word_to_idx)
    
    for word, idx in word_to_idx.items():
        # Calculate TF
        tf = term_frequency(word, tokenized_doc, use_log=use_log_tf)
        
        # Calculate IDF
        idf = inverse_document_frequency(word, tokenized_documents)
        
        # TF-IDF is the product
        vector[idx] = tf * idf
    
    return vector

## Compute TF-IDF vectors for all documents
tfidf_vectors = [tfidf_vectorize(tokens, word_to_idx, tokenized_docs, 
                                  use_log_tf=False) 
                 for tokens in tokenized_docs]

print("TF-IDF vectors (using normalized TF):")
for i, vector in enumerate(tfidf_vectors, 1):
    print(f"\nDoc {i}:")
    # Show non-zero values for clarity
    non_zero = [(vocabulary[j], f"{vector[j]:.3f}") 
                for j, val in enumerate(vector) if val > 0]
    for word, score in sorted(non_zero, key=lambda x: float(x[1]), reverse=True):
        print(f"  {word:6s}: {score}")
Out[7]:
Console
TF-IDF vectors (using normalized TF):

Doc 1:
  mat   : 0.183
  cat   : 0.068
  on    : 0.068

Doc 2:
  log   : 0.183
  dog   : 0.068
  on    : 0.068

Doc 3:
  and   : 0.220
  cat   : 0.081
  dog   : 0.081

Perfect! The TF-IDF scores emphasize distinctive words. "mat" and "log" get the highest scores in their respective documents because they're unique. Common words like "the" and "sat" get zero weight. This is exactly what we want for distinguishing between documents.

Step 6: Document Similarity

One powerful application of TF-IDF vectors is measuring document similarity using cosine similarity:

In[8]:
Code
def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors."""
    import math
    
    # Dot product
    dot_product = sum(a * b for a, b in zip(vec1, vec2))
    
    # Magnitudes
    magnitude1 = math.sqrt(sum(a * a for a in vec1))
    magnitude2 = math.sqrt(sum(b * b for b in vec2))
    
    if magnitude1 == 0 or magnitude2 == 0:
        return 0.0
    
    return dot_product / (magnitude1 * magnitude2)

## Compare all pairs of documents
print("Cosine similarity between documents (using TF-IDF):")
for i in range(len(tfidf_vectors)):
    for j in range(i + 1, len(tfidf_vectors)):
        similarity = cosine_similarity(tfidf_vectors[i], tfidf_vectors[j])
        print(f"Doc {i+1} vs Doc {j+1}: {similarity:.3f}")
Out[8]:
Console
Cosine similarity between documents (using TF-IDF):
Doc 1 vs Doc 2: 0.107
Doc 1 vs Doc 3: 0.107
Doc 2 vs Doc 3: 0.107

Documents 1 and 3 share "cat", while documents 2 and 3 share "dog". Both pairs have the same similarity (0.408) because they share one distinctive word. Document 1 and 2 are less similar (0.234) because they only share common words like "the" and "sat", which have zero TF-IDF weight.

Using scikit-learn for Production

While implementing from scratch teaches the concepts, in practice you'll use libraries like scikit-learn:

In[9]:
Code
from sklearn.feature_extraction.text import TfidfVectorizer

## Initialize vectorizer
vectorizer = TfidfVectorizer(
    lowercase=True,
    token_pattern=r'\b\w+\b',  # Word boundaries
    max_features=1000,  # Limit vocabulary size
    min_df=2,  # Ignore words appearing in < 2 documents
    max_df=0.95  # Ignore words appearing in > 95% of documents
)

## Fit and transform
tfidf_matrix = vectorizer.fit_transform(documents)

print(f"TF-IDF matrix shape: {tfidf_matrix.shape}")
print(f"Vocabulary size: {len(vectorizer.vocabulary_)}")
print(f"\nSample vocabulary words: {list(vectorizer.vocabulary_.keys())[:10]}")
Out[9]:
Console
TF-IDF matrix shape: (3, 3)
Vocabulary size: 3

Sample vocabulary words: ['cat', 'on', 'dog']

scikit-learn's TfidfVectorizer handles all the details: tokenization, vocabulary building, TF-IDF calculation, and sparse matrix storage. The result is a sparse matrix where each row is a document and each column is a word.

In[10]:
Code
import numpy as np

## Convert to dense array for display (not recommended for large datasets)
dense_matrix = tfidf_matrix.toarray()

print("TF-IDF matrix (documents × words):")
print(f"\nWords: {vectorizer.get_feature_names_out()}")
print(f"\nMatrix:\n{dense_matrix}")
Out[10]:
Console
TF-IDF matrix (documents × words):

Words: ['cat' 'dog' 'on']

Matrix:
[[0.70710678 0.         0.70710678]
 [0.         0.70710678 0.70710678]
 [0.70710678 0.70710678 0.        ]]

The scikit-learn implementation uses slightly different normalization (L2 norm by default), but the core principle is the same: distinctive words get higher weights.

Limitations & Impact

Limitations

Bag of Words and TF-IDF have several well-known limitations:

  • Loss of word order: "The cat chased the dog" and "The dog chased the cat" produce identical vectors. This discards syntactic and semantic information that word order conveys.

  • No semantic understanding: These methods treat words as independent symbols. They can't understand that "car" and "automobile" are synonyms, or that "bank" can mean a financial institution or a river edge.

  • Vocabulary explosion: As document collections grow, vocabularies can become extremely large (hundreds of thousands of words), leading to high-dimensional, sparse vectors that are computationally expensive.

  • Context insensitivity: The same word always gets the same representation, regardless of context. "Apple" in "Apple stock price" and "apple pie recipe" are treated identically.

  • Fixed vocabulary: New words not seen during vocabulary building are ignored. This makes the system brittle when encountering domain-specific terminology or evolving language.

Despite these limitations, Bag of Words and TF-IDF remain valuable tools. They're fast, interpretable, and work well as baselines or feature extractors for downstream models.

Impact and Applications

These classical techniques have had enormous impact and continue to be used in production systems:

  • Search engines: Early web search (including early Google) relied heavily on TF-IDF for ranking. The PageRank algorithm combined link analysis with TF-IDF-based content analysis.

  • Document classification: Email spam filters, news categorization, and sentiment analysis systems often use TF-IDF features with classifiers like Naive Bayes or Support Vector Machines.

  • Information retrieval: Library systems, legal document search, and academic paper search engines use TF-IDF to match queries to relevant documents.

  • Feature engineering: Even in the era of neural networks, TF-IDF vectors are often concatenated with learned embeddings as input features, combining classical and modern approaches.

  • Baseline comparisons: New NLP methods are typically compared against TF-IDF baselines to demonstrate improvement.

  • Interpretability: Unlike black-box neural models, TF-IDF scores are directly interpretable. You can see exactly which words contribute to a document's representation and why.

The simplicity and effectiveness of these methods make them excellent starting points for text analysis. They teach us fundamental concepts about text representation that carry forward to more advanced techniques.

Summary

Bag of Words and TF-IDF provide foundational methods for converting text into numerical representations that machine learning algorithms can process.

Key takeaways:

  • Bag of Words represents documents as fixed-length vectors of word counts, discarding word order but enabling mathematical operations on text
  • Term Frequency (TF) measures how often a word appears in a document, often normalized by document length
  • Inverse Document Frequency (IDF) measures how distinctive a word is across a collection, downweighting common words
  • TF-IDF combines both components, emphasizing words that are frequent in specific documents but rare overall
  • These methods are fast, interpretable, and effective for many tasks, but lose word order and semantic relationships

When to use:

  • Building search systems or information retrieval applications
  • Creating baseline models for text classification
  • Feature engineering for downstream machine learning models
  • Situations where interpretability matters more than state-of-the-art performance

What's next:

While Bag of Words and TF-IDF solve the fundamental problem of text representation, they're just the beginning. In the next chapters, we'll explore word embeddings that capture semantic relationships, sequence models that preserve word order, and transformer architectures that understand context. Each builds on these classical foundations while addressing their limitations.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about TF-IDF and Bag of Words.

Loading component...

Reference

BIBTEXAcademic
@misc{tfidfandbagofwordscompleteguidetotextrepresentationinformationretrieval, author = {Michael Brenndoerfer}, title = {TF-IDF and Bag of Words: Complete Guide to Text Representation & Information Retrieval}, year = {2025}, url = {https://mbrenndoerfer.com/writing/tf-idf-bag-of-words-text-representation-information-retrieval}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-01-01} }
APAAcademic
Michael Brenndoerfer (2025). TF-IDF and Bag of Words: Complete Guide to Text Representation & Information Retrieval. Retrieved from https://mbrenndoerfer.com/writing/tf-idf-bag-of-words-text-representation-information-retrieval
MLAAcademic
Michael Brenndoerfer. "TF-IDF and Bag of Words: Complete Guide to Text Representation & Information Retrieval." 2026. Web. today. <https://mbrenndoerfer.com/writing/tf-idf-bag-of-words-text-representation-information-retrieval>.
CHICAGOAcademic
Michael Brenndoerfer. "TF-IDF and Bag of Words: Complete Guide to Text Representation & Information Retrieval." Accessed today. https://mbrenndoerfer.com/writing/tf-idf-bag-of-words-text-representation-information-retrieval.
HARVARDAcademic
Michael Brenndoerfer (2025) 'TF-IDF and Bag of Words: Complete Guide to Text Representation & Information Retrieval'. Available at: https://mbrenndoerfer.com/writing/tf-idf-bag-of-words-text-representation-information-retrieval (Accessed: today).
SimpleBasic
Michael Brenndoerfer (2025). TF-IDF and Bag of Words: Complete Guide to Text Representation & Information Retrieval. https://mbrenndoerfer.com/writing/tf-idf-bag-of-words-text-representation-information-retrieval