Using a Language Model in Code: Complete Guide to API Integration & Implementation

Michael Brenndoerfer

AI Agent Handbook Machine Learning Software Engineering

Learn how to call language models from Python code, including GPT-5, Claude Sonnet 4.5, and Gemini 2.5. Master API integration, error handling, and building reusable functions for AI agents.

Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

Using a Language Model in Code

In the previous section, we explored how language models work conceptually. They're prediction engines trained on massive amounts of text, learning patterns that let them generate coherent, contextually appropriate responses. Now it's time to move from theory to practice: actually calling a language model from Python code.

This is where your personal assistant starts to come alive. Instead of just understanding what a language model does, you'll see how to use one to build something real.

The Basic Pattern

Every interaction with a language model follows the same fundamental pattern, regardless of which provider you use:

Set up a connection to the model (usually via an API)
Send a prompt (your instruction or question)
Receive a response (the model's generated text)
Use the response (display it, process it, or feed it into another step)

That's it. The complexity comes in how you craft your prompts and what you do with the responses, but the basic mechanics are straightforward.

Let's see this in action.

Your First Model Call

We'll start with the simplest possible example: asking a language model a question and printing the answer.

Example (GPT-5)

1import os
2from openai import OpenAI
3
4## Initialize the client
5client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
6
7## Ask a question
8## Using GPT-5 for its standardized, reliable responses
9response = client.chat.completions.create(
10    model="gpt-5",
11    messages=[
12        {"role": "user", "content": "What is 2 + 2?"}
13    ]
14)
15
16## Print the answer
17print(response.choices[0].message.content)
18## Output: "2 + 2 equals 4."

1import os
2from openai import OpenAI
3
4## Initialize the client
5client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
6
7## Ask a question
8## Using GPT-5 for its standardized, reliable responses
9response = client.chat.completions.create(
10    model="gpt-5",
11    messages=[
12        {"role": "user", "content": "What is 2 + 2?"}
13    ]
14)
15
16## Print the answer
17print(response.choices[0].message.content)
18## Output: "2 + 2 equals 4."

Let's unpack what's happening here:

API Key: The OPENAI_API_KEY is your authentication credential. You get this from OpenAI's website after creating an account. Store it as an environment variable (never hard-code it in your source code, as that's a security risk).

Client initialization: The OpenAI() client is your connection to the model. You create it once and reuse it for multiple requests.

Model selection: "gpt-5" specifies which language model to use. Different models have different capabilities, speeds, and costs. GPT-5 is OpenAI's latest model as of 2025, offering improved reliability and standardized responses.

Messages format: The messages parameter is a list of conversation turns. Each message has a role (who's speaking) and content (what they said). Right now we're only sending one user message, but we'll expand this shortly.

Response structure: The model returns a complex object, but what we care about is response.choices[0].message.content, which contains the actual text the model generated.

Adding Context with System Messages

The example above works, but it's missing something important: we haven't told the model who it is or how it should behave. That's where system messages come in.

Example (GPT-5)

1## Using GPT-5 for basic text generation
2response = client.chat.completions.create(
3    model="gpt-5",
4    messages=[
5        {"role": "system", "content": "You are a helpful personal assistant. Be concise and friendly."},
6        {"role": "user", "content": "What is 2 + 2?"}
7    ]
8)
9
10print(response.choices[0].message.content)
11## Output: "It's 4! 😊"

1## Using GPT-5 for basic text generation
2response = client.chat.completions.create(
3    model="gpt-5",
4    messages=[
5        {"role": "system", "content": "You are a helpful personal assistant. Be concise and friendly."},
6        {"role": "user", "content": "What is 2 + 2?"}
7    ]
8)
9
10print(response.choices[0].message.content)
11## Output: "It's 4! 😊"

The system message sets the tone and behavior. It's like giving the model a job description before asking it to work. Without it, the model will still respond, but you have less control over how it responds.

Think of it this way: if you walked up to someone and said "What is 2 + 2?", they might give you a straightforward answer, or they might be confused about why you're asking such a simple question, or they might launch into a lecture about arithmetic. The system message is like introducing yourself first: "Hi, I'm your personal assistant, and I'm here to help you with quick, friendly answers."

For intermediate readers: System messages are powerful but not foolproof. The model can still deviate from them, especially if user messages strongly contradict the system instructions. In practice, you'll often need to reinforce important behaviors through examples (few-shot prompting) or by structuring your prompts carefully. System messages work best for setting general tone and role, less well for enforcing strict rules.

Building a Reusable Function

Hard-coding each API call gets tedious quickly. Let's wrap this in a function that we can reuse throughout our assistant:

1def ask_assistant(user_message, system_prompt="You are a helpful personal assistant."):
2    """
3    Send a message to the language model and get a response.
4    
5    Args:
6        user_message: The user's question or instruction
7        system_prompt: Instructions for how the assistant should behave
8    
9    Returns:
10        The model's response as a string
11    """
12    # Using GPT-5 for general-purpose text generation
13    response = client.chat.completions.create(
14        model="gpt-5",
15        messages=[
16            {"role": "system", "content": system_prompt},
17            {"role": "user", "content": user_message}
18        ]
19    )
20    return response.choices[0].message.content
21
22## Now we can use it easily
23answer = ask_assistant("What's the capital of France?")
24print(answer)
25## Output: "The capital of France is Paris."
26
27## Or with a custom system prompt
28answer = ask_assistant(
29    "What's the capital of France?",
30    system_prompt="You are a geography teacher. Explain your answers."
31)
32print(answer)
33## Output: "The capital of France is Paris. It's located in the north-central 
34## part of the country and has been the capital since the 12th century..."

1def ask_assistant(user_message, system_prompt="You are a helpful personal assistant."):
2    """
3    Send a message to the language model and get a response.
4    
5    Args:
6        user_message: The user's question or instruction
7        system_prompt: Instructions for how the assistant should behave
8    
9    Returns:
10        The model's response as a string
11    """
12    # Using GPT-5 for general-purpose text generation
13    response = client.chat.completions.create(
14        model="gpt-5",
15        messages=[
16            {"role": "system", "content": system_prompt},
17            {"role": "user", "content": user_message}
18        ]
19    )
20    return response.choices[0].message.content
21
22## Now we can use it easily
23answer = ask_assistant("What's the capital of France?")
24print(answer)
25## Output: "The capital of France is Paris."
26
27## Or with a custom system prompt
28answer = ask_assistant(
29    "What's the capital of France?",
30    system_prompt="You are a geography teacher. Explain your answers."
31)
32print(answer)
33## Output: "The capital of France is Paris. It's located in the north-central 
34## part of the country and has been the capital since the 12th century..."

This function is the foundation of our assistant. Every time we want to ask the model something, we'll call this function (or an enhanced version of it).

The Same Pattern, Different Providers

The core pattern (send messages, get response) is universal, but each provider has slightly different syntax. Let's see the same basic interaction using Claude and Gemini.

Example (Claude Sonnet 4.5)

1import os
2from anthropic import Anthropic
3
4## Initialize the client
5client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
6
7## Ask a question
8## Using Claude Sonnet 4.5 for its excellent reasoning capabilities
9response = client.messages.create(
10    model="claude-sonnet-4-5",
11    max_tokens=1024,
12    system="You are a helpful personal assistant.",
13    messages=[
14        {"role": "user", "content": "What is 2 + 2?"}
15    ]
16)
17
18print(response.content[0].text)
19## Output: "2 + 2 equals 4."

1import os
2from anthropic import Anthropic
3
4## Initialize the client
5client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
6
7## Ask a question
8## Using Claude Sonnet 4.5 for its excellent reasoning capabilities
9response = client.messages.create(
10    model="claude-sonnet-4-5",
11    max_tokens=1024,
12    system="You are a helpful personal assistant.",
13    messages=[
14        {"role": "user", "content": "What is 2 + 2?"}
15    ]
16)
17
18print(response.content[0].text)
19## Output: "2 + 2 equals 4."

Notice the differences: Claude uses messages.create() instead of chat.completions.create(), requires a max_tokens parameter (the maximum length of the response), and puts the system prompt as a separate parameter rather than in the messages list. But the fundamental pattern is identical.

Claude Sonnet 4.5 (released September 2025) is Anthropic's most advanced model, excelling at real-world agent tasks, coding, and computer use. It offers improved alignment and can handle extended autonomous work sessions. There's also Claude Haiku 4.5 (released October 2025), which is faster and more cost-efficient while matching Sonnet 4's performance in coding tasks.

Example (Gemini 2.5 Pro)

1import os
2import google.generativeai as genai
3
4## Configure with your API key
5genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
6
7## Initialize the model
8## Using Gemini 2.5 Pro for its large context window and multimodal capabilities
9model = genai.GenerativeModel('gemini-2.5-pro')
10
11## Ask a question
12response = model.generate_content("What is 2 + 2?")
13
14print(response.text)
15## Output: "2 + 2 equals 4."

1import os
2import google.generativeai as genai
3
4## Configure with your API key
5genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
6
7## Initialize the model
8## Using Gemini 2.5 Pro for its large context window and multimodal capabilities
9model = genai.GenerativeModel('gemini-2.5-pro')
10
11## Ask a question
12response = model.generate_content("What is 2 + 2?")
13
14print(response.text)
15## Output: "2 + 2 equals 4."

Gemini's API is even simpler for basic use cases. You just call generate_content() with your prompt. The trade-off is less fine-grained control over things like system messages (though you can add them by structuring your prompt differently).

Gemini 2.5 Pro (released October 2025) is Google's latest model, offering advanced capabilities for various applications and competing with Claude Sonnet 4.5 and GPT-5.

Why show multiple providers? Because in real projects, you'll often choose different providers for different tasks. OpenAI might be best for general text generation, Claude for complex reasoning with tools, and Gemini for multimodal inputs. Understanding the pattern helps you switch between them easily.

Handling Errors Gracefully

API calls can fail. The network might be down, your API key might be invalid, or you might hit rate limits. Production code needs to handle these cases:

1def ask_assistant_safe(user_message, system_prompt="You are a helpful personal assistant."):
2    """
3    Send a message to the language model with error handling.
4    """
5    try:
6        # Using GPT-5 for general-purpose text generation
7        response = client.chat.completions.create(
8            model="gpt-5",
9            messages=[
10                {"role": "system", "content": system_prompt},
11                {"role": "user", "content": user_message}
12            ],
13            timeout=30  # Don't wait forever
14        )
15        return response.choices[0].message.content
16    
17    except Exception as e:
18        # In production, you'd log this error properly
19        return f"Sorry, I encountered an error: {str(e)}"
20
21## Now if something goes wrong, we get a friendly message instead of a crash
22answer = ask_assistant_safe("What's the weather like?")
23print(answer)

1def ask_assistant_safe(user_message, system_prompt="You are a helpful personal assistant."):
2    """
3    Send a message to the language model with error handling.
4    """
5    try:
6        # Using GPT-5 for general-purpose text generation
7        response = client.chat.completions.create(
8            model="gpt-5",
9            messages=[
10                {"role": "system", "content": system_prompt},
11                {"role": "user", "content": user_message}
12            ],
13            timeout=30  # Don't wait forever
14        )
15        return response.choices[0].message.content
16    
17    except Exception as e:
18        # In production, you'd log this error properly
19        return f"Sorry, I encountered an error: {str(e)}"
20
21## Now if something goes wrong, we get a friendly message instead of a crash
22answer = ask_assistant_safe("What's the weather like?")
23print(answer)

This isn't bulletproof error handling. In a real application, you'd want to distinguish between different error types (network issues vs. authentication problems vs. rate limits) and handle each appropriately. But it's a start.

For intermediate readers: Robust error handling for LLM APIs typically includes: exponential backoff for rate limits, retry logic for transient failures, fallback to alternative models if the primary is unavailable, timeout handling, and detailed logging for debugging. You might also want to implement circuit breakers if you're making many requests, to avoid cascading failures. The tenacity library is useful for implementing retry logic with backoff.

Controlling the Output

Language models are probabilistic. They don't always give the same answer to the same question. Traditionally, you could control this behavior with parameters like temperature, but this has changed with newer models.

Important Note about GPT-5: OpenAI has removed the temperature parameter from GPT-5 to standardize response behavior and improve reliability. If you need temperature control, use Claude Sonnet 4.5 or Gemini 2.5 instead.

Example (Claude Sonnet 4.5 with temperature control)

1## More creative/random responses
2## Using Claude Sonnet 4.5 for its temperature control capabilities
3response = client.messages.create(
4    model="claude-sonnet-4-5",
5    max_tokens=1024,
6    temperature=1.0,  # Higher = more random (range: 0 to 1 for Claude)
7    messages=[
8        {"role": "user", "content": "Write a creative opening line for a story."}
9    ]
10)
11
12## More deterministic/focused responses
13response = client.messages.create(
14    model="claude-sonnet-4-5",
15    max_tokens=1024,
16    temperature=0.0,  # Lower = more deterministic
17    messages=[
18        {"role": "user", "content": "What is 2 + 2?"}
19    ]
20)

1## More creative/random responses
2## Using Claude Sonnet 4.5 for its temperature control capabilities
3response = client.messages.create(
4    model="claude-sonnet-4-5",
5    max_tokens=1024,
6    temperature=1.0,  # Higher = more random (range: 0 to 1 for Claude)
7    messages=[
8        {"role": "user", "content": "Write a creative opening line for a story."}
9    ]
10)
11
12## More deterministic/focused responses
13response = client.messages.create(
14    model="claude-sonnet-4-5",
15    max_tokens=1024,
16    temperature=0.0,  # Lower = more deterministic
17    messages=[
18        {"role": "user", "content": "What is 2 + 2?"}
19    ]
20)

Temperature (when available) controls randomness. At 0.0, the model always picks the most likely next word, making responses consistent and predictable. At higher values (up to 1.0 for Claude, 2.0 for Gemini), it samples from a wider range of possibilities, making responses more creative but less reliable.

For factual questions and tasks requiring consistency, use low temperature (0.0 to 0.3). For creative tasks like writing or brainstorming, use higher temperature (0.7 to 1.0).

Claude Sonnet 4.5 and Gemini 2.5 both support temperature parameters, making them excellent choices when you need this level of control in your applications.

Other useful parameters (availability varies by model):

max_tokens: Limits response length (useful for controlling costs and keeping responses concise). Available in all current models.
top_p: Alternative to temperature for controlling randomness (typically use one or the other, not both). Available in Claude and Gemini.
system: Sets the model's behavior and personality (Claude uses a separate parameter; GPT-5 and Gemini use message roles).

For most use cases with GPT-5, you'll primarily adjust max_tokens and system messages. For Claude Sonnet 4.5 and Gemini 2.5, you have additional control through temperature and top_p parameters.

Putting It Together: A Simple Chat Loop

Let's build a basic interactive chat with our assistant:

1def chat():
2    """
3    Simple chat loop with the assistant.
4    Type 'quit' to exit.
5    """
6    print("Assistant: Hello! I'm your personal assistant. How can I help you today?")
7    print("(Type 'quit' to exit)\n")
8    
9    while True:
10        # Get user input
11        user_input = input("You: ").strip()
12        
13        # Check for exit
14        if user_input.lower() in ['quit', 'exit', 'bye']:
15            print("Assistant: Goodbye! Have a great day!")
16            break
17        
18        # Skip empty inputs
19        if not user_input:
20            continue
21        
22        # Get response from the model
23        response = ask_assistant_safe(user_input)
24        print(f"Assistant: {response}\n")
25
26## Run the chat
27chat()

1def chat():
2    """
3    Simple chat loop with the assistant.
4    Type 'quit' to exit.
5    """
6    print("Assistant: Hello! I'm your personal assistant. How can I help you today?")
7    print("(Type 'quit' to exit)\n")
8    
9    while True:
10        # Get user input
11        user_input = input("You: ").strip()
12        
13        # Check for exit
14        if user_input.lower() in ['quit', 'exit', 'bye']:
15            print("Assistant: Goodbye! Have a great day!")
16            break
17        
18        # Skip empty inputs
19        if not user_input:
20            continue
21        
22        # Get response from the model
23        response = ask_assistant_safe(user_input)
24        print(f"Assistant: {response}\n")
25
26## Run the chat
27chat()

Try this out. You now have a working conversational assistant! It's basic (it doesn't remember previous messages in the conversation, can't use tools, and has no special capabilities beyond text generation), but it's a real AI agent that can understand and respond to natural language.

Example interaction:

1Assistant: Hello! I'm your personal assistant. How can I help you today?
2(Type 'quit' to exit)
3
4You: What's the largest planet in our solar system?
5Assistant: The largest planet in our solar system is Jupiter.
6
7You: How much larger is it than Earth?
8Assistant: Jupiter is significantly larger than Earth. Its diameter is about 
911 times that of Earth, and its volume is roughly 1,321 times greater.
10
11You: quit
12Assistant: Goodbye! Have a great day!

1Assistant: Hello! I'm your personal assistant. How can I help you today?
2(Type 'quit' to exit)
3
4You: What's the largest planet in our solar system?
5Assistant: The largest planet in our solar system is Jupiter.
6
7You: How much larger is it than Earth?
8Assistant: Jupiter is significantly larger than Earth. Its diameter is about 
911 times that of Earth, and its volume is roughly 1,321 times greater.
10
11You: quit
12Assistant: Goodbye! Have a great day!

Notice a limitation: when we asked "How much larger is it than Earth?", the assistant understood we were asking about Jupiter because that's a reasonable inference from the question alone (the pronoun "it" plus the context of planetary comparisons). But it doesn't actually remember the previous exchange. Each call to ask_assistant_safe() is independent. We'll fix this in Chapter 6 when we add memory.

What We've Built

In just a few dozen lines of code, you've created the foundation of an AI agent:

A function that can send prompts to a language model
Error handling for when things go wrong
A simple chat interface for interacting with the assistant
Understanding of how to control the model's behavior

This is the core of every AI agent. Everything else we build (tools, memory, planning, reasoning) will be additions and enhancements to this basic pattern of sending messages and receiving responses.

The Cost of Conversation

One practical consideration: API calls cost money. All providers charge based on tokens (roughly, pieces of words) processed. A typical conversation turn might cost a fraction of a cent, but it adds up with heavy use.

Here's a rough guide (prices as of November 2025, but check current rates as they change frequently):

OpenAI:

GPT-5: Pricing varies by usage tier; check OpenAI's website for current rates (most advanced, standardized responses)

Anthropic:

Claude Sonnet 4.5: $3 per million input tokens, $15 per million output tokens (excellent for agent tasks and coding)
Claude Haiku 4.5: $1 per million input tokens, $5 per million output tokens (faster, cost-efficient, good for high-volume tasks)
Claude Opus 4.1: $15 per million input tokens, $75 per million output tokens (maximum capability for complex tasks)

Google:

Gemini 2.5 Flash: Lower cost, faster responses, good for simple tasks
Gemini 2.5 Pro: Competitive pricing with large context window (1M tokens); check Google's pricing page for current rates

A typical message might use 100-500 tokens, so a conversation might cost a fraction of a cent to a few cents depending on the model. Not much for occasional use, but significant if you're processing thousands of requests.

Design consideration: Choose the model based on task complexity. Don't use the most expensive models for simple tasks that cheaper models can handle. We'll explore optimization strategies in Chapter 15.

Looking Ahead

You now know how to call a language model from code. This is the foundation, but it's just the beginning. Our assistant can respond to individual questions, but it can't:

Remember previous parts of the conversation
Use tools like calculators or search engines
Reason through complex problems step by step
Plan multi-step tasks
Take actions beyond generating text

Each of these capabilities builds on what we've learned here. The next chapter introduces prompting, the art and science of communicating effectively with language models. You'll learn how to craft instructions that get better results, how to guide the model's reasoning, and how small changes in wording can dramatically affect output quality.

The model is now listening. Let's learn to speak its language.

Key Concepts

API (Application Programming Interface): A way for programs to communicate with each other. In our case, we use APIs to send requests to language model providers and receive responses.

API Key: A secret credential that authenticates your requests to an API. Think of it like a password that identifies you to the service provider.

System Message: Instructions that set the language model's behavior, personality, or role. This is where you tell the model "You are a helpful assistant" or "You are an expert programmer."

Temperature: A parameter that controls randomness in the model's responses. Low temperature (0.0-0.3) makes responses more focused and deterministic; high temperature (0.7-1.0) makes them more creative and varied. Note: GPT-5 has removed this parameter; use Claude Sonnet 4.5 or Gemini 2.5 if you need temperature control.

Tokens: The units that language models process. Roughly, a token is a piece of a word (common words are one token, longer words might be two or three). Both input (your prompt) and output (the model's response) are measured in tokens.

Max Tokens: A parameter that limits how long the model's response can be. Useful for controlling costs and ensuring responses stay concise.

Further Exploration

OpenAI API Documentation: The official guide covers all parameters and best practices in detail: https://platform.openai.com/docs/guides/text-generation

Token Counting: Understanding tokens is crucial for managing costs. OpenAI provides a tokenizer tool to see how text is broken into tokens: https://platform.openai.com/tokenizer

Rate Limits: All providers limit how many requests you can make per minute. Learn about rate limits and how to handle them: https://platform.openai.com/docs/guides/rate-limits

Alternative Providers: Explore other language model APIs like Anthropic's Claude, Google's Gemini, or open-source models through Hugging Face to see how different providers compare.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about using language models in code.

Loading component...

Back to AI Agent Handbook

Previous Chapter

How Language Models Work (in Plain English)

Next Chapter

Prompting: Communicating with Your AI

Reference

BIBTEXAcademic

@misc{usingalanguagemodelincodecompleteguidetoapiintegrationimplementation, author = {Michael Brenndoerfer}, title = {Using a Language Model in Code: Complete Guide to API Integration & Implementation}, year = {2025}, url = {https://mbrenndoerfer.com/writing/using-a-language-model-in-code}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-09} }

APAAcademic

Michael Brenndoerfer (2025). Using a Language Model in Code: Complete Guide to API Integration & Implementation. Retrieved from https://mbrenndoerfer.com/writing/using-a-language-model-in-code

MLAAcademic

Michael Brenndoerfer. "Using a Language Model in Code: Complete Guide to API Integration & Implementation." 2025. Web. 11/9/2025. <https://mbrenndoerfer.com/writing/using-a-language-model-in-code>.

CHICAGOAcademic

Michael Brenndoerfer. "Using a Language Model in Code: Complete Guide to API Integration & Implementation." Accessed 11/9/2025. https://mbrenndoerfer.com/writing/using-a-language-model-in-code.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Using a Language Model in Code: Complete Guide to API Integration & Implementation'. Available at: https://mbrenndoerfer.com/writing/using-a-language-model-in-code (Accessed: 11/9/2025).

SimpleBasic

Michael Brenndoerfer (2025). Using a Language Model in Code: Complete Guide to API Integration & Implementation. https://mbrenndoerfer.com/writing/using-a-language-model-in-code

Direct link:

https://mbrenndoerfer.com/writing/using-a-language-model-in-code

Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

InteractiveUsing a Language Model in Code: Complete Guide to API Integration & Implementation