Learn how to call language models from Python code, including GPT-5, Claude Sonnet 4.5, and Gemini 2.5. Master API integration, error handling, and building reusable functions for AI agents.

This article is part of the free-to-read AI Agent Handbook
Using a Language Model in Code
In the previous section, we explored how language models work conceptually. They're prediction engines trained on massive amounts of text, learning patterns that let them generate coherent, contextually appropriate responses. Now it's time to move from theory to practice: actually calling a language model from Python code.
This is where your personal assistant starts to come alive. Instead of just understanding what a language model does, you'll see how to use one to build something real.
The Basic Pattern
Every interaction with a language model follows the same fundamental pattern, regardless of which provider you use:
- Set up a connection to the model (usually via an API)
- Send a prompt (your instruction or question)
- Receive a response (the model's generated text)
- Use the response (display it, process it, or feed it into another step)
That's it. The complexity comes in how you craft your prompts and what you do with the responses, but the basic mechanics are straightforward.
Let's see this in action.
Your First Model Call
We'll start with the simplest possible example: asking a language model a question and printing the answer.
Example (GPT-5)
1import os
2from openai import OpenAI
3
4## Initialize the client
5client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
6
7## Ask a question
8## Using GPT-5 for its standardized, reliable responses
9response = client.chat.completions.create(
10 model="gpt-5",
11 messages=[
12 {"role": "user", "content": "What is 2 + 2?"}
13 ]
14)
15
16## Print the answer
17print(response.choices[0].message.content)
18## Output: "2 + 2 equals 4."1import os
2from openai import OpenAI
3
4## Initialize the client
5client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
6
7## Ask a question
8## Using GPT-5 for its standardized, reliable responses
9response = client.chat.completions.create(
10 model="gpt-5",
11 messages=[
12 {"role": "user", "content": "What is 2 + 2?"}
13 ]
14)
15
16## Print the answer
17print(response.choices[0].message.content)
18## Output: "2 + 2 equals 4."Let's unpack what's happening here:
API Key: The OPENAI_API_KEY is your authentication credential. You get this from OpenAI's website after creating an account. Store it as an environment variable (never hard-code it in your source code, as that's a security risk).
Client initialization: The OpenAI() client is your connection to the model. You create it once and reuse it for multiple requests.
Model selection: "gpt-5" specifies which language model to use. Different models have different capabilities, speeds, and costs. GPT-5 is OpenAI's latest model as of 2025, offering improved reliability and standardized responses.
Messages format: The messages parameter is a list of conversation turns. Each message has a role (who's speaking) and content (what they said). Right now we're only sending one user message, but we'll expand this shortly.
Response structure: The model returns a complex object, but what we care about is response.choices[0].message.content, which contains the actual text the model generated.
Adding Context with System Messages
The example above works, but it's missing something important: we haven't told the model who it is or how it should behave. That's where system messages come in.
Example (GPT-5)
1## Using GPT-5 for basic text generation
2response = client.chat.completions.create(
3 model="gpt-5",
4 messages=[
5 {"role": "system", "content": "You are a helpful personal assistant. Be concise and friendly."},
6 {"role": "user", "content": "What is 2 + 2?"}
7 ]
8)
9
10print(response.choices[0].message.content)
11## Output: "It's 4! 😊"1## Using GPT-5 for basic text generation
2response = client.chat.completions.create(
3 model="gpt-5",
4 messages=[
5 {"role": "system", "content": "You are a helpful personal assistant. Be concise and friendly."},
6 {"role": "user", "content": "What is 2 + 2?"}
7 ]
8)
9
10print(response.choices[0].message.content)
11## Output: "It's 4! 😊"The system message sets the tone and behavior. It's like giving the model a job description before asking it to work. Without it, the model will still respond, but you have less control over how it responds.
Think of it this way: if you walked up to someone and said "What is 2 + 2?", they might give you a straightforward answer, or they might be confused about why you're asking such a simple question, or they might launch into a lecture about arithmetic. The system message is like introducing yourself first: "Hi, I'm your personal assistant, and I'm here to help you with quick, friendly answers."
For intermediate readers: System messages are powerful but not foolproof. The model can still deviate from them, especially if user messages strongly contradict the system instructions. In practice, you'll often need to reinforce important behaviors through examples (few-shot prompting) or by structuring your prompts carefully. System messages work best for setting general tone and role, less well for enforcing strict rules.
Building a Reusable Function
Hard-coding each API call gets tedious quickly. Let's wrap this in a function that we can reuse throughout our assistant:
1def ask_assistant(user_message, system_prompt="You are a helpful personal assistant."):
2 """
3 Send a message to the language model and get a response.
4
5 Args:
6 user_message: The user's question or instruction
7 system_prompt: Instructions for how the assistant should behave
8
9 Returns:
10 The model's response as a string
11 """
12 # Using GPT-5 for general-purpose text generation
13 response = client.chat.completions.create(
14 model="gpt-5",
15 messages=[
16 {"role": "system", "content": system_prompt},
17 {"role": "user", "content": user_message}
18 ]
19 )
20 return response.choices[0].message.content
21
22## Now we can use it easily
23answer = ask_assistant("What's the capital of France?")
24print(answer)
25## Output: "The capital of France is Paris."
26
27## Or with a custom system prompt
28answer = ask_assistant(
29 "What's the capital of France?",
30 system_prompt="You are a geography teacher. Explain your answers."
31)
32print(answer)
33## Output: "The capital of France is Paris. It's located in the north-central
34## part of the country and has been the capital since the 12th century..."1def ask_assistant(user_message, system_prompt="You are a helpful personal assistant."):
2 """
3 Send a message to the language model and get a response.
4
5 Args:
6 user_message: The user's question or instruction
7 system_prompt: Instructions for how the assistant should behave
8
9 Returns:
10 The model's response as a string
11 """
12 # Using GPT-5 for general-purpose text generation
13 response = client.chat.completions.create(
14 model="gpt-5",
15 messages=[
16 {"role": "system", "content": system_prompt},
17 {"role": "user", "content": user_message}
18 ]
19 )
20 return response.choices[0].message.content
21
22## Now we can use it easily
23answer = ask_assistant("What's the capital of France?")
24print(answer)
25## Output: "The capital of France is Paris."
26
27## Or with a custom system prompt
28answer = ask_assistant(
29 "What's the capital of France?",
30 system_prompt="You are a geography teacher. Explain your answers."
31)
32print(answer)
33## Output: "The capital of France is Paris. It's located in the north-central
34## part of the country and has been the capital since the 12th century..."This function is the foundation of our assistant. Every time we want to ask the model something, we'll call this function (or an enhanced version of it).
The Same Pattern, Different Providers
The core pattern (send messages, get response) is universal, but each provider has slightly different syntax. Let's see the same basic interaction using Claude and Gemini.
Example (Claude Sonnet 4.5)
1import os
2from anthropic import Anthropic
3
4## Initialize the client
5client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
6
7## Ask a question
8## Using Claude Sonnet 4.5 for its excellent reasoning capabilities
9response = client.messages.create(
10 model="claude-sonnet-4-5",
11 max_tokens=1024,
12 system="You are a helpful personal assistant.",
13 messages=[
14 {"role": "user", "content": "What is 2 + 2?"}
15 ]
16)
17
18print(response.content[0].text)
19## Output: "2 + 2 equals 4."1import os
2from anthropic import Anthropic
3
4## Initialize the client
5client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
6
7## Ask a question
8## Using Claude Sonnet 4.5 for its excellent reasoning capabilities
9response = client.messages.create(
10 model="claude-sonnet-4-5",
11 max_tokens=1024,
12 system="You are a helpful personal assistant.",
13 messages=[
14 {"role": "user", "content": "What is 2 + 2?"}
15 ]
16)
17
18print(response.content[0].text)
19## Output: "2 + 2 equals 4."Notice the differences: Claude uses messages.create() instead of chat.completions.create(), requires a max_tokens parameter (the maximum length of the response), and puts the system prompt as a separate parameter rather than in the messages list. But the fundamental pattern is identical.
Claude Sonnet 4.5 (released September 2025) is Anthropic's most advanced model, excelling at real-world agent tasks, coding, and computer use. It offers improved alignment and can handle extended autonomous work sessions. There's also Claude Haiku 4.5 (released October 2025), which is faster and more cost-efficient while matching Sonnet 4's performance in coding tasks.
Example (Gemini 2.5 Pro)
1import os
2import google.generativeai as genai
3
4## Configure with your API key
5genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
6
7## Initialize the model
8## Using Gemini 2.5 Pro for its large context window and multimodal capabilities
9model = genai.GenerativeModel('gemini-2.5-pro')
10
11## Ask a question
12response = model.generate_content("What is 2 + 2?")
13
14print(response.text)
15## Output: "2 + 2 equals 4."1import os
2import google.generativeai as genai
3
4## Configure with your API key
5genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
6
7## Initialize the model
8## Using Gemini 2.5 Pro for its large context window and multimodal capabilities
9model = genai.GenerativeModel('gemini-2.5-pro')
10
11## Ask a question
12response = model.generate_content("What is 2 + 2?")
13
14print(response.text)
15## Output: "2 + 2 equals 4."Gemini's API is even simpler for basic use cases. You just call generate_content() with your prompt. The trade-off is less fine-grained control over things like system messages (though you can add them by structuring your prompt differently).
Gemini 2.5 Pro (released October 2025) is Google's latest model, offering advanced capabilities for various applications and competing with Claude Sonnet 4.5 and GPT-5.
Why show multiple providers? Because in real projects, you'll often choose different providers for different tasks. OpenAI might be best for general text generation, Claude for complex reasoning with tools, and Gemini for multimodal inputs. Understanding the pattern helps you switch between them easily.
Handling Errors Gracefully
API calls can fail. The network might be down, your API key might be invalid, or you might hit rate limits. Production code needs to handle these cases:
1def ask_assistant_safe(user_message, system_prompt="You are a helpful personal assistant."):
2 """
3 Send a message to the language model with error handling.
4 """
5 try:
6 # Using GPT-5 for general-purpose text generation
7 response = client.chat.completions.create(
8 model="gpt-5",
9 messages=[
10 {"role": "system", "content": system_prompt},
11 {"role": "user", "content": user_message}
12 ],
13 timeout=30 # Don't wait forever
14 )
15 return response.choices[0].message.content
16
17 except Exception as e:
18 # In production, you'd log this error properly
19 return f"Sorry, I encountered an error: {str(e)}"
20
21## Now if something goes wrong, we get a friendly message instead of a crash
22answer = ask_assistant_safe("What's the weather like?")
23print(answer)1def ask_assistant_safe(user_message, system_prompt="You are a helpful personal assistant."):
2 """
3 Send a message to the language model with error handling.
4 """
5 try:
6 # Using GPT-5 for general-purpose text generation
7 response = client.chat.completions.create(
8 model="gpt-5",
9 messages=[
10 {"role": "system", "content": system_prompt},
11 {"role": "user", "content": user_message}
12 ],
13 timeout=30 # Don't wait forever
14 )
15 return response.choices[0].message.content
16
17 except Exception as e:
18 # In production, you'd log this error properly
19 return f"Sorry, I encountered an error: {str(e)}"
20
21## Now if something goes wrong, we get a friendly message instead of a crash
22answer = ask_assistant_safe("What's the weather like?")
23print(answer)This isn't bulletproof error handling. In a real application, you'd want to distinguish between different error types (network issues vs. authentication problems vs. rate limits) and handle each appropriately. But it's a start.
For intermediate readers: Robust error handling for LLM APIs typically includes: exponential backoff for rate limits, retry logic for transient failures, fallback to alternative models if the primary is unavailable, timeout handling, and detailed logging for debugging. You might also want to implement circuit breakers if you're making many requests, to avoid cascading failures. The tenacity library is useful for implementing retry logic with backoff.
Controlling the Output
Language models are probabilistic. They don't always give the same answer to the same question. Traditionally, you could control this behavior with parameters like temperature, but this has changed with newer models.
Important Note about GPT-5: OpenAI has removed the temperature parameter from GPT-5 to standardize response behavior and improve reliability. If you need temperature control, use Claude Sonnet 4.5 or Gemini 2.5 instead.
Example (Claude Sonnet 4.5 with temperature control)
1## More creative/random responses
2## Using Claude Sonnet 4.5 for its temperature control capabilities
3response = client.messages.create(
4 model="claude-sonnet-4-5",
5 max_tokens=1024,
6 temperature=1.0, # Higher = more random (range: 0 to 1 for Claude)
7 messages=[
8 {"role": "user", "content": "Write a creative opening line for a story."}
9 ]
10)
11
12## More deterministic/focused responses
13response = client.messages.create(
14 model="claude-sonnet-4-5",
15 max_tokens=1024,
16 temperature=0.0, # Lower = more deterministic
17 messages=[
18 {"role": "user", "content": "What is 2 + 2?"}
19 ]
20)1## More creative/random responses
2## Using Claude Sonnet 4.5 for its temperature control capabilities
3response = client.messages.create(
4 model="claude-sonnet-4-5",
5 max_tokens=1024,
6 temperature=1.0, # Higher = more random (range: 0 to 1 for Claude)
7 messages=[
8 {"role": "user", "content": "Write a creative opening line for a story."}
9 ]
10)
11
12## More deterministic/focused responses
13response = client.messages.create(
14 model="claude-sonnet-4-5",
15 max_tokens=1024,
16 temperature=0.0, # Lower = more deterministic
17 messages=[
18 {"role": "user", "content": "What is 2 + 2?"}
19 ]
20)Temperature (when available) controls randomness. At 0.0, the model always picks the most likely next word, making responses consistent and predictable. At higher values (up to 1.0 for Claude, 2.0 for Gemini), it samples from a wider range of possibilities, making responses more creative but less reliable.
For factual questions and tasks requiring consistency, use low temperature (0.0 to 0.3). For creative tasks like writing or brainstorming, use higher temperature (0.7 to 1.0).
Claude Sonnet 4.5 and Gemini 2.5 both support temperature parameters, making them excellent choices when you need this level of control in your applications.
Other useful parameters (availability varies by model):
- max_tokens: Limits response length (useful for controlling costs and keeping responses concise). Available in all current models.
- top_p: Alternative to temperature for controlling randomness (typically use one or the other, not both). Available in Claude and Gemini.
- system: Sets the model's behavior and personality (Claude uses a separate parameter; GPT-5 and Gemini use message roles).
For most use cases with GPT-5, you'll primarily adjust max_tokens and system messages. For Claude Sonnet 4.5 and Gemini 2.5, you have additional control through temperature and top_p parameters.
Putting It Together: A Simple Chat Loop
Let's build a basic interactive chat with our assistant:
1def chat():
2 """
3 Simple chat loop with the assistant.
4 Type 'quit' to exit.
5 """
6 print("Assistant: Hello! I'm your personal assistant. How can I help you today?")
7 print("(Type 'quit' to exit)\n")
8
9 while True:
10 # Get user input
11 user_input = input("You: ").strip()
12
13 # Check for exit
14 if user_input.lower() in ['quit', 'exit', 'bye']:
15 print("Assistant: Goodbye! Have a great day!")
16 break
17
18 # Skip empty inputs
19 if not user_input:
20 continue
21
22 # Get response from the model
23 response = ask_assistant_safe(user_input)
24 print(f"Assistant: {response}\n")
25
26## Run the chat
27chat()1def chat():
2 """
3 Simple chat loop with the assistant.
4 Type 'quit' to exit.
5 """
6 print("Assistant: Hello! I'm your personal assistant. How can I help you today?")
7 print("(Type 'quit' to exit)\n")
8
9 while True:
10 # Get user input
11 user_input = input("You: ").strip()
12
13 # Check for exit
14 if user_input.lower() in ['quit', 'exit', 'bye']:
15 print("Assistant: Goodbye! Have a great day!")
16 break
17
18 # Skip empty inputs
19 if not user_input:
20 continue
21
22 # Get response from the model
23 response = ask_assistant_safe(user_input)
24 print(f"Assistant: {response}\n")
25
26## Run the chat
27chat()Try this out. You now have a working conversational assistant! It's basic (it doesn't remember previous messages in the conversation, can't use tools, and has no special capabilities beyond text generation), but it's a real AI agent that can understand and respond to natural language.
Example interaction:
1Assistant: Hello! I'm your personal assistant. How can I help you today?
2(Type 'quit' to exit)
3
4You: What's the largest planet in our solar system?
5Assistant: The largest planet in our solar system is Jupiter.
6
7You: How much larger is it than Earth?
8Assistant: Jupiter is significantly larger than Earth. Its diameter is about
911 times that of Earth, and its volume is roughly 1,321 times greater.
10
11You: quit
12Assistant: Goodbye! Have a great day!1Assistant: Hello! I'm your personal assistant. How can I help you today?
2(Type 'quit' to exit)
3
4You: What's the largest planet in our solar system?
5Assistant: The largest planet in our solar system is Jupiter.
6
7You: How much larger is it than Earth?
8Assistant: Jupiter is significantly larger than Earth. Its diameter is about
911 times that of Earth, and its volume is roughly 1,321 times greater.
10
11You: quit
12Assistant: Goodbye! Have a great day!Notice a limitation: when we asked "How much larger is it than Earth?", the assistant understood we were asking about Jupiter because that's a reasonable inference from the question alone (the pronoun "it" plus the context of planetary comparisons). But it doesn't actually remember the previous exchange. Each call to ask_assistant_safe() is independent. We'll fix this in Chapter 6 when we add memory.
What We've Built
In just a few dozen lines of code, you've created the foundation of an AI agent:
- A function that can send prompts to a language model
- Error handling for when things go wrong
- A simple chat interface for interacting with the assistant
- Understanding of how to control the model's behavior
This is the core of every AI agent. Everything else we build (tools, memory, planning, reasoning) will be additions and enhancements to this basic pattern of sending messages and receiving responses.
The Cost of Conversation
One practical consideration: API calls cost money. All providers charge based on tokens (roughly, pieces of words) processed. A typical conversation turn might cost a fraction of a cent, but it adds up with heavy use.
Here's a rough guide (prices as of November 2025, but check current rates as they change frequently):
OpenAI:
- GPT-5: Pricing varies by usage tier; check OpenAI's website for current rates (most advanced, standardized responses)
Anthropic:
- Claude Sonnet 4.5: $3 per million input tokens, $15 per million output tokens (excellent for agent tasks and coding)
- Claude Haiku 4.5: $1 per million input tokens, $5 per million output tokens (faster, cost-efficient, good for high-volume tasks)
- Claude Opus 4.1: $15 per million input tokens, $75 per million output tokens (maximum capability for complex tasks)
Google:
- Gemini 2.5 Flash: Lower cost, faster responses, good for simple tasks
- Gemini 2.5 Pro: Competitive pricing with large context window (1M tokens); check Google's pricing page for current rates
A typical message might use 100-500 tokens, so a conversation might cost a fraction of a cent to a few cents depending on the model. Not much for occasional use, but significant if you're processing thousands of requests.
Design consideration: Choose the model based on task complexity. Don't use the most expensive models for simple tasks that cheaper models can handle. We'll explore optimization strategies in Chapter 15.
Looking Ahead
You now know how to call a language model from code. This is the foundation, but it's just the beginning. Our assistant can respond to individual questions, but it can't:
- Remember previous parts of the conversation
- Use tools like calculators or search engines
- Reason through complex problems step by step
- Plan multi-step tasks
- Take actions beyond generating text
Each of these capabilities builds on what we've learned here. The next chapter introduces prompting, the art and science of communicating effectively with language models. You'll learn how to craft instructions that get better results, how to guide the model's reasoning, and how small changes in wording can dramatically affect output quality.
The model is now listening. Let's learn to speak its language.
Key Concepts
API (Application Programming Interface): A way for programs to communicate with each other. In our case, we use APIs to send requests to language model providers and receive responses.
API Key: A secret credential that authenticates your requests to an API. Think of it like a password that identifies you to the service provider.
System Message: Instructions that set the language model's behavior, personality, or role. This is where you tell the model "You are a helpful assistant" or "You are an expert programmer."
Temperature: A parameter that controls randomness in the model's responses. Low temperature (0.0-0.3) makes responses more focused and deterministic; high temperature (0.7-1.0) makes them more creative and varied. Note: GPT-5 has removed this parameter; use Claude Sonnet 4.5 or Gemini 2.5 if you need temperature control.
Tokens: The units that language models process. Roughly, a token is a piece of a word (common words are one token, longer words might be two or three). Both input (your prompt) and output (the model's response) are measured in tokens.
Max Tokens: A parameter that limits how long the model's response can be. Useful for controlling costs and ensuring responses stay concise.
Further Exploration
OpenAI API Documentation: The official guide covers all parameters and best practices in detail: https://platform.openai.com/docs/guides/text-generation
Token Counting: Understanding tokens is crucial for managing costs. OpenAI provides a tokenizer tool to see how text is broken into tokens: https://platform.openai.com/tokenizer
Rate Limits: All providers limit how many requests you can make per minute. Learn about rate limits and how to handle them: https://platform.openai.com/docs/guides/rate-limits
Alternative Providers: Explore other language model APIs like Anthropic's Claude, Google's Gemini, or open-source models through Hugging Face to see how different providers compare.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about using language models in code.
Reference

About the author: Michael Brenndoerfer
All opinions expressed here are my own and do not reflect the views of my employer.
Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.
With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.
Related Content

Adding a Calculator Tool to Your AI Agent: Complete Implementation Guide
Build a working calculator tool for your AI agent from scratch. Learn the complete workflow from Python function to tool integration, with error handling and testing examples.

DBSCAN Clustering: Complete Guide to Density-Based Spatial Clustering with Noise Detection
Master DBSCAN clustering for finding arbitrary-shaped clusters and detecting outliers. Learn density-based clustering, parameter tuning, and implementation with scikit-learn.

Designing Simple Tool Interfaces: A Complete Guide to Connecting AI Agents with External Functions
Learn how to design effective tool interfaces for AI agents, from basic function definitions to multi-tool orchestration. Covers tool descriptions, parameter extraction, workflow implementation, and best practices for agent-friendly APIs.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.

