00: Start With One LLM API Call
Do not start with agents.
The first layer of a chatbot is not an agent, and it is not a framework. It is one request:
text = model.generate(messages).text
The user sends text. Your program turns that text into JSON for a provider, sends it, then extracts text from the provider's response JSON. This sounds mundane. But if this layer is fuzzy, tool calling, agent loops, and multi-agent systems will feel like floating vocabulary.
This chapter breaks the minimal chatbot into two pieces:
- Messages: what the user said and what instructions the developer supplied.
- Provider: how OpenAI, Anthropic, Gemini, and DeepSeek expect requests to look, and where each one puts the answer.
Understand this layer first. Then agent loops become much less mysterious.
A Chatbot Is an API Contract
Imagine a travel assistant. The user says:
Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.
At runtime, the program does roughly this:
flowchart LR
U["User input"] --> M["Internal messages"]
M --> P["provider adapter"]
P --> A["model API"]
A --> R["provider JSON response"]
R --> T["extract text"]
T --> O["show answer"]
A simple internal message shape might be:
messages = [
{
"role": "system",
"content": "You are a careful travel assistant. Do not invent live facts.",
},
{
"role": "user",
"content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
},
]
That messages shape is ours. Providers do not all accept it as-is.
OpenAI's newer endpoint can use input; its older but still common chat endpoint uses messages. Anthropic keeps system outside the messages array. Gemini uses contents and parts. DeepSeek is close to OpenAI Chat Completions, but not identical.
So the first lesson is not “learn one SDK.” It is: keep your application logic away from provider-specific response shapes.
Same Question, Different APIs
Every example below asks the same thing:
Plan a relaxed one-day Hangzhou trip.
The model names are illustrative. In real applications, put them in configuration or environment variables.
OpenAI: Responses API
For new OpenAI projects, the Responses API is usually the first interface to inspect. It groups text, multimodal input, tools, and response state under one endpoint.
The HTTP shape is roughly:
curl https://api.openai.com/v1/responses \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"instructions": "You are a careful travel assistant. Do not invent live facts.",
"input": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking."
}'
With the Python SDK:
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5.5",
instructions="You are a careful travel assistant. Do not invent live facts.",
input="Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
)
text = response.output_text
Notice the shape:
instructionsis where developer/system guidance goes.inputcan be a plain string or a richer list of input items.- text is commonly read from
response.output_text. - server-side continuation can use features such as
previous_response_idor conversations, but this tutorial starts with local state.
Why avoid server-side state at first? Because agent design is easier to learn when message history, tool observations, and stop conditions are visible in your own code.
OpenAI: Chat Completions API
Chat Completions is older and still widespread. Many tutorials and OpenAI-compatible providers use its shape.
The HTTP shape:
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.2",
"messages": [
{
"role": "developer",
"content": "You are a careful travel assistant. Do not invent live facts."
},
{
"role": "user",
"content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking."
}
]
}'
With the Python SDK:
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-5.2",
messages=[
{
"role": "developer",
"content": "You are a careful travel assistant. Do not invent live facts.",
},
{
"role": "user",
"content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
},
],
)
text = completion.choices[0].message.content
The difference is practical:
- Responses feels like “give the model input, receive a response object.”
- Chat Completions feels like “send a transcript, receive an assistant message.”
For ordinary chat, both can work. For newer OpenAI-native capabilities, start with Responses. For many OpenAI-compatible services, the Chat Completions shape remains common.
Anthropic: Messages API
Anthropic's Claude also has a Messages API, but it is not the same JSON as OpenAI Chat Completions.
Python SDK shape:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system="You are a careful travel assistant. Do not invent live facts.",
messages=[
{
"role": "user",
"content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
}
],
)
text = message.content[0].text
Three details matter:
systemis a top-level parameter, not asystemrole insidemessages.max_tokensis explicit.contentis a list of blocks; text is only one possible block type.
Anthropic's guide also describes Messages as stateless: for multi-turn conversation, send the full conversation history each time. That becomes important in the next chapter.
Google Gemini: GenerateContent API
Gemini uses another shape. The input is contents, and each content item contains parts.
Common Python SDK shape:
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
)
text = response.text
A REST-like payload looks closer to this:
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking."
}
]
}
]
}
Gemini's multimodal inputs also live in parts: text is a part, and images, audio, or file references can be parts too. For this tutorial, remember the shape: Gemini is not messages -> choices; it is closer to contents -> candidates. SDKs often expose response.text for the simple text case.
DeepSeek: Close to Chat Completions, Not Identical
DeepSeek's Chat Completion endpoint is close to OpenAI Chat Completions. A common approach is to use the OpenAI SDK with a different base_url:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
completion = client.chat.completions.create(
model="deepseek-chat",
messages=[
{
"role": "system",
"content": "You are a careful travel assistant. Do not invent live facts.",
},
{
"role": "user",
"content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
},
],
)
text = completion.choices[0].message.content
“OpenAI-compatible” usually means the basic chat shape is similar. It does not guarantee identical tool semantics, reasoning fields, streaming events, errors, or provider-specific metadata.
LangChain's docs make a similar point: when you use a generic OpenAI-compatible wrapper, non-standard provider fields may not be extracted or preserved. If your agent depends on a provider-specific field, do not hide it too deeply.
Role and Field Mapping
The same internal concept gets different names across providers. When writing an adapter, I like to start with a mapping table:
| Internal concept | OpenAI Responses | OpenAI Chat Completions | Anthropic Messages | Gemini GenerateContent | DeepSeek Chat |
|---|---|---|---|---|---|
| Developer instruction | instructions |
developer or system message |
top-level system |
systemInstruction / SDK config |
system message |
| User input | input |
messages[].role=user |
messages[].role=user |
contents[].role=user + parts |
messages[].role=user |
| Model reply | response output item | choices[0].message |
content blocks |
candidates / SDK text |
choices[0].message |
| Multi-turn history | local history or response/conversation continuation | send messages each time |
send full messages each time |
send contents each time |
send messages each time |
| Common text path | response.output_text |
choices[0].message.content |
message.content[0].text |
response.text |
choices[0].message.content |
The table is not something to memorize. It is a reminder that agent code should talk about instructions, user input, model replies, history, and tool observations. Field names belong in adapters.
Why Write a Provider Adapter
If application code directly reads provider responses, it quickly gets messy:
text = response.output_text
text = completion.choices[0].message.content
text = message.content[0].text
text = response.text
All four lines mean “extract the final text.” They should not leak into the rest of your agent code.
Start with a small interface:
from dataclasses import dataclass
from typing import Literal, Protocol
@dataclass
class ChatMessage:
role: Literal["system", "user", "assistant"]
content: str
@dataclass
class ChatResult:
text: str
raw: object | None = None
usage: dict | None = None
stop_reason: str | None = None
request_id: str | None = None
class ChatModel(Protocol):
def generate(self, messages: list[ChatMessage]) -> ChatResult:
...
The provider adapter translates:
class OpenAIResponsesModel:
def __init__(self, client, model: str):
self.client = client
self.model = model
def generate(self, messages: list[ChatMessage]) -> ChatResult:
instructions, input_text = split_system_and_user(messages)
response = self.client.responses.create(
model=self.model,
instructions=instructions,
input=input_text,
)
return ChatResult(
text=response.output_text,
raw=response,
)
Here, split_system_and_user() is a small helper: it moves internal system messages into instructions and turns the user message into input. In a real app, it also needs to handle multi-turn history; this chapter stays with one request.
This adapter does not plan, call tools, or loop. It only turns our ChatMessage objects into provider-specific payloads, then turns the provider response into ChatResult.
Keep this layer thin. Thin layers are easier to reason about.
Do Not Over-Hide Provider Details
For teaching, returning a string is fine:
def generate(messages) -> str:
...
For a real application, I prefer returning ChatResult, because later agent code needs:
usage: token cost for the call.stop_reason: whether the model stopped normally or hit a limit.request_id: useful when debugging production incidents.raw: a place to inspect tool calls, reasoning fields, safety blocks, or provider-specific metadata.
You will not use all of these fields in chapter one. Just avoid throwing them away too early.
The Minimal Chatbot
Now the actual chatbot is small:
def reply(model: ChatModel, user_text: str) -> str:
result = model.generate([
ChatMessage(
role="system",
content="You are a careful travel assistant. Do not invent live facts.",
),
ChatMessage(role="user", content=user_text),
])
return result.text
This is the first layer. It can answer once, but it is not an agent.
Why not?
- It does not remember the previous turn.
- It does not require stable JSON.
- It cannot check the weather.
- It cannot choose the next step after seeing a tool result.
- It has no stop condition because there is no loop.
So when the user asks:
What did I say I liked?
the minimal chatbot is stuck. It only sees the current request.
The next layer is conversation history: 01: Conversation History.
API Choice Cheat Sheet
| If you want... | Start with... |
|---|---|
| A new OpenAI project with newer OpenAI-native tools, multimodal input, or state features | OpenAI Responses API |
| Broad compatibility with OpenAI-compatible providers | Chat Completions shape |
| Direct Claude usage with Claude block/tool/thinking semantics | Anthropic Messages API |
| Natural mixing of text, images, audio, and file parts | Gemini GenerateContent |
| DeepSeek with a simple chat-compatible interface | DeepSeek Chat Completion |
How This Relates to Frameworks
Vercel AI SDK, Pydantic AI, and LangChain all solve a similar problem: they wrap provider-specific APIs behind a shared interface.
They are heavier than the small adapter in this chapter:
- Vercel AI SDK / AI Gateway fits the TypeScript and frontend ecosystem, with routing and fallback support.
- Pydantic AI is Python-first, with model/provider abstractions tied closely to structured output.
- LangChain has the widest integration surface: chat models, routers, tools, retrievers, and more.
This tutorial avoids them at first not because they are bad, but because we want to see the bottom layer clearly. Once messages -> provider payload -> provider response -> text feels obvious, those frameworks are much easier to read.
References
- OpenAI — Text generation / Responses API: https://developers.openai.com/api/docs/guides/text
- OpenAI — Chat Completions API: https://platform.openai.com/docs/api-reference/chat/create
- Anthropic — Using the Messages API: https://platform.claude.com/docs/en/build-with-claude/working-with-messages
- Google — Gemini GenerateContent API: https://ai.google.dev/api/generate-content
- DeepSeek — Create Chat Completion: https://api-docs.deepseek.com/api/create-chat-completion
- Vercel — Models & Providers: https://vercel.com/docs/ai-gateway/models-and-providers
- Pydantic AI — Models overview: https://pydantic.dev/docs/ai/models/overview/
- LangChain — Chat model integrations: https://docs.langchain.com/oss/python/integrations/chat