跳转至

00: Start With One LLM API Call

Do not start with agents.

The first layer of a chatbot is not an agent, and it is not a framework. It is one request:

text = model.generate(messages).text

The user sends text. Your program turns that text into JSON for a provider, sends it, then extracts text from the provider's response JSON. This sounds mundane. But if this layer is fuzzy, tool calling, agent loops, and multi-agent systems will feel like floating vocabulary.

This chapter breaks the minimal chatbot into two pieces:

  1. Messages: what the user said and what instructions the developer supplied.
  2. Provider: how OpenAI, Anthropic, Gemini, and DeepSeek expect requests to look, and where each one puts the answer.

Understand this layer first. Then agent loops become much less mysterious.

A Chatbot Is an API Contract

Imagine a travel assistant. The user says:

Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.

At runtime, the program does roughly this:

flowchart LR
    U["User input"] --> M["Internal messages"]
    M --> P["provider adapter"]
    P --> A["model API"]
    A --> R["provider JSON response"]
    R --> T["extract text"]
    T --> O["show answer"]

A simple internal message shape might be:

messages = [
    {
        "role": "system",
        "content": "You are a careful travel assistant. Do not invent live facts.",
    },
    {
        "role": "user",
        "content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
    },
]

That messages shape is ours. Providers do not all accept it as-is.

OpenAI's newer endpoint can use input; its older but still common chat endpoint uses messages. Anthropic keeps system outside the messages array. Gemini uses contents and parts. DeepSeek is close to OpenAI Chat Completions, but not identical.

So the first lesson is not “learn one SDK.” It is: keep your application logic away from provider-specific response shapes.

Same Question, Different APIs

Every example below asks the same thing:

Plan a relaxed one-day Hangzhou trip.

The model names are illustrative. In real applications, put them in configuration or environment variables.

OpenAI: Responses API

For new OpenAI projects, the Responses API is usually the first interface to inspect. It groups text, multimodal input, tools, and response state under one endpoint.

The HTTP shape is roughly:

curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "instructions": "You are a careful travel assistant. Do not invent live facts.",
    "input": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking."
  }'

With the Python SDK:

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    instructions="You are a careful travel assistant. Do not invent live facts.",
    input="Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
)

text = response.output_text

Notice the shape:

  • instructions is where developer/system guidance goes.
  • input can be a plain string or a richer list of input items.
  • text is commonly read from response.output_text.
  • server-side continuation can use features such as previous_response_id or conversations, but this tutorial starts with local state.

Why avoid server-side state at first? Because agent design is easier to learn when message history, tool observations, and stop conditions are visible in your own code.

OpenAI: Chat Completions API

Chat Completions is older and still widespread. Many tutorials and OpenAI-compatible providers use its shape.

The HTTP shape:

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2",
    "messages": [
      {
        "role": "developer",
        "content": "You are a careful travel assistant. Do not invent live facts."
      },
      {
        "role": "user",
        "content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking."
      }
    ]
  }'

With the Python SDK:

from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {
            "role": "developer",
            "content": "You are a careful travel assistant. Do not invent live facts.",
        },
        {
            "role": "user",
            "content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
        },
    ],
)

text = completion.choices[0].message.content

The difference is practical:

  • Responses feels like “give the model input, receive a response object.”
  • Chat Completions feels like “send a transcript, receive an assistant message.”

For ordinary chat, both can work. For newer OpenAI-native capabilities, start with Responses. For many OpenAI-compatible services, the Chat Completions shape remains common.

Anthropic: Messages API

Anthropic's Claude also has a Messages API, but it is not the same JSON as OpenAI Chat Completions.

Python SDK shape:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="You are a careful travel assistant. Do not invent live facts.",
    messages=[
        {
            "role": "user",
            "content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
        }
    ],
)

text = message.content[0].text

Three details matter:

  • system is a top-level parameter, not a system role inside messages.
  • max_tokens is explicit.
  • content is a list of blocks; text is only one possible block type.

Anthropic's guide also describes Messages as stateless: for multi-turn conversation, send the full conversation history each time. That becomes important in the next chapter.

Google Gemini: GenerateContent API

Gemini uses another shape. The input is contents, and each content item contains parts.

Common Python SDK shape:

from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
)

text = response.text

A REST-like payload looks closer to this:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking."
        }
      ]
    }
  ]
}

Gemini's multimodal inputs also live in parts: text is a part, and images, audio, or file references can be parts too. For this tutorial, remember the shape: Gemini is not messages -> choices; it is closer to contents -> candidates. SDKs often expose response.text for the simple text case.

DeepSeek: Close to Chat Completions, Not Identical

DeepSeek's Chat Completion endpoint is close to OpenAI Chat Completions. A common approach is to use the OpenAI SDK with a different base_url:

import os

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

completion = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {
            "role": "system",
            "content": "You are a careful travel assistant. Do not invent live facts.",
        },
        {
            "role": "user",
            "content": "Plan a relaxed one-day Hangzhou trip. I like tea, local food, and easy walking.",
        },
    ],
)

text = completion.choices[0].message.content

“OpenAI-compatible” usually means the basic chat shape is similar. It does not guarantee identical tool semantics, reasoning fields, streaming events, errors, or provider-specific metadata.

LangChain's docs make a similar point: when you use a generic OpenAI-compatible wrapper, non-standard provider fields may not be extracted or preserved. If your agent depends on a provider-specific field, do not hide it too deeply.

Role and Field Mapping

The same internal concept gets different names across providers. When writing an adapter, I like to start with a mapping table:

Internal concept OpenAI Responses OpenAI Chat Completions Anthropic Messages Gemini GenerateContent DeepSeek Chat
Developer instruction instructions developer or system message top-level system systemInstruction / SDK config system message
User input input messages[].role=user messages[].role=user contents[].role=user + parts messages[].role=user
Model reply response output item choices[0].message content blocks candidates / SDK text choices[0].message
Multi-turn history local history or response/conversation continuation send messages each time send full messages each time send contents each time send messages each time
Common text path response.output_text choices[0].message.content message.content[0].text response.text choices[0].message.content

The table is not something to memorize. It is a reminder that agent code should talk about instructions, user input, model replies, history, and tool observations. Field names belong in adapters.

Why Write a Provider Adapter

If application code directly reads provider responses, it quickly gets messy:

text = response.output_text
text = completion.choices[0].message.content
text = message.content[0].text
text = response.text

All four lines mean “extract the final text.” They should not leak into the rest of your agent code.

Start with a small interface:

from dataclasses import dataclass
from typing import Literal, Protocol


@dataclass
class ChatMessage:
    role: Literal["system", "user", "assistant"]
    content: str


@dataclass
class ChatResult:
    text: str
    raw: object | None = None
    usage: dict | None = None
    stop_reason: str | None = None
    request_id: str | None = None


class ChatModel(Protocol):
    def generate(self, messages: list[ChatMessage]) -> ChatResult:
        ...

The provider adapter translates:

class OpenAIResponsesModel:
    def __init__(self, client, model: str):
        self.client = client
        self.model = model

    def generate(self, messages: list[ChatMessage]) -> ChatResult:
        instructions, input_text = split_system_and_user(messages)
        response = self.client.responses.create(
            model=self.model,
            instructions=instructions,
            input=input_text,
        )
        return ChatResult(
            text=response.output_text,
            raw=response,
        )

Here, split_system_and_user() is a small helper: it moves internal system messages into instructions and turns the user message into input. In a real app, it also needs to handle multi-turn history; this chapter stays with one request.

This adapter does not plan, call tools, or loop. It only turns our ChatMessage objects into provider-specific payloads, then turns the provider response into ChatResult.

Keep this layer thin. Thin layers are easier to reason about.

Do Not Over-Hide Provider Details

For teaching, returning a string is fine:

def generate(messages) -> str:
    ...

For a real application, I prefer returning ChatResult, because later agent code needs:

  • usage: token cost for the call.
  • stop_reason: whether the model stopped normally or hit a limit.
  • request_id: useful when debugging production incidents.
  • raw: a place to inspect tool calls, reasoning fields, safety blocks, or provider-specific metadata.

You will not use all of these fields in chapter one. Just avoid throwing them away too early.

The Minimal Chatbot

Now the actual chatbot is small:

def reply(model: ChatModel, user_text: str) -> str:
    result = model.generate([
        ChatMessage(
            role="system",
            content="You are a careful travel assistant. Do not invent live facts.",
        ),
        ChatMessage(role="user", content=user_text),
    ])
    return result.text

This is the first layer. It can answer once, but it is not an agent.

Why not?

  • It does not remember the previous turn.
  • It does not require stable JSON.
  • It cannot check the weather.
  • It cannot choose the next step after seeing a tool result.
  • It has no stop condition because there is no loop.

So when the user asks:

What did I say I liked?

the minimal chatbot is stuck. It only sees the current request.

The next layer is conversation history: 01: Conversation History.

API Choice Cheat Sheet

If you want... Start with...
A new OpenAI project with newer OpenAI-native tools, multimodal input, or state features OpenAI Responses API
Broad compatibility with OpenAI-compatible providers Chat Completions shape
Direct Claude usage with Claude block/tool/thinking semantics Anthropic Messages API
Natural mixing of text, images, audio, and file parts Gemini GenerateContent
DeepSeek with a simple chat-compatible interface DeepSeek Chat Completion

How This Relates to Frameworks

Vercel AI SDK, Pydantic AI, and LangChain all solve a similar problem: they wrap provider-specific APIs behind a shared interface.

They are heavier than the small adapter in this chapter:

  • Vercel AI SDK / AI Gateway fits the TypeScript and frontend ecosystem, with routing and fallback support.
  • Pydantic AI is Python-first, with model/provider abstractions tied closely to structured output.
  • LangChain has the widest integration surface: chat models, routers, tools, retrievers, and more.

This tutorial avoids them at first not because they are bad, but because we want to see the bottom layer clearly. Once messages -> provider payload -> provider response -> text feels obvious, those frameworks are much easier to read.

References

  • OpenAI — Text generation / Responses API: https://developers.openai.com/api/docs/guides/text
  • OpenAI — Chat Completions API: https://platform.openai.com/docs/api-reference/chat/create
  • Anthropic — Using the Messages API: https://platform.claude.com/docs/en/build-with-claude/working-with-messages
  • Google — Gemini GenerateContent API: https://ai.google.dev/api/generate-content
  • DeepSeek — Create Chat Completion: https://api-docs.deepseek.com/api/create-chat-completion
  • Vercel — Models & Providers: https://vercel.com/docs/ai-gateway/models-and-providers
  • Pydantic AI — Models overview: https://pydantic.dev/docs/ai/models/overview/
  • LangChain — Chat model integrations: https://docs.langchain.com/oss/python/integrations/chat