Python for TypeScript Devs — Fast Track
You don't need to become a Python expert. You need Python good enough to read/write AI code. Target: 1 week of focused practice.
Why this page exists (and the one mental shift)
You have 7 years of TS/PHP. You already know closures, async, generics, the event loop, dependency injection, decorators (NestJS). So this is not "learn to program" — it's a transfer-learning problem. 90% of your knowledge maps over directly; the other 10% is where you'll lose hours if nobody tells you.
The single biggest mental shift, and the source of most "why is this slow / why did this crash" surprises:
- TypeScript types are erased at runtime. Python type hints are also erased at runtime — but the ecosystem leans on them far harder.
x: int = "hello"runs fine in plain Python; nothing checks it. Pydantic and FastAPI are the exception: they read the hints at runtime via reflection and actually validate. So in AI code you'll see two worlds coexisting — un-enforced hints (most of the codebase, checked only bymypy/pyrightin CI) and enforced hints (Pydantic models, FastAPI signatures). Knowing which world a given annotation lives in tells you whether a bad value blows up at the boundary or silently propagates. - Python's async is cooperative and single-threaded, like Node — but there is no implicit event loop. In Node, top-level
await/promises "just run". In Python you must explicitly start a loop (asyncio.run(main())) and a sync function calling anasyncone withoutawaitgets a coroutine object, not a result — a footgun with no TS equivalent (TS would at least give you aPromiseyou can.then). This is the #1 source of "my LLM call returned<coroutine object ...>" bugs.
Hold those two ideas and the rest of this page is muscle memory.
How a staff engineer reasons about "is this annotation enforced?"
When you read AI library code, classify every type hint into one of two buckets on sight — it tells you where the blast radius of a bad value is:
| Annotation lives on… | Enforced at runtime? | Where a bad value blows up |
|---|---|---|
A plain function arg / local / @dataclass field | No (only mypy/pyright in CI) | Deep inside, far from the cause — a TypeError three frames down, or worse, silent wrong behavior |
A pydantic.BaseModel field | Yes — __init__ validates via reflection | At the boundary, with a precise ValidationError naming the field |
A FastAPI route signature (def handler(body: Req)) | Yes — FastAPI validates the request | At the HTTP boundary, returned to the caller as a 422 |
A SQLAlchemy 2.0 Mapped[...] column | Partially — shapes the schema, not every assignment | At query/flush time |
The lesson for AI code specifically: put your LLM's output through a Pydantic model the instant it crosses your boundary. The model returns text; the moment you parse it into a BaseModel, a malformed field fails there with a named error, instead of propagating into your business logic as a None or a wrong-typed dict. This is the runtime-validation reflex you already have from zod at the edge of a TS service.
Mental model — TS to Python mapping
| TypeScript | Python |
|---|---|
const x = 5 | x = 5 |
let x: number = 5 | x: int = 5 |
function foo(a: string): boolean { return a.length > 0 } | def foo(a: str) -> bool: return len(a) > 0 |
async function | async def |
await | await |
Array<T> / T[] | list[T] |
Record<string, number> | dict[str, int] |
interface User { ... } | class User(BaseModel): ... (with Pydantic) |
type T = A | B | T = A | B (Python 3.10+) or Union[A, B] |
import { X } from 'mod' | from mod import X |
npm install | pip install / uv add |
package.json | pyproject.toml |
tsconfig.json | pyproject.toml [tool.ruff] + mypy.ini |
try { } catch (e) { } | try: except Exception as e: |
JSON.stringify(x) | json.dumps(x) |
JSON.parse(s) | json.loads(s) |
| Vite / Webpack | (no equivalent — no bundling) |
| ESLint | Ruff |
| Prettier | Ruff format / Black |
Tools to use (modern Python 2026)
- Package manager :
uv(NOT pip alone) — astral.sh, blazingly fast - Linter/formatter :
ruff— replaces flake8/isort/black - Type checker :
mypyorpyright - Environment :
uv venv(venv created automatically) - Test runner :
pytest - HTTP client :
httpx(NOT requests — httpx is async) - Validation :
pydanticv2 - Web framework :
FastAPI(you already know this from Dravos)
The mental model: there is no bundler, and that changes everything
Coming from TS, the absence you'll feel hardest is there's no tsc/Vite step that produces a self-contained artifact. Python ships source plus a resolved dependency graph; the "build" is reproducing an environment. Three consequences:
- The lockfile is the artifact.
uv.lock(committed) is yourpackage-lock.json.uv sync --frozenin CI/Docker reproduces the exact tree — the equivalent ofnpm ci. Without--frozen,uvmay re-resolve and drift. - The venv is the
node_modules. It's a directory of installed packages, not bundled into your app. In Docker you copypyproject.toml+uv.lock,uv sync, then copy source — same layer-caching trick aspackage.jsonbeforeCOPY . .. pyproject.tomlispackage.json+tsconfig+.eslintrc+.prettierrcin one file.[project]is your deps,[tool.ruff]your lint/format config,[tool.mypy]your typecheck config. One file, many[tool.*]tables.
What a senior wires into CI on day one
The AI-code-specific reason this matters: mypy --strict is your only line of defense against the un-awaited-coroutine and wrong-content-block-type bugs that Python won't catch at runtime. Treat it like tsc --noEmit — a required gate, not optional polish.
# pyproject.toml
[tool.mypy]
strict = true # the closest thing to TS strict mode
warn_unreachable = true
[tool.ruff.lint]
select = ["E", "F", "I", "ASYNC", "B"] # ASYNC catches blocking-in-async footgunsSet up a new Python project (cheat sheet)
# Init project
uv init my-project --python 3.12
cd my-project
# Add deps
uv add anthropic openai pydantic fastapi httpx
uv add --dev pytest ruff mypy
# Run
uv run python main.py
uv run pytest
uv run ruff check .
uv run mypy .Async / await — same as TS but stricter
- Cannot mix sync and async cleanly. Once async, stay async. (This is "function coloring" — same constraint as TS, but Python won't auto-wrap a sync call in a Promise to paper over it.)
asyncio.gather()≈Promise.all().asyncio.gather(..., return_exceptions=True)≈Promise.allSettled().asyncio.as_completed()≈ consuming a stream of Promises as they resolve (no directPromise.raceover many, but this is the idiom for "process whichever finishes first").- Use
httpx.AsyncClient()notrequests.requestsis sync-only and blocks the loop — calling it from an async handler stalls every concurrent request, the classic Python-async production incident. - Use
asyncpg(or SQLAlchemy async) for Postgres;redis.asynciofor Redis.
The coloring footgun, concretely
async def get_completion() -> str:
return "..."
# WRONG — result is a coroutine object, not a string. No await.
result = get_completion() # <coroutine object ...>; also raises a RuntimeWarning
# RIGHT — inside an async function
result = await get_completion()
# RIGHT — at the top level of a script
import asyncio
result = asyncio.run(get_completion())In TS this class of bug surfaces as Promise<string> showing up where you wanted string — the type checker catches it. In Python you only get a runtime warning (easy to miss in logs) and a wrong value. Run mypy/pyright; they flag un-awaited coroutines too.
Async failure modes you'll actually hit in production
gather is the easy 80%. The incidents come from the edges:
gatheris fail-fast by default. The first child exception cancels the others and propagates. For "fan out N model calls, keep the successes" you wantasyncio.gather(*tasks, return_exceptions=True)(≈Promise.allSettled) and then filter — otherwise one rate-limited call nukes nine good responses. Decide per call site which semantics you want; the default is rarely what you want for LLM fan-out.- A bare
gatheris unbounded concurrency.gather(*(call(d) for d in 10_000_docs))opens 10,000 in-flight requests and instantly trips a 429 (or OOMs the event loop). Bound it with aasyncio.Semaphore(N)around each call, or a worker-pool pattern. There is noPromise.allequivalent that throttles for you. - Blocking the loop is invisible until load. Any sync call inside an
async def—requests.get, a CPU-boundjson.loadson a 50MB blob,time.sleep, a sync DB driver — stalls every concurrent coroutine, because it's one thread. Symptom: p50 latency fine in dev, p99 collapses under concurrency. Offload CPU-bound work withasyncio.to_thread(...)(or a process pool); use async drivers for I/O. - Timeouts and cancellation. Wrap a flaky call in
asyncio.timeout(30)(3.11+) so a hung model request doesn't pin a worker forever. Cancellation propagates asasyncio.CancelledError— don't swallow it in a blanketexcept Exception, or you'll turn a clean shutdown into a hang. - Don't create a client per request.
AsyncAnthropic()holds an HTTP connection pool. Instantiate it once at module/app scope and reuse it; a fresh client per call defeats keep-alive and leaks connections.
Why async matters disproportionately for AI code
LLM calls are the slowest thing your service does — seconds, not milliseconds, and almost entirely network-bound (you're waiting on a remote GPU). That makes them the textbook case for concurrency: a request that makes 5 independent model/tool calls should take ~max(call), not sum(call). This is the Python equivalent of the asyncio.gather / Promise.all reflex you already have — and the canonical pattern in the Anthropic SDK is AsyncAnthropic + asyncio.gather for parallel tool calls (see the Anthropic block below).
Servers: use the async client, not the sync one
The Anthropic Python SDK ships both Anthropic() (sync) and AsyncAnthropic() (async). In a NestJS-style server (FastAPI here), a sync SDK call inside an async def handler blocks the event loop for the entire multi-second model call — throughput collapses under load. Default to AsyncAnthropic on any server.
import asyncio
from anthropic import AsyncAnthropic
client = AsyncAnthropic() # reads ANTHROPIC_API_KEY from env
async def summarize(doc: str) -> str:
resp = await client.messages.create(
model="claude-opus-4-8", # flagship (5 USD / 25 USD per Mtok at 1M ctx)
max_tokens=1024,
messages=[{"role": "user", "content": f"Summarize:\n{doc}"}],
)
# resp.content is a list of blocks — narrow by type before reading .text
return "".join(b.text for b in resp.content if b.type == "text")
async def summarize_many(docs: list[str]) -> list[str]:
# 10 docs in ~one round-trip's wall-time, not 10x sequential
return await asyncio.gather(*(summarize(d) for d in docs))Note resp.content is a list of content blocks (a discriminated union: text, thinking, tool_use, …), not a bare string — the Python analog of TS's ContentBlock[]. resp.content[0].text works until the day a thinking block lands at index 0 and your code reads .text off the wrong block type. Narrow by b.type (or, for typed structured output, prefer client.messages.parse() with a Pydantic schema — see the Pydantic section).
The same call, production-grade
The snippet above is the teaching version. The version that survives a Monday-morning traffic spike adds five things a senior reviewer looks for: a configured client (retries + a per-call timeout), typed exception handling, streaming for large outputs, stop_reason checks, and cost logging via usage.
import asyncio
import logging
import anthropic
from anthropic import AsyncAnthropic
log = logging.getLogger(__name__)
# One client for the whole app. max_retries handles 429/5xx/overload with
# exponential backoff; timeout caps a single hung request.
client = AsyncAnthropic(max_retries=4, timeout=30.0)
async def summarize(doc: str) -> str:
try:
async with client.messages.stream( # stream → no HTTP timeout on long output
model="claude-opus-4-8",
max_tokens=4096,
messages=[{"role": "user", "content": f"Summarize:\n{doc}"}],
) as stream:
msg = await stream.get_final_message()
except anthropic.RateLimitError:
log.warning("rate limited after retries"); raise
except anthropic.APIStatusError as e: # 4xx/5xx with a status code
log.error("anthropic API error %s: %s", e.status_code, e.message); raise
if msg.stop_reason == "refusal":
raise ValueError("model refused the request")
# Log usage on every call — this is your cost and your prompt-cache hit rate.
u = msg.usage
log.info(
"tokens in=%d out=%d cache_read=%d",
u.input_tokens, u.output_tokens, getattr(u, "cache_read_input_tokens", 0),
)
return "".join(b.text for b in msg.content if b.type == "text")Why each piece earns its place:
max_retrieson the client, not a hand-rolled loop. The SDK already does exponential backoff on429/5xx/529 overloaded. Re-implementing it is a code smell.- Typed exceptions (
RateLimitError,APIStatusError,APITimeoutError,OverloadedError), neverif "429" in str(e). String-matching error messages is the Python equivalent of parsing an HTTP body with a regex. - Stream for large
max_tokens. Above ~16K output tokens a non-streaming call risks an SDK HTTP timeout;.stream()+get_final_message()sidesteps it and gives you the assembled message. stop_reasonis load-bearing.refusalandmax_tokensmean "the content is not what you asked for." Check before you use it.usageis your bill. Loginput_tokens,output_tokens, andcache_read_input_tokenson every call — it's the only way to attribute cost and verify prompt caching is actually hitting.
Concepts that differ from TS
- No
this— methods takeselfexplicitly - Decorators (
@property,@staticmethod,@app.get("/")) — like TS decorators but more common - Context managers (
with open(...) as f:) — auto cleanup pattern - List comprehensions :
[x*2 for x in lst if x > 0] - Generators :
def gen(): yield 1; yield 2→ returns iterable - f-strings :
f"hello {name}"(like JS template literals but more)
Pydantic = your interface in Python
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
email: str | None = None
# Parse + validate
user = User(name="Achref", age=30)
# To JSON
user.model_dump_json()
# From dict
User.model_validate({"name": "Achref", "age": 30})→ Pydantic is in EVERY Python AI lib. Master it.
Why Pydantic is the load-bearing part for AI code
In TS you'd reach for zod to validate an LLM's JSON output at runtime. Pydantic is that — but the Anthropic SDK integrates with it directly, so you rarely hand-roll the parse/validate/retry loop. The senior pattern is native structured outputs via client.messages.parse() with a Pydantic schema, not "ask for JSON in the prompt and json.loads() it":
from anthropic import AsyncAnthropic
from pydantic import BaseModel, Field
class Contact(BaseModel):
name: str
email: str | None = None
interests: list[str] = Field(default_factory=list)
demo_requested: bool = False
client = AsyncAnthropic()
async def extract(text: str) -> Contact:
resp = await client.messages.parse(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": f"Extract the contact:\n{text}"}],
output_config={"format": Contact}, # schema-constrained decoding
)
# resp.parsed_output is None if the model refused — guard it (see below)
assert resp.parsed_output is not None
return resp.parsed_outputWhy this beats prompt-and-json.loads():
- The output is schema-constrained at decode time, not coaxed by prose. You don't get
Here is the JSON:\n{...}preambles, trailing commas, or markdown fences to strip. - One source of truth. The same
Contactmodel validates the output and documents the contract — no second JSON Schema to keep in sync. default_factory=list, not= []. Mutable defaults are a Python footgun with no TS equivalent:interests: list[str] = []shares one list across every instance. Pydantic guardsBaseModelfields, but the habit (Field(default_factory=list)) saves you the day you write a plain@dataclassor a function default.
Two failure modes a staff engineer always handles:
resp.parsed_output is None— the model can refuse (safety) and returnstop_reason == "refusal", in which case there's nothing to parse. Branch onstop_reasonbefore touching the parsed value.stop_reason == "max_tokens"— the JSON got truncated; the parse fails. Raisemax_tokensor stream. Never silently retry the same call.
defaultis per model: structured outputs are supported on Opus 4.8, Sonnet 4.6, and Haiku 4.5. The first request with a new schema pays a one-time compilation cost; identical schemas hit a 24h cache after that.
🏋️ Exercices
Demanding and progressive. Each one is a real thing you'll build for agentic AI, not a toy. Do them in order — later ones assume the earlier scaffolding.
1. Port a NestJS endpoint to FastAPI + Pydantic, green CI
Objectif : reproduce one real NestJS route (DTO validation, DB read, typed response) in FastAPI with uv + ruff + mypy --strict + pytest all passing.
Indice/Solution : NestJS class-validator DTO → Pydantic BaseModel request body; the route signature is the validation. Wire mypy --strict and ruff into a CI job and make it actually fail on a bad annotation before you call it done — that's the whole point. Add one pytest test using httpx.AsyncClient against the app.
2. Streaming Claude CLI with structured extraction
Objectif : a typer CLI that takes a blob of text, calls claude-opus-4-8 via AsyncAnthropic, streams the tokens to the terminal as they arrive, and also returns a validated Pydantic object.
Indice/Solution : use async with client.messages.stream(...) and iterate text deltas for the live display, then await stream.get_final_message(). For the structured part, do a second client.messages.parse(output_config={"format": MyModel}) call — or extract from the streamed message. Handle stop_reason in {"refusal", "max_tokens"} explicitly; don't let a refusal crash with an AttributeError on parsed_output.
3. Bounded parallel fan-out that doesn't 429
Objectif : summarize 500 documents concurrently. Cap in-flight requests so you never trip a rate limit, keep partial successes, and finish in ~max-latency time, not sum.
Indice/Solution : asyncio.Semaphore(N) (start N≈8) wrapping each summarize call; asyncio.gather(*tasks, return_exceptions=True) so one failure doesn't cancel the batch; partition results into successes vs isinstance(r, Exception). Defend your choice of N by measuring p99 latency and the 429 rate — gather without a semaphore is the wrong answer and you should be able to say why.
4. Break it, then fix it — the blocking-call incident
Objectif : reproduce the classic "fine in dev, p99 collapses under load" incident, then fix it.
Indice/Solution : drop a requests.get(...) (or a time.sleep(2)) inside an async def handler and load-test it (hey/locust) at concurrency 50 — watch throughput crater because one thread is blocked for every request. Fix path A: swap to httpx.AsyncClient. Fix path B (for genuinely CPU-bound work): await asyncio.to_thread(...). Prove the fix with the same load test. Bonus: enable ruff's ASYNC rules and show they flag the original.
5. Production-grade Claude client wrapper
Objectif : a reusable async wrapper around AsyncAnthropic with retries, per-call timeout, typed-exception handling, usage logging, and prompt caching on a stable system prefix.
Indice/Solution : single module-scoped AsyncAnthropic(max_retries=4, timeout=30); catch RateLimitError / OverloadedError / APITimeoutError / APIStatusError distinctly; put cache_control: {"type": "ephemeral"} on the frozen system block and assert usage.cache_read_input_tokens > 0 on the second call. Log input_tokens/output_tokens/cache_read per call. Write a test that injects a 429 (monkeypatch the transport) and asserts the wrapper retried, not crashed.
6. Defend the number — token & cost budget
Objectif : given a prompt + expected output size, defend a per-request cost figure and a max_tokens choice with real measurements.
Indice/Solution : count input tokens with client.messages.count_tokens(model="claude-opus-4-8", ...) — not tiktoken (it's OpenAI's tokenizer and undercounts Claude by 15–20%+). Multiply input by $5/Mtok and output by $25/Mtok (Opus 4.8 at 1M context). Show how prompt caching changes the math (cache reads ≈ 0.1× input price) and what max_tokens you'd set so you neither truncate (stop_reason == "max_tokens") nor over-provision. The deliverable is a defensible number, not a guess.
🎤 En entretien
Short, senior-level questions this topic invites — with the one-line answer.
"Python type hints are erased at runtime like TS — so why does Pydantic 'work'?" Pydantic reads the annotations via reflection in
__init__and validates at runtime; the hints aren't enforced by the interpreter, they're enforced by the library — so aBaseModel/FastAPI boundary validates while a plain function arg only gets checked bymypyin CI."A teammate calls an
async defand gets<coroutine object>instead of a result. What happened and how does TS differ?" They forgot toawait(or to run it on a loop) — Python returns the un-awaited coroutine and emits only aRuntimeWarning; TS would have given aPromisethe type checker flags, so runmypy/pyrightwhich catch un-awaited coroutines too."
asyncio.gatherover 5,000 LLM calls — what breaks, and what do you change?" Unbounded concurrency trips a 429 (and can OOM), and the default fail-fast cancels good calls on the first error — bound with aSemaphoreand passreturn_exceptions=True, then filter successes."Why is
AsyncAnthropicthe default on a FastAPI server, and what's the worst-case if you use the sync client?" A sync SDK call inside anasync defblocks the single event-loop thread for the whole multi-second model call, so throughput collapses under concurrency even though it looks fine in dev — same hazard asrequestsin an async handler."How do you estimate the token cost of a Claude prompt, and what's the common mistake?" Call
messages.count_tokenswith the actual model ID (counts are model-specific); the mistake is reaching fortiktoken, which is OpenAI's tokenizer and is wrong for Claude.
Resources
- uv docs : astral.sh/uv
- Ruff : astral.sh/ruff
- FastAPI tutorial : fastapi.tiangolo.com (you might know this)
- Pydantic v2 docs : docs.pydantic.dev
- Real Python : realpython.com — quality tutorials
When you need TypeScript instead
You can stay 80% in TypeScript for :
- Frontend (Vercel AI SDK + Next.js)
- MCP servers (Anthropic ships TS SDK)
- Simple OpenAI/Anthropic API wrappers
- Edge functions (Cloudflare Workers, Vercel)
You NEED Python for :
- LangGraph (Python is first-class, TS is younger)
- HuggingFace ecosystem
- Ragas eval
- Most academic / research code
- Data processing at scale
→ Default to TS unless one of the above forces Python.