Skip to content

Python for TypeScript Devs — Fast Track

You don't need to become a Python expert. You need Python good enough to read/write AI code. Target: 1 week of focused practice.

Why this page exists (and the one mental shift)

You have 7 years of TS/PHP. You already know closures, async, generics, the event loop, dependency injection, decorators (NestJS). So this is not "learn to program" — it's a transfer-learning problem. 90% of your knowledge maps over directly; the other 10% is where you'll lose hours if nobody tells you.

The single biggest mental shift, and the source of most "why is this slow / why did this crash" surprises:

  • TypeScript types are erased at runtime. Python type hints are also erased at runtime — but the ecosystem leans on them far harder. x: int = "hello" runs fine in plain Python; nothing checks it. Pydantic and FastAPI are the exception: they read the hints at runtime via reflection and actually validate. So in AI code you'll see two worlds coexisting — un-enforced hints (most of the codebase, checked only by mypy/pyright in CI) and enforced hints (Pydantic models, FastAPI signatures). Knowing which world a given annotation lives in tells you whether a bad value blows up at the boundary or silently propagates.
  • Python's async is cooperative and single-threaded, like Node — but there is no implicit event loop. In Node, top-level await/promises "just run". In Python you must explicitly start a loop (asyncio.run(main())) and a sync function calling an async one without await gets a coroutine object, not a result — a footgun with no TS equivalent (TS would at least give you a Promise you can .then). This is the #1 source of "my LLM call returned <coroutine object ...>" bugs.

Hold those two ideas and the rest of this page is muscle memory.

How a staff engineer reasons about "is this annotation enforced?"

When you read AI library code, classify every type hint into one of two buckets on sight — it tells you where the blast radius of a bad value is:

Annotation lives on…Enforced at runtime?Where a bad value blows up
A plain function arg / local / @dataclass fieldNo (only mypy/pyright in CI)Deep inside, far from the cause — a TypeError three frames down, or worse, silent wrong behavior
A pydantic.BaseModel fieldYes__init__ validates via reflectionAt the boundary, with a precise ValidationError naming the field
A FastAPI route signature (def handler(body: Req))Yes — FastAPI validates the requestAt the HTTP boundary, returned to the caller as a 422
A SQLAlchemy 2.0 Mapped[...] columnPartially — shapes the schema, not every assignmentAt query/flush time

The lesson for AI code specifically: put your LLM's output through a Pydantic model the instant it crosses your boundary. The model returns text; the moment you parse it into a BaseModel, a malformed field fails there with a named error, instead of propagating into your business logic as a None or a wrong-typed dict. This is the runtime-validation reflex you already have from zod at the edge of a TS service.

Mental model — TS to Python mapping

TypeScriptPython
const x = 5x = 5
let x: number = 5x: int = 5
function foo(a: string): boolean { return a.length > 0 }def foo(a: str) -> bool: return len(a) > 0
async functionasync def
awaitawait
Array<T> / T[]list[T]
Record<string, number>dict[str, int]
interface User { ... }class User(BaseModel): ... (with Pydantic)
type T = A | BT = A | B (Python 3.10+) or Union[A, B]
import { X } from 'mod'from mod import X
npm installpip install / uv add
package.jsonpyproject.toml
tsconfig.jsonpyproject.toml [tool.ruff] + mypy.ini
try { } catch (e) { }try: except Exception as e:
JSON.stringify(x)json.dumps(x)
JSON.parse(s)json.loads(s)
Vite / Webpack(no equivalent — no bundling)
ESLintRuff
PrettierRuff format / Black

Tools to use (modern Python 2026)

  • Package manager : uv (NOT pip alone) — astral.sh, blazingly fast
  • Linter/formatter : ruff — replaces flake8/isort/black
  • Type checker : mypy or pyright
  • Environment : uv venv (venv created automatically)
  • Test runner : pytest
  • HTTP client : httpx (NOT requests — httpx is async)
  • Validation : pydantic v2
  • Web framework : FastAPI (you already know this from Dravos)

The mental model: there is no bundler, and that changes everything

Coming from TS, the absence you'll feel hardest is there's no tsc/Vite step that produces a self-contained artifact. Python ships source plus a resolved dependency graph; the "build" is reproducing an environment. Three consequences:

  • The lockfile is the artifact. uv.lock (committed) is your package-lock.json. uv sync --frozen in CI/Docker reproduces the exact tree — the equivalent of npm ci. Without --frozen, uv may re-resolve and drift.
  • The venv is the node_modules. It's a directory of installed packages, not bundled into your app. In Docker you copy pyproject.toml + uv.lock, uv sync, then copy source — same layer-caching trick as package.json before COPY . ..
  • pyproject.toml is package.json + tsconfig + .eslintrc + .prettierrc in one file. [project] is your deps, [tool.ruff] your lint/format config, [tool.mypy] your typecheck config. One file, many [tool.*] tables.

What a senior wires into CI on day one

The AI-code-specific reason this matters: mypy --strict is your only line of defense against the un-awaited-coroutine and wrong-content-block-type bugs that Python won't catch at runtime. Treat it like tsc --noEmit — a required gate, not optional polish.

toml
# pyproject.toml
[tool.mypy]
strict = true            # the closest thing to TS strict mode
warn_unreachable = true

[tool.ruff.lint]
select = ["E", "F", "I", "ASYNC", "B"]  # ASYNC catches blocking-in-async footguns

Set up a new Python project (cheat sheet)

bash
# Init project
uv init my-project --python 3.12
cd my-project

# Add deps
uv add anthropic openai pydantic fastapi httpx
uv add --dev pytest ruff mypy

# Run
uv run python main.py
uv run pytest
uv run ruff check .
uv run mypy .

Async / await — same as TS but stricter

  • Cannot mix sync and async cleanly. Once async, stay async. (This is "function coloring" — same constraint as TS, but Python won't auto-wrap a sync call in a Promise to paper over it.)
  • asyncio.gather()Promise.all(). asyncio.gather(..., return_exceptions=True)Promise.allSettled().
  • asyncio.as_completed() ≈ consuming a stream of Promises as they resolve (no direct Promise.race over many, but this is the idiom for "process whichever finishes first").
  • Use httpx.AsyncClient() not requests. requests is sync-only and blocks the loop — calling it from an async handler stalls every concurrent request, the classic Python-async production incident.
  • Use asyncpg (or SQLAlchemy async) for Postgres; redis.asyncio for Redis.

The coloring footgun, concretely

python
async def get_completion() -> str:
    return "..."

# WRONG — result is a coroutine object, not a string. No await.
result = get_completion()          # <coroutine object ...>; also raises a RuntimeWarning

# RIGHT — inside an async function
result = await get_completion()

# RIGHT — at the top level of a script
import asyncio
result = asyncio.run(get_completion())

In TS this class of bug surfaces as Promise<string> showing up where you wanted string — the type checker catches it. In Python you only get a runtime warning (easy to miss in logs) and a wrong value. Run mypy/pyright; they flag un-awaited coroutines too.

Async failure modes you'll actually hit in production

gather is the easy 80%. The incidents come from the edges:

  • gather is fail-fast by default. The first child exception cancels the others and propagates. For "fan out N model calls, keep the successes" you want asyncio.gather(*tasks, return_exceptions=True) (≈ Promise.allSettled) and then filter — otherwise one rate-limited call nukes nine good responses. Decide per call site which semantics you want; the default is rarely what you want for LLM fan-out.
  • A bare gather is unbounded concurrency. gather(*(call(d) for d in 10_000_docs)) opens 10,000 in-flight requests and instantly trips a 429 (or OOMs the event loop). Bound it with a asyncio.Semaphore(N) around each call, or a worker-pool pattern. There is no Promise.all equivalent that throttles for you.
  • Blocking the loop is invisible until load. Any sync call inside an async defrequests.get, a CPU-bound json.loads on a 50MB blob, time.sleep, a sync DB driver — stalls every concurrent coroutine, because it's one thread. Symptom: p50 latency fine in dev, p99 collapses under concurrency. Offload CPU-bound work with asyncio.to_thread(...) (or a process pool); use async drivers for I/O.
  • Timeouts and cancellation. Wrap a flaky call in asyncio.timeout(30) (3.11+) so a hung model request doesn't pin a worker forever. Cancellation propagates as asyncio.CancelledError — don't swallow it in a blanket except Exception, or you'll turn a clean shutdown into a hang.
  • Don't create a client per request. AsyncAnthropic() holds an HTTP connection pool. Instantiate it once at module/app scope and reuse it; a fresh client per call defeats keep-alive and leaks connections.

Why async matters disproportionately for AI code

LLM calls are the slowest thing your service does — seconds, not milliseconds, and almost entirely network-bound (you're waiting on a remote GPU). That makes them the textbook case for concurrency: a request that makes 5 independent model/tool calls should take ~max(call), not sum(call). This is the Python equivalent of the asyncio.gather / Promise.all reflex you already have — and the canonical pattern in the Anthropic SDK is AsyncAnthropic + asyncio.gather for parallel tool calls (see the Anthropic block below).

Servers: use the async client, not the sync one

The Anthropic Python SDK ships both Anthropic() (sync) and AsyncAnthropic() (async). In a NestJS-style server (FastAPI here), a sync SDK call inside an async def handler blocks the event loop for the entire multi-second model call — throughput collapses under load. Default to AsyncAnthropic on any server.

python
import asyncio
from anthropic import AsyncAnthropic

client = AsyncAnthropic()  # reads ANTHROPIC_API_KEY from env

async def summarize(doc: str) -> str:
    resp = await client.messages.create(
        model="claude-opus-4-8",      # flagship (5 USD / 25 USD per Mtok at 1M ctx)
        max_tokens=1024,
        messages=[{"role": "user", "content": f"Summarize:\n{doc}"}],
    )
    # resp.content is a list of blocks — narrow by type before reading .text
    return "".join(b.text for b in resp.content if b.type == "text")

async def summarize_many(docs: list[str]) -> list[str]:
    # 10 docs in ~one round-trip's wall-time, not 10x sequential
    return await asyncio.gather(*(summarize(d) for d in docs))

Note resp.content is a list of content blocks (a discriminated union: text, thinking, tool_use, …), not a bare string — the Python analog of TS's ContentBlock[]. resp.content[0].text works until the day a thinking block lands at index 0 and your code reads .text off the wrong block type. Narrow by b.type (or, for typed structured output, prefer client.messages.parse() with a Pydantic schema — see the Pydantic section).

The same call, production-grade

The snippet above is the teaching version. The version that survives a Monday-morning traffic spike adds five things a senior reviewer looks for: a configured client (retries + a per-call timeout), typed exception handling, streaming for large outputs, stop_reason checks, and cost logging via usage.

python
import asyncio
import logging

import anthropic
from anthropic import AsyncAnthropic

log = logging.getLogger(__name__)

# One client for the whole app. max_retries handles 429/5xx/overload with
# exponential backoff; timeout caps a single hung request.
client = AsyncAnthropic(max_retries=4, timeout=30.0)


async def summarize(doc: str) -> str:
    try:
        async with client.messages.stream(   # stream → no HTTP timeout on long output
            model="claude-opus-4-8",
            max_tokens=4096,
            messages=[{"role": "user", "content": f"Summarize:\n{doc}"}],
        ) as stream:
            msg = await stream.get_final_message()
    except anthropic.RateLimitError:
        log.warning("rate limited after retries"); raise
    except anthropic.APIStatusError as e:        # 4xx/5xx with a status code
        log.error("anthropic API error %s: %s", e.status_code, e.message); raise

    if msg.stop_reason == "refusal":
        raise ValueError("model refused the request")

    # Log usage on every call — this is your cost and your prompt-cache hit rate.
    u = msg.usage
    log.info(
        "tokens in=%d out=%d cache_read=%d",
        u.input_tokens, u.output_tokens, getattr(u, "cache_read_input_tokens", 0),
    )
    return "".join(b.text for b in msg.content if b.type == "text")

Why each piece earns its place:

  • max_retries on the client, not a hand-rolled loop. The SDK already does exponential backoff on 429/5xx/529 overloaded. Re-implementing it is a code smell.
  • Typed exceptions (RateLimitError, APIStatusError, APITimeoutError, OverloadedError), never if "429" in str(e). String-matching error messages is the Python equivalent of parsing an HTTP body with a regex.
  • Stream for large max_tokens. Above ~16K output tokens a non-streaming call risks an SDK HTTP timeout; .stream() + get_final_message() sidesteps it and gives you the assembled message.
  • stop_reason is load-bearing. refusal and max_tokens mean "the content is not what you asked for." Check before you use it.
  • usage is your bill. Log input_tokens, output_tokens, and cache_read_input_tokens on every call — it's the only way to attribute cost and verify prompt caching is actually hitting.

Concepts that differ from TS

  • No this — methods take self explicitly
  • Decorators (@property, @staticmethod, @app.get("/")) — like TS decorators but more common
  • Context managers (with open(...) as f:) — auto cleanup pattern
  • List comprehensions : [x*2 for x in lst if x > 0]
  • Generators : def gen(): yield 1; yield 2 → returns iterable
  • f-strings : f"hello {name}" (like JS template literals but more)

Pydantic = your interface in Python

python
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int
    email: str | None = None

# Parse + validate
user = User(name="Achref", age=30)

# To JSON
user.model_dump_json()

# From dict
User.model_validate({"name": "Achref", "age": 30})

→ Pydantic is in EVERY Python AI lib. Master it.

Why Pydantic is the load-bearing part for AI code

In TS you'd reach for zod to validate an LLM's JSON output at runtime. Pydantic is that — but the Anthropic SDK integrates with it directly, so you rarely hand-roll the parse/validate/retry loop. The senior pattern is native structured outputs via client.messages.parse() with a Pydantic schema, not "ask for JSON in the prompt and json.loads() it":

python
from anthropic import AsyncAnthropic
from pydantic import BaseModel, Field

class Contact(BaseModel):
    name: str
    email: str | None = None
    interests: list[str] = Field(default_factory=list)
    demo_requested: bool = False

client = AsyncAnthropic()

async def extract(text: str) -> Contact:
    resp = await client.messages.parse(
        model="claude-opus-4-8",
        max_tokens=1024,
        messages=[{"role": "user", "content": f"Extract the contact:\n{text}"}],
        output_config={"format": Contact},  # schema-constrained decoding
    )
    # resp.parsed_output is None if the model refused — guard it (see below)
    assert resp.parsed_output is not None
    return resp.parsed_output

Why this beats prompt-and-json.loads():

  • The output is schema-constrained at decode time, not coaxed by prose. You don't get Here is the JSON:\n{...} preambles, trailing commas, or markdown fences to strip.
  • One source of truth. The same Contact model validates the output and documents the contract — no second JSON Schema to keep in sync.
  • default_factory=list, not = []. Mutable defaults are a Python footgun with no TS equivalent: interests: list[str] = [] shares one list across every instance. Pydantic guards BaseModel fields, but the habit (Field(default_factory=list)) saves you the day you write a plain @dataclass or a function default.

Two failure modes a staff engineer always handles:

  • resp.parsed_output is None — the model can refuse (safety) and return stop_reason == "refusal", in which case there's nothing to parse. Branch on stop_reason before touching the parsed value.
  • stop_reason == "max_tokens" — the JSON got truncated; the parse fails. Raise max_tokens or stream. Never silently retry the same call.

default is per model: structured outputs are supported on Opus 4.8, Sonnet 4.6, and Haiku 4.5. The first request with a new schema pays a one-time compilation cost; identical schemas hit a 24h cache after that.

🏋️ Exercices

Demanding and progressive. Each one is a real thing you'll build for agentic AI, not a toy. Do them in order — later ones assume the earlier scaffolding.

1. Port a NestJS endpoint to FastAPI + Pydantic, green CI

Objectif : reproduce one real NestJS route (DTO validation, DB read, typed response) in FastAPI with uv + ruff + mypy --strict + pytest all passing.

Indice/Solution : NestJS class-validator DTO → Pydantic BaseModel request body; the route signature is the validation. Wire mypy --strict and ruff into a CI job and make it actually fail on a bad annotation before you call it done — that's the whole point. Add one pytest test using httpx.AsyncClient against the app.

2. Streaming Claude CLI with structured extraction

Objectif : a typer CLI that takes a blob of text, calls claude-opus-4-8 via AsyncAnthropic, streams the tokens to the terminal as they arrive, and also returns a validated Pydantic object.

Indice/Solution : use async with client.messages.stream(...) and iterate text deltas for the live display, then await stream.get_final_message(). For the structured part, do a second client.messages.parse(output_config={"format": MyModel}) call — or extract from the streamed message. Handle stop_reason in {"refusal", "max_tokens"} explicitly; don't let a refusal crash with an AttributeError on parsed_output.

3. Bounded parallel fan-out that doesn't 429

Objectif : summarize 500 documents concurrently. Cap in-flight requests so you never trip a rate limit, keep partial successes, and finish in ~max-latency time, not sum.

Indice/Solution : asyncio.Semaphore(N) (start N≈8) wrapping each summarize call; asyncio.gather(*tasks, return_exceptions=True) so one failure doesn't cancel the batch; partition results into successes vs isinstance(r, Exception). Defend your choice of N by measuring p99 latency and the 429 rate — gather without a semaphore is the wrong answer and you should be able to say why.

4. Break it, then fix it — the blocking-call incident

Objectif : reproduce the classic "fine in dev, p99 collapses under load" incident, then fix it.

Indice/Solution : drop a requests.get(...) (or a time.sleep(2)) inside an async def handler and load-test it (hey/locust) at concurrency 50 — watch throughput crater because one thread is blocked for every request. Fix path A: swap to httpx.AsyncClient. Fix path B (for genuinely CPU-bound work): await asyncio.to_thread(...). Prove the fix with the same load test. Bonus: enable ruff's ASYNC rules and show they flag the original.

5. Production-grade Claude client wrapper

Objectif : a reusable async wrapper around AsyncAnthropic with retries, per-call timeout, typed-exception handling, usage logging, and prompt caching on a stable system prefix.

Indice/Solution : single module-scoped AsyncAnthropic(max_retries=4, timeout=30); catch RateLimitError / OverloadedError / APITimeoutError / APIStatusError distinctly; put cache_control: {"type": "ephemeral"} on the frozen system block and assert usage.cache_read_input_tokens > 0 on the second call. Log input_tokens/output_tokens/cache_read per call. Write a test that injects a 429 (monkeypatch the transport) and asserts the wrapper retried, not crashed.

6. Defend the number — token & cost budget

Objectif : given a prompt + expected output size, defend a per-request cost figure and a max_tokens choice with real measurements.

Indice/Solution : count input tokens with client.messages.count_tokens(model="claude-opus-4-8", ...)not tiktoken (it's OpenAI's tokenizer and undercounts Claude by 15–20%+). Multiply input by $5/Mtok and output by $25/Mtok (Opus 4.8 at 1M context). Show how prompt caching changes the math (cache reads ≈ 0.1× input price) and what max_tokens you'd set so you neither truncate (stop_reason == "max_tokens") nor over-provision. The deliverable is a defensible number, not a guess.

🎤 En entretien

Short, senior-level questions this topic invites — with the one-line answer.

  • "Python type hints are erased at runtime like TS — so why does Pydantic 'work'?" Pydantic reads the annotations via reflection in __init__ and validates at runtime; the hints aren't enforced by the interpreter, they're enforced by the library — so a BaseModel/FastAPI boundary validates while a plain function arg only gets checked by mypy in CI.

  • "A teammate calls an async def and gets <coroutine object> instead of a result. What happened and how does TS differ?" They forgot to await (or to run it on a loop) — Python returns the un-awaited coroutine and emits only a RuntimeWarning; TS would have given a Promise the type checker flags, so run mypy/pyright which catch un-awaited coroutines too.

  • "asyncio.gather over 5,000 LLM calls — what breaks, and what do you change?" Unbounded concurrency trips a 429 (and can OOM), and the default fail-fast cancels good calls on the first error — bound with a Semaphore and pass return_exceptions=True, then filter successes.

  • "Why is AsyncAnthropic the default on a FastAPI server, and what's the worst-case if you use the sync client?" A sync SDK call inside an async def blocks the single event-loop thread for the whole multi-second model call, so throughput collapses under concurrency even though it looks fine in dev — same hazard as requests in an async handler.

  • "How do you estimate the token cost of a Claude prompt, and what's the common mistake?" Call messages.count_tokens with the actual model ID (counts are model-specific); the mistake is reaching for tiktoken, which is OpenAI's tokenizer and is wrong for Claude.

Resources

When you need TypeScript instead

You can stay 80% in TypeScript for :

  • Frontend (Vercel AI SDK + Next.js)
  • MCP servers (Anthropic ships TS SDK)
  • Simple OpenAI/Anthropic API wrappers
  • Edge functions (Cloudflare Workers, Vercel)

You NEED Python for :

  • LangGraph (Python is first-class, TS is younger)
  • HuggingFace ecosystem
  • Ragas eval
  • Most academic / research code
  • Data processing at scale

→ Default to TS unless one of the above forces Python.

Bibliothèque tech perso — Achref