Skip to content

🎯 Roadmap Senior Agentic AI — recalibré 30h/sem

Recalibrage du parcours pour 4–5h/jour (~30h/sem) au lieu de 12–15h/sem. Stack réel : Python (cœur agent) · NestJS (orchestration/streaming) · Angular (UI streaming). Cible : senior agentic AI + réussir les entretiens senior. Fil rouge : une seule app, AgentDesk, qui grossit sur les 6 phases.

Le principe du recalibrage

The existing hub timeline assumed ~12-15h/week over 12 months. At 30h/week (4-5h/day, ~2x capacity) you compress that to roughly a 6-month run to senior-ready, BUT we do NOT just "go faster through the same notes" — we raise the floor. The hub's deep RAG/agentic/MCP/voice/LLMOps files are already at-or-above tutorial level; the bottleneck is NOT reading, it's that almost nothing is BUILT and wired across your real stack (Python AI core + NestJS serving/orchestration + Angular streaming UI). So the recalibration is: (1) cut reading time per topic in half (you read fast, you've shipped PHP/TS prod) and convert the freed hours into daily building; (2) every week ends with a running, demoable artifact in your three-language stack, not a checkbox; (3) inject the senior-depth gaps the reviews flagged (quantization/filtered-ANN, agent-eval rigor, prompt-injection taxonomy, prompt-caching-for-agent-loops, SLO/incident/canary, structured-output via native API) as BUILD tasks not just reading; (4) fix the hub's correctness debt (stale model IDs, broken thinking syntax, fabricated dated aliases) early so you never paste a 400/404 into a live interview. Net: ~24 weeks, 6 phases, each phase a vertical slice that gets thicker. Theory is the appetizer (60-90min/day), the build is the main course (2.5-3h/day), interview drilling is dessert (30min/day, every day, non-negotiable — at 2x pace you must externalize knowledge daily or it evaporates).

Rythme hebdomadaire

A typical 4-5h day = 60-90min THEORY (read fast — you're an experienced dev, skim deep hub files for the mental model + pitfalls + decision tables, don't re-derive what you already know) + 2.5-3h BUILD (the day's concrete artifact in Python/NestJS/Angular — this is where seniority is forged; if a day's theory and build conflict for time, theory yields) + 30min INTERVIEW DRILL/WRITING (answer 2 questions in a running doc, or whiteboard one diagram out loud, timed — daily, non-negotiable, because at 2x pace knowledge evaporates unless externalized). A typical WEEK (5-6 study days): Mon-Thu = Python/AI-core + NestJS days (build-heavy, the harder integration work when fresh); Fri = Angular/UI day (visual, slightly lighter, satisfying to see streaming work); Sat = INTEGRATION+SHIP day — wire the week's pieces end-to-end and produce the phase's running demo artifact + write a short 'what I built + what I'd say in an interview' note; Sun = light or rest (1h review of the week's drills doc, or buffer for spillover). Hard rule: never end a week without a RUNNING artifact. If you fall behind, cut theory depth, never the build.

Fil rouge stack — l'app AgentDesk

One app, 'AgentDesk', grows across all 6 phases so integration is continuous, not bolted on at the end. PYTHON (AI/agent core): FastAPI service owning the agentic loop — AsyncAnthropic streaming tool-use loop with parallel tool execution (asyncio.gather), retries/backoff/timeouts, native structured outputs, prompt caching, adaptive thinking (Phase 1); a real hybrid RAG retrieval tool over pgvector+Qdrant with reranking + metadata filtering (Phase 2); an agent-eval harness (trajectory+outcome, calibrated LLM-judge, bootstrap CIs) gating merges (Phase 3); injection-defense + guardrail layer + cross-turn prompt-cache cost optimization (Phase 4); SLI emission + OTel/Langfuse spans + RGPD erasure (Phase 5). NESTJS (serve/orchestrate): a reusable LlmModule.forRootAsync (DI-injected creds) that proxies+streams the Python agent over SSE with heartbeats, Last-Event-ID resume, and AbortController->/chat/stop (Phase 3); an MCP server endpoint (TS SDK + Zod, per-session isolation, served over stdio+Streamable HTTP) composing your tools (Phase 4); BullMQ AI jobs (WorkerHost.process making real Anthropic calls, idempotency keyed to generation id, cost-aware retry, partial-output/abort handling, progress->SSE), plus canary traffic-split with auto-rollback on SLO breach and end-to-end x-request-id propagation NestJS->Python (Phase 5). ANGULAR (stream UI): a new 07-ai-ui module — EventSource->Observable + fetch ReadableStream(getReader/TextDecoder) terminating in a signal; append-only message buffer as a SignalStore with rAF-coalesced per-token rendering under zoneless (no CD thrash); a discriminated-union step model (pending|running|streaming|done|error) rendering a collapsible tool-call trace timeline with partial/streaming tool-args; optimistic placeholder reconciled with streamed server truth; Stop button wired to AbortController; markdown+code rendering via DomSanitizer; autoscroll/stick-to-bottom (Phases 3 and 5). The seam is the contract: Python emits structured SSE events (token | tool_call_started | tool_result | step_status | usage | done | error), NestJS relays/enriches+persists+resumes them, Angular renders them as live UI. Building that one event contract end-to-end IS the senior full-stack-around-AI differentiator the career docs should sell.

Les 6 phases

Phase 1 — Reset the floor: correctness sweep + modern Python AI core + narrative fix

Durée : Weeks 1-4

Focus. Kill the hub's correctness debt, rebuild first-ai/ AI lessons to the current Anthropic surface as the Python core of your portfolio app 'AgentDesk', and re-align the career narrative to Python+NestJS+Angular. End state: an async, streaming, retry-hardened, prompt-cached Python tool-use agent with native structured outputs — runnable, tested, observable.

Journée type. 60-90min theory: re-read (fast) the relevant deep hub files — claude-api skill + ai-engineer/01-fundamentals/06-claude-api, 05-tokenization-context, 09-structured-outputs, and first-ai/lessons/ai/01-08 as your refactor targets. 2.5-3h build: day-by-day rebuild the Python core (see milestones). 30min drills: write/answer 2 interview questions in a running doc (Anthropic loop mechanics, prompt-cache breakpoints, adaptive thinking vs budget_tokens, retry/backoff strategy).

Build. AgentDesk Python core (FastAPI): AsyncAnthropic streaming tool-use loop that (a) appends assistant content verbatim, branches on stop_reason, returns tool_result keyed by tool_use_id; (b) executes multiple tool_use blocks concurrently via asyncio.gather; (c) wraps every call in try/except for RateLimitError/APIError with exponential backoff + jitter + timeout; (d) uses native structured outputs (output_config.format / messages.parse) replacing 03's hand-rolled JSON; (e) prompt-caches the system+tools prefix; (f) adaptive thinking + effort; (g) logs resp.usage tokens+cost per step. Pydantic-validate every tool input (close 05's own SENIOR-NOTE gap). pytest that MOCKS the Anthropic client and asserts: right tool/args called, non-tool stop_reason ends loop, max_steps enforced. Week 1 day 1-2 ONLY: the repo-wide model-id/pricing/thinking-syntax sweep + delete/rewrite crashing python/test.py + flesh out main.py & README. Week 4: rewrite ai-engineer/00-overview L8/L217, 08-freelance L20/L85, 06-portfolio L19, 15-owning-dravos L75 to the true stack; fix Dravos front-end claim.

✅ Done when. pytest on the AI core passes (loop, max_steps, parallel tools, retry path all covered by mocked-client tests); the agent streams tokens AND executes ≥2 tools concurrently in one turn; one induced 429 is survived via backoff; per-step token+cost is logged; grep across the repo finds zero claude-sonnet-4-7/claude-haiku-4-7/fabricated-date aliases and zero budget_tokens; you can verbally defend (in your drills doc) why adaptive thinking replaced budget_tokens and how prompt-cache breakpoints order.

Phase 2 — Real RAG + retrieval internals you can defend with numbers

Durée : Weeks 5-9

Focus. Turn the toy in-memory RAG into a production retrieval service AND close the embeddings/index/reranking senior-internals gaps by MEASURING them. End state: a pgvector+Qdrant-backed hybrid-search RAG with chunking, reranking, metadata filtering — plus a benchmark notebook proving you understand recall/latency/memory/quantization/filtered-ANN tradeoffs.

Journée type. 60-90min theory: the strong deep files — ai-engineer/02-rag-production/05-chunking, 06-query-rewriting, 04-advanced-hybrid-rerank (write the missing prose yourself), 08-infrastructure vector files; fill the 02-embeddings prose gap (cosine/dot/euclidean+normalization, HNSW ef/M, PQ/SQ/binary, Matryoshka, filtered-ANN cliff). 2.5-3h build: incrementally grow the RAG service. 30min drills: retrieval-trap questions (why RRF k=60, BM25 vs cosine score-scale normalization, recall cliff, when binary quantization).

Build. AgentDesk RAG service (Python): real chunking (try contextual-retrieval + late chunking from 05), embeddings persisted in pgvector (HNSW) AND mirrored in Qdrant; hybrid BM25+dense retrieval with RRF and explicit score normalization; cross-encoder rerank PLUS a ColBERT/late-interaction tier with a measured latency budget; metadata filtering with multi-tenant isolation. Benchmark notebook (the senior artifact): index one corpus, sweep ef_search/M -> recall@k vs p95 latency vs memory curves; add binary + scalar quantization -> recall/memory tradeoff; add a metadata filter and DEMONSTRATE the pre- vs post-filter recall cliff with real numbers and the fix (Qdrant cardinality-aware pre-filter vs pgvector post-filter). Idempotent re-indexing + embedding-model versioning. Expose RAG as a tool to the Phase 1 agent loop.

✅ Done when. The agent answers a question by calling your RAG tool end-to-end; the benchmark notebook produces real recall/latency/memory curves and a labeled recall-cliff chart; you can defend, with your own numbers, why you chose ef_search=X, why binary quantization cost you Y% recall, and why post-filtering tanked recall; hybrid fusion uses normalized scores (not naive concat); re-indexing is idempotent and tenant-isolated.

Phase 3 — NestJS orchestration + Angular streaming UI + agent EVAL rigor

Durée : Weeks 10-15

Focus. Wire the full stack: NestJS orchestrates/streams the Python agent over SSE; Angular renders streaming tokens + tool-call traces + optimistic steps; and you build a defensible agent-eval harness (trajectory + outcome). This is the phase that makes you visibly full-stack-around-AI. End state: a demoable end-to-end streaming agent with merge-gating eval.

Journée type. 60-90min theory: nestjs/07-features/09-sse-streaming + 04-advanced/05-queues-bullmq + 01-foundations DI; angular/02-reactive (signals-vs-rxjs, change-detection) + 01-foundations/02-signals; ai-engineer/03-agentic-and-mcp/02-langgraph + 06-react-plan-reflexion for loop depth; eval rigor from 02-rag-production/03-eval. 2.5-3h build: alternate NestJS days / Angular days / eval days. 30min drills: 'how do you eval an agent?', trajectory vs outcome, judge bias, defend a benchmark number.

Build. NestJS orchestrator: a reusable LlmModule.forRootAsync (DI-injected ANTHROPIC_API_KEY via ConfigService — fix the SSE chapter's new Anthropic() in a field), a controller that proxies/streams the Python agent's SSE (heartbeats, Last-Event-ID resume, AbortController -> /chat/stop), and a tool-use orchestration path. Angular UI (new 07-ai-ui module you author): EventSource->Observable (and fetch ReadableStream getReader()/TextDecoder) terminating in a signal; append-only message buffer as a SignalStore; per-token rAF-coalesced rendering for zoneless (avoid CD thrash); a discriminated-union step model (pending|running|streaming|done|error) rendering a collapsible tool-call trace timeline; optimistic placeholder reconciled with streamed truth; Stop button wired to AbortController. Agent-eval harness (Python pytest): golden TRAJECTORIES + golden outcomes; trajectory eval (did it call the right tools in a sane order) vs outcome eval (was the answer right); LLM-as-judge with a DIFFERENT model (Opus 4.8 judges), temperature=0, calibrated against ~30 of your own hand labels with reported agreement; bootstrap confidence intervals on the small golden set; merge-blocking regression gate.

✅ Done when. You can open the Angular app, type a query, watch tokens stream AND a live tool-call trace fill in, hit Stop mid-generation and it aborts cleanly; NestJS injects the LLM client via DI and resumes a dropped SSE via Last-Event-ID; the eval harness fails CI on a deliberately regressed prompt; you can state your judge-human agreement number and a 95% CI on your golden set and defend why they're trustworthy.

Phase 4 — MCP, multi-agent, security taxonomy + prompt-cache-for-loops

Durée : Weeks 16-19

Focus. Expose an MCP surface from NestJS, build a structured agent failure-mode/guardrail defense layer, and slash agent-loop cost with cross-turn prompt caching (measured). End state: an MCP-served, injection-hardened, cost-optimized agent — the core of senior agentic interviews.

Journée type. 60-90min theory: ai-engineer/03-agentic-and-mcp/03-mcp-protocol, 04-build-custom-mcp-server, 10-mcp-patterns, 05-multi-agent-orchestration, 08-computer-use (threat model); claude-api skill for MCP + caching. 2.5-3h build. 30min drills: injection taxonomy, MCP transport/auth, when multi-agent LOSES to single-agent (cite Cognition 'Don't build multi-agents').

Build. MCP: a minimal MCP server (TS SDK + Zod) exposing your RAG + calculator + a domain tool over stdio AND Streamable HTTP, with per-session isolation (closure capture) for multi-tenant safety; a NestJS endpoint that serves/composes it (DI-injected creds, OAuth2.1/PKCE or API-key decision documented); a Python MCP client that lists+calls tools. Multi-agent: a supervisor + worker pattern with a HARD per-session cost cap (abort if total_cost > threshold) and the single-agent-baseline-first comparison written up. Security layer: implement a reusable failure-mode taxonomy as code — prompt-injection-via-tool-output and tool-result poisoning defenses (spotlighting, data/instruction delimiting, allowlist on tool outputs), confused-deputy check, compounding-error/loop guard; add an input/output guardrail layer (a jailbreak/PI classifier pass + schema-constrained decoding as a safety control); write an adversarial test set that runs in CI. Cost: implement prompt-caching-for-agent-loops (cache the stable system+tools+history prefix across turns, handle invalidation when tools/model change, respect the lookback window) and MEASURE before/after cost on a 10-turn session.

✅ Done when. Claude Code / an MCP client can connect to your NestJS-served MCP server and call a tool; per-session isolation prevents tenant A from seeing tenant B's tools; your adversarial CI set catches a planted tool-output injection; the supervisor aborts on the cost cap; you have a before/after table showing prompt-caching cut a 10-turn agent session cost by a measured % and you can explain the invalidation rules; you can recite the failure-mode taxonomy from memory in a drill.

Phase 5 — Production hardening: SLOs, incidents, canary, BullMQ AI jobs, RGPD erasure

Durée : Weeks 20-23

Focus. Make AgentDesk production-operable: SLOs gating releases, incident lifecycle, canary with auto-rollback, real async AI jobs on BullMQ, and end-to-end right-to-erasure. This is the 'I RUN agents in prod' senior line and the dominant gap across the llm-ops/voice/infra reviews.

Journée type. 60-90min theory: ai-engineer/05-llm-ops (safety, rate-limiting, cost, eval-in-CI), 08-infrastructure (CDC/RGPD deletes), nestjs/04-advanced/05-queues-bullmq + 05-quality observability; the 'agentic systems in prod' gaps. 2.5-3h build. 30min drills: SLO/error-budget, incident sev classification, canary mechanics, RGPD propagation.

Build. SLOs/SLIs for AgentDesk: define p95 latency, faithfulness (Ragas), tool-success-rate SLIs with targets+windows+error budgets; wire error-budget burn to BLOCK releases. Observability: OpenTelemetry/Langfuse spans for the full agent trace (span hierarchy per tool call + per agent step), token/cost per trace, alerting on quality regression — connect to the existing fastapi/observability.py request-id pattern propagated NestJS->Python->Angular (x-request-id end to end). BullMQ AI jobs (NestJS): a WorkerHost.process that makes a real Anthropic call for a long agent task, idempotency keyed to a generation id, token-cost-aware retry/backoff, partial-output/abort handling on retry, progress surfaced back to the Angular SSE channel. Canary: NestJS traffic-split (5%->100%) for a new prompt/model with bake time and AUTOMATED rollback on SLO breach; prompt-as-code lifecycle (PR + adversarial gate from Phase 4 + canary + rollback). RGPD: a right-to-erasure runbook+code propagating a deletion across Postgres -> vector store -> Langfuse traces, verified. Incident: induce a real failure (e.g. provider 529 storm), follow a sev runbook, write a blameless post-mortem.

✅ Done when. A simulated quality regression burns the error budget and the release gate blocks; a BullMQ AI job survives a mid-generation abort+retry without double-billing or corrupt output and streams progress to Angular; a canary auto-rolls-back on an injected SLO breach; a deletion request verifiably purges a user from Postgres+vector+traces; you have one written post-mortem of an induced incident with action items.

Phase 6 — Interview mastery, vertical narrative, optional voice slice

Durée : Weeks 24-26 (+ ongoing)

Focus. Convert everything built into a senior interview + freelance positioning machine: Dravos-style top-25 defense for YOUR app and for NestJS/Angular, a vertical GTM angle, and one optional voice slice for breadth. End state: you can demo and defend a production agentic system across three languages and ace hostile senior interviews.

Journée type. 60-90min: ai-engineer/15-owning-dravos (the model), 10-vertical-positioning + 09-verticales-fr (pick ONE vertical), 11-portfolio-checklist, 08-freelance-strategy; ai-engineer/04-voice-agents if doing the voice slice. 2-3h: polish portfolio + write defense banks; OR build the voice slice. 30-45min: live mock-interview drilling (whiteboard your AgentDesk architecture out loud, timed).

Build. AgentDesk top-25 hostile-interview question bank with answer frameworks (mirroring 15-owning-dravos), covering: agent loop, eval methodology + your CIs, prompt-injection defenses, prompt-cache cost wins, RAG recall-cliff, SLO/canary/incident, MCP isolation. A parallel NestJS+Angular interview-defense bank (the asymmetric gap the reviews flagged — defend your DI/streaming/signals/zoneless choices like you defend LangGraph). A whiteboard-ready architecture diagram of the full Python->NestJS->Angular system. Pick ONE vertical (legal/finance/RH) from 09-verticales-fr and reframe AgentDesk as a vertical demo + a 1-page positioning ('I embed agents into enterprise Angular/Nest stacks'). OPTIONAL hard-mode breadth: add a voice slice — LiveKit/Realtime STT->agent->TTS with a latency budget, streamed into the same Angular UI.

✅ Done when. You can whiteboard the full architecture in <10min from memory; you have written defensible answers to 25 hostile questions about your own system AND a NestJS/Angular defense set; AgentDesk is reframed as a named vertical demo with a positioning one-pager; (optional) a voice turn round-trips under your stated p95 budget into the Angular UI.

Gaps prioritaires à combler (verdict de la revue)

⚠️ Hub correctness debt: stale/fabricated Claude model IDs (claude-sonnet-4-7, claude-haiku-4-7, claude-*-20260101, retired 20251022 snapshot), Opus 4.7-as-flagship + wrong pricing, and BROKEN extended-thinking syntax thinking={type:'enabled',budget_tokens:N} that returns HTTP 400 on Opus 4.7/4.8.

  • Pourquoi ça compte : You will copy these into a live-coding interview and hit a 404/400 in front of a panel. The pricing errors poison every ROI/cost-defense answer — a senior is judged on cost reasoning. This is a credibility landmine sitting in your strongest material (RAG/agentic files).
  • Fix : Week 1, day 1-2: do a repo-wide sweep keyed to the claude-api skill. Normalize to claude-opus-4-8 (flagship, $5/$25 1M ctx), claude-sonnet-4-6, claude-haiku-4-5 ($1/$5). Strip all fabricated date suffixes (use bare aliases). Replace budget_tokens with thinking={type:'adaptive'} + output_config.effort. Add one source-of-truth pricing/model-id snippet referenced everywhere. This is a grep+edit pass, not a study task — knock it out fast and re-run affected ROI tables.

⚠️ NOTHING is wired across your actual stack. first-ai/ has 8 toy Python AI lessons (blocking, no retries, no async tool loop, no eval, no tests, no observability, toy in-memory RAG). nestjs/ teaches SSE-streaming an LLM but NO tool-use loop, NO agent loop, NO MCP endpoint, NO real BullMQ AI job. angular/ has ZERO SSE/streaming-token/tool-trace UI. There is no end-to-end Python->NestJS->Angular agent.

  • Pourquoi ça compte : This is your #1 worry ('the code lessons may not be enough') and the reviews confirm it: senior_readiness on the code project is BELOW. Reading deep notes ≠ senior. Interviews and clients judge a running streaming agent across services, not markdown. Your differentiator is precisely full-stack-around-AI (Python+NestJS+Angular), which is currently entirely un-built.
  • Fix : The stack_integration_track below threads a single growing app ('AgentDesk') through all 6 phases: Python FastAPI agent core -> NestJS orchestrator (SSE/WS, tool-loop, BullMQ, MCP endpoint) -> Angular streaming UI (signals + SSE, tool-call traces, optimistic steps). Every phase adds one real layer. By month 3 you have a demoable agent; by month 6 it's production-hardened with eval+observability+canary.

⚠️ Agentic-AI depth + eval rigor is thin for your stated target (senior AGENTIC interviews): no structured agent failure-mode taxonomy (prompt-injection-via-tool-output, tool-result poisoning, confused deputy, compounding multi-step error), no trajectory-vs-outcome eval, no judge calibration/bias, no significance on benchmark numbers, no prompt-caching-for-agent-loops (the #1 real cost lever), no determinism/replay harness.

  • Pourquoi ça compte : Your whole career thesis is agentic AI. These are the exact senior probes: 'how do you eval an agent?', 'defend that 92% number', 'how do you stop tool-output injection?', 'how do you make a multi-step agent reproducible/cheap?'. The hub name-drops targets (92% tool accuracy) without defensible methodology.
  • Fix : Phase 3 builds an agent-eval harness (trajectory + outcome, golden trajectories, LLM-as-judge with calibration against your own labels, bootstrap CI on small golden sets) as runnable pytest gating merges. Phase 4 builds the injection/guardrail defense layer (spotlighting, data/instruction separation, allowlist, schema-constrained decoding as a safety control) and prompt-caching-for-agent-loops with measured before/after cost. You'll be able to demo AND defend every number.

⚠️ Retrieval/embeddings senior-internals: WHY cosine vs dot vs euclidean (+ normalization), quantization (PQ/SQ/binary), Matryoshka/MRL, HNSW ef_construction/ef_search/M intuition, and the filtered-ANN recall cliff (pre- vs post-filtering on HNSW). Reranking only at API level (no ColBERT/late-interaction, no latency budget, no score normalization in hybrid fusion).

  • Pourquoi ça compte : Classic senior interview traps. 'You added a metadata filter and recall tanked — why?' (recall cliff). 'How do you cut vector memory 32x?' (binary quantization). The hub's index files (02-embeddings, 04-advanced-hybrid-rerank) are checkbox-thin exactly here while the body files are deep — so you'll feel covered and get caught.
  • Fix : Phase 2: write the prose you're missing AND prove it with a benchmark notebook — index the same corpus in pgvector (HNSW) and Qdrant, sweep ef_search/M, measure recall@k vs latency vs memory, then add a metadata filter and SHOW the pre- vs post-filter recall cliff with numbers. Add binary/SQ quantization and measure the recall/memory tradeoff. Add a ColBERT/late-interaction reranker as the middle tier and put a latency budget on it. Now it's a portfolio artifact, not a fact you memorized.

⚠️ Production/SRE-for-AI is scattered and un-built: no SLO/error-budget framework tying quality/latency to release gating, no incident lifecycle (sev levels, paging, blameless post-mortem), no concrete canary/progressive-delivery with automated SLO-breach rollback, no prompt-as-code lifecycle (PR + adversarial gate + canary + rollback), no end-to-end right-to-erasure propagation (Postgres->vector->traces->FT dataset).

  • Pourquoi ça compte : This is what separates 'I built an agent' from 'I run agents in production' — the senior line. French market adds RGPD/AI Act teeth. These appear as the dominant GAP across llm-ops/voice/infra reviews and you have zero hands-on.
  • Fix : Phase 5 (the 'prod-hardening' phase): define real SLOs/SLIs for AgentDesk (p95 latency, faithfulness, tool-success), wire error-budget burn to a release gate, build a canary deploy in NestJS (traffic split + auto-rollback on SLO breach + bake time), and implement the right-to-erasure pipeline end-to-end across your real Postgres+vector+Langfuse. Write the incident runbook + a real post-mortem of a failure you induce.

⚠️ Career narrative is stale and contradicts your real stack: docs lead with 'TypeScript-first / full-stack-TS / Next.js+Vercel' differentiator, and 15-owning-dravos hardcodes 'Dashboard Next.js 15'. Your stack is Python+NestJS+Angular, NO Next.js. Angular is never leveraged as a moat. There's an asymmetric interview-defense (Dravos top-25 framework exists for Python/LangGraph but NOTHING for NestJS/Angular).

  • Pourquoi ça compte : You'd be coached to defend a Next.js dashboard that doesn't exist — a factual interview liability. And your actual strongest, rarest differentiator (a real ex-PHP/TS dev who embeds agents into enterprise Angular/NestJS stacks) is unsold. Senior interviews are won on a coherent, true narrative.
  • Fix : Phase 1 (week 1-2) and ongoing: rewrite the stack thesis to 'Python(FastAPI/LangGraph) AI core + TypeScript(NestJS) orchestration + Angular streaming UIs' as the moat ('I ship agents INTO real enterprise Angular/Nest apps, not notebooks'). Fix the Dravos diagram to your true front end. Build a NestJS/Angular interview-defense Q&A mirroring the Dravos top-25. Your AgentDesk app becomes the portfolio proof of exactly this differentiator.

⚠️ Modern Anthropic API surface absent in code: no native structured outputs (output_config.format / messages.parse — 03 hand-rolls JSON prompting), no adaptive thinking + effort, no prompt caching (cache_control), no SDK tool runner, no streaming for large max_tokens, no async AsyncAnthropic tool loop, no retries/backoff/typed-exception handling on ANY LLM call.

  • Pourquoi ça compte : These are table-stakes 2026 senior patterns and the most common live-coding asks. Hand-rolling JSON prompting when messages.parse() exists, or having zero retry/backoff, reads as junior immediately.
  • Fix : Phase 1 rebuilds the first-ai/ AI lessons to current surface as you go: AsyncAnthropic streaming tool loop with asyncio.gather over parallel tool_use blocks, try/except on RateLimitError/APIError with exponential backoff + timeouts, native structured outputs, prompt-cache the system/tool prefix, adaptive thinking. These become the Python core of AgentDesk — you fix the lessons by turning them into the real app.

Hard mode (pour viser au-dessus du senior)

  • Reproducibility/replay harness for the agent loop: record every LLM request/response + tool I/O and replay a full multi-step trajectory deterministically (and be able to explain WHY temperature=0 alone is NOT reproducible) — directly answers a senior nondeterminism probe.
  • Defend every benchmark number statistically: no bare '92% tool accuracy' — report it with a bootstrap 95% CI on your golden set and your judge-vs-human agreement, and be able to say when N is too small to trust.
  • Filtered-ANN deep cut: don't just show the recall cliff — implement BOTH pre-filtering (Qdrant cardinality-aware) and post-filtering (pgvector) and pick per query based on filter selectivity, with a measured crossover.
  • Cost-defense kata: maintain a live cost model for AgentDesk (per request, per 10-turn session, with vs without prompt caching, batch -50% where applicable) and be able to whiteboard the levers and the math under pressure with CORRECT 2026 pricing.
  • Confused-deputy + multi-tenant red-team: actually break your own MCP/RAG isolation in a test (tenant A reaching tenant B's tools/docs) and then fix it — having an exploit-then-patch story is senior gold.
  • Zoneless per-token performance: profile your Angular streaming UI under a fast token stream, prove the rAF/coalescing change cut CD work with DevTools numbers (don't just claim it).
  • End-to-end RGPD erasure VERIFICATION: not just delete — write an automated check that proves the user is gone from Postgres AND the vector store AND Langfuse traces AND any FT dataset, as one pipeline.
  • Build the NestJS/Angular interview-defense bank to the same hostile standard as Dravos top-25 — your rarest differentiator (agents-in-enterprise-Angular/Nest) is currently undefended; make it your strongest answer set.

Jalons mesurables

  • Week 1: repo correctness sweep done — zero fabricated/stale Claude model IDs, zero budget_tokens, one source-of-truth pricing snippet; crashing python/test.py fixed; career stack thesis re-aligned to Python+NestJS+Angular and Dravos front-end claim corrected.
  • Week 4: AgentDesk Python core runs — async streaming tool-use loop, parallel tools, retry/backoff, native structured outputs, prompt caching, per-step token+cost logging, all covered by mocked-client pytest.
  • Week 9: Production RAG service live as an agent tool, plus a benchmark notebook with real recall/latency/memory curves and a demonstrated filtered-ANN recall cliff + quantization tradeoff you can defend with your own numbers.
  • Week 15: Full stack demoable end-to-end — Angular streams tokens + a live tool-call trace, Stop aborts cleanly, NestJS resumes dropped SSE via Last-Event-ID; agent-eval harness (trajectory+outcome, calibrated judge, CIs) gates merges.
  • Week 19: MCP server served from NestJS with per-session isolation; injection/guardrail defense layer catches planted tool-output attacks in CI; prompt-caching-for-loops shows a measured cost cut on a 10-turn session.
  • Week 23: AgentDesk is production-operable — SLO error-budget gates releases, canary auto-rolls-back on breach, BullMQ AI jobs survive abort+retry without double-billing, RGPD erasure verifiably propagates across Postgres+vector+traces, one written incident post-mortem.
  • Week 26: Interview-ready — sub-10min whiteboard of the full architecture, a 25-question hostile-defense bank for AgentDesk + a parallel NestJS/Angular defense set, AgentDesk reframed as a vertical demo with a positioning one-pager (optional voice slice round-tripping under budget).

Ce qu'il faut SKIP (ne pas disperser les heures)

  • Do NOT re-read the deep hub files line-by-line — they're already at/above tutorial level. Skim for the mental model, pitfalls list, and decision tables, then go build. Your bottleneck is building, not reading.
  • Skip the FR/EN parity / i18n cleanup entirely for now (verbatim /fr duplicates, 23 missing fr/symfony files, config en-US-vs-FR mismatch). It's real debt but it does NOT make you senior or win interviews — defer until after Phase 6, or just keep one French-canonical tree and drop the /fr mirror.
  • Skip Symfony entirely — it's not in your target stack (Python+NestJS+Angular) and competes for the same hours.
  • Don't chase framework breadth (CrewAI vs AutoGen vs Swarm vs MetaGPT, Agno, Mastra, DSPy, LlamaIndex). Read 07-frameworks/08-comparison-matrix once for decision-grade judgment ('when single-agent wins', TCO), build everything on raw SDK + LangGraph, and be able to JUSTIFY not using the others. Senior = judgment, not tool-collecting.
  • Defer fine-tuning hands-on (LoRA/QLoRA/distillation) — the hub itself correctly de-emphasizes it (RAG-first). Read 06-fine-tuning/01-when-to-fine-tune for the decision tree + ROI ADR so you can defend NOT fine-tuning; skip actually training until a project demands it.
  • Skip computer-use/browser-agent implementation as a core track — read 08-computer-use for its genuinely senior threat-model/security posture (cite it in interviews), but don't burn build-weeks on Xvfb/VNC unless a vertical needs it.
  • Make the voice slice OPTIONAL (Phase 6 only) — it's great breadth and a strong demo, but it's not the spine of an agentic-AI senior role; only build it if Phases 1-5 land on schedule.
  • Don't write report/summary .md files about your progress — put your externalized knowledge into the interview-drills doc and the per-phase defense banks, which are the artifacts that actually convert to offers.

Bibliothèque tech perso — Achref