Who is it for?

Developer teams and technical leads who integrate LLMs into products and internal systems. Comfortable with code and APIs.

Objectives

By the end of the training, participants can:

Build against LLM APIs: structured outputs, streaming, batching, error handling
Design agentic systems: tool use, MCP, orchestration, and their limits
Build and evaluate a RAG pipeline: chunking, hybrid search, evaluation
Use AI-assisted development tools effectively as a team
Take an LLM feature to production: monitoring, evaluation, cost control

Program

Day 1 — Under the hood, and first integrations

LLM internals for engineers. Transformer intuition, tokens and embeddings, the training pipeline (pre-training, SFT, RL), base vs instruct models, inference parameters (temperature, top-p, context window). Why hallucinations are structural. Demo: running a small local model and watching it hallucinate.

The ecosystem, mapped. Proprietary vs open-weight models and their licenses. Hugging Face, inference frameworks (llama.cpp, vLLM), local vs managed APIs — cost, sovereignty, latency. Managed providers: Bedrock, Azure, Vertex, EU options.

Building against the APIs. Anthropic and OpenAI SDKs: structure and differences. Structured outputs, streaming, batch processing. Error handling, rate limits, retries. Prompt caching and what it does to your bill.

Hands-on — first integration. Build a small extraction pipeline against a real API: structured output, validation, cost measurement.

Day 2 — Agents, RAG, AI-assisted development

Tool use and agents. Function calling from scratch. Model Context Protocol (MCP). Agentic loops: planning, execution, correction. Multi-agent orchestration — and when a deterministic chain beats an agent.

RAG, done properly. Embeddings and vector stores. Chunking strategies. Hybrid search (vector + lexical). RAG vs long context vs fine-tuning. How to evaluate a RAG pipeline before your users do.

AI-assisted development. Claude Code, Cursor, Copilot: what actually changes in a team’s workflow. Agents, hooks, MCP servers in practice. Team conventions that keep AI-generated code reviewable.

Hands-on — build an agent. Integrate an agent with tool use on a real task: file access, an external API, and a verification step.

Day 3 (optional) — Production, security, evaluation

Production architecture and observability. Reference architectures. Monitoring latency, cost, error rates, and output quality. Fallback strategies and graceful degradation.

Security. Prompt injection (direct and indirect), jailbreaks, PII leakage. Technical controls: filtering, sandboxing, human review. Threat modeling an LLM application — hands-on, on your own architecture.

Evaluation and continuous quality. Building eval sets from real traffic. Automated evaluation pipelines, regression detection, LLM-as-judge and its pitfalls.

Hands-on — evaluation pipeline. Set up an automated eval on a realistic use case and use it to compare two models.

Practical details

Two to three days depending on scope, on site or remote
Hands-on: participants build against real APIs during the training
Includes post-training follow-up calls to review your AI projects