Who is it for?
Developer teams and technical leads who integrate LLMs into products and internal systems. Comfortable with code and APIs.
Objectives
By the end of the training, participants can:
- Build against LLM APIs: structured outputs, streaming, batching, error handling
- Design agentic systems: tool use, MCP, orchestration, and their limits
- Build and evaluate a RAG pipeline: chunking, hybrid search, evaluation
- Use AI-assisted development tools effectively as a team
- Take an LLM feature to production: monitoring, evaluation, cost control
Program
Day 1 — Under the hood, and first integrations
LLM internals for engineers. Transformer intuition, tokens and embeddings, the training pipeline (pre-training, SFT, RL), base vs instruct models, inference parameters (temperature, top-p, context window). Why hallucinations are structural. Demo: running a small local model and watching it hallucinate.
The ecosystem, mapped. Proprietary vs open-weight models and their licenses. Hugging Face, inference frameworks (llama.cpp, vLLM), local vs managed APIs — cost, sovereignty, latency. Managed providers: Bedrock, Azure, Vertex, EU options.
Building against the APIs. Anthropic and OpenAI SDKs: structure and differences. Structured outputs, streaming, batch processing. Error handling, rate limits, retries. Prompt caching and what it does to your bill.
Hands-on — first integration. Build a small extraction pipeline against a real API: structured output, validation, cost measurement.
Day 2 — Agents, RAG, AI-assisted development
Tool use and agents. Function calling from scratch. Model Context Protocol (MCP). Agentic loops: planning, execution, correction. Multi-agent orchestration — and when a deterministic chain beats an agent.
RAG, done properly. Embeddings and vector stores. Chunking strategies. Hybrid search (vector + lexical). RAG vs long context vs fine-tuning. How to evaluate a RAG pipeline before your users do.
AI-assisted development. Claude Code, Cursor, Copilot: what actually changes in a team’s workflow. Agents, hooks, MCP servers in practice. Team conventions that keep AI-generated code reviewable.
Hands-on — build an agent. Integrate an agent with tool use on a real task: file access, an external API, and a verification step.
Day 3 (optional) — Production, security, evaluation
Production architecture and observability. Reference architectures. Monitoring latency, cost, error rates, and output quality. Fallback strategies and graceful degradation.
Security. Prompt injection (direct and indirect), jailbreaks, PII leakage. Technical controls: filtering, sandboxing, human review. Threat modeling an LLM application — hands-on, on your own architecture.
Evaluation and continuous quality. Building eval sets from real traffic. Automated evaluation pipelines, regression detection, LLM-as-judge and its pitfalls.
Hands-on — evaluation pipeline. Set up an automated eval on a realistic use case and use it to compare two models.
Practical details
- Two to three days depending on scope, on site or remote
- Hands-on: participants build against real APIs during the training
- Includes post-training follow-up calls to review your AI projects