How LLMs Work

A clear map of how large language models turn a prompt into an answer — and the models and hardware behind them.

Free to start · Fully editable · Export to SVG, PNG, GIF & MP4

What's in this template

7 connected components you can rename, recolor, and extend with AI.

OpenAI GPTAnthropic ClaudeGoogle GeminiMeta LlamaHugging FaceNVIDIA GPUsPyTorch

This diagram explains how a large language model (LLM) actually works. A prompt is broken into tokens, turned into embeddings, and passed through a stack of transformer layers whose attention mechanism weighs every token against every other one. The model then samples the next token, again and again, to generate a response. Around that core sit the real players that make it possible: foundation models from OpenAI, Anthropic and Google, open models from Meta and Hugging Face, and the NVIDIA GPUs and PyTorch frameworks used to train and serve them.

Use it to teach how generative AI works, to brief a non-technical team before an AI project, or as the opening slide of an LLM talk. Every node is editable, so you can swap in the exact models and tooling your stack uses.

Great for

Explaining generative AI to a non-technical team
Opening slide for an LLM or AI talk
Onboarding docs for an AI product team
Course or tutorial on how LLMs work
Internal AI strategy briefings

Frequently asked questions

How does a large language model work?+

An LLM splits your prompt into tokens, converts them to embeddings, and runs them through many transformer layers that use attention to relate every token to every other. It then predicts the next token over and over to produce an answer.

What is the transformer in an LLM?+

The transformer is the neural-network architecture behind modern LLMs. Its self-attention mechanism lets the model weigh the relevance of all tokens at once, which is what makes long-range understanding and fluent generation possible.

Which companies build the major LLMs?+

OpenAI (GPT), Anthropic (Claude) and Google (Gemini) build leading closed models, while Meta (Llama) and the Hugging Face community lead open models. They are trained and served largely on NVIDIA GPUs using frameworks like PyTorch.

Can I customise this LLM diagram?+

Yes. Rename any node, swap in the exact models or hardware your team uses, change the animation style, and export to PNG, SVG, GIF or MP4.

How LLMs Work

What's in this template

Great for

Frequently asked questions

Related templates

Claude Code Architecture

Codex Architecture

RAG Architecture Diagram

AI Agent Architecture Diagram

ML Training Pipeline Diagram

LLM Application Architecture Diagram

Recommendation System Architecture Diagram

MLOps Pipeline Diagram

Make it yours in seconds