A clear map of how large language models turn a prompt into an answer — and the models and hardware behind them.
Free to start · Fully editable · Export to SVG, PNG, GIF & MP4
7 connected components you can rename, recolor, and extend with AI.
This diagram explains how a large language model (LLM) actually works. A prompt is broken into tokens, turned into embeddings, and passed through a stack of transformer layers whose attention mechanism weighs every token against every other one. The model then samples the next token, again and again, to generate a response. Around that core sit the real players that make it possible: foundation models from OpenAI, Anthropic and Google, open models from Meta and Hugging Face, and the NVIDIA GPUs and PyTorch frameworks used to train and serve them.
Use it to teach how generative AI works, to brief a non-technical team before an AI project, or as the opening slide of an LLM talk. Every node is editable, so you can swap in the exact models and tooling your stack uses.
An LLM splits your prompt into tokens, converts them to embeddings, and runs them through many transformer layers that use attention to relate every token to every other. It then predicts the next token over and over to produce an answer.
The transformer is the neural-network architecture behind modern LLMs. Its self-attention mechanism lets the model weigh the relevance of all tokens at once, which is what makes long-range understanding and fluent generation possible.
OpenAI (GPT), Anthropic (Claude) and Google (Gemini) build leading closed models, while Meta (Llama) and the Hugging Face community lead open models. They are trained and served largely on NVIDIA GPUs using frameworks like PyTorch.
Yes. Rename any node, swap in the exact models or hardware your team uses, change the animation style, and export to PNG, SVG, GIF or MP4.
How Claude Code reads your repository, calls tools through MCP, and edits code from the terminal
How an AI coding agent like Codex plans, writes and tests code inside a secure sandbox
Map how retrieval-augmented generation grounds an LLM in your data with a vector database
Visualize the reasoning loop, tools, and memory that let an AI agent plan and act
Chart every stage from raw data to a trained, validated machine learning model
See how a production LLM app wires frontend, orchestration, model APIs, and guardrails
Show how candidate generation, ranking, and filtering produce personalized recommendations
Trace the flow from training and CI/CD to deployment, monitoring, and retraining
Open the how llms work in the Infogiph canvas, then edit, animate, and export.
Use this template