Interactive AI lab

How AI Works

AI feels mysterious until you separate the parts: tokens, context, model prediction, tool use, memory, and verification. This page lets you poke each part directly.

Run the agent loop Training & alignment deep dive Back to Home

Tour ready

Walk through the lab in order: system map, training, diagrams, tokens, context, prompts, tools, agent loop, trace, systems, concepts, stack building, and failure modes.

Illustrated overview

AI System Map

The illustration shows the same system as a physical workbench. Pick a station to see what that part contributes to an agent run.

Generated illustration

Illustrated AI agent system with input, token blocks, context window, model core, tools, memory, verification, and final output stations.

Separate deep dive

Training and Alignment Roadmap

This page mostly explains what happens when you use an AI system. Model training and RLHF are a deeper topic, so they have their own full page with hands-on demos.

Deep dive live

How models are made

The deep dive covers pretraining, fine-tuning, RLHF, safety tuning, evaluation, data quality, and the difference between base models and chat assistants — including a tiny model you can train yourself and a preference-ranking demo.

Pretraining Fine-tuning RLHF Alignment Evals Data issues

Open the deep dive

Mental models

Three Diagrams Worth Remembering

These are the useful shapes underneath most AI systems: the text pipeline, the agent loop, and the difference between temporary context and durable memory.

Model Pipeline

PromptTokensModelResponse

Text is chopped into tokens, processed through the model, then decoded back into text. The model is predicting the next useful pieces, not looking up a literal answer book. Tap a stage, or send a prompt through to watch the whole trip.

Agent Loop

ReadPlanToolObserve Loop until done

An agent keeps cycling until it has enough evidence or hits a limit. Good agents know when to stop and report what they verified.

Context vs Memory

Contextvisible nowsummarydrops out

Memorysaved factpreferenceworkflow rule

Context is the working desk. Memory is the labeled drawer. If a fact needs to survive future sessions, it belongs in memory or a source file.

Text becomes pieces

Token Playground

Models do not read text exactly like humans do. They see chunks called tokens — in English, one token averages about three-quarters of a word. This is an approximate local demo, but it shows why wording and length matter. Real models accept anywhere from thousands to over a million tokens at once; the meter here fills against a tiny 120-token demo budget so you can watch it move.

Approx tokens0

Demo budget used0%

LengthShort

What the model can see

Context Window Visualizer

A chat can be long while the model's active view is limited. Older material may stay visible, get summarized, or fall out unless it is saved somewhere durable.

Conversation length: 6 turns

visible now summary saved memory dropped

Same model, better instructions

Prompt Quality Compare

Better prompts do not make the model magical. They reduce ambiguity, supply constraints, and tell the model what a useful answer looks like.

Choose the prompt

Likely result

Guessing vs checking

Tool Use Simulator

A plain model predicts an answer from its training and current context. A tool-using agent can inspect files, search, run commands, generate images, or ask for approval.

Model guesses

A model can produce a plausible answer from context, but it may invent details when the answer depends on current files, dates, or external state.

Agent checks

A tool-using agent can inspect the actual source of truth, then answer with less guesswork and clearer evidence.

Step through it

Agent Loop Simulator

An agent is a model wrapped in a loop. It reads the request, checks context, chooses tools when needed, observes the result, and decides whether to keep going or answer.

No API required

Observable work

Trace Viewer

A useful AI run leaves an outside trail: what was requested, what it checked, what it observed, and how it verified the result. You do not need hidden chain-of-thought to judge whether the work is trustworthy.

Audit trail

Different jobs, different tradeoffs

Local, Cloud, Agent, Media

AI is not one thing. The useful question is what kind of system you are using and what constraints come with it. Pick a job below and see which system fits.

Match the job

Local model

Private and hardware-bound. Good for experiments, automation, and offline control. Quality depends on the model and machine.

Cloud model

Usually stronger and faster to update. Great for hard reasoning, coding, and multimodal work, with external service tradeoffs.

Agent

A model with tools, memory, permissions, and a loop. Powerful because it can act, risky if guardrails are sloppy.

Media model

Image, voice, music, and video models translate prompts into pixels or sound. They need visual direction, not just facts.

Pick a job above to see which kind of system fits it best — and why the others are a worse match.

Plain terms

Concept Decoder

AI terms get thrown around loosely. Pick a concept to see the useful definition, a practical example, and the common misunderstanding to avoid.

Tap a term

Assemble the system

Build an Agent Stack

Most real AI systems are a stack of choices. Add or remove capabilities below and watch the profile change.

Interactive model

Capability0

Risk0

Oversight0

Where it goes wrong

Failure Modes

Most AI mistakes are not mysterious. They usually come from missing context, stale information, weak instructions, bad tool choice, or permission boundaries.

Quick checkpoint

Question: “What changed in the latest deployed website commit?”

Pick the safer approach.