Glossary

Token

In LLMs, a token is the smallest unit of text the model processes — roughly 0.75 words or 3-4 characters in English. LLM pricing, context window limits, and generation speed are all measured in tokens.

Explanation

LLMs don't process text character-by-character or word-by-word — they process tokens. Tokens are determined by a tokenizer (typically byte-pair encoding, BPE). Common words are single tokens ('the', 'function', 'return'); rare words split into multiple tokens; code follows similar patterns with programming-specific vocabulary. Why tokens matter for developers: context windows are in tokens (GPT-4o's 128K context = ~96,000 words). API costs are in tokens (input + output tokens × price per 1M). Generation speed is in tokens per second. Concise prompts are cheaper and faster to process than verbose ones. Tokenization in code: variable names (especially snake_case and camelCase) tokenize differently than dictionary words. Long descriptive names use more tokens than abbreviations. Code comments in natural language use more tokens per character than code itself. JSON uses more tokens than YAML for equivalent data. Context window strategy: when using LLMs on code, you're spending tokens on your prompt (system instructions, context, question) and receiving tokens in response. For expensive models, long prompts with irrelevant context are both slow and costly. Curating context — including only what the model needs — is both a cost optimization and a quality optimization (models perform worse with irrelevant context).

Code Example

javascript

// Token counting (tiktoken library)
const { encoding_for_model } = require('tiktoken');
const enc = encoding_for_model('gpt-4o');

const code = 'function calculateTax(income, rate) { return income * rate; }';
const tokens = enc.encode(code);
console.log(`${code.length} chars = ${tokens.length} tokens`);
// Typical: ~65 chars = ~18 tokens

// Estimating API cost (GPT-4o pricing)
function estimateCost(inputTokens, outputTokens) {
  const inputCost  = inputTokens  * (2.50 / 1_000_000);   // $2.50/1M input
  const outputCost = outputTokens * (10.00 / 1_000_000);  // $10/1M output
  return inputCost + outputCost;
}

// 1000-token prompt + 500-token response
console.log(`$${estimateCost(1000, 500).toFixed(4)} per call`);
// $0.0075 per call

// Scale: 10,000 users × 10 calls/day × 1000 input tokens
const dailyCost = estimateCost(10_000 * 10 * 1000, 10_000 * 10 * 200);
console.log(`Daily AI cost: $${dailyCost.toFixed(2)}`);
// $270/day = $8,100/month — token economics matter at scale

// Context window budget
const MAX_TOKENS = 128_000;
const SYSTEM_PROMPT = 500;
const RESPONSE_RESERVE = 2000;
const AVAILABLE = MAX_TOKENS - SYSTEM_PROMPT - RESPONSE_RESERVE;
console.log(`Available for code context: ${AVAILABLE.toLocaleString()} tokens`);

Why It Matters for Engineers

Token awareness is a practical skill for engineers building AI-powered features or using AI tools at scale. API costs scale linearly with token usage — a product making 1 million AI calls per day with 1,000-token prompts costs 10x more than one with 100-token prompts. Optimizing prompts directly impacts infrastructure costs. Token limits also affect architectural decisions: a large codebase won't fit in any context window. Understanding token budgets guides decisions about retrieval-augmented generation (RAG), chunking strategies, and when to use smaller, cheaper models for simpler tasks.

Learn This In Practice

Go deeper with the full module on Beyond Vibe Code.

AI-Assisted Dev Foundations → →