AI Glossary
Short, precise definitions of artificial-intelligence terms — without the academic fog. Every entry starts with a two-sentence answer, followed by a deeper explanation, a concrete example and related concepts. The glossary grows as new terms enter the news cycle.
Models
Convolutional Neural Network (CNN)
A Convolutional Neural Network (CNN) is a deep learning architecture that applies learned weight-sharing filters over local patches of input data — most commonly images — to detect hierarchical spatial features such as e
Decoder-Only Architecture
A Decoder-Only architecture is a transformer variant that uses a single self-attention stack with causal (left-to-right) masking to predict each next token from its preceding context, without a separate encoder, and is t
Deep Learning
Deep learning is a subfield of machine learning that trains neural networks with many layers to learn hierarchical representations directly from raw data such as pixels, audio waveforms, or text tokens, without hand-craf
Diffusion Model
A diffusion model is a generative AI system that produces images, audio, or video by learning to reverse an iterative noise-addition process, starting from random noise and progressively denoising it through learned step
Embedding Model
An embedding model converts text, images, or other data into fixed-length numerical vectors in a high-dimensional space, where semantically similar items are geometrically close to each other.
Encoder–Decoder Architecture
An Encoder–Decoder architecture is a neural network design in which an encoder maps an input into a latent representation and a separate decoder generates the output sequence from that representation, making it well-suit
Foundation Model
A foundation model is a large-scale AI system pre-trained on broad, diverse data—text, images, code—to serve as a general-purpose base adaptable to many downstream tasks. The term was introduced by Stanford HAI in 2021;
Frontier Model
A frontier model is an AI system operating at the current limits of machine learning capability, typically trained with the largest known compute budgets. The term is used in AI policy to identify systems whose capabilit
Generative Adversarial Network (GAN)
A Generative Adversarial Network (GAN) is a dual-network architecture where a generator produces synthetic data and a discriminator attempts to classify it as real or fake; adversarial training between the two drives pro
Large Language Model (LLM)
A large language model (LLM) is a transformer-based neural network with billions of parameters trained on internet-scale text to model and generate human language, capable of performing diverse tasks through natural-lang
Machine Learning
Machine learning is a branch of artificial intelligence in which algorithms automatically improve their performance on a task by learning patterns from data, without being explicitly programmed with task-specific rules f
Mixture of Experts (MoE)
Mixture of Experts (MoE) is a neural network architecture where a learned routing mechanism activates only a small subset of specialized sub-networks (experts) for each input token, allowing large total parameter counts
Multimodal Model
A multimodal model is an AI system that processes and generates data across more than one modality — such as text, images, audio, or video — within a single unified architecture.
Neural Network
A neural network is a computational model composed of interconnected layers of numerical units (neurons) that learn to map inputs to outputs by adjusting connection weights through exposure to training data.
Open-Weights Model
An open-weights model is an AI system whose trained parameter weights are publicly released, allowing anyone to download, run, and modify the model without accessing the original developer's infrastructure.
Reasoning Model
A reasoning model is an AI system designed to solve complex, multi-step problems by generating explicit intermediate reasoning steps — often called a chain of thought — before producing a final answer.
Recurrent Neural Network (RNN)
A Recurrent Neural Network (RNN) is a neural network that processes sequential data by maintaining a hidden state vector updated at each time step, allowing information from earlier inputs to influence later predictions.
Small Language Model (SLM)
A small language model (SLM) is a language model with a compact parameter count—typically 1 to 13 billion parameters—optimized for efficient inference on consumer devices, edge hardware, or single-GPU servers rather than
Speech Recognition (ASR)
Speech recognition (ASR) is a technology that converts spoken audio into written text, using machine learning models trained on large corpora of speech to accurately transcribe words and sentences in real time or from re
State Space Model (SSM)
State Space Model (SSM) is a class of sequence-processing architectures derived from control theory that represent data streams through a latent state vector updated by linear recurrences, enabling efficient processing o
Text-to-Image Model
A text-to-image model is a generative AI system that produces raster images from natural-language text prompts, synthesizing visual content that matches the described scene, style, or subject.
Text-to-Speech (TTS)
Text-to-speech (TTS) is a technology that converts written text into synthesized spoken audio, using AI models trained on human speech recordings to produce natural-sounding voice output.
Text-to-Video Model
A text-to-video model is a generative AI system that synthesizes video clips from natural-language text prompts, producing temporally coherent sequences of frames that match the described motion, scene, or narrative.
Transformer
A Transformer is a neural network architecture centered on self-attention, which lets every position in a sequence directly attend to every other position simultaneously, enabling fully parallel training and effective mo
Vision-Language Model (VLM)
A Vision-Language Model (VLM) is an AI model that jointly processes visual inputs (images or video) and natural language text, enabling tasks such as image captioning, visual question answering, and document understandin
World Model
A world model is an internal representation that an AI system learns of its environment's dynamics, enabling it to predict the consequences of actions and simulate future states without directly interacting with the real
Training
Backpropagation
Backpropagation is an algorithm for computing the gradient of a neural network's loss with respect to its weights by propagating error signals backward through the network layers, enabling gradient-based optimization.
Catastrophic Forgetting
Catastrophic forgetting is the tendency of a neural network to abruptly lose performance on previously learned tasks when trained sequentially on new data, because weight updates for the new task overwrite representation
Continual Learning
Continual learning is a machine learning paradigm in which a model learns from a continuous stream of tasks or data over time while retaining performance on previously acquired knowledge, without full retraining from scr
Data Augmentation
Data augmentation is the practice of artificially expanding a training dataset by applying label-preserving transformations to existing examples — such as image flipping, cropping, or noise injection — to improve model g
Direct Preference Optimization (DPO)
Direct Preference Optimization (DPO) is a training algorithm that fine-tunes language models to align with human preferences by reformulating the RLHF objective as a binary classification loss over preference pairs, elim
Epoch
An epoch is one complete pass of the entire training dataset through a machine learning model. Models are typically trained for multiple epochs, each pass refining the model's weights through backpropagation applied to e
Federated Learning
Federated learning is a machine learning technique that trains a shared model across many decentralized devices or servers without centralizing raw data, transmitting only model parameter updates to a coordinating server
Fine-tuning
Fine-tuning is the process of further training a pre-trained AI model on a smaller, task-specific dataset so it performs better on that task. Instead of building a model from scratch, you adapt an existing one to your do
Gradient Descent
Gradient descent is an iterative optimization algorithm that trains machine learning models by repeatedly adjusting parameters in the direction that most reduces a loss function, using partial derivatives computed via ba
Instruction Tuning
Instruction tuning is a supervised fine-tuning technique that adapts a pre-trained language model on instruction-response pairs, teaching it to follow natural-language directives rather than merely predict the next token
Knowledge Distillation
Knowledge distillation is a compression technique in which a small student model is trained to match the output distribution of a larger teacher model, producing a compact model that retains much of the teacher's accurac
LoRA (Low-Rank Adaptation)
LoRA is a parameter-efficient fine-tuning technique that adds trainable low-rank matrix pairs to frozen pre-trained model layers, enabling large-model adaptation with a fraction of the original parameter count.
Loss Function
A loss function is a mathematical function that measures the discrepancy between a model's predictions and the true target values, producing a scalar score that optimization algorithms minimize during training.
Model Checkpoint
A model checkpoint is a saved snapshot of a neural network's weights and optimizer state at a specific point during training, enabling resumption after hardware failures and selection of the best-performing version acros
Overfitting
Overfitting occurs when a machine learning model learns the training data too closely—including its noise and idiosyncrasies—resulting in high accuracy on training examples but poor generalization to unseen data.
Pre-training
Pre-training is the initial large-scale training phase in which a neural network learns general representations from a massive corpus using self-supervised objectives, before any task-specific fine-tuning.
QLoRA
QLoRA is a fine-tuning method that quantizes a frozen base model to 4-bit precision while training LoRA adapters at higher precision, enabling large language models to be fine-tuned on a single consumer or professional G
Reinforcement Learning
Reinforcement learning is a machine learning paradigm in which an agent learns a decision-making policy by interacting with an environment and receiving scalar reward signals, optimizing for maximum cumulative reward wit
Reinforcement Learning from AI Feedback (RLAIF)
Reinforcement Learning from AI Feedback (RLAIF) is a variant of RLHF in which an AI model generates the preference labels used to train the reward model, reducing dependence on costly and hard-to-scale human annotation.
Reinforcement Learning from Human Feedback (RLHF)
Reinforcement Learning from Human Feedback (RLHF) is a training pipeline that collects human preference judgments between model outputs, trains a reward model on those judgments, and uses reinforcement learning to fine-t
Reinforcement Learning with Verifiable Rewards (RLVR)
Reinforcement Learning with Verifiable Rewards (RLVR) is a training approach in which RL reward signals come from objective, programmatically checkable criteria — such as numerical correctness of a math answer or code pa
Scaling Laws
Scaling laws are empirical power-law relationships showing that language model performance improves predictably as model parameters, training data volume, and compute budget increase, enabling researchers to forecast cap
Self-Supervised Learning
Self-supervised learning is a training paradigm in which a model generates its own supervisory signal from unlabeled data by solving pretext tasks, eliminating the need for costly human annotations.
Supervised Learning
Supervised learning is a machine learning paradigm in which a model is trained on a labeled dataset of input-output pairs to learn a mapping function, then applied to predict outputs for unseen inputs; it underlies most
Synthetic Data
Synthetic data is artificially generated data — produced by algorithms, simulations, or generative models rather than collected from real-world events — used to train, validate, or test machine learning systems while byp
Training Data
Training data is the labeled or unlabeled dataset fed to a machine learning model during the optimization process, allowing it to adjust internal parameters by minimizing prediction error; its quality, scale, and diversi
Transfer Learning
Transfer learning is a technique in which a model pre-trained on one large dataset or task is adapted to a different but related task, substantially reducing the need for labeled data and training compute.
Unsupervised Learning
Unsupervised learning is a machine learning paradigm in which models find patterns, structure, or compact representations in data without labeled examples, using techniques such as clustering, dimensionality reduction, a
Inference
Context Window
A context window is the maximum number of tokens a language model can process in a single inference call, covering both the input prompt and the generated output. Exceeding it causes input truncation or an API error; lar
Inference
Inference is the process of applying a trained machine learning model to new input data to produce predictions or outputs. It is the deployment-time operation, distinct from training, in which no model parameters are upd
KV-Cache
KV-Cache (Key-Value Cache) is a memory buffer that stores the key and value tensors produced by a transformer's attention layers for already-processed tokens, eliminating redundant recomputation during autoregressive gen
Prompt Caching
Prompt caching is an API and serving technique that stores the computed KV-cache state for a shared prompt prefix — such as a system prompt or a large document — and reuses it across multiple separate requests, eliminati
Quantization
Quantization is the technique of representing a neural network's weights — and optionally its activations — in lower-precision numeric formats such as INT8 or INT4 instead of the default FP16 or BF16, reducing memory foo
Token
A token is the basic unit of text that a language model processes, typically a word, subword fragment, or punctuation mark. In common English prose, one word corresponds to approximately 1.3 tokens under widely used subw
Tokenization
Tokenization is the process of splitting raw text into discrete units called tokens — typically subword fragments — that a language model numerically encodes and processes. A token averages roughly 4 characters in Englis