AI Glossary

Short, precise definitions of artificial-intelligence terms — without the academic fog. Every entry starts with a two-sentence answer, followed by a deeper explanation, a concrete example and related concepts. The glossary grows as new terms enter the news cycle.

Models

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a deep learning architecture that applies learned weight-sharing filters over local patches of input data — most commonly images — to detect hierarchical spatial features such as e

Decoder-Only Architecture

A Decoder-Only architecture is a transformer variant that uses a single self-attention stack with causal (left-to-right) masking to predict each next token from its preceding context, without a separate encoder, and is t

Deep learning is a subfield of machine learning that trains neural networks with many layers to learn hierarchical representations directly from raw data such as pixels, audio waveforms, or text tokens, without hand-craf

Diffusion Model

A diffusion model is a generative AI system that produces images, audio, or video by learning to reverse an iterative noise-addition process, starting from random noise and progressively denoising it through learned step

Embedding Model

An embedding model converts text, images, or other data into fixed-length numerical vectors in a high-dimensional space, where semantically similar items are geometrically close to each other.

Encoder–Decoder Architecture

An Encoder–Decoder architecture is a neural network design in which an encoder maps an input into a latent representation and a separate decoder generates the output sequence from that representation, making it well-suit

Foundation Model

A foundation model is a large-scale AI system pre-trained on broad, diverse data—text, images, code—to serve as a general-purpose base adaptable to many downstream tasks. The term was introduced by Stanford HAI in 2021;

A frontier model is an AI system operating at the current limits of machine learning capability, typically trained with the largest known compute budgets. The term is used in AI policy to identify systems whose capabilit

Generative Adversarial Network (GAN)

A Generative Adversarial Network (GAN) is a dual-network architecture where a generator produces synthetic data and a discriminator attempts to classify it as real or fake; adversarial training between the two drives pro

Large Language Model (LLM)

A large language model (LLM) is a transformer-based neural network with billions of parameters trained on internet-scale text to model and generate human language, capable of performing diverse tasks through natural-lang

Machine Learning

Machine learning is a branch of artificial intelligence in which algorithms automatically improve their performance on a task by learning patterns from data, without being explicitly programmed with task-specific rules f

Mixture of Experts (MoE)

Mixture of Experts (MoE) is a neural network architecture where a learned routing mechanism activates only a small subset of specialized sub-networks (experts) for each input token, allowing large total parameter counts

Multimodal Model

A multimodal model is an AI system that processes and generates data across more than one modality — such as text, images, audio, or video — within a single unified architecture.

A neural network is a computational model composed of interconnected layers of numerical units (neurons) that learn to map inputs to outputs by adjusting connection weights through exposure to training data.

Open-Weights Model

An open-weights model is an AI system whose trained parameter weights are publicly released, allowing anyone to download, run, and modify the model without accessing the original developer's infrastructure.

Reasoning Model

A reasoning model is an AI system designed to solve complex, multi-step problems by generating explicit intermediate reasoning steps — often called a chain of thought — before producing a final answer.

Recurrent Neural Network (RNN)

A Recurrent Neural Network (RNN) is a neural network that processes sequential data by maintaining a hidden state vector updated at each time step, allowing information from earlier inputs to influence later predictions.

Small Language Model (SLM)

A small language model (SLM) is a language model with a compact parameter count—typically 1 to 13 billion parameters—optimized for efficient inference on consumer devices, edge hardware, or single-GPU servers rather than

Speech Recognition (ASR)

Speech recognition (ASR) is a technology that converts spoken audio into written text, using machine learning models trained on large corpora of speech to accurately transcribe words and sentences in real time or from re

State Space Model (SSM)

State Space Model (SSM) is a class of sequence-processing architectures derived from control theory that represent data streams through a latent state vector updated by linear recurrences, enabling efficient processing o

Text-to-Image Model

A text-to-image model is a generative AI system that produces raster images from natural-language text prompts, synthesizing visual content that matches the described scene, style, or subject.

Text-to-Speech (TTS)

Text-to-speech (TTS) is a technology that converts written text into synthesized spoken audio, using AI models trained on human speech recordings to produce natural-sounding voice output.

Text-to-Video Model

A text-to-video model is a generative AI system that synthesizes video clips from natural-language text prompts, producing temporally coherent sequences of frames that match the described motion, scene, or narrative.

A Transformer is a neural network architecture centered on self-attention, which lets every position in a sequence directly attend to every other position simultaneously, enabling fully parallel training and effective mo

Vision-Language Model (VLM)

A Vision-Language Model (VLM) is an AI model that jointly processes visual inputs (images or video) and natural language text, enabling tasks such as image captioning, visual question answering, and document understandin

A world model is an internal representation that an AI system learns of its environment's dynamics, enabling it to predict the consequences of actions and simulate future states without directly interacting with the real

Training

Backpropagation

Backpropagation is an algorithm for computing the gradient of a neural network's loss with respect to its weights by propagating error signals backward through the network layers, enabling gradient-based optimization.

Catastrophic Forgetting

Catastrophic forgetting is the tendency of a neural network to abruptly lose performance on previously learned tasks when trained sequentially on new data, because weight updates for the new task overwrite representation

Continual Learning

Continual learning is a machine learning paradigm in which a model learns from a continuous stream of tasks or data over time while retaining performance on previously acquired knowledge, without full retraining from scr

Data Augmentation

Data augmentation is the practice of artificially expanding a training dataset by applying label-preserving transformations to existing examples — such as image flipping, cropping, or noise injection — to improve model g

Direct Preference Optimization (DPO)

Direct Preference Optimization (DPO) is a training algorithm that fine-tunes language models to align with human preferences by reformulating the RLHF objective as a binary classification loss over preference pairs, elim

An epoch is one complete pass of the entire training dataset through a machine learning model. Models are typically trained for multiple epochs, each pass refining the model's weights through backpropagation applied to e

Federated Learning

Federated learning is a machine learning technique that trains a shared model across many decentralized devices or servers without centralizing raw data, transmitting only model parameter updates to a coordinating server

Fine-tuning is the process of further training a pre-trained AI model on a smaller, task-specific dataset so it performs better on that task. Instead of building a model from scratch, you adapt an existing one to your do

Gradient Descent

Gradient descent is an iterative optimization algorithm that trains machine learning models by repeatedly adjusting parameters in the direction that most reduces a loss function, using partial derivatives computed via ba

Instruction Tuning

Instruction tuning is a supervised fine-tuning technique that adapts a pre-trained language model on instruction-response pairs, teaching it to follow natural-language directives rather than merely predict the next token

Knowledge Distillation

Knowledge distillation is a compression technique in which a small student model is trained to match the output distribution of a larger teacher model, producing a compact model that retains much of the teacher's accurac

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient fine-tuning technique that adds trainable low-rank matrix pairs to frozen pre-trained model layers, enabling large-model adaptation with a fraction of the original parameter count.

A loss function is a mathematical function that measures the discrepancy between a model's predictions and the true target values, producing a scalar score that optimization algorithms minimize during training.

Model Checkpoint

A model checkpoint is a saved snapshot of a neural network's weights and optimizer state at a specific point during training, enabling resumption after hardware failures and selection of the best-performing version acros

Overfitting occurs when a machine learning model learns the training data too closely—including its noise and idiosyncrasies—resulting in high accuracy on training examples but poor generalization to unseen data.

Pre-training is the initial large-scale training phase in which a neural network learns general representations from a massive corpus using self-supervised objectives, before any task-specific fine-tuning.

QLoRA is a fine-tuning method that quantizes a frozen base model to 4-bit precision while training LoRA adapters at higher precision, enabling large language models to be fine-tuned on a single consumer or professional G

Reinforcement Learning

Reinforcement learning is a machine learning paradigm in which an agent learns a decision-making policy by interacting with an environment and receiving scalar reward signals, optimizing for maximum cumulative reward wit

Reinforcement Learning from AI Feedback (RLAIF)

Reinforcement Learning from AI Feedback (RLAIF) is a variant of RLHF in which an AI model generates the preference labels used to train the reward model, reducing dependence on costly and hard-to-scale human annotation.

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning from Human Feedback (RLHF) is a training pipeline that collects human preference judgments between model outputs, trains a reward model on those judgments, and uses reinforcement learning to fine-t

Reinforcement Learning with Verifiable Rewards (RLVR)

Reinforcement Learning with Verifiable Rewards (RLVR) is a training approach in which RL reward signals come from objective, programmatically checkable criteria — such as numerical correctness of a math answer or code pa

Scaling laws are empirical power-law relationships showing that language model performance improves predictably as model parameters, training data volume, and compute budget increase, enabling researchers to forecast cap

Self-Supervised Learning

Self-supervised learning is a training paradigm in which a model generates its own supervisory signal from unlabeled data by solving pretext tasks, eliminating the need for costly human annotations.

Supervised Learning

Supervised learning is a machine learning paradigm in which a model is trained on a labeled dataset of input-output pairs to learn a mapping function, then applied to predict outputs for unseen inputs; it underlies most

Synthetic data is artificially generated data — produced by algorithms, simulations, or generative models rather than collected from real-world events — used to train, validate, or test machine learning systems while byp

Training data is the labeled or unlabeled dataset fed to a machine learning model during the optimization process, allowing it to adjust internal parameters by minimizing prediction error; its quality, scale, and diversi

Transfer Learning

Transfer learning is a technique in which a model pre-trained on one large dataset or task is adapted to a different but related task, substantially reducing the need for labeled data and training compute.

Unsupervised Learning

Unsupervised learning is a machine learning paradigm in which models find patterns, structure, or compact representations in data without labeled examples, using techniques such as clustering, dimensionality reduction, a

Inference

A context window is the maximum number of tokens a language model can process in a single inference call, covering both the input prompt and the generated output. Exceeding it causes input truncation or an API error; lar

Inference is the process of applying a trained machine learning model to new input data to produce predictions or outputs. It is the deployment-time operation, distinct from training, in which no model parameters are upd

KV-Cache (Key-Value Cache) is a memory buffer that stores the key and value tensors produced by a transformer's attention layers for already-processed tokens, eliminating redundant recomputation during autoregressive gen

Prompt caching is an API and serving technique that stores the computed KV-cache state for a shared prompt prefix — such as a system prompt or a large document — and reuses it across multiple separate requests, eliminati

Quantization is the technique of representing a neural network's weights — and optionally its activations — in lower-precision numeric formats such as INT8 or INT4 instead of the default FP16 or BF16, reducing memory foo

A token is the basic unit of text that a language model processes, typically a word, subword fragment, or punctuation mark. In common English prose, one word corresponds to approximately 1.3 tokens under widely used subw

Tokenization is the process of splitting raw text into discrete units called tokens — typically subword fragments — that a language model numerically encodes and processes. A token averages roughly 4 characters in Englis

Agents

An AI agent is a system where a language model does not just answer, but plans and executes multi-step tasks: it calls tools and APIs, reads the results and decides the next action toward a goal. Unlike a chatbot, an age

Techniques & methods

RAG (Retrieval-Augmented Generation)

RAG (retrieval-augmented generation) is a technique that lets a language model pull relevant documents from an external knowledge base before answering and ground its response in them. It reduces hallucinations and keeps