State Space Model (SSM)
State Space Model (SSM) is a class of sequence-processing architectures derived from control theory that represent data streams through a latent state vector updated by linear recurrences, enabling efficient processing of very long sequences with sub-quadratic complexity.
SSMs formalize sequence modeling as a dynamical system: a hidden state vector h(t) evolves according to a linear differential or difference equation driven by the input x(t), and the output y(t) is a linear projection of h(t). This formulation, standard in control engineering since the 1960s, was adapted for deep learning by mapping learned, structured matrices onto the recurrence, bypassing the attention mechanism entirely.
The practical breakthrough came with S4 (Structured State Space for Sequences, Gu et al., 2021), which showed that diagonal-plus-low-rank parameterization of the state matrix allows the recurrence to be computed via fast convolutions during training — matching transformer parallelism — while reverting to cheap linear recurrence for autoregressive inference. Mamba (2023, Albert Gu and Tri Dao) introduced selective state spaces: input-dependent gating of SSM parameters that gives the model content-aware memory, removing a key limitation of purely linear SSMs.
SSMs matter because transformer self-attention scales quadratically with sequence length, making long contexts expensive. SSMs scale linearly in both compute and memory, making them attractive for genomics (sequences of millions of base pairs), long-document processing, and continuous sensor streams. Hybrid architectures such as Jamba (AI21 Labs, 2024) and Zamba interleave SSM and attention layers to capture the strengths of both.
As of 2026, SSMs have moved from research curiosities to production-ready components. Mamba-2 (2024) unified SSMs with linear attention under a structured matrix multiplication framework and demonstrated competitive perplexity with transformers at multi-billion parameter scales. State-space layers are available in Hugging Face Transformers and are being incorporated into multimodal and audio models, though transformers remain dominant for general-purpose LLMs at the largest scales.