Habr AI→ original

Why ChatGPT Forgets: Explaining the Context Window of Language Models

After an hour of conversation with ChatGPT, the model suddenly forgets the character's name from the first message and asks again about what was already…

AI-processed from Habr AI; edited by Hamidun News
Why ChatGPT Forgets: Explaining the Context Window of Language Models
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

After an hour of working with ChatGPT, the model suddenly forgets details from the first messages — contradicts itself, asks again about already agreed upon things. This is not a glitch: this is how the context window works, and understanding this mechanism is important for anyone who uses AI in their work.

What is a context window

A language model doesn't "remember" a conversation in the human sense. It processes text as a single block — the so-called context. A context window is the maximum volume of text that a model can consider in a single request. The unit of measurement is a token: approximately 3–4 characters in English or 1–2 words in Russian.

Modern models work with windows of different capacities:

  • GPT-4o — up to 128,000 tokens (about 96,000 words)
  • Claude 3.7 Sonnet — up to 200,000 tokens
  • Gemini 1.5 Pro — up to 2,000,000 tokens
  • Llama 3 — from 8,000 to 128,000 tokens depending on version

Even 128,000 tokens is about 300 pages of text. It sounds like a lot, but in real working sessions — with a codebase, documents, and lengthy dialogue — this limit is reached faster than it seems.

Why the model "forgets"

When a conversation exceeds the context window, the model doesn't "forget" — it simply doesn't see the old messages. They are technically absent from the request input data.

Most services solve this problem in one of two ways.

Truncation: the oldest messages are removed from the context. The model continues to respond, but without access to the beginning of the conversation. This is how most chat interfaces work by default.

Summarization: instead of the first N messages, their brief summary generated by the model itself is fed into the context. Details are lost, but the general thread is preserved.

There is also a third approach — RAG (Retrieval-Augmented Generation): important information is stored in an external database and loaded into the context only when needed. This is how more complex AI systems and enterprise solutions work.

Lost in the middle: a hidden problem

The issue is not only about overflow. The quality of answers degrades even before the context runs out. Researchers from Stanford and Berkeley in 2023 described the "lost in the middle" phenomenon: models significantly better utilize information from the beginning and end of the context. Data that falls in the middle is processed worse — the model seems to "overlook" it.

"Language models tend to make worse use of relevant information when it is located in the middle of a long context," — from the

Lost in the Middle research, 2023.

Practical takeaway: key instructions are better given at the beginning or at the end of the request, rather than buried in the middle of a long document.

How to work with this limitation

Several practical strategies:

  • Break down tasks — instead of one giant session, divide work into sessions with clear intermediate summaries
  • Place important information at the beginning — system prompt and key constraints work best at the beginning of the context
  • Use models with larger windows — for large documents, choose Gemini 1.5 Pro (2M tokens) or Claude with 200K
  • Summarize yourself — before a new session, ask the model to sum up the previous one and save that text
  • Estimate length in advance — 1 page of text ≈ 500 tokens, 1 code file ≈ 1,000–5,000 tokens

What this means

A context window is not a technical nuance but a central parameter of any work with language models. Understanding this limitation allows you not to blame "strange" model behavior on a glitch, but to properly organize working sessions. The race for bigger context continues: providers are competing to increase limits, but the engineering question "what the model sees right now" will remain key for a long time to come.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…