Why ChatGPT Forgets: Explaining the Context Window of Language Models

Q: What is the source?

Originally published on Habr AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Jun 15, 2026. Reading time: 3 min.

After an hour of conversation with ChatGPT, the model suddenly forgets the character's name from the first message and asks again about what was already…

Hamidun News Editorial

AI monitoring · Habr AI

Jun 15, 2026· 3 min

AI-processed from Habr AI; edited by Hamidun News

Why ChatGPT Forgets: Explaining the Context Window of Language Models — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

After an hour of working with ChatGPT, the model suddenly forgets details from the first messages — contradicts itself, asks again about already agreed upon things. This is not a glitch: this is how the context window works, and understanding this mechanism is important for anyone who uses AI in their work.

What is a context window

A language model doesn't "remember" a conversation in the human sense. It processes text as a single block — the so-called context. A context window is the maximum volume of text that a model can consider in a single request. The unit of measurement is a token: approximately 3–4 characters in English or 1–2 words in Russian.

Modern models work with windows of different capacities:

GPT-4o — up to 128,000 tokens (about 96,000 words)
Claude 3.7 Sonnet — up to 200,000 tokens
Gemini 1.5 Pro — up to 2,000,000 tokens
Llama 3 — from 8,000 to 128,000 tokens depending on version

Even 128,000 tokens is about 300 pages of text. It sounds like a lot, but in real working sessions — with a codebase, documents, and lengthy dialogue — this limit is reached faster than it seems.

Why the model "forgets"

When a conversation exceeds the context window, the model doesn't "forget" — it simply doesn't see the old messages. They are technically absent from the request input data.

Most services solve this problem in one of two ways.

Truncation: the oldest messages are removed from the context. The model continues to respond, but without access to the beginning of the conversation. This is how most chat interfaces work by default.

Summarization: instead of the first N messages, their brief summary generated by the model itself is fed into the context. Details are lost, but the general thread is preserved.

There is also a third approach — RAG (Retrieval-Augmented Generation): important information is stored in an external database and loaded into the context only when needed. This is how more complex AI systems and enterprise solutions work.

Lost in the middle: a hidden problem

The issue is not only about overflow. The quality of answers degrades even before the context runs out. Researchers from Stanford and Berkeley in 2023 described the "lost in the middle" phenomenon: models significantly better utilize information from the beginning and end of the context. Data that falls in the middle is processed worse — the model seems to "overlook" it.

"Language models tend to make worse use of relevant information when it is located in the middle of a long context," — from the

Lost in the Middle research, 2023.

Practical takeaway: key instructions are better given at the beginning or at the end of the request, rather than buried in the middle of a long document.

How to work with this limitation

Several practical strategies:

Break down tasks — instead of one giant session, divide work into sessions with clear intermediate summaries
Place important information at the beginning — system prompt and key constraints work best at the beginning of the context
Use models with larger windows — for large documents, choose Gemini 1.5 Pro (2M tokens) or Claude with 200K
Summarize yourself — before a new session, ask the model to sum up the previous one and save that text
Estimate length in advance — 1 page of text ≈ 500 tokens, 1 code file ≈ 1,000–5,000 tokens

What this means

A context window is not a technical nuance but a central parameter of any work with language models. Understanding this limitation allows you not to blame "strange" model behavior on a glitch, but to properly organize working sessions. The race for bigger context continues: providers are competing to increase limits, but the engineering question "what the model sees right now" will remain key for a long time to come.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation