Why LLMs forget what you just told them

It's all about context windows, and token limits.

Apr 23, 2025

If you keep talking to a chatbot like ChatGPT or Claude (they are LLMs, FYI), you will notice that they sometimes “forget“ things you just told them, like your command to use `punkt_tab` instead of `punkt` in your script, or something you said about your best friend.

While recent updates to memory management (especially for ChatGPT) prevent this to some extent, earlier versions of most LLMs suffered a lot from this memory loss. And in this article, we’ll talk about how and why this happens, simply.

The Context Windows

Most of the LLMs use the transformer architecture, whether it is the main architecture or not. And these transformer models use what’s called “self-attention“ in order to attend to every token (like words) in the input sequence.

But these sequences have a maximum set token limit, which varies by model. This is what we call the “context window“.

If the conversation exceeds the token limit, old tokens get truncated, and yep, the model can no longer use them in the chat.

But this is the most basic thing that can happen, as we’re talking about this in 2025. Most of the models have adapted new techniques to prevent this, like using RAG.

Even when the model hasn’t exceeded its context window, it still forgets the stuff you tell it, for several reasons. I’ll explain the most common ones down below:

Instruction Interference

When you keep chatting with an AI model in a certain task, say writing a script, you provide different instructions. Some of these instructions change when you chat further, which results in a classic instruction overriding or another kind of interference.

LLMs may respond to this in different ways. For instance, it may mess up between the previous instruction and the latest instruction, acknowledging a merged instruction (i.e., trying to follow both of them to some extent). As this happens, the influence of instructions will also seem to naturally decay over time.

Pattern Following vs. True Comprehension

When you chat with an LLM, you're interacting with a “system” that's fundamentally predicting patterns rather than truly understanding your intent, like humans do.

This makes the LLM reset to its default patterns (like tone of speech or choice of words), which also makes some of your instructions or details easy to lose track of.

This problem gets worse because LLMs were trained on data with certain dominant patterns, causing them to revert to these familiar patterns when your instructions differ from their training. This is also why it’s hard to prompt engineer (I’m talking about basic prompt engineering in the UI) large language models like ChatGPT to, let’s say, write like you. Your text of 750 tokens doesn’t stand a chance with the 1 trillion parameters it was trained on, and this leads to inconsistencies and eventually forgetting what your style was.

Summing Up

The “forgetting“ nature of LLMs, as we discussed, is the result of a mix of technical limitations (like context windows), instruction overload, and the nature of how they process information. Next time your AI assistant "forgets" something important, you'll know exactly why it's happening, and maybe you can work something out to prevent it, like repeating constant instructions or asking it to summarize the progress so far.

Thanks for reading so far!

P.Q. Rubin

Apr 23

Interesting! This is something I've been curious about for a long time.

I like to generate longer texts with LLMs. They all have a pretty short maximum output length, so I'm forced to prompt one paragraph at a time. After a few prompts ("Write part 37 now!") they start forgetting things.

I've seen people use intermittent prompts reminding the AI of key storylines and facts from time to time to combat this. I wonder if it's purely a context window issue I'm up against, or if LLMs forget for other reasons as well.

Expand full comment