Agentic Memory — what to remember, and what to forget.

Agent contexts are blank slates. What differentiates one agentic harness from another is how that context is meticulously crafted — not only to give the agent the best chance to succeed, but to do so the way the user prefers.

Learning user preferences is not new; it is what makes a product adapt to each user. LLMs only amplify its importance, since they are so good at morphing into exactly how a user prefers their interactions and their work output.

But despite popular belief, perfect memory is more of a curse than a superpower. What matters at this very moment most likely won't matter a day from now, and remembering it is just a waste of compute — if not an active distraction. The same philosophy goes for your favorite agents: you want them to remember the things that matter to you, so you don't have to repeat yourself — not to remember everything.

Recency bias

Our current selves are not a good judge of what is worth remembering. We tend to significantly overvalue recent events and experiences, thinking we must remember all of it. Thankfully, our brains have evolved to do most of their memory formation in the subconscious — and especially during sleep, at a remove from the moment that made everything feel so urgent.

The same thing happens to an agent that's asked to curate your preferences while it's in the middle of working for you. It overvalues the information right in front of it, certain that this is definitely worth remembering, when in reality it could just be a one-off task you happened to ask for today. Striking a balance between remembering too much and not enough is the name of the game.

At Brightwave, we designed our memory system with all of this in mind. Every user gets their own dedicated memory agent that runs constantly in the background, observing your behavior across the platform. Its only job is to decide whether there's enough signal that an interaction justifies a long-term memory — or whether it should instead take a note and watch for more repeated events before writing one down.

Memory forms in the background

The most important design choice is also the least obvious one: the agent that forms your memories is not the agent doing your work.

It's tempting to let the working agent curate as it goes — notice a preference mid-task and jot it down. But that's precisely the agent in the worst position to judge. It's deep in the current task, and everything about the current task feels important. That's recency bias, now wearing an LLM's clothes.

So we moved memory formation out of the critical path entirely. While you work, nothing is trying to remember anything. Every message you send simply drops a small, cheap signal — an event that records this happened and nothing more. No agent runs. No memory is written. Your task stays fast, and your work is never waiting on a memory.

Memories form later, and somewhere else. About once a minute, a background process wakes up, gathers the signals that have piled up, and hands them to your personal memory agent — the one thing whose entire job is to decide what's worth keeping. It's the closest thing we have to consolidating during sleep: reflection happens away from the moment, with a little distance from the urgency that made everything feel essential.

Memory forms in the background: your work drops cheap signals; a background sweep hands them to your per-user memory agent, the only place memories are created, updated, or deleted.

Figure 1. Memory forms in the background — like consolidating during sleep. Your work drops a cheap signal and moves on; a background sweep hands the accumulated signals to your one-per-user memory agent, which is the only place a memory is ever created, updated, or deleted.

A few things make this practical. There is exactly one memory agent per user — a single, long-lived agent that persists across every project you touch, so it's building a picture of you, not of one task. The background sweep is fair: it serves whoever has been waiting longest first, so one heavy user can't starve everyone else. And it's careful never to run two copies of your memory agent at the same time, so two passes can't both decide to write the same thing.

Enough signal to remember

This is where the balance actually gets struck.

When the memory agent looks at your recent activity, most of it should slide right off. A one-off request is not a preference. A single instance of almost anything is rarely worth a permanent memory. So the agent is held to a deliberately high bar.

Strong, explicit signals get through immediately — when you say always show me X, or never do Y, there's nothing to second-guess, and it becomes a memory on the spot. But the far more common case is the weak signal: a thing you did once that might be the start of a pattern, or might be nothing at all. For those, the agent does not write a memory. It does exactly what a careful person does — it takes a note and watches.

Those notes are hypotheses, not memories: seems to prefer concise summaries — seen once. They sit in the agent's own working memory, and only when a pattern actually repeats — ideally across different conversations, not just twice in the same breath — does a hypothesis graduate into a real, durable memory. A hunch has to earn its place.

There's a subtle trick worth calling out here. Those hypotheses don't live in a database; they live in the agent's working memory — the running summary of its own long-lived conversation. That summary is constantly being compacted as the conversation grows, which would normally mean these fragile half-formed notes get smoothed away. So we taught the summarizer to treat them as its single highest priority — the one kind of content it is never allowed to drop. It's consolidation in miniature: the unproven stuff is held loosely, in a place designed to fade, and only what survives repetition gets written down for good.

From signal to memory: strong explicit signals become memories immediately; weak signals become tracked hypotheses that only graduate after repeating; most signals are forgotten.

Figure 2. From signal to memory. Strong, explicit preferences are kept immediately; weak signals become tracked hypotheses in the agent's working memory and only graduate once the pattern repeats. Most signals are forgotten — and that's the point.

A bad memory is worse than a missing one

If forgetting is a feature, then a memory store with no limit is a bug. We give each memory agent a small budget — on the order of a few dozen memories — and the reason isn't storage. Storage is cheap. The reason is that a cluttered memory is a worse memory. Every memory you keep is something the agent has to weigh every single time it works for you, and past a certain point more memories don't make the agent more personalized — they make it more distracted. A wrong or stale memory is worse still: it quietly pulls the agent away from what you actually want.

The interesting behavior shows up when the budget is full. We don't silently delete the oldest memory to make room — that would throw away something that might genuinely matter, without anyone deciding to. Instead, the next attempt to write a memory simply fails, and the failure tells the agent, in plain language, that it's out of room and has a choice to make: consolidate two overlapping memories into one, delete something that's no longer true, or update an existing memory instead of adding another.

In other words, the limit turns the agent into its own gardener. Hitting the ceiling isn't an error to route around — it's a prompt to tend the garden. Quality over quantity, enforced by making the agent confront the tradeoff itself instead of hiding it behind an automatic eviction rule it never sees.

A full memory budget is a forcing function: at the cap the create call fails and the failure tells the agent to consolidate, delete, or update instead of silently evicting the oldest.

Figure 3. The budget is a forcing function, not an eviction policy. At the cap, the create call fails and the failure itself tells the agent to consolidate, delete, or update — so it tends the garden instead of silently dropping the oldest memory.

Remembering why

There's a question you should be able to ask anything that claims to remember things about you: why do you think that?

Most systems can't answer it. Once a fact has been melted down into a vector and dropped into a store, the thread back to where it came from is gone. We built ours the other way around. Every memory the agent writes has to come with its receipts — the specific messages you sent that justified it. The agent literally cannot create or change a memory without pointing at that evidence, and we keep the link permanently. It points at the cause: the exact messages, in the exact conversations, that led the agent to believe this about you.

That makes the entire memory set auditable. For any memory, we can walk straight back to the moments that created it — which is how you keep a system like this honest, and how you debug it when it gets something wrong. It also draws a clean line between what the system inferred about you and what you told it directly: the agent can curate its own memories freely, but it can never overwrite or delete the ones you wrote yourself.

Every memory keeps its receipts: each memory links back to the exact user messages that caused it; a separate trail records each change for audit.

Figure 4. Every memory keeps its receipts. Each memory links back to the exact user messages that caused it, kept permanently; a separate trail records each change for audit.

Putting memories to work

A memory is only useful if it shows up at the right moment. Because the set is small and clean, we can take a simpler approach than most: give the agent all of it, every time.

At the start of every turn, your full set of memories is laid in front of the working agent — short previews it can scan at a glance, with the ability to pull up the complete text of any memory that looks relevant. This is the opposite of the usual approach, where every turn fires off a similarity search and hopes the right memory ranks high enough to surface. There's no haystack to search, so there's nothing to gamble on. Keeping the set small and curated means the agent can simply know everything it knows about you, all the time. (We do keep a semantic search as well — but mostly so the memory agent can find near-duplicates before it creates yet another one.)

What holds it together

None of this works if the plumbing underneath is sloppy, so a handful of invariants do the quiet, load-bearing work.

There is a single source of truth for your memories, and the search index is treated as strictly derivative — always rebuildable from the source, never the thing of record. Memory writes and the work to re-index them are committed together, so a memory can't exist without its re-index being scheduled, and the indexing work is written to converge on the current state of the world even if it runs late, out of order, or twice.

And the background memory agent can never run twice for the same user at once — claiming the work and marking the agent busy are the same atomic step, so a second sweep simply finds it occupied and moves on. If a run ever crashes mid-curation, it doesn't get stuck: the very next sweep notices the abandoned run and picks it back up. None of these are exotic, but together they're what let a thing that runs constantly, in the background, for every user, stay correct without a babysitter.

What we're really building

It's tempting to chase perfect recall — an agent that remembers everything you've ever done with it. We think that's the wrong goal. The agent you actually want is the one that remembers the handful of things that matter, forgets the rest without being asked, and can tell you exactly why it remembers what it does.

Forgetting isn't a limitation we're engineering around. It's the feature. The point was never to remember everything — it was so you never have to repeat yourself.

Professional-grade AI for the world's most complex challenges.

Schedule a trial