Agentic Memory

Agent contexts are blank slates. What differentiates one agentic harness from another is how that context is meticulously crafted to best serve its user. LLMs are remarkably good at morphing into however you like your interactions and your work output. They just have to be told. And without memory, they have to be told every single time.

It is tempting to remember everything, but perfect memory is more of a curse than a superpower. Most of what matters at this very moment won’t matter a day from now, and remembering it is at best wasted compute, at worst an active distraction. What you want from an agent is what you want from a good colleague: not total recall, just the handful of things that mean you never have to repeat yourself.

Recency bias

We are poor judges, in the moment, of what’s worth remembering. Whatever just happened feels significant, which is exactly why our brains do most of their real memory formation later, in the subconscious and especially during sleep, at a remove from the moment that made everything feel so urgent.

An agent asked to curate your preferences while it works for you has the same flaw. Everything in its context window looks like the most important thing it has ever seen, so a one-off request you made on a Tuesday becomes a permanent fact about you. The hard problem in agent memory isn’t storage, and it isn’t retrieval. It’s judgment: deciding, with a little distance, what was signal and what was just Tuesday.

So we gave the judgment its own agent. Every Brightwave user gets a dedicated memory agent that runs constantly in the background, watching how you work across the platform. It has exactly one job: decide whether what it just saw justifies a long-term memory, or take a note and wait for the pattern to repeat before writing anything down. This isn’t a launch announcement, either: the system has been running in production since late 2025. We’re just now getting around to writing about it. The rest of this post is how it works.

Memory forms in the background

The most important design choice is also the least obvious one: the agent that forms your memories is not the agent doing your work.

We moved memory formation out of the critical path entirely. Brightwave already had the right backbone for this: everything that happens on the platform is recorded as an event in a single append-only log. A message sent, a document uploaded, a report generated. Memory is built as a subscriber to that log. While you work, nothing is trying to remember anything. Your message simply becomes one more event in the log, and your task moves on. No agent runs. No memory is written. Your work is never waiting on a memory.

Memories form later, and somewhere else. On a steady cadence, a background sweep gathers the events that have accumulated and hands them to your personal memory agent, the one thing whose entire job is to decide what’s worth keeping. It’s the closest thing we have to consolidating during sleep: reflection happens away from the moment, with a little distance from the urgency that made everything feel essential.

Sequence diagram of the write path: your task appends cheap events to an append-only log and returns immediately; a background sweep atomically claims your one memory agent, which commits its decision in a single transaction together with its evidence.

Figure 1. The write path, end to end. Your work only appends a cheap event to the platform’s log and moves on; a background sweep, fair to whoever has waited longest, atomically claims your one memory agent, and the agent’s decision commits in a single transaction, together with its evidence.

One detail makes this practical. There is exactly one memory agent per user: a single, long-lived agent that persists across every project you touch, so it’s building a picture of you, not of one task.

Building on the event log has one more consequence worth pausing on: memory at Brightwave isn’t a chat feature. The log doesn’t care whether an event is a message you sent, a document you uploaded, or a report you reran, and neither does the pipeline that turns events into memories. Conversation is the richest signal today, but the system is designed to learn from behavior, and nearly everything you do on the platform is a behavior it can observe.

Enough signal to remember

This is where the balance actually gets struck.

When the memory agent looks at your recent activity, most of it should slide right off. A one-off request is not a preference. A single instance of almost anything is rarely worth a permanent memory. So the agent is held to a deliberately high bar.

Strong, explicit signals get through immediately. When you say always show me X, or never do Y, there’s nothing to second-guess, and it becomes a memory on the spot. But the far more common case is the weak signal: a thing you did once that might be the start of a pattern, or might be nothing at all. For those, the agent does not write a memory. It does what a careful person does. It takes a note and watches.

Those notes are hypotheses, not memories, and they look like exactly what they are: TRACKING: prefers tables over prose for comparisons. Observed once. Only when a pattern actually repeats, ideally across different conversations rather than twice in the same breath, does a hypothesis graduate into a real, durable memory. A hunch has to earn its place.

Where those hypotheses live is a trick we’re fond of. They aren’t rows in a database; they live in the memory agent’s own working context, the running summary of its one long-lived conversation. That summary is continuously compacted as the conversation grows, which would normally smooth these fragile, half-formed notes right out of existence. So the summarizer carries one hard rule: tracked hypotheses are the single thing it is never allowed to drop. We took a mechanism that is designed to be lossy and made it the home for hunches. Unproven things are held loosely, in a place where fading away is free, and only what survives repetition gets written down for good.

Where a hunch lives: every signal is weighed by the memory agent, which writes a durable memory when convinced, leaves a TRACKING note in its continuously-compacted working summary when unsure, and dismisses the rest; repeated patterns get promoted, stale notes fade.

Figure 2. Where a hunch lives. The memory agent weighs every signal. When it’s convinced, it writes a durable memory; when it merely suspects a pattern, it leaves a TRACKING note in its own working summary, a place designed to forget. The summarizer may drop anything except a TRACKING note; notes that keep proving out get promoted, and stale ones simply fade.

A bad memory is worse than a missing one

If forgetting is a feature, then a memory store with no limit is a bug. Each memory agent gets a small budget of memories, small enough that you could read the whole set in one sitting, and the reason isn’t storage. Storage is cheap. The reason is that a cluttered memory is a worse memory. Every memory you keep is something the agent has to weigh every single time it works for you, and past a certain point more memories don’t make the agent more personalized. They make it more distracted. A wrong or stale memory is worse still: it quietly pulls the agent away from what you actually want.

The interesting behavior shows up when the budget is full. We don’t silently delete the oldest memory to make room; that would throw away something that might genuinely matter, without anyone deciding to. Instead, the next attempt to write a memory simply fails, and the failure message is itself a prompt. It tells the agent it’s out of room and lays out its options: consolidate two overlapping memories into one, delete something that’s no longer true, or update an existing memory instead of adding another.

In other words, the limit turns the agent into its own gardener. Hitting the ceiling isn’t an error to route around; it’s the system telling the agent it’s time to tend the garden. Quality over quantity, enforced by making the agent confront the tradeoff itself instead of hiding it behind an automatic eviction rule it never sees.

A tool-call trace at the full budget: the create call is rejected, the error coaches the agent to consolidate, delete, or update, and the retried create succeeds, contrasted with rejected silent eviction.

Figure 3. When the budget is full, the failure is the prompt. The write fails, and the error itself coaches the agent through making room: consolidate, delete, or update. Making room is a curation decision, never a silent eviction.

Remembering why

There’s a question you should be able to ask anything that claims to remember things about you: why do you think that?

Most systems can’t answer it. Once a fact is distilled into a memory, the thread back to where it came from is gone. We built ours the other way around. Every memory the agent writes has to come with its receipts: the specific events that justified it. The call that writes a memory requires the evidence, so a memory without provenance is simply not expressible. The link is kept permanently, and it points at the cause: the exact moments, in the exact conversations, that led the agent to believe this about you.

That makes the entire memory set auditable. For any memory, we can walk straight back to the moments that created it, which is how you keep a system like this honest, and how you debug it when it gets something wrong. Every change to a memory leaves a trail of its own, so what does the agent believe about me, why, and since when is always an answerable question.

Provenance object graph: one memory points back at the exact messages, in two different conversations, that justified it; a change trail records every revision; the write call requires the evidence.

Figure 4. Every memory keeps its receipts. The call that writes a memory requires the exact moments that justified it, and the links are kept for good, so any memory can answer “why do you think that?” A separate trail records every change since.

What we’re really building

It’s tempting to chase perfect recall: an agent that remembers everything you’ve ever done with it. We think that’s the wrong goal. The agent you actually want is the one that remembers the handful of things that matter, forgets the rest without being asked, and can tell you exactly why it remembers what it does.

And because it listens to a log of behavior rather than a transcript of chat, what it can learn from keeps growing: every action on the platform is a signal it can weigh, through exactly the same pipeline.

Forgetting isn’t a limitation we’re engineering around. It’s the feature. The point was never to remember everything. It was so you never have to repeat yourself.

Recency bias

Memory forms in the background

Enough signal to remember

A bad memory is worse than a missing one

Remembering why

What we’re really building

More from the blog

Giving a Cloud Agent Hands on Your Machine

Thirteen Surfaces, One Engine

How We Built LLM Infrastructure That Stays Up When Providers Don’t

Professional-grade AI for the world's most complex challenges.