A memory system that remembers more should not cost more to read. Mine did. Here is the fix, and the number that proves it.
I run Claude Code with a memory plugin called claude-mem. After every working session it distills what happened into small, structured observations — a sentence or two each, tagged by project and type — and stores them in a local database. Next session, it pulls some of that history back into context so the model starts warm instead of cold. It is a genuinely good tool, and I rely on it daily.
Then I looked at what it was actually feeding the model, and found the problem every append-only memory system eventually has.
The thing it actually was
The store only grows. Every session adds observations; nothing ever consolidates them. On my heaviest project I had over sixteen hundred of these notes — dozens of them circling the same architectural decision, each one true, each one partial, none of them aware of the others.
At read time, the plugin's default move is to inject the fifty most recent narratives. So two failure modes stack on top of each other. The cost of priming a session climbs with history — more past work means more tokens spent re-reading it. And the recency cutoff means a decision I made three weeks ago is silently dropped the moment fifty newer notes pile up in front of it. I was paying more over time to remember less of what mattered.
That is the worst possible trade. The longer you work on something, the more it costs to remember, and the more of the important early reasoning falls off the back of the truck. Watching tokens go up in smoke to re-read a filing cabinet I'd already paid to fill.
The decision: stop storing the log, store the understanding
The fix is one idea, and the whole build hangs off it.
Memory cost should scale with the number of things you know about a project — not the number
of times you wrote something down. In the language I actually think in: the read cost should
be O(types), not O(observations).
So instead of one growing pile per project, I keep one evolving digest per (project, type) cell. "How it works." "Gotchas." "Decisions." "Patterns." Eight categories. Each cell holds a single tight summary — a few hundred tokens — that represents everything the system has ever learned in that category, not the last fifty notes.
The mechanism that keeps it current is a fold:
Each run, I gather only the observations newer than a stored watermark, hand the existing digest and those new notes to Claude, and ask for a rewritten digest that absorbs them — merging duplicates, resolving contradictions in favor of newer information, and dropping what's been superseded. Then the watermark advances. Run it again with nothing new and it does nothing. Interrupt it halfway and it resumes from the last committed watermark, because every batch is persisted before the next one starts.
That rewrite is the entire trick. An append log can only get longer. A digest that is rewritten each time can stay the same size while getting smarter — the way your own understanding of a codebase doesn't grow a new lobe every time you learn something; it revises the model you already had.
What it cost to build, honestly
The merge is a real model call, and I made two decisions there worth naming.
First, I route it through claude -p against my Claude subscription rather than a
metered API key — the consolidation should be something a normal Claude Code user can run
without a separate bill. That came with a sharp edge I'll save you: if
ANTHROPIC_API_KEY is set in the environment, the CLI silently prefers it over your
subscription and bills API credits instead. An empty key still wins its slot. You have to
remove it from the child process environment, not blank it. I found that the way everyone finds
it — by watching a "credit balance too low" error stop a job that should have been free.
Second, model choice is a knob. I started on Sonnet, assuming the stronger model would consolidate better. Then my heaviest project — over sixteen hundred observations — ran the subscription's rolling usage window dry partway through its first fold. So I moved that backfill to Haiku, expecting to trade quality for the ability to finish in one pass.
Haiku consolidated tighter, not looser. Every Haiku-built cell came in under the size budget; the Sonnet-built cells on two other projects overran it. And the detail survived — file paths, exact commands, version strings, the specific database index that was missing. The one thing Haiku did worse was resolve recency: a few stale dates lingered next to their replacements where Sonnet would have collapsed them. So the real trade wasn't quality for cost. It was cleaner recency resolution for better size discipline and lower spend — the opposite of what I assumed when I started.
I found that limit, and that surprise, the way I find most things — by running the thing until it broke, on purpose, and reading what the failure had to say. The drain-and-resume design exists because I assumed the first full run would die halfway. It did. It resumed clean.
What it revealed
Seven projects are fully consolidated as I write this. The numbers are token estimates —
character counts divided by four, not exact count_tokens calls — so read them as
order-of-magnitude, not to the digit.
Here is the whole store:
| Observations | Raw tokens | Digest tokens |
|---|---|---|
| 3,136 | ~1,183,000 | ~31,400 |
But the line that matters isn't the total — it's what happens per project as history piles up. The digest size barely moves. Across seven projects ranging from twenty-nine observations to over sixteen hundred, every digest lands between roughly 2,300 and 6,500 tokens. The summary does not grow with the history. It tracks how many kinds of things you know about the project — eight categories — not how much you wrote down.
Which means the compression isn't a fixed ratio. It scales with history. On a near-empty project, twenty-nine observations compress about three-fold — barely worth it, because eight summary cells have a floor cost no matter how little you've recorded. On my heaviest project, sixteen hundred observations — close to six hundred thousand tokens of raw notes — fold into about 4,300 tokens. A hundred and thirty-seven fold.
In plain terms: priming a mature project's entire memory costs a few thousand tokens — a couple percent of a 200K context window — flat, and you get all of the history instead of the last fifty notes of it. Against the plugin's current fifty-narrative default, the digest runs about half the tokens while covering everything, not just the recent slice.
Now the honest part, because numbers this good deserve the caveats stapled to them.
Two of my Sonnet-built cells overran their character budget — the model treats a size limit as a strong suggestion, not a hard stop. The guarantee isn't real until the budget is enforced in code, with a truncate, rather than requested in a prompt. The small-project floor is real too: below roughly a hundred observations, this barely beats storing the raw notes, and on the very smallest project it's close to a wash. And this is a sidecar I built on top of claude-mem, reading its database and writing my own. It is not part of the plugin. Nobody but me runs it yet.
So this is a result with its work shown, not a product announcement. But the shape of it holds, and the shape is the point: an agent's memory should get denser as it accumulates, not just longer. The cost of remembering a project should track how complex the project is — not how long you've been working on it.
The tools for giving agents long-term memory arrived this year. The discipline they still need is an old one. Don't keep every note you ever took. Keep a good summary, and revise it when you learn something.
The store grew. The understanding didn't have to.