An append-only memory store grows forever. The cost of remembering a project should track how complex the project is — not how long you've worked on it. This is the sidecar that makes that true: a fold that rewrites understanding in place instead of stacking another note on the pile.
I run Claude Code with a memory plugin called claude-mem. After every working session it distills what happened into small, structured observations — a sentence or two each, tagged by project and type — and stores them in a local SQLite database with a vector index for recall. Next session, it injects some of that history back into context so the model starts warm instead of cold. It is a genuinely good tool, and I rely on it daily.
Then I looked at what it was actually feeding the model, and found the problem every append-only memory system eventually has. The store only grows. Every session adds observations; nothing ever consolidates them. On my heaviest project I had over sixteen hundred of these notes — dozens of them circling the same architectural decision, each one true, each one partial, none of them aware of the others.
At read time, the plugin's default move is to inject the fifty most recent narratives. So two failure modes stack on top of each other. The cost of priming a session climbs with history — more past work means more tokens spent re-reading it. And the recency cutoff means a decision I made three weeks ago is silently dropped the moment fifty newer notes pile up in front of it. I was paying more over time to remember less of what mattered.
The constraint I set first was that the consolidation layer could not corrupt the data it
consolidated. claude-mem's database is the source of truth; my sidecar opens it
readonly:true and never writes to it. All consolidated state lives in a separate
digests.db — a clean separation that means a bug in my build can cost me a
recompute, but never a byte of the underlying memory.
The system has three moving parts: a read-only clustering pass that decides which (project, type) cells to build, a write path that performs the fold, and a read path that assembles a session-start block and reports the token economics. The merge itself runs through a swappable provider seam, so the model doing the consolidation is a configuration value, not a hardcoded dependency.
The fix is one idea, and the whole build hangs off it. Memory cost should scale with the
number of things you know about a project — not the number of times you wrote something down.
In the language I actually think in: the read cost should be O(types), not
O(observations).
So instead of one growing pile per project, I keep one evolving digest per (project, type) cell. How it works. Gotchas. Decisions. Patterns. Eight categories. Each cell holds a single tight summary that represents everything the system has ever learned in that category — not the last fifty notes. The mechanism that keeps it current is a fold:
Each run, the sidecar gathers only the observations newer than a stored watermark, hands the existing digest and those new notes to Claude, and asks for a rewritten digest that absorbs them — merging duplicates, resolving contradictions in favor of newer information, and dropping what's been superseded. Then the watermark advances. Run it again with nothing new and it does nothing. Interrupt it halfway and it resumes from the last committed watermark, because every batch is persisted before the next one starts.
An append log can only get longer. A digest that is rewritten each time can stay the same size while getting smarter — the way your own understanding of a codebase doesn't grow a new lobe every time you learn something; it revises the model you already had. The store stays append-only and immutable. The understanding does not. New work rewrites the summary in place instead of stacking another note on top of it. Every property that makes this safe to run unattended — incrementality, resumability, idempotence — falls out of one design choice: a monotonic per-cell watermark, persisted after every batch.
One sharp edge worth naming, because it cost me a job that should have been free: the merge
routes through claude -p against my Claude subscription rather than a metered API
key. But if ANTHROPIC_API_KEY is set in the environment, the CLI silently prefers
it over the subscription and bills API credits instead. An empty key still wins its slot. You
have to remove it from the child process environment, not blank it. I found that the
way everyone finds it — by watching a "credit balance too low" error stop a job that should
have cost nothing.
Seven projects are fully consolidated. The numbers below are token estimates — character
counts divided by four, not exact count_tokens calls — so read them as
order-of-magnitude, not to the digit.
| Project | Obs | Raw tokens | Digest tokens | Ratio |
|---|---|---|---|---|
| litt | 1,640 | 587,602 | 4,288 | 137.0× |
| hoa-assistant | 753 | 310,395 | 6,489 | 47.8× |
| dev | 507 | 190,597 | 6,054 | 31.5× |
| claude-mem-consolidation | 127 | 50,824 | 4,557 | 11.2× |
| billing-legal | 46 | 21,437 | 3,555 | 6.0× |
| portfolio | 34 | 11,588 | 2,344 | 4.9× |
| transparent-confidence ◂ floor | 29 | 11,046 | 4,091 | 2.7× |
| Total | 3,136 | 1,183,489 | 31,378 | 37.7× |
The line that matters isn't the total — it's what happens per project as history piles up. The digest size barely moves. Across seven projects ranging from twenty-nine observations to over sixteen hundred, every digest lands between roughly 2,300 and 6,500 tokens. The summary does not grow with the history. It tracks how many kinds of things you know about the project — eight categories — not how much you wrote down.
Which means the compression scales with depth. On a near-empty project, twenty-nine observations compress about three-fold — barely worth it, because eight summary cells have a floor cost no matter how little you've recorded. On my heaviest project, sixteen hundred observations — close to six hundred thousand tokens of raw notes — fold into about 4,300 tokens. A hundred and thirty-seven fold. The technique pays off precisely where you need it: the deeper the history, the harder it compresses.
There was a surprise in the model choice, too. I started on Sonnet, assuming the stronger model would consolidate better. When my heaviest project drained the subscription's usage window partway through its first fold, I moved that backfill to Haiku, expecting to trade quality for throughput. Haiku consolidated tighter, not looser — every Haiku-built cell came in under budget while two Sonnet-built cells overran it — and the detail survived: file paths, exact commands, version strings, the specific database index that was missing. The one thing Haiku did worse was resolve recency, leaving a few stale dates beside their replacements. So the real trade wasn't quality for cost. It was cleaner recency resolution for better size discipline and lower spend — the opposite of what I assumed when I started.
This is a result with its work shown, not a product announcement. It runs, the numbers hold, and nobody but me runs it yet. Five things stand between this sidecar and something that could be upstreamed into the plugin itself — and I'd close them in roughly this order.
The size budget is currently advisory — it's requested in the merge prompt, and Sonnet treated it as a strong suggestion rather than a ceiling, overrunning on two cells. The guarantee isn't real until it's enforced in code with a post-merge hard truncate. A limit a model can ignore is not a limit.
Right now the sidecar keeps a separate digests.db and reads claude-mem's
database directly. Upstreaming would mean in-schema digest tables with a proper migration,
rather than a parallel file living beside the source of truth.
The merge transport is a hardcoded claude -p CLI call. It should route through
the plugin's existing provider layer (CLAUDE_MEM_PROVIDER /
CLAUDE_MEM_MODEL) so model and transport are configured once, in one place,
consistently with the rest of the system.
The build assumes Windows-absolute paths throughout. Before anyone else can run it, those need to go through the plugin's path resolvers so it works on macOS and Linux without edits.
The read path emits its session-start block to stdout today; it isn't yet injected through the SessionStart hook. Until it is, the payoff — priming a mature project on a few thousand tokens of dense digest instead of fifty raw notes — is something I run manually rather than something that happens automatically at the start of every session.
None of these change the shape of the result, and the shape is the point: an agent's memory should get denser as it accumulates, not just longer. The tools for giving agents long-term memory arrived this year. The discipline they still need is an old one. Don't keep every note you ever took. Keep a good summary, and revise it when you learn something.