Persisted Memory — Eric Tetzlaff

The Problem

The longer you work, the more it costs to remember less.

I run Claude Code with a memory plugin called claude-mem. After every working session it distills what happened into small, structured observations — a sentence or two each, tagged by project and type — and stores them in a local SQLite database with a vector index for recall. Next session, it injects some of that history back into context so the model starts warm instead of cold. It is a genuinely good tool, and I rely on it daily.

Then I looked at what it was actually feeding the model, and found the problem every append-only memory system eventually has. The store only grows. Every session adds observations; nothing ever consolidates them. On my heaviest project I had over sixteen hundred of these notes — dozens of them circling the same architectural decision, each one true, each one partial, none of them aware of the others.

At read time, the plugin's default move is to inject the fifty most recent narratives. So two failure modes stack on top of each other. The cost of priming a session climbs with history — more past work means more tokens spent re-reading it. And the recency cutoff means a decision I made three weeks ago is silently dropped the moment fifty newer notes pile up in front of it. I was paying more over time to remember less of what mattered.

The Architecture

A read-only sidecar that never touches the source of truth.

The constraint I set first was that the consolidation layer could not corrupt the data it consolidated. claude-mem's database is the source of truth; my sidecar opens it readonly:true and never writes to it. All consolidated state lives in a separate digests.db — a clean separation that means a bug in my build can cost me a recompute, but never a byte of the underlying memory.

The system has three moving parts: a read-only clustering pass that decides which (project, type) cells to build, a write path that performs the fold, and a read path that assembles a session-start block and reports the token economics. The merge itself runs through a swappable provider seam, so the model doing the consolidation is a configuration value, not a hardcoded dependency.

8

Digest types per project

40

Max new obs per batch

~700

Token budget per cell

Node.js Bun.spawn SQLite (better-sqlite3) claude -p CLI Claude Haiku Claude Sonnet claude-mem Watermark Incrementality

The Key Decision

Store the understanding, not the log.

The fix is one idea, and the whole build hangs off it. Memory cost should scale with the number of things you know about a project — not the number of times you wrote something down. In the language I actually think in: the read cost should be O(types), not O(observations).

So instead of one growing pile per project, I keep one evolving digest per (project, type) cell. How it works. Gotchas. Decisions. Patterns. Eight categories. Each cell holds a single tight summary that represents everything the system has ever learned in that category — not the last fifty notes. The mechanism that keeps it current is a fold:

digest(n) = merge( digest(n−1), { obs : obs.epoch > watermark } )

Each run, the sidecar gathers only the observations newer than a stored watermark, hands the existing digest and those new notes to Claude, and asks for a rewritten digest that absorbs them — merging duplicates, resolving contradictions in favor of newer information, and dropping what's been superseded. Then the watermark advances. Run it again with nothing new and it does nothing. Interrupt it halfway and it resumes from the last committed watermark, because every batch is persisted before the next one starts.

Design Decision — The Fold Is the Whole Trick

An append log can only get longer. A digest that is rewritten each time can stay the same size while getting smarter — the way your own understanding of a codebase doesn't grow a new lobe every time you learn something; it revises the model you already had. The store stays append-only and immutable. The understanding does not. New work rewrites the summary in place instead of stacking another note on top of it. Every property that makes this safe to run unattended — incrementality, resumability, idempotence — falls out of one design choice: a monotonic per-cell watermark, persisted after every batch.

One sharp edge worth naming, because it cost me a job that should have been free: the merge routes through claude -p against my Claude subscription rather than a metered API key. But if ANTHROPIC_API_KEY is set in the environment, the CLI silently prefers it over the subscription and bills API credits instead. An empty key still wins its slot. You have to remove it from the child process environment, not blank it. I found that the way everyone finds it — by watching a "credit balance too low" error stop a job that should have cost nothing.

What It Revealed

Compression isn't a fixed ratio. It scales with history.

Seven projects are fully consolidated. The numbers below are token estimates — character counts divided by four, not exact count_tokens calls — so read them as order-of-magnitude, not to the digit.

Project	Obs	Raw tokens	Digest tokens	Ratio
litt	1,640	587,602	4,288	137.0×
hoa-assistant	753	310,395	6,489	47.8×
dev	507	190,597	6,054	31.5×
claude-mem-consolidation	127	50,824	4,557	11.2×
billing-legal	46	21,437	3,555	6.0×
portfolio	34	11,588	2,344	4.9×
transparent-confidence ◂ floor	29	11,046	4,091	2.7×
Total	3,136	1,183,489	31,378	37.7×

The line that matters isn't the total — it's what happens per project as history piles up. The digest size barely moves. Across seven projects ranging from twenty-nine observations to over sixteen hundred, every digest lands between roughly 2,300 and 6,500 tokens. The summary does not grow with the history. It tracks how many kinds of things you know about the project — eight categories — not how much you wrote down.

Which means the compression scales with depth. On a near-empty project, twenty-nine observations compress about three-fold — barely worth it, because eight summary cells have a floor cost no matter how little you've recorded. On my heaviest project, sixteen hundred observations — close to six hundred thousand tokens of raw notes — fold into about 4,300 tokens. A hundred and thirty-seven fold. The technique pays off precisely where you need it: the deeper the history, the harder it compresses.

There was a surprise in the model choice, too. I started on Sonnet, assuming the stronger model would consolidate better. When my heaviest project drained the subscription's usage window partway through its first fold, I moved that backfill to Haiku, expecting to trade quality for throughput. Haiku consolidated tighter, not looser — every Haiku-built cell came in under budget while two Sonnet-built cells overran it — and the detail survived: file paths, exact commands, version strings, the specific database index that was missing. The one thing Haiku did worse was resolve recency, leaving a few stale dates beside their replacements. So the real trade wasn't quality for cost. It was cleaner recency resolution for better size discipline and lower spend — the opposite of what I assumed when I started.

What I'd Do Differently

The honest gap between a working result and a shippable one.

This is a result with its work shown, not a product announcement. It runs, the numbers hold, and nobody but me runs it yet. Five things stand between this sidecar and something that could be upstreamed into the plugin itself — and I'd close them in roughly this order.

1 · Make the budget real

The size budget is currently advisory — it's requested in the merge prompt, and Sonnet treated it as a strong suggestion rather than a ceiling, overrunning on two cells. The guarantee isn't real until it's enforced in code with a post-merge hard truncate. A limit a model can ignore is not a limit.

2 · Fold the storage into the schema

Right now the sidecar keeps a separate digests.db and reads claude-mem's database directly. Upstreaming would mean in-schema digest tables with a proper migration, rather than a parallel file living beside the source of truth.

3 · Route the merge through the plugin's provider abstraction

The merge transport is a hardcoded claude -p CLI call. It should route through the plugin's existing provider layer (CLAUDE_MEM_PROVIDER / CLAUDE_MEM_MODEL) so model and transport are configured once, in one place, consistently with the rest of the system.

4 · Resolve paths cross-platform

The build assumes Windows-absolute paths throughout. Before anyone else can run it, those need to go through the plugin's path resolvers so it works on macOS and Linux without edits.

5 · Wire the read side into the hook

The read path emits its session-start block to stdout today; it isn't yet injected through the SessionStart hook. Until it is, the payoff — priming a mature project on a few thousand tokens of dense digest instead of fifty raw notes — is something I run manually rather than something that happens automatically at the start of every session.

None of these change the shape of the result, and the shape is the point: an agent's memory should get denser as it accumulates, not just longer. The tools for giving agents long-term memory arrived this year. The discipline they still need is an old one. Don't keep every note you ever took. Keep a good summary, and revise it when you learn something.

NOTE: This is a personal sidecar built on top of the open-source claude-mem plugin, reading its database and writing its own. It is not part of the plugin and is not affiliated with its maintainers. Token figures are estimates (characters ÷ 4), not exact counts. Public repository forthcoming at the link above.