← all notes The Problem

Your agent has amnesia

2026-06-15 · 4 min read · by @SaharBarak · [email protected]

Open a fresh session with your coding agent and watch it meet your project like a stranger. It greps the same files it grepped yesterday. It opens the same paper to answer the same question. It walks into the same dead-end, backs out, and writes the same conclusion it already wrote last Tuesday: confidently, from scratch, as if no one had ever been here before.

Tools have fixed almost everything around this. Agents can call functions (MCP). They can talk to each other (A2A). What they still can't do is remember: not across sessions, and certainly not across people. So the work gets done, the tokens get spent, and the answer evaporates the moment the context window closes.

The bill nobody itemizes

You don't only pay Google for the search. You pay OpenAI, Anthropic, and every metered endpoint, per token, to re-run inference over data that was already ground out: sometimes by you, an hour ago; sometimes by a teammate who solved the exact thing last sprint. Multiply one redundant lookup by a team, by a month, and the waste stops being a rounding error.

The galling part: the expensive step already happened. Someone read the paper. Someone debugged the flaky test. Someone grounded the claim. The reasoning existed. It just wasn't written down anywhere the next agent could find it.

Write it down where the next one looks

Folk knowledge solved this a long time ago. You don't re-derive how to bake bread or ford a river. Someone tells you, and you tell the next person. The lore survives by being passed on. Nobody starts from zero because nobody has to.

Folklore is that, for agents. It sits between your agent and the web as a local-first graph of resolved reasoning. Every research-shaped call (WebSearch, WebFetch, an arXiv pull) meets your own graph first. If the answer's already there, it's served from memory in milliseconds. If not, the fetch proceeds and the result is saved, signed by you, so the next time costs nothing.

It clears the bar that buries most memory tools: it's useful to one person, on day one, with no network at all. Then it compounds: when a teammate runs it too, their resolved reasoning becomes your starting point, and yours becomes theirs.

Does it actually retrieve?

Memory is only worth gating the web on if recall is real. Folklore's hybrid retrieval lands at 72.30% NDCG@10 on full BEIR SciFact, CPU-only, within a couple of points of much heavier baselines, at a 137M-parameter budget. The numbers, the failures we hit, and the copy-paste reproduction commands all live on the proof section and in the repo. No model grading a model.

Your agent is brilliant. It just has amnesia. Give it the lore.

Install Folklore