Living Memory

Teaching AI Agents to Remember Like Humans

A neuroscience-inspired memory architecture for persistent, evolving AI agent knowledge

Helm ADE — March 2026

Helm ADE Living Memory Architecture

Architecture Overview

        ┌────────────────────────────────────────────────────────┐
        │            CONTEXT WINDOW MANAGEMENT                   │
        │  L0/L1/L2 tiers · progressive shedding · seg rotation │
        └───────────────────────────┬────────────────────────────┘
                                    │
  fast capture                      ▼  keeps sessions running
  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────────┐
  │             │  │             │  │                          │
  │    AGENT    │─▶│  EPISODIC   │  │        MEMORY.md         │
  │   SESSION   │  │   FACTS     │  │  assembled at sess start │
  │             │  │             │  │                          │
  └─────────────┘  └──────┬──────┘  └──────────────────────────┘
        ▲                 │                       ▲
        │                 │  decay over time       │
        │                 ▼                       │
        │  ┌─────────────────────────────────┐    │
        │  │    CONSOLIDATION PIPELINE       │    │
        │  │                                 │    │
        │  │  cluster episodes → extract     │    │
        │  │  patterns · prioritize          │    │
        │  │  corrections · resolve time     │    │
        │  └────────────────┬────────────────┘    │
        │                   │                     │
        │                   ▼                     │
        │  ┌─────────────────────────────────┐    │
        │  │                                 │    │
        │  │      SEMANTIC KNOWLEDGE         │────┘
        │  │   durable, generalized patterns │
        │  │                                 │
        │  └────────────────┬────────────────┘
        │                   │  high confidence
        │                   ▼
        │  ┌─────────────────────────────────┐
        │  │        CRYSTALLIZATION          │
        │  │                                 │
        │  │  CONVENTIONS.md · org settings  │
        └──│  permanent artifacts, approved  │
           │                                 │
           └─────────────────────────────────┘

  ┌──────────────────────────────────────────────────────────┐
  │  POWER-LAW DECAY + REINFORCEMENT                        │
  │                                                          │
  │  · facts decay following t⁻λ curve (not exponential)    │
  │  · reinforcement from retrieval slows decay              │
  │  · cross-session spacing provides stronger durability    │
  │  · nothing deleted — low-confidence facts fade naturally │
  └──────────────────────────────────────────────────────────┘

The Problem: Every Session Starts From Zero

AI coding agents are stateless. Every time you start a new session, the agent knows nothing — not your coding conventions, not the decisions you made yesterday, not the architectural patterns your team agreed on last week. You find yourself repeating the same instructions, correcting the same mistakes, and re-explaining the same context. Session after session.

The tools we have today make this worse, not better. Most AI development environments handle long conversations with "compaction" — a lossy summarization pass that throws away old messages when the context window fills up. The agent's short-term memory gets crushed into a paragraph, and the nuance is gone. Decisions that took twenty minutes to reach get compressed into a sentence that loses the reasoning behind them.

This isn't just annoying. It's a fundamental limitation. An agent that can't remember is an agent that can't learn. And an agent that can't learn will never get better at working with you, no matter how many sessions you run.

Living Memory is our answer to this problem. But rather than inventing a novel architecture from scratch, we looked at the system that's been solving this problem for millions of years: the human brain.

What the Brain Actually Does

When cognitive scientists study how humans form long-term memories, they don't find a simple recording device. They find a surprisingly sophisticated two-stage system that prioritizes useful knowledge over raw recall — and that forgets strategically as a feature, not a bug.

Two Stores, Two Speeds

The dominant model in memory research is Complementary Learning Systems (CLS) theory, proposed by McClelland, McNaughton, and O'Reilly in 1995 and actively refined over the following decades — most notably by Kumaran, Hassabis (DeepMind), and McClelland in 2016, who updated the framework to incorporate schema-consistent rapid learning and connections to deep reinforcement learning. Thirty years later, CLS remains the canonical framework. The core insight: the brain maintains two separate memory systems that work at different speeds and serve different purposes.

The hippocampus acts as fast episodic storage. It captures individual experiences quickly — what happened, when, where. These memories are specific and vivid at first, but they're also fragile. They decay fast if they're not reinforced.

The neocortex acts as slow semantic storage. It doesn't store individual episodes. Instead, it gradually extracts patterns across many experiences and builds generalized knowledge — schemas, conventions, rules of thumb. This knowledge is durable. Once your brain has consolidated a pattern from enough experiences, it persists even after the individual episodes that produced it have faded.

This is why you might not remember the specific meeting where your team decided to use a particular API pattern, but you know the pattern. The episodes faded. The knowledge remained.

The original CLS model characterized neocortical learning as strictly slow. Subsequent work by McClelland (2013) showed the neocortex can actually learn rapidly when new information is consistent with existing schemas — a refinement that matters for our architecture, where reinforcing an already-established convention should consolidate faster than introducing a novel one. The 2016 update (Kumaran, Hassabis & McClelland, "CLS 2.0") further showed that the hippocampus does more generalization than originally thought, and that replay during sleep has more sophisticated structure than simple repetition — insights that informed our consolidation pipeline design.

Forgetting Is a Feature

Hermann Ebbinghaus first documented the forgetting curve in 1885, and the follow-up work by Wixted and Ebbesen in 1991 showed something important: forgetting follows a power law, not an exponential curve. The practical difference matters.

With exponential decay, old memories disappear at a constant rate regardless of how long they've survived. With power-law decay, memories that have survived a long time become more resistant to further forgetting. A fact you've remembered for a year is harder to forget than a fact you've remembered for a day — not because of some arbitrary threshold, but because survival itself is evidence of importance.

Reinforcement modulates this curve. Every time you recall a memory, you make it more durable. A convention you apply every day decays almost imperceptibly. A one-off decision you never revisit fades within weeks. The brain doesn't need explicit cleanup rules. The forgetting curve handles it naturally.

Spacing Beats Cramming

The spacing effect, documented across 259 of 271 studies in Cepeda et al.'s 2006 meta-analysis, shows that when you reinforce matters as much as how often. Recalling a fact five times across five separate days produces dramatically more durable memories than recalling it five times in a single study session. The brain treats spaced repetitions as independent evidence of importance — each time you recall something in a different context, you're proving it's worth keeping.

The lag effect takes this further: longer intervals between repetitions produce even better retention. A fact you recall after a day, then after a week, then after a month, becomes deeply entrenched. This is the opposite of how most database systems handle access counts — they track total hits without caring about temporal distribution.

Interference: New Learning Suppresses Old

Interference theory, established by Underwood in 1957, explains one of the most common causes of real-world forgetting: new learning actively interferes with the retrieval of older, related memories. When you learn that your team is switching from REST to GraphQL, your knowledge of the GraphQL patterns doesn't just add to your knowledge of REST — it actively makes the REST patterns harder to recall.

This retroactive interference is strongest between semantically similar memories. Two facts about API architecture interfere with each other far more than a fact about API architecture and a fact about CSS styling. This has direct implications for how a memory system should handle project migrations and convention changes — the old way should fade when the new way arrives, not persist as an equal competitor.

Memories Are Mutable

Karim Nader's reconsolidation research, published in 2000, overturned the assumption that memories are fixed once stored. When you retrieve a memory, it temporarily becomes mutable — it re-enters an unstable state where it can be updated, strengthened, or modified based on your current context. If you recall a fact and your current experience contradicts it, the memory gets updated. Every retrieval is an opportunity to refine, not just to read.

This is the opposite of how most database systems work. Databases treat stored records as immutable unless explicitly edited. The brain treats every access as a potential update.


Living Memory: The Architecture

Living Memory translates these neuroscience principles into a concrete system architecture for AI agents. It's not a metaphor — each principle maps to a specific technical mechanism.

Two-Tier Memory

Living Memory maintains two distinct data stores, mirroring the hippocampus and neocortex:

Episodic memory (memory_facts table) stores individual facts extracted from agent sessions. When an agent conversation mentions a decision ("let's use guard let over force unwrap"), a preference ("I prefer functional components"), or a pattern ("this repo's API routes are in /app/api/"), the system extracts it as a discrete episodic fact with a confidence score. These facts are written fast, searched via vector embeddings, and decay over time if not reinforced.

Semantic knowledge (semantic_knowledge table) stores generalized patterns consolidated from many episodes. When the consolidation pipeline detects that 12 correction episodes across 7 sessions all point to the same Swift convention, it produces a single semantic knowledge entry: "This project uses guard let over force unwrap." The semantic entry is durable. The individual episodes can fade.

This separation matters for practical reasons. Session-start context should be built from stable, generalized knowledge — not from a grab-bag of raw observations that might include tentative ideas, one-off remarks, or outdated decisions. Agents start each session with a MEMORY.md file assembled from the semantic layer, which provides the kind of reliable background knowledge that makes an agent feel like it knows you and your project.

Power-Law Decay With Reinforcement

Every episodic fact begins decaying the moment it's created. The decay follows a power-law curve, not a static threshold or an exponential function:

effective_confidence = base_confidence × log₂(access_count + 1) × log₂(distinct_session_count + 1) × t^(-λ)

Where:

The log₂(access_count + 1) term ensures each reinforcement has diminishing returns — going from 0 to 1 retrieval doubles durability, but going from 100 to 101 barely moves the needle.

The log₂(distinct_session_count + 1) term is the spacing bonus. A fact retrieved 10 times in a single session gets a spacing bonus of 1.0. The same fact retrieved 10 times across 8 different sessions gets a spacing bonus of 3.17 — making it roughly three times more durable. This directly implements the spacing effect: cross-session reinforcement is substantially more meaningful than same-session repetition, because each separate session represents an independent context in which the fact proved relevant.

The decay rate modulation means a fact accessed once a year decays much faster than a fact accessed every day, but both continue to decay — nothing is exempt.

Nothing is ever deleted from the database. A fact with near-zero effective confidence is simply never surfaced in ranked results, but it remains retrievable if the right cue appears — a specific search query, a high-similarity vector match. Just like human memory, where the right cue can surface a "forgotten" memory decades later.

This eliminates the need for artificial cleanup rules. No "delete facts older than 90 days." No "prune when the database exceeds N entries." The forgetting curve handles relevance naturally, and reinforcement ensures that important knowledge persists.

Reconsolidation on Retrieval

When an agent searches memory or when MEMORY.md is assembled at session start, the system doesn't just read facts and return them. It checks whether the current session context contradicts any retrieved fact.

If an agent retrieves the fact "this project uses REST for all APIs" but the current session shows the team migrating to GraphQL, the system flags a prediction error. The fact is returned with a marker suggesting it may be outdated, and the agent can choose to update, supersede, or confirm it.

This is lightweight and heuristic-based — no LLM call, just keyword and entity overlap detection. False negatives are acceptable (consolidation will catch them later). False positives are marked as suggestions, not auto-corrections. The goal is to make retrieval an active process, not passive playback.

There's a subtler retrieval dynamic at work too. Anderson, Bjork, and Bjork's 1994 research on retrieval-induced forgetting showed that retrieving one memory actively suppresses competing memories in the same category — about a 13% recall reduction for related-but-unpracticed items. In Living Memory, when context_search retrieves "use guard let" and reinforces it, competing error-handling approaches naturally receive no reinforcement and decay faster. For conventions, this is actually desirable — the dominant pattern gets reinforced through use, and alternatives fade through neglect. It becomes a concern only if genuinely valid alternatives get suppressed. The existing architecture handles this gracefully: alternatives are never deleted, just less prominent. If someone explicitly searches for an alternative approach, that search reinforces it, counteracting the suppression. No special mechanism needed — the power-law decay and reinforcement system produces the right behavior naturally.

Consolidation Pipeline

The consolidation pipeline is a background process (a Supabase Edge Function on a configurable schedule, typically daily) that models the brain's overnight memory consolidation — the process where the hippocampus replays experiences and the neocortex extracts lasting patterns.

It runs in three phases:

Phase 1 — Episodic maintenance. Apply power-law decay (with spacing bonus) to all active episodic facts. Facts that have dropped below a minimum effective confidence threshold are deactivated (removed from standard query results, but not deleted). Deduplicate near-identical facts. Resolve temporal conflicts where newer facts contradict older ones — this explicitly handles retroactive interference, ensuring that when a project migrates from one approach to another, the old approach doesn't persist as an equal competitor to the new one.

Phase 2 — Episodic-to-semantic consolidation. This is the core insight. The pipeline clusters related episodic facts by vector similarity, looking for groups that span multiple sessions and meet a minimum cluster size.

Clusters are processed in priority order: correction-sourced clusters go first. This implements prioritized experience replay — the principle from reinforcement learning (Schaul et al., 2015) that high-surprise, high-error experiences carry stronger learning signals and should be replayed with higher priority. A cluster of five user corrections about error handling consolidates before a cluster of five passive observations about import patterns, because corrections represent direct learning signal.

Within each cluster, facts are temporally ordered — the most recently reinforced fact's assertion dominates the extracted gist. This prevents stale knowledge from bleeding into consolidated patterns. If a cluster about API patterns contains older "use REST" facts and newer "use GraphQL" facts, the gist reflects the current state, not an average of old and new.

From each qualifying cluster, the pipeline extracts the generalized pattern — the gist — and creates a semantic knowledge entry. Before inserting, it checks for conflicts with existing semantic knowledge. Schema-consistent new knowledge (high similarity, same assertion) merges with the existing entry — this is fast assimilation, the mechanism McClelland identified in 2013 where the neocortex can learn rapidly when new information fits existing schemas. Schema-inconsistent knowledge (moderate similarity, contradictory assertion) triggers accommodation — the newer entry supersedes the older one, just as Piaget described how existing knowledge structures get revised when new evidence contradicts them.

For example, if across 8 sessions, the system has captured facts like "user corrected: use guard let not force unwrap," "user said: always use guard let," "builder session: applied guard let pattern," and "correction: force unwrap replaced with guard let" — the pipeline consolidates these into a single semantic entry: "Swift convention: use guard let over force unwrap." The entry tracks which episodic facts contributed to it and how many sessions it spans.

The source episodes continue to decay normally. They're not deleted or frozen — they can still be individually retrieved via search. But the durable representation is now the semantic entry, which has its own (much slower) decay characteristics.

Phase 3 — Crystallization candidates. When a semantic knowledge entry reaches a high confidence threshold and has been reinforced by enough independent sessions, it becomes a candidate for crystallization — promotion into a durable artifact that persists outside the memory system entirely.

Four-Layer Ownership

All knowledge — both episodic and semantic — belongs to one of four ownership layers:

┌─────────────────────────────────────────────────────┐ │ ORG │ │ "We use Tailwind. Deploy to Vercel." │ ├─────────────────────────────────────────────────────┤ │ PROJECT-ORG │ │ "This repo uses Next.js 16. API routes in /app/api"│ ├─────────────────────────────────────────────────────┤ │ USER-PROJECT │ │ "Mark prefers functional components here" │ ├─────────────────────────────────────────────────────┤ │ USER │ │ "Mark likes concise responses. Timezone: Pacific" │ └─────────────────────────────────────────────────────┘

The most specific layer wins on conflict. If Mark prefers tabs for one project but the org convention is spaces, the agent uses tabs for that project.

Org and user-scoped knowledge lives in Supabase (the cloud database), not in any single project repository — because it applies across projects. Project-scoped knowledge can be crystallized into files in the project repo (like CONVENTIONS.md) where it's version-controlled alongside the code.

Crystallized Knowledge

Crystallization is the process of promoting high-confidence semantic knowledge out of the memory system and into durable artifacts. The destination depends on the ownership layer:

Ownership Layer Crystallization Target
Project-Org CONVENTIONS.md additions (committed to the project repo)
Org Supabase-stored org conventions (injected into all project sessions)
User-Project MEMORY.md content (automatic, no approval needed)
User Supabase-stored user preferences (injected into all sessions)

For project-org and org-scoped knowledge, crystallization requires human approval. The system identifies candidates and surfaces them in the Helm UI — "Based on 14 corrections across 9 sessions, we think this is a project convention: always use guard let over force unwrap. Approve?" The user reviews and accepts or rejects.

This is a deliberate design choice. Today, the user is both the extraction engine and the approval gate — they manually recognize patterns and instruct the agent to update CONVENTIONS.md. Living Memory automates the extraction and leaves only the approval step with the user. The system proposes. The human decides.

User-scoped knowledge doesn't need this gate. If the system has high confidence that a user prefers concise responses (because it's been reinforced across many sessions), that preference flows into MEMORY.md automatically. It's self-correcting — if the preference changes, new episodes will reinforce the new behavior and the old entry will naturally decay.


Context Window Management

Memory persistence is only half the problem. The other half is what happens during a session. Current AI runtimes handle long conversations by compacting — lossy single-pass summarization that permanently destroys context when the window fills up. This is the equivalent of cramming for an exam by ripping pages out of the textbook.

Living Memory replaces compaction with three mechanisms that manage context depth without losing information.

Tiered Content (L0/L1/L2)

Every piece of knowledge in the system — code files, task descriptions, research findings, memory facts — is stored at three depth levels:

Agents start with L1 overviews. If they need more detail, they drill into L2. When they're done with a piece of context, they shed back to L0 or remove it entirely. The original content is never lost — it's always available in the database at full fidelity.

For code files, L1 summaries are generated using tree-sitter AST extraction (fast, cheap, no LLM needed) — just the structural outline: function signatures, class definitions, imports, exports. For non-code content, L1 uses LLM summarization.

Progressive Shedding

Helm continuously monitors the agent's context window usage against the model's capacity (e.g., 128K tokens for Claude). Three escalation tiers:

The key guarantee: the underlying AI runtime's own compaction never fires. Helm keeps context usage below that threshold 100% of the time. No lossy summarization ever occurs.

Session Segment Rotation

When shedding alone isn't enough, Helm performs a segment rotation — functionally starting a new session, but with a continuity bridge that preserves the user's experience of a single uninterrupted conversation.

Before rotating, Helm extracts all facts from the current segment (the "extract before you shed" principle, borrowed from engram). The new segment starts with MEMORY.md (rebuilt from semantic knowledge), pinned context, the recent conversation tail, and a continuity prompt that tells the agent what it was working on.

The user sees a stable session identity. There might be a brief toast notification — "Context refreshed" — but no disruption. The session can run indefinitely.


Research Persistence

AI coding agents frequently delegate to sub-agents for research — "look up how to implement WebSocket reconnection in Swift," or "find all references to this API pattern." These sub-agents run in their own context windows, which are even more aggressively compacted than the parent session. Hours of research can be compressed into a paragraph.

Living Memory intercepts sub-agent outputs at the infrastructure level (the sub-agents are unmodified) and streams them into the L0/L1/L2 storage system with vector embeddings. The research is fully recoverable via semantic search, reusable across sessions, and available to other agents and team members working in the same org.

Before spawning a new research sub-agent, Helm checks for prior completed research with high vector similarity. If someone already researched WebSocket reconnection patterns last week, the agent gets those results instead of starting from scratch.


Behavioral Learning

Beyond extracting facts from conversations, Living Memory observes how agents use tools, how users correct agent behavior, and what session outcomes look like. These behavioral observations feed into the same memory system and follow the same lifecycle.

User corrections are the highest-value signal. When a user says "no, don't use that pattern, use this instead," the system captures a structured correction with both the wrong approach and the right one. Corrections start at high confidence (0.9) because they represent direct user instruction.

Tool call patterns capture recurring sequences — when an agent consistently runs tests after making changes, or always checks documentation before writing code, those patterns are extracted at lower confidence and build strength through repetition.

When the consolidation pipeline detects enough related behavioral observations across sessions, it generates a candidate that can become a project skill — a loadable instruction file that future agent sessions pick up automatically. "When modifying Swift views, always check for accessibility modifiers. When running tests, use the --parallel flag." These are presented for human approval before activation, just like convention candidates.

The loop closes: behavior is observed → patterns are extracted → knowledge is consolidated → conventions or skills are proposed → the user approves → agents behave better → new behaviors are observed. Over time, the agents genuinely improve at working with you.


What This Means in Practice

A developer using Living Memory would experience something like this:

Week 1. The system is learning. Episodic facts accumulate as the developer works across several coding sessions. The agent starts to remember recent preferences — "you asked for concise code comments yesterday" — but the memory is still shallow.

Month 1. Consolidation has run 30 times. Semantic knowledge entries are forming. The developer no longer needs to remind the agent about project conventions — they're in MEMORY.md at session start. Research from last week's debugging session is retrievable. Tool corrections have been captured.

Month 3. The system has proposed several convention candidates based on patterns it detected across dozens of sessions. The developer approved most of them, and they're now in CONVENTIONS.md. A couple of generated skills are active, automatically loaded into builder sessions. The agent makes the same kind of mistakes less frequently.

Month 6. Early episodic facts from month 1 have mostly faded — their effective confidence is near zero. But the knowledge they produced is durable. The semantic entries are well-reinforced, the conventions are crystallized in artifacts, and the agent behaves as if it's been working on this project all along.

The developer didn't have to do anything special. They didn't curate a knowledge base. They didn't write a training manual. They just worked, and the system learned.


Design Principles

Several principles guided the design, each rooted in the neuroscience research:

No artificial thresholds. No "delete after 90 days." No "maximum 10,000 facts." The power-law forgetting curve handles relevance naturally. Reinforcement ensures important knowledge persists. Unreinforced knowledge fades on its own.

No permanent deletion. Facts become asymptotically irrelevant but remain in the database. With the right cue — a specific search query, a high-similarity vector match — any fact can be resurfaced. This matches human memory, where "forgotten" memories can be triggered by unexpected cues years later.

All fact types follow the same lifecycle. Observations, decisions, preferences, conventions — they all decay the same way, reinforce the same way, and consolidate the same way. There's no second-class tier. Signal strength is the only differentiator. A heavily reinforced observation that keeps appearing across sessions will consolidate into durable knowledge just as effectively as an explicit decision.

Spacing over repetition. A fact proven relevant across many separate sessions is more durable than a fact referenced many times in one session. The system tracks cross-session reinforcement separately from raw access counts, implementing the spacing effect directly. This means a convention that surfaces naturally across a month of work builds deeper durability than one that gets mentioned twenty times in a single discussion.

New knowledge supersedes old on conflict. When facts contradict each other, the system doesn't average them or keep both equally accessible. It applies interference theory: the more recently reinforced assertion wins, and the older one fades. This is how human memory handles migrations — you don't carry equal access to your old phone number and your new one. The new one takes over, and the old one becomes harder to recall.

Corrections are the strongest signal. User corrections carry the highest confidence because they represent direct prediction errors — moments where the agent's behavior diverged from what the user wanted. The consolidation pipeline prioritizes correction-sourced clusters, consolidating them first and with lower thresholds. This implements prioritized experience replay: the system learns fastest from its mistakes.

The user approves, the system proposes. The system handles pattern recognition, clustering, gist extraction, and candidate generation. The user makes the final call on whether a pattern becomes a permanent convention or skill. This puts the human where they're most effective — quality judgment — and takes them out of where they're least effective — exhaustive pattern detection across hundreds of sessions.

Knowledge artifacts, not instruction artifacts. Memory consolidation produces knowledge: "this project uses guard let." It does not produce agent instructions: "when you see a force unwrap, replace it with guard let." The distinction matters. Knowledge lives in project artifacts (CONVENTIONS.md, MEMORY.md) and organizational stores (Supabase). Instructions live in the agent toolkit (prompt files, skill files, configuration). They're separate systems with separate update mechanisms.


Technical Foundation

Living Memory runs on a straightforward stack: Swift/SwiftUI on the Mac side for extraction and context management, Supabase Postgres with pgvector on the backend for storage and vector search, and Supabase Edge Functions for server-side consolidation.

Fact extraction runs inline with every agent turn using fast heuristic pattern matching — no LLM calls, under 50ms per message. Vector embeddings are generated server-side using OpenAI's text-embedding-3-small model (1536 dimensions), and semantic search uses pgvector's cosine similarity. The consolidation Edge Function runs on a configurable schedule (daily by default) via pg_cron.

The capacity math is comfortable. At roughly 500-1,000 new episodic facts per day for a team of five, the system accumulates around 250-365K vectors per year. Semantic knowledge grows much slower — maybe 10-50 entries per day from consolidation. This is well within pgvector's operational range on a standard Supabase instance, and embedding costs are negligible (around $0.01-0.05 per day at current OpenAI pricing).

All of this runs above the AI runtime layer (opencode), not inside it. No forks, no patches, no modifications to the underlying tooling. Helm manages context through behavioral control — monitoring usage, shedding proactively, and rotating segments when needed — so the runtime's own compaction never triggers.


What We're Not Building

Some things are deliberately out of scope:

These are future possibilities, not current commitments.


The Insight

The central insight behind Living Memory is that AI agents don't need a database — they need a memory. Databases are designed for perfect recall: everything is stored at full fidelity, nothing is forgotten unless explicitly deleted, and retrieval returns exact matches. Memory works differently. It's lossy by design. It generalizes. It forgets strategically. It strengthens through use. It updates on access.

Current approaches to AI agent persistence try to give agents databases — append-only logs of everything that happened, keyword search over conversation history, or vector stores that grow without bound. These approaches drown in their own data. The tenth-best fact from six months ago competes equally with the most important convention established yesterday.

Living Memory gives agents a memory. Raw experiences are captured fast and decay fast. Patterns that recur across many experiences are consolidated into durable knowledge. That knowledge is what agents actually use. Reinforcement keeps important things alive. Forgetting removes noise naturally. And when knowledge reaches sufficient confidence, it crystallizes into permanent artifacts that outlast the memory system itself.

This is how humans build expertise. Not by remembering every conversation, but by extracting patterns from many conversations and letting the specifics fade. Living Memory brings that capability to AI agents.


References