Episode 14
Agentic Memory: Stateful RAG and AI Agents (Chapter 16)
Unlock the future of AI agents with agentic memory — a transformative approach that extends Retrieval-Augmented Generation (RAG) by incorporating persistent, evolving memories. In this episode, we explore how stateful intelligence turns stateless LLMs into adaptive, personalized agents capable of learning over time.
In this episode:
- Understand the CoALA framework dividing memory into episodic, semantic, procedural, and working types
- Explore key tools like Mem0, LangMem, Zep, Graphiti, LangChain, and Neo4j for implementing agentic memory
- Dive into practical architectural patterns, memory curation strategies, and trade-offs for real-world AI systems
- Hear from Keith Bourne, author of *Unlocking Data with Generative AI and RAG*, sharing insider insights and code lab highlights
- Discuss latency, accuracy improvements, and engineering challenges in scaling stateful AI agents
- Review real-world applications across finance, healthcare, education, and customer support
Key tools & technologies mentioned:
Mem0, LangMem, Zep, Graphiti, LangChain, Neo4j, Pinecone, Weaviate, Airflow, Temporal
Timestamps:
00:00 - Introduction & Episode Overview
02:15 - What is Agentic Memory and Why It Matters
06:10 - The CoALA Cognitive Architecture Explained
09:30 - Comparing Memory Implementations: Mem0, LangMem, Graphiti
13:00 - Deep Dive: Memory Curation and Background Pipelines
16:00 - Performance Metrics & Real-World Impact
18:30 - Challenges & Open Problems in Agentic Memory
20:00 - Closing Thoughts & Resources
Resources:
- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition
- Visit Memriq.ai for more AI engineering deep dives and resources
Transcript
MEMRIQ INFERENCE DIGEST - ENGINEERING EDITION Episode: Agentic Memory: Chapter 16 Deep Dive on Stateful RAG and AI Agents
MORGAN:Welcome to Memriq Inference Digest — Engineering Edition. I’m Morgan, and as always, it’s great to have you with us. This podcast is brought to you by Memriq AI, a content studio focused on building tools and resources for AI practitioners. You can find us at Memriq.ai for more deep dives into AI engineering topics.
CASEY:Today’s discussion is a juicy one. We’re diving into Agentic Memory, a cutting-edge extension of Retrieval-Augmented Generation — or RAG — with stateful intelligence baked in. We’ll be unpacking key ideas from Chapter 16 of *Unlocking Data with Generative AI and RAG* by Keith Bourne.
MORGAN:And if you want to dig even deeper — with detailed diagrams, rigorous explanations, and hands-on code labs that walk you through implementation step-by-step — definitely check out Keith’s book. Search for Keith Bourne on Amazon and grab the 2nd edition.
CASEY:We’re also thrilled to have Keith himself joining us throughout the episode. He’s here to share insider insights, behind-the-scenes thinking, and real-world experience on this thorny topic that’s critical for building next-gen AI agents.
MORGAN:We’ll cover everything from how agentic memory fundamentally transforms stateless LLMs into adaptive, evolving agents, to concrete tools like Mem0, LangMem, Zep, Graphiti, LangChain, and Neo4j — plus plenty of practical architectural patterns and trade-offs.
CASEY:Let’s get started.
JORDAN:So, here’s the kicker: Agentic memory isn’t just about adding memory to LLMs. It’s about fundamentally transforming them from reactive, stateless systems into stateful agents that can *learn* and *evolve* over time.
MORGAN:That’s huge. So instead of losing context after each session, these agents can build on what they learned before — becoming more personalized, more intelligent.
CASEY:But wait — how does that even work? We’re talking about extending RAG beyond static document retrieval to dynamic, multi-modal, temporally aware memory stores. That’s a big leap.
JORDAN:Exactly. The book calls out the CoALA framework — a cognitive architecture dividing memory into episodic, semantic, procedural types — inspired by cognitive science. This isn’t just slapping on a vector store; it’s a principled way to organize memory operations.
MORGAN:And the results speak for themselves. Mem0 reports a 26% boost in accuracy over OpenAI’s memory baseline, plus a 90% drop in latency and token consumption. Graphiti hits nearly 95% accuracy on deep memory retrieval and slashes latency by 90%.
CASEY:That kind of jump can be a game-changer when you’re building agents for real-time, persistent, personalized applications. This is the frontier of what stateful AI really means in production.
CASEY:If you remember just one thing from today, it’s this: Agentic memory extends RAG by incorporating structured, evolving long-term memories that enable AI agents to maintain context, learn from experience, and personalize interactions persistently over time.
MORGAN:Key tools to keep on your radar are Mem0, LangMem, Zep, Graphiti, LangChain, and Neo4j — each tackling different aspects of multi-modal, temporal, and procedural memory.
CASEY:So, agentic memory is the core advancement that turns stateless LLMs into truly stateful AI agents. Essential knowledge if you’re building scalable, intelligent systems today.
JORDAN:The problem before was clear: LLMs like ChatGPT can understand one session brilliantly, but they can’t remember anything beyond that session. Their context windows are limited — even with recent prompt length extensions to over a million tokens, that’s not the full answer.
MORGAN:Right, and just throwing more tokens at the problem brings latency, cost, and signal-to-noise degradation. The quality of relevant context can actually go down when you load in noisy, irrelevant information.
JORDAN:Exactly. What’s changed is the rise of robust vector databases, advanced knowledge graphs, and temporal data stores. These make it possible to build scalable, persistent memory architectures that support long-term agentic intelligence.
CASEY:And the demand is soaring. Industries want AI assistants that autonomously manage complex tasks over weeks or months — tracking evolving contexts, adapting behavior, personalizing recommendations.
JORDAN:Neo4j’s graph databases and temporal knowledge graphs let you track information evolution over time. Vector stores handle semantic similarity search with blazing speed. Together, these techs enable agentic memory the book dives into.
MORGAN:Plus, infrastructure improvements are driving latency down — for example, Graphiti’s P95 latency clocks in at around 300 milliseconds, which is impressive for deep memory retrieval.
CASEY:So, this isn’t just academic. The tools and infrastructure have finally caught up to make agentic memory practical in real-world systems.
TAYLOR:At its core, agentic memory builds directly on RAG by introducing persistent, structured memory modules that dynamically evolve with agent interactions.
MORGAN:It’s a significant departure from previous stateless retrieval, which treats each query as independent without long-term context.
TAYLOR:The book’s CoALA framework lays out a cognitive architecture dividing memory into four key types: working, episodic, semantic, and procedural.
CASEY:Can you give us a quick refresher on those memory types?
TAYLOR:Sure. Working memory is immediate, short-lived context held in the current LLM window. Episodic memory stores discrete events or experiences — think of it as a diary of past interactions. Semantic memory holds structured, factual knowledge — often represented as knowledge graphs for multi-hop reasoning. Procedural memory encodes skills and behavior patterns, like how to invoke tools or apply policies.
MORGAN:And this memory is scoped too, right? Between personal memories — which are user-specific — and community memories — which are shared knowledge pools.
TAYLOR:Exactly, to balance personalization with privacy and collective intelligence. Architecturally, this means polyglot storage: vector DBs for semantic similarity, knowledge graphs for structured reasoning, and relational or key-value stores for procedural data.
MORGAN:Keith, as the author, what drove you to emphasize this concept so prominently in the book?
KEITH:Great question. I felt that treating memory as a monolithic vector database oversimplifies what real, cognitive memory entails. Drawing on cognitive science with the CoALA framework helped crystallize how different memory types serve distinct roles in agent intelligence. This separation isn’t just academic — it informs architectural patterns that improve efficiency, accuracy, and adaptability. I wanted readers to grasp this upfront because it shapes every practical implementation from there on.
TAYLOR:Now, let’s look at some contenders. Mem0 opts for a unified long-term memory store combining key-value, vector, and graph storage behind the scenes. It prioritizes retrieval performance but doesn’t explicitly separate the cognitive memory types.
CASEY:That sounds simpler to implement, but what do you lose?
TAYLOR:You trade off some cognitive fidelity and fine-grained control. LangMem, on the other hand, explicitly implements CoALA’s memory types with separate namespaces and extraction pipelines. It even supports procedural memory and integrates tightly with LangChain workflows.
MORGAN:Interesting. And what about Zep and Graphiti?
TAYLOR:They focus heavily on temporal knowledge graphs with bi-temporal tracking — capturing both valid time and transaction time. This doubles down on episodic and semantic memory with strong temporal reasoning, but they don’t natively support procedural memory. So you get precision in historical queries but need hybrid integration for behavior adaptation.
CASEY:So Mem0 is your pick for simplicity and speed; LangMem for cognitive completeness and adaptability; Graphiti for temporal reasoning and auditability.
TAYLOR:Exactly. The choice depends on your application needs. If you require procedural memory — say, dynamic agent skills — LangMem is the only production-ready option. For complex temporal queries, Graphiti shines. Mem0 is a great starting point if you want solid long-term retrieval with less engineering overhead.
ALEX:Let’s get into the guts of how agentic memory actually works. We start with working memory — this is the short-term context that lives inside the LLM’s current prompt window. But since context windows are limited, you need to selectively summarize and retain only the most relevant info.
MORGAN:So, this is a real-time curation problem?
ALEX:Exactly. Then episodic memory records discrete events — for example, conversation snippets or user actions — stored as timestamped entries in a vector DB or graph. When the agent needs to recall past experiences, it uses semantic similarity search plus graph traversal to fetch relevant episodes.
CASEY:What about semantic memory?
ALEX:That’s where you store structured factual knowledge — often modeled as knowledge graphs. These enable multi-hop reasoning, where the agent traverses relationships to infer new insights. Neo4j is a great backend here, providing ACID transactions and rich query languages like Cypher.
MORGAN:And procedural memory?
ALEX:Procedural memory encodes skills, tool-use policies, or action patterns. It’s often stored as system prompts or code snippets, dynamically updated as the agent adapts. LangMem, in particular, extracts procedural memory into isolated namespaces with background pipelines to keep behavior consistent but flexible.
JORDAN:Alex, the book offers extensive code labs on these patterns. What’s the one key thing you wish readers would internalize?
ALEX:I’d say it’s the importance of memory curation. Without aggressive deduplication, decay, and pruning, your memory quickly becomes bloated and noisy, degrading retrieval quality. Architecting asynchronous background extraction pipelines — like using Airflow or Temporal workflows — is critical to maintain performance without latency spikes during conversations.
KEITH:I’d add that memory isn’t static. Learning how to consolidate, conflict-resolve, and forget appropriately is an ongoing engineering challenge. The book goes deep on these mechanisms with example code you won’t find elsewhere.
ALEX:Now, onto the headline numbers. Mem0 boasts a 26% accuracy improvement over OpenAI’s baseline memory retrieval, coupled with a 90% reduction in latency and token consumption. That’s a huge win for both user experience and infrastructure cost.
MORGAN:90% latency reduction? That’s staggering.
ALEX:Absolutely. Graphiti’s results are equally impressive — 94.8% accuracy on Deep Memory Retrieval benchmark and an 18.5% improvement on LongMemEval tasks, all while cutting latency by 90% with P95 around 300 milliseconds.
CASEY:Those numbers aren’t just metrics; they translate to more responsive, context-aware agents that can handle nuanced, temporally rich queries in production.
ALEX:And LangMem’s procedural memory support lets agents evolve behavior dynamically— an often overlooked but critical feature for real-world applications.
CASEY:Okay, let’s pump the brakes for a moment. The book is refreshingly candid about the practical limitations.
MORGAN:Such as?
CASEY:Working memory quality is a notorious bottleneck — if your immediate context is incomplete or noisy, downstream memory extraction suffers. Then there’s the risk of memory sclerosis — outdated memories that dominate and clog the system — and memory bloat from redundant data.
JORDAN:And procedural memory?
CASEY:That’s still a tough nut. Adapting procedural memory without destabilizing agent behavior requires careful design and ongoing monitoring. Also, just expanding context windows doesn’t solve memory problems — latency and cost skyrocket, and signal-to-noise ratio worsens.
MORGAN:Privacy seems like a major concern too.
CASEY:Absolutely. Personal memory in multi-tenant systems demands strict data isolation and encryption. The book drills into these challenges and suggests namespace isolation and access control as critical engineering patterns.
MORGAN:Keith, what’s the biggest mistake you see people make when implementing agentic memory?
KEITH:Overloading the memory store without proper curation is the top pitfall. Engineers often underestimate how fast memory can degrade if you don’t aggressively prune and consolidate. Also, procedural memory is often shoehorned in without a solid update strategy, leading to unpredictable agent behavior. It’s tempting to think a bigger context window is enough, but that’s a dead end. You need a principled architecture and continuous maintenance workflows.
SAM:Let’s talk applications. Financial advisors can use agentic memory to remember client portfolios, risk tolerance, and market events over time — enabling personalized, evolving recommendations.
MORGAN:Healthcare assistants track symptoms, medication responses, and treatment progress longitudinally — critical for chronic disease management.
SAM:Technical support bots learn from debugging sessions, adapting troubleshooting strategies dynamically. Educational tutors tailor learning paths based on evolving student performance and preferences.
CASEY:And customer support bots leverage community memory for shared best practices while maintaining personal memory for individual context, improving both efficiency and user satisfaction.
MORGAN:These aren’t theoretical — many of these industries are actively deploying systems built on Mem0, LangMem, or graph-backed temporal models like Graphiti.
SAM:It’s a testament to how agentic memory is transitioning from research to production across diverse domains.
SAM:Here’s a scenario: building a healthcare assistant that requires temporal tracking of detailed patient history, personalized treatment recommendations, and adaptive behavior over time.
TAYLOR:Mem0 brings simplicity and strong retrieval performance but lacks explicit temporal reasoning and procedural memory. That could hurt when you need fine-grained historical queries.
CASEY:LangMem’s explicit memory type separation and procedural memory support offers dynamic workflows but demands more complex infrastructure and operational overhead.
MORGAN:Meanwhile, Zep/Graphiti excels with temporal knowledge graphs, offering precise historical queries and audit trails — crucial for healthcare compliance — but procedural memory support is missing, so behavior adaptation is limited or requires hybrid setups.
SAM:Could a hybrid approach combining LangMem’s procedural capabilities with Graphiti’s temporal episodic and semantic stores be the sweet spot?
TAYLOR:Absolutely. It’s complex but offers comprehensive coverage of memory types and temporal reasoning, albeit increasing system complexity.
CASEY:So, your choice depends on whether you prioritize simplicity, cognitive completeness, or temporal precision — with significant trade-offs in reliability and scalability.
SAM:And those trade-offs must be carefully weighed against domain requirements and infrastructure resources.
SAM:For engineers building agentic memory, start by implementing multi-tiered memory pipelines — working memory for immediate context, episodic and semantic stores for long-term knowledge, and procedural memory for behavior.
MORGAN:Use vector databases like Pinecone or Weaviate for semantic similarity search and Neo4j or TigerGraph for knowledge graph reasoning.
CASEY:Don’t skip memory curation: deduplication, decay, consolidation, and pruning are non-negotiable to maintain retrieval quality.
SAM:And implement asynchronous background extraction pipelines — Airflow or Temporal are great options — to avoid latency spikes during user interactions.
MORGAN:Namespace isolation and encryption are essential for privacy in multi-tenant systems.
CASEY:Monitor your key metrics closely — retrieval precision and recall, latency percentiles, storage growth, conversation coherence, and temporal consistency.
SAM:These patterns collectively form a robust toolbox to build, optimize, and maintain agentic memory in production.
MORGAN:Quick plug — *Unlocking Data with Generative AI and RAG* by Keith Bourne is packed with rich diagrams, thorough explanations, and hands-on code labs that you won’t find anywhere else. It’s the technical foundation you need if you want to go beyond the highlights we covered today.
MORGAN:A quick shoutout to Memriq AI — an AI consultancy and content studio crafting tools and resources for AI practitioners. This podcast is produced by Memriq AI to help engineers and leaders stay ahead in the fast-evolving AI landscape.
CASEY:Head over to Memriq.ai for more deep dives, practical guides, and research breakdowns — all engineered for the AI/ML community.
SAM:Despite progress, several challenges remain open. Adaptive procedural memory that learns and evolves without destabilizing agents is still a research frontier.
MORGAN:Balancing memory retention and forgetting — to avoid both bloat and sclerosis — is more art than science right now.
SAM:Privacy-preserving memory sharing between personal and community scopes demands advanced anonymization and access control techniques.
CASEY:Scaling memory systems to billions of interactions with low latency and high accuracy poses steep infrastructure demands.
SAM:And we lack standardized, automated evaluation metrics for aspects like conversation coherence, temporal consistency, and multi-hop reasoning.
MORGAN:These open problems mark exciting avenues for innovation that will define the next generation of agentic memory architectures.
MORGAN:For me, agentic memory is the gateway to truly intelligent and personalized AI agents. We’re moving from single-session chatbots to lifelong collaborators.
CASEY:I’ll add — never underestimate the engineering complexity and maintenance demands. Memory is powerful but fragile. Design carefully.
JORDAN:I’m struck by how cognitive science informs practical architecture. The CoALA framework bridges theory and engineering beautifully.
TAYLOR:Choosing the right memory architecture is a critical decision — complexity, temporal reasoning, and procedural needs all factor in deeply.
ALEX:Memory curation is the unsung hero. Without it, performance degrades fast. That’s the lever you need to pull for production readiness.
SAM:Real-world deployments prove the concept — from finance to healthcare, agentic memory is already making an impact.
KEITH:As the author, the one thing I hope you take away is that agentic memory is less about a single technology and more about orchestrating diverse memory types thoughtfully. It’s the foundation for AI agents that don’t just respond but learn, adapt, and evolve with you.
MORGAN:Keith, thanks so much for joining us and giving us the inside scoop on agentic memory.
KEITH:My pleasure. I hope this inspires you to dig into the book and build something amazing.
CASEY:This was a deep and rewarding conversation. Thanks for guiding us through the real engineering challenges behind the hype.
MORGAN:We covered the key concepts today, but remember — the book goes much deeper with detailed diagrams, thorough explanations, and hands-on code labs that let you build these systems yourself. Search for Keith Bourne on Amazon and grab the 2nd edition of *Unlocking Data with Generative AI and RAG.*
MORGAN:Thanks for listening. See you next time on Memriq Inference Digest — Engineering Edition.
