Episode 12

Graph-Based RAG: Hybrid Embeddings & Explainable AI (Chapter 14)

Unlock the power of graph-based Retrieval-Augmented Generation (RAG) in this technical deep dive featuring insights from Chapter 14 of Keith Bourne's "Unlocking Data with Generative AI and RAG." Discover how combining knowledge graphs with LLMs using hybrid embeddings and explicit graph traversal can dramatically improve multi-hop reasoning accuracy and explainability.

In this episode:

- Explore ontology design and graph ingestion workflows using Protégé, RDF, and Neo4j

- Understand the advantages of hybrid embeddings over vector-only approaches

- Learn why Python static dictionaries significantly boost LLM multi-hop reasoning accuracy

- Discuss architecture trade-offs between ontology-based and cyclical graph RAG systems

- Review real-world production considerations, scalability challenges, and tooling best practices

- Hear directly from author Keith Bourne about building explainable and reliable AI pipelines

Key tools and technologies mentioned:

- Protégé for ontology creation

- RDF triples and rdflib for data parsing

- Neo4j graph database with Cypher queries

- Sentence-Transformers (all-MiniLM-L6-v2) for embedding generation

- FAISS for vector similarity search

- LangChain for orchestration

- OpenAI chat models

- python-dotenv for secrets management

Timestamps:

00:00 - Introduction & episode overview

02:30 - Surprising results: Python dicts vs natural language for KG representation

05:45 - Why graph-based RAG matters now: tech readiness & industry demand

08:15 - Architecture walkthrough: from ontology to LLM prompt input

12:00 - Comparing ontology-based vs cyclical graph RAG approaches

15:00 - Under the hood: building the pipeline step-by-step

18:30 - Real-world results, scaling challenges, and practical tips

21:00 - Closing thoughts and next steps

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Visit Memriq AI at https://Memriq.ai for more AI engineering insights and tools

Transcript

MEMRIQ INFERENCE DIGEST - ENGINEERING EDITION Episode: Graph-Based RAG: Chapter 14 Deep Dive on Hybrid Embeddings & Explainable AI

MORGAN: 00:00

Welcome to Memriq Inference Digest — Engineering Edition. I’m Morgan, and today we’re diving deep into a fascinating intersection of knowledge graphs and retrieval-augmented generation—specifically graph-based RAG. This episode is brought to you by Memriq AI, a content studio building tools and resources for AI practitioners—check them out at Memriq.ai.

CASEY: 00:20

We’re exploring material from Chapter 14 of “Unlocking Data with Generative AI and RAG” by Keith Bourne, who is joining us as today’s special guest. Keith’s book is packed with detailed diagrams, thorough explanations, and hands-on code labs. So while we’re giving you the highlights here, if you want to really get your hands dirty, definitely search for Keith Bourne on Amazon and grab the second edition.

MORGAN: 00:42

And Keith is here throughout the episode to share insider insights, behind-the-scenes thinking, and real-world experience on graph-based RAG — from ontology design with Protégé to embedding with Sentence-Transformers, graph databases like Neo4j, and integration with LangChain and OpenAI chat models.

CASEY: 01:00

We’ll cover everything from surprising findings on KG representation formats to the architecture of hybrid embeddings, and the trade-offs with vector-only approaches. We’ll also discuss real-world production considerations, scalability, and pitfalls you’ll want to avoid.

MORGAN: 01:18

So buckle up—it’s a technical deep dive for AI/ML engineers, data scientists, senior ICs, and infrastructure pros who want to build next-level retrieval and reasoning pipelines with LLMs grounded in structured knowledge.

JORDAN: 01:35

Here’s a nugget that blew me away: representing knowledge graph relationships as Python static dictionaries—yes, plain old Python dicts—boosted multi-hop reasoning accuracy in LLMs to 67.9%. To put that in context, the same knowledge expressed as natural language hit just 44.6%, and JSON barely scraped 26.1%.

MORGAN: 01:55

Wait, hold on—that’s a 78% improvement over natural language? That’s massive.

CASEY: 02:02

That’s counterintuitive. You’d expect natural language, which is what LLMs are trained on, to do better than a rigid data structure like a Python dictionary.

JORDAN: 02:10

Exactly. But the author points out that Python dicts let the model perform computational reasoning steps explicitly — like traversing relationships — instead of pattern-matching ambiguous text.

MORGAN: 02:22

And then there’s the hybrid embeddings — combining textual descriptions with multi-hop graph context — that outperformed pure vector-based retrieval pipelines, especially in semantic search.

CASEY: 02:32

Plus, by using graph traversal instead of just vector similarity, they achieved precise, explainable multi-hop reasoning. This isn’t just throwing darts in vector space — it’s a guided, structured search.

JORDAN: 02:43

Right, and when you combine formal ontologies with graph DBs like Neo4j, you get a scalable, production-ready knowledge graph that grounds LLMs in verified domain knowledge. That’s a powerful toolkit for anyone building explainable AI.

MORGAN: 02:58

That resonated with me. If you’re scaling AI beyond simple retrieval to nuanced, multi-step inference, this approach is a game changer.

CASEY: 03:08

If you remember just one thing: Graph-based RAG marries knowledge graphs with LLMs using hybrid embeddings and explicit graph traversal to unlock precise, explainable multi-hop reasoning that goes beyond traditional vector search.

MORGAN: 03:20

Key tools in the mix? Neo4j for the graph database, LangChain for orchestration, Sentence-Transformers for embeddings, FAISS for vector indexing, and OpenAI models for generation.

CASEY: 03:32

The big takeaway is that structuring your retrieval around the graph and embedding that structure semantically improves both accuracy and trustworthiness.

JORDAN: 03:41

The timing for graph-based RAG is perfect. Traditional vector-based RAG has had success, but it struggles with precise reasoning, especially when queries require multi-hop inference. You want the system to explain how it got there, but vector search alone can’t provide that traceability.

CASEY: 03:57

So why couldn’t we do this before?

JORDAN: 04:00

A few things changed recently. First, embedding techniques improved drastically, allowing us to create hybrid embeddings that fuse textual and structural data. Second, graph databases like Neo4j matured to support large, complex ontologies with efficient querying and indexing. Third, frameworks like LangChain emerged to glue all these pieces together seamlessly.

MORGAN: 04:18

And regulatory domains like finance and healthcare demand grounded and explainable AI. They can’t risk hallucinated or unverifiable answers. That’s driving adoption of graph-based RAG.

JORDAN: 04:29

Exactly. The book highlights how semantic search combined with explicit graph traversal addresses these pain points, making multi-hop, explainable reasoning practical at scale.

CASEY: 04:38

So it’s both a push from technology readiness and a pull from industry needs — a perfect storm for graph RAG to shine.

TAYLOR: 04:46

At the core, graph-based RAG uses knowledge graphs as structured data sources to provide context to LLMs. Instead of just dumping text into a vector store, it retrieves entities and relationships explicitly modeled in the graph.

MORGAN: 04:58

How does that differ from classic RAG?

TAYLOR: 05:01

Classic vector-only RAG performs semantic similarity search over chunks of text or documents. But it treats knowledge as unstructured blobs — no explicit connections. Graph-based RAG encodes topology and multi-hop relationships directly, so retrieval can leverage graph traversals.

TAYLOR: 05:18

Architecturally, the system ingests an ontology built in Protégé, exports RDF triples, parses and imports them into Neo4j with constraints and navigational anchor nodes. Then it generates hybrid embeddings combining textual node descriptions with connected graph context. Queries embed natural language input, perform semantic search over hybrid embeddings using FAISS, then expand results in Neo4j to retrieve precise subgraphs. Finally, these subgraphs are formatted as Python static dictionaries for the LLM to consume.

MORGAN: 05:47

And Keith, as the author, what made this concept so important to cover early in the book?

KEITH: 05:52

Great question, Taylor. The core insight is that LLMs excel at language but struggle with multi-hop reasoning when context isn’t explicitly structured. By combining ontologies with graph databases and hybrid embeddings, you create a retrieval backend that provides both semantic richness and formal structure. This bridges natural language understanding with precise data traversal. Getting that foundation right early empowers readers to grasp the architecture and build robust, explainable AI applications.

TAYLOR: 06:18

That makes sense. The book really emphasizes building from the ground up — ontology design, graph ingestion, embedding generation — instead of treating RAG as a black box.

KEITH: 06:26

Absolutely. And the hands-on code labs in the book walk you through every step, so you’re not just reading theory but implementing real pipelines.

TAYLOR: 06:33

Let’s compare different approaches to graph-based RAG, starting with ontology-based knowledge graphs versus cyclical graphs. Ontology KGs are directed acyclic graphs — DAGs — which provide stable, schema-driven structures ideal for precise lookups. Cyclical graphs support recursive and feedback-driven reasoning but require complex cycle detection and traversal logic.

CASEY: 06:52

Sounds like ontology-based KGs are easier to implement but less flexible?

TAYLOR: 06:56

Precisely. Ontologies offer stability and enforce schema constraints, which is great for regulated domains. Cyclical graphs can model dynamic relationships but add complexity and potential performance hits.

MORGAN: 07:08

What about the data representation formats? We heard Python static dictionaries outperform natural language and JSON for feeding KG data to LLMs.

TAYLOR: 07:15

That’s a key trade-off. Python dicts enable explicit computational traversal for LLMs, boosting reasoning accuracy to 67.9%, compared to 44.6% for natural language and 26.1% for JSON. But dicts may require more careful formatting and integration.

CASEY: 07:29

And how does this compare to Microsoft Research’s GraphRAG?

TAYLOR: 07:32

GraphRAG uses more complex pipelines with global and local graph search modes and supports cyclical graphs. The approach in the book focuses on simpler, ontology-driven Neo4j implementations emphasizing explainability and ease of production deployment.

MORGAN: 07:46

So when do you pick one over the other?

TAYLOR: 07:49

Use ontology-based Neo4j when you need stable, explainable, schema-driven graphs — great for compliance-heavy sectors. Opt for cyclical graphs or GraphRAG-style pipelines for dynamic graphs or recursive reasoning needs, but be ready for added engineering complexity. And for data format, Python static dictionaries are best when you want LLMs to do explicit reasoning steps.

CASEY: 08:09

That’s a practical framework for decision-making.

ALEX: 08:12

Let’s get our hands dirty and walk through how graph-based RAG actually works under the hood. This is where the magic lives.

ALEX: 08:18

Step one: ontology design. Using Protégé, you build your domain ontology — classes, properties, relationships — export it as Turtle (.ttl) RDF triples. The book gives a financial domain example with entities like Securities, Issuers, and Regulators.

ALEX: 08:33

Step two: parse RDF triples with rdflib, then convert to CSVs for nodes, edges, and data properties. Pandas helps clean and transform data. This step ensures you have clean, CSV-formatted data for Neo4j ingestion.

ALEX: 08:45

Step three: import into Neo4j using Cypher queries. Crucially, the book shows how to use MERGE to enforce uniqueness constraints on nodes and relationships, avoiding duplicates. Anchor nodes labeled :Concept help navigate class hierarchies efficiently.

ALEX: 08:58

Step four: generate hybrid text descriptions for each node. Using Cypher queries, you combine node properties with multi-hop connected context—imagine concatenating textual info from related nodes two or three hops away. This better captures semantic and structural context for embedding.

ALEX: 09:13

Step five: embed these hybrid texts locally with Sentence-Transformers’ all-MiniLM-L6-v2 model. The embeddings then feed into a FAISS index configured for efficient top-k nearest neighbor search, typically k=5.

ALEX: 09:25

Step six: build retrieval and expansion functions in LangChain. When a user query comes in, you embed it, perform FAISS semantic search over hybrid embeddings, retrieve candidate nodes, then expand subgraphs in Neo4j with Cypher to get related entities and relationships.

ALEX: 09:39

Step seven: format the retrieved subgraph as Python static dictionaries — mapping node properties and relationships explicitly. This representation allows the LLM to perform computational reasoning—traversing dict keys and values rather than guessing from text patterns.

ALEX: 09:52

Step eight: compose LangChain RunnableLambda chains connecting context generation, prompt templates, and OpenAI chat models. Environment variables managed with python-dotenv keep credentials secure. This pipeline orchestrates the end-to-end flow from query to answer.

MORGAN: 10:07

Alex, that’s a comprehensive walkthrough. Keith, the book has extensive code labs on this—what’s the one thing you want readers to really internalize?

KEITH: 10:14

Thanks, Alex. The biggest takeaway is to appreciate how each step builds on the last. Ontology design isn’t just formalism — it sets the foundation for precise graph construction. Hybrid embeddings fuse textual and structural knowledge, which pure vector searches miss. And representing knowledge as Python static dictionaries unlocks computational reasoning that transforms LLM outputs from guesswork to trustworthy inference. The labs guide you through these layers so you can replicate and adapt the pipeline confidently.

ALEX: 10:39

That clarity and modularity are what make the approach so practical for production.

ALEX: 10:43

On to the results, which are pretty exciting. Importing a financial ontology into Neo4j yielded 17 nodes and 4 relationships in the demo. Anchor nodes enabled efficient class-based retrieval.

ALEX: 10:54

Hybrid embeddings captured multi-hop context, improving semantic search recall and relevance — the system consistently retrieved precise subgraphs that matched complex competency questions.

ALEX: 11:05

The real kicker is the reasoning accuracy: representing KG data as Python static dictionaries boosted LLM multi-hop reasoning accuracy to 67.9%. That’s a 78% improvement over natural language formats, and more than double JSON’s 26.1%.

MORGAN: 11:18

That’s huge! Getting this lift in accuracy means the RAG system is much more reliable when answering complex queries like “Which equities are regulated by the SEC and who are their issuers?”

CASEY: 11:27

But those numbers come from a relatively small graph. Wonder how it scales…

ALEX: 11:31

That’s fair. The book discusses scaling challenges, but this is a solid proof-of-concept demonstrating measurable gains. Latency and throughput are manageable for medium-scale graphs with efficient indexing and Cypher queries.

MORGAN: 11:44

Bottom line: these improvements translate into grounded, explainable, and trustworthy AI outputs — a huge win for domains demanding auditability and compliance.

CASEY: 11:53

Speaking of scaling and challenges, let’s pump the brakes a bit. Ontology-based KGs are static and schema-driven, which limits flexibility for dynamic or cyclical relationships. Many real-world domains don’t fit neatly into acyclic graphs.

MORGAN: 12:05

So cycle detection and traversal become necessary, adding engineering overhead.

CASEY: 12:09

Exactly. Then there’s the hybrid embeddings. Constructing multi-hop textual descriptions can introduce noise or dilute the semantic signal. It takes careful tuning to balance context depth versus embedding quality.

CASEY: 12:20

Also, FAISS vector stores here run in-memory on CPU. For very large graphs or high query rates, this could become a bottleneck — requiring distributed or GPU-accelerated solutions.

MORGAN: 12:31

What about the Python static dictionaries?

CASEY: 12:33

They improve reasoning but might struggle with dynamic or real-time graph updates without additional engineering to keep them in sync. Plus, the approach depends heavily on accurate ontology design and consistent data ingestion pipelines — any gaps there impact downstream results.

KEITH: 12:46

Casey, you’re spot on. In the book, I emphasized these limitations to temper expectations. One big mistake I see is overestimating how plug-and-play this is. Ontology design is hard work. Also, teams sometimes neglect cycle detection or underestimate embedding noise. Careful engineering and ongoing maintenance are critical.

CASEY: 13:05

That honesty is refreshing. These aren’t silver bullets — they’re advanced tools requiring expertise.

SAM: 13:10

Let’s talk deployment. Graph-based RAG really shines in regulated and complex domains. For example, financial services use it to answer multi-hop queries about instruments, issuers, and regulatory authorities with grounded provenance.

SAM: 13:23

Healthcare applications leverage graph RAG for explainable clinical decision support, combining ontologies with patient data to provide traceable recommendations.

MORGAN: 13:33

What about enterprise knowledge management?

SAM: 13:35

Spot on. Enterprises integrate heterogeneous data sources — ontologies, tables, docs — into Neo4j graphs, enabling AI assistants to answer nuanced natural language queries grounded in structured knowledge.

CASEY: 13:46

And semantic search engines combine vector and graph retrieval to boost precision and reduce hallucinations.

SAM: 13:50

Exactly. Agentic AI systems also benefit — they need traceable, multi-step inference capabilities that pure vector search struggles to provide.

MORGAN: 13:59

So the approach is production-ready and already deployed in high-stakes environments.

SAM: 14:03

Yes, but with the caveats Casey mentioned — it needs careful engineering and domain expertise.

SAM: 14:09

Here’s a scenario: you’re building a RAG system for a dynamic knowledge base with frequent updates and some cyclical relations. Which approach do you pick?

MORGAN: 14:16

I’d argue for vector-only RAG. It’s fast, scalable, and flexible. You can update embeddings incrementally and avoid the complexity of cycle detection.

TAYLOR: 14:24

But you lose precise reasoning and explainability. For regulated domains, ontology-based graph RAG with Neo4j offers stable schema and traceability. It’s worth the complexity.

CASEY: 14:34

What about cyclical graphs? You could implement cycle detection and recursive traversal to handle dynamic relationships. Microsoft GraphRAG does this with global and local graph search modes.

MORGAN: 14:43

True, but that pipeline is complex and may increase latency. For many use cases, the simpler ontology-driven Neo4j approach is easier to maintain.

SAM: 14:51

So to sum up: use vector-only RAG for flexible, high-throughput scenarios with less need for explicit reasoning. Use ontology-based Neo4j graph RAG for compliance and explainability. Consider cyclical graphs and GraphRAG-style pipelines only if recursive reasoning is indispensable and you have engineering bandwidth.

MORGAN: 15:08

And don’t forget data representation — Python static dictionaries when you want to boost reasoning accuracy.

SAM: 15:13

Let’s round out with some practical tips. Start your pipeline by designing your ontology in Protégé and export it as Turtle format. Then parse RDF triples using rdflib and pandas to create clean CSVs for Neo4j ingestion.

ALEX: 15:25

Use Cypher MERGE operations to enforce uniqueness constraints, avoiding graph duplication. Anchor nodes labeled :Concept help with efficient querying of class hierarchies.

SAM: 15:34

Generate hybrid text descriptions with Cypher queries that combine multi-hop relationships — this is critical for embedding quality.

TAYLOR: 15:41

Embed these texts locally with Sentence-Transformers all-MiniLM-L6-v2, then index with FAISS for fast semantic search.

SAM: 15:48

Implement semantic search and graph expansion helpers in LangChain, combining prompt templates, RunnableLambda chains, and OpenAI chat models for end-to-end pipelines.

CASEY: 15:56

And don’t hardcode secrets — use python-dotenv to manage credentials securely.

MORGAN: 16:01

Avoid overloading your embeddings with too much graph context to prevent noise. Tune your top-k retrieval size carefully.

SAM: 16:08

Above all, iterate on your ontology and ingestion pipeline — quality input makes all the difference.

MORGAN: 16:14

Quick plug — Keith’s “Unlocking Data with Generative AI and RAG” goes far beyond what we covered today. Diagrams, deep explanations, and complete, hands-on code labs guide you through building graph-based RAG pipelines step-by-step. Worth grabbing if you want to build production-grade systems.

MORGAN: 16:32

Memriq AI is an AI consultancy and content studio building tools and resources for AI practitioners.

CASEY: 16:37

This podcast is produced by Memriq AI to help engineers and leaders stay current with the rapidly evolving AI landscape.

MORGAN: 16:43

Head over to Memriq.ai for more AI deep-dives, practical guides, and cutting-edge research breakdowns.

SAM: 16:49

Looking ahead, scaling graph-based RAG to knowledge graphs with millions of nodes and edges remains an open challenge. How do you efficiently update hybrid embeddings and vector indexes as ontologies evolve?

MORGAN: 16:59

Handling dynamic or cyclical relationships without sacrificing query performance or reasoning correctness is another big hurdle.

SAM: 17:06

Integrating heterogeneous data beyond ontologies—like unstructured text streams—while maintaining explainability is still an active research area.

CASEY: 17:14

Improving semantic search coverage for rare or domain-specific terminology is critical too, especially in specialized domains.

SAM: 17:20

Automating ontology evolution and synchronization with graph databases is needed to keep pipelines maintainable and up-to-date.

KEITH: 17:26

Another emerging challenge is extending Python static dictionary representations to support dynamic, real-time KG updates without breaking reasoning pipelines.

MORGAN: 17:36

And balancing latency with accuracy in production-grade RAG pipelines will continue to demand innovative engineering solutions.

MORGAN: 17:44

My takeaway? Graph-based RAG is a leap forward in making AI outputs grounded and trustworthy — that 78% accuracy bump is no joke.

CASEY: 17:52

I’m reminded that no tool is magic. Ontology design and embedding construction require craftsmanship, or your system falls short.

JORDAN: 17:59

The storytelling through graph traversals unlocks richer, explainable narratives behind AI answers — that’s powerful.

TAYLOR: 18:06

Architecturally, combining formal ontologies with hybrid embeddings and vector search gives you the best of structure and semantics.

ALEX: 18:13

I love how Python static dictionaries let LLMs do real reasoning steps — turning retrieval into computation, not guesswork.

SAM: 18:20

The practical toolbox Keith shares is a blueprint for anyone serious about building advanced RAG pipelines.

KEITH: 18:26

As the author, the one thing I hope you take away is that graph-based RAG isn’t just a new technique — it’s a paradigm shift in how we ground LLMs in verified knowledge, making AI systems more reliable, explainable, and ultimately more useful.

MORGAN: 18:42

Keith, thanks for giving us the inside scoop today.

KEITH: 18:45

My pleasure — I hope this inspires you to dig into the book and build something amazing.

CASEY: 18:50

And thanks to all our listeners for sticking with us through this deep dive.

MORGAN: 18:55

Remember, we covered key concepts here, but the book goes much deeper — detailed diagrams, thorough explanations, and hands-on code labs to build this stuff yourself. Search for Keith Bourne on Amazon and grab the second edition of Unlocking Data with Generative AI and RAG.

MORGAN: 19:10

Thanks for listening. See you next time on Memriq Inference Digest.

Episode 12

Graph-Based RAG: Hybrid Embeddings & Explainable AI (Chapter 14)

Transcript

About the Podcast

Listen for free

About your host

Memriq AI