Episode 1

Why Your AI Is Failing: The NLU Paradigm Shift CTOs Can’t Ignore

Are your AI initiatives stalling in production? This episode uncovers the critical architectural shift brought by the Natural Language Understanding (NLU) layer and why treating AI as just another feature is setting CTOs up for failure. Learn how rethinking your entire stack—from closed-world deterministic workflows to open-world AI-driven orchestration—is essential to unlock real business value.

In this episode:

- Understand the fundamental difference between traditional deterministic web apps and AI-powered conversational interfaces

- Explore the pivotal role of the NLU layer as the "brain" that dynamically interprets, prioritizes, and routes user intents

- Discover why adding an orchestrator component bridges the gap between probabilistic AI reasoning and deterministic backend execution

- Dive into multi-intent handling, partial understanding, and strategies for graceful fallback and out-of-scope requests

- Compare architectural approaches and learn best practices for building production-grade AI chatbots

- Hear about real-world deployments and open challenges facing AI/ML engineers and infrastructure teams

Key tools & technologies mentioned:

- Large Language Models (LLMs)

- Structured function calling APIs

- Conversational AI orchestrators

- 99-intents fallback pattern

- Semantic caching and episodic memory

Timestamps:

00:00 – Introduction & Why This Matters

03:30 – The NLU Paradigm Shift Explained

07:45 – The Orchestrator: Bridging AI and Backend

11:20 – Handling Multi-Intent & Partial Understanding

14:10 – Turning Fallbacks into Opportunities

16:50 – Architectural Comparisons & Best Practices

19:30 – Real-World Deployments & Open Problems

22:15 – Final Takeaways & Closing

Resources:

"Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition
This podcast is brought to you by Memriq.ai - AI consultancy and content studio building tools and resources for AI practitioners.

Transcript

MEMRIQ INFERENCE DIGEST - ENGINEERING EDITION

Episode: Why Your AI Is Failing: The NLU Paradigm Shift CTOs Can’t Ignore

Total Duration:: 38:30

============================================================

MORGAN: 00:00

Welcome to the Memriq Inference Digest - Engineering Edition. I'm Morgan, and this podcast is brought to you by Memriq AI, a content studio building tools and resources for AI practitioners. Whether you're engineering AI systems or managing infrastructure, we dive deep into what's really going on under the hood. Check out Memriq.ai to explore more.

CASEY: 00:21

Today, we're tackling the impact of the Natural Language Understanding layer — or NLU — when moving from the predictable world of deterministic web applications to the dynamic realm of AI-driven chatbots. It's a seismic architectural shift that's both exciting and challenging.

MORGAN: 00:38

And if you want to go even deeper — with detailed diagrams, explanations, and hands-on code labs — search for Keith Bourne on Amazon and grab the second edition of his book on RAG and AI agents. It provides significant insight into the architectures that support these concepts.

CASEY: 00:55

This episode is part of Memriq's broader effort to help practitioners improve their AI efforts. Not understanding the NLU layer and its architectural implications is a key reason many AI initiatives fail in production. We're covering these topics to help you avoid the mistakes that others are making.

MORGAN: 01:14

That's a message worth repeating — these aren't theoretical concerns. They're the difference between AI projects that deliver value and those that stall out. Let's jump in.

JORDAN: 01:25

Here's a hard truth that CTOs need to hear — if you're still architecting your AI features as tools bolted onto a traditional web application, you're setting yourself up to fail. And this is exactly why so many enterprise AI initiatives are falling flat.

MORGAN: 01:40

That's a strong statement. What's going wrong?

JORDAN: 01:43

Most technology leaders are treating AI like they treated every other new technology — as a feature to add, a component to integrate, another API to call. But conversational AI with an NLU layer isn't a feature. It fundamentally rewires how your entire application stack operates.

CASEY: 02:00

So they're applying the old playbook to a new game?

JORDAN: 02:03

Exactly. In a traditional web app, you control everything. Users click buttons you designed, fill forms you specified, navigate paths you predetermined. The UI constrains what's possible. But the moment you introduce a conversational interface with natural language understanding, you've handed the steering wheel to your users — and most architectures simply aren't built for that.

MORGAN: 02:25

And that's causing failures?

JORDAN: 02:27

Massively. Patrick Chan, formerly a Google engineer and now CTO of Gentoro, describes how the NLU layer acts as the "brain" of these systems — it parses user input into intent and parameters, decides what needs to happen, then invokes functions to fulfill requests. That's not a helper function. That's a core system actor making decisions that your traditional architecture never anticipated.

CASEY: 02:51

So the architecture itself is the bottleneck?

JORDAN: 02:54

It's worse than a bottleneck — it's a fundamental mismatch. Dr. Rosario De Chiara, a blockchain technology lead who writes extensively on LLM-first design, puts it bluntly: the LLM becomes "a decision-maker, not a helper." Navigation through tasks becomes emergent rather than predetermined. If your architecture still assumes predetermined paths, you're fighting against the very nature of what you've built.

MORGAN: 03:19

This is the wake-up call, then. CTOs need to stop thinking about AI integration and start thinking about AI-centric architecture.

JORDAN: 03:27

That's the paradigm shift. And everything we'll discuss today flows from understanding that distinction.

TAYLOR: 03:34

Let's break down what this paradigm shift actually means for technical leaders. When you introduce an NLU layer, you're not adding a component — you're replacing the core interaction model of your application.

MORGAN: 03:46

Walk us through the key concepts.

TAYLOR: 03:49

In a traditional web application, interactions are deterministic. Ivan Westerhof, Chief Automation Officer at Scale Fast AI, explains it simply: "A website has a predefined scope. The user's actions are finite." Click button A, get result B. Every time. The UI itself enforces the boundaries of what's possible.

CASEY: 04:07

And that predictability is comfortable for architects.

TAYLOR: 04:11

Very comfortable. You can enumerate every possible path, test every scenario, guarantee outcomes. But here's what changes with NLU — as Westerhof puts it, "the user defines their own journey and experience." You've given them a text box and said, "Tell me what you want." That's not a constrained input space. That's infinite possibility.

MORGAN: 04:31

So the NLU layer has to interpret that infinite space?

TAYLOR: 04:35

Interpret it, classify it, extract parameters from it, and then route it to the appropriate backend function — all probabilistically. Mahesh Kumar, CMO of Acceldata, points out that the same query phrased differently might yield varied responses. That's fundamentally different from deterministic logic where identical inputs always produce identical outputs.

CASEY: 04:56

How should CTOs be thinking about this architecturally?

TAYLOR: 05:00

They need to understand that they're building a hybrid system now. The NLU layer provides probabilistic reasoning — it handles ambiguity, infers intent, plans multi-turn conversations. But execution remains deterministic — your APIs have defined inputs, outputs, and business logic. The challenge is orchestrating between these two fundamentally different paradigms.

MORGAN: 05:22

What does that orchestration look like in practice?

TAYLOR: 05:25

You need an orchestrator component that receives natural language, uses the LLM to figure out intent, decides which function to invoke, handles results, manages dialog state, and formulates responses. That orchestrator is the critical new component — it's the bridge between probabilistic understanding and deterministic execution.

CASEY: 05:45

And this is what's missing from most current implementations?

TAYLOR: 05:49

Exactly. Many organizations are calling an LLM API from within their existing architecture and calling it "AI-powered." But without a proper orchestration layer that can handle the probabilistic-to-deterministic translation, manage context across turns, and gracefully handle the inevitable mismatches — they're building on sand.

MORGAN: 06:10

So the orchestrator is the key architectural addition?

TAYLOR: 06:14

It's the linchpin. And it needs to be designed from the ground up with the understanding that users will ask for things you never anticipated, in ways you never expected, and the system needs to handle that gracefully rather than catastrophically.

CASEY: 06:28

To sum it up in one breath: The NLU layer replaces rigid UI workflows with a flexible, probabilistic interface that interprets natural language and dynamically invokes backend functions — and this requires rethinking your entire architecture, not just adding an AI component.

JORDAN: 06:45

So why is this the moment we're seeing this architectural shift? In the past, web apps were deterministic — fixed UI elements, predefined flows, limited input scope. The NLU layer was either very simple or non-existent.

MORGAN: 06:59

Right, if you wanted to order a pizza online, you clicked buttons for size, toppings, payment — all tightly constrained.

JORDAN: 07:06

But recent advances in LLMs have unlocked the capacity to interpret open-ended natural language queries at scale. Now users want to say, "Order me a large pepperoni with extra cheese, and add a gluten-free crust," in a single utterance instead of navigating menus.

CASEY: 07:23

And enterprises are taking notice — integrating AI-powered agents to automate complex workflows that traditional UI can't handle flexibly.

JORDAN: 07:32

The problem is old deterministic architectures can't handle this level of ambiguity or multi-intent complexity. They break when the user strays from the script. Dr. De Chiara captures this perfectly: "LLM-first systems are inherently uncertain — not just because users are unpredictable, but because the UI itself can invent new flows."

MORGAN: 07:52

The conversational interface and the AI co-create the interaction?

JORDAN: 07:57

Exactly. Which makes conventional QA and testing approaches inadequate. You can no longer enumerate all possible paths. Hence, new design paradigms are needed — probabilistic AI at the core, orchestrating function calls dynamically, with guardrails and fallbacks built into the architecture itself.

MORGAN: 08:15

So, AI/ML engineers and infrastructure teams must adopt new patterns to handle this flexible, unpredictable input reliably and at scale.

TAYLOR: 08:23

Here's how I see the fundamental shift. Previously, your architecture was UI-driven: fixed inputs, state machines, hard-coded flows. This is what we call closed-world design — all user actions are predefined, and the system only handles anticipated scenarios.

MORGAN: 08:39

That's a significant mental model shift.

TAYLOR: 08:42

It is. The NLU layer becomes the "brain" that parses free-form user input, recognizes intents and parameters, and orchestrates backend function calls dynamically. This NLU orchestrator doesn't just parse text — it decides which API to call, when, and how to handle multi-step workflows.

CASEY: 08:59

What does this architecture actually require?

TAYLOR: 09:02

Let me walk through the essential components. First, intent prioritization logic — when a user expresses multiple intents or ambiguous requests, the system needs rules for what to handle first. Second, robust context switching capability — users jump between topics, and your system needs to track where they are and where they've been.

MORGAN: 09:22

What about memory?

TAYLOR: 09:23

Critical. You need conversation memory and dialog state management. The system must remember what's been said, what's been accomplished, and what's pending. Without this, every turn feels like starting over.

CASEY: 09:36

What about the backend integration?

TAYLOR: 09:38

This is where safe-fail tool design comes in. Every API and function the orchestrator calls should be designed to return clear signals when something can't be fulfilled — not crash, not return ambiguous errors. The orchestrator needs actionable information to formulate helpful responses.

MORGAN: 09:56

And governance?

TAYLOR: 09:57

Multiple layers. Confidence thresholds to catch low-certainty interpretations before they trigger actions. Policy engines to enforce business rules — maybe certain actions require approval, or certain requests are off-limits. Content filters and oversight hooks for compliance-sensitive domains. And comprehensive logging of every decision, every function call, every fallback — because you need visibility into what this probabilistic system is actually doing.

CASEY: 10:25

That's a substantial checklist.

TAYLOR: 10:27

It is. And that's why bolting an LLM onto existing architecture fails. You're not adding a feature — you're building a fundamentally different kind of system that requires all these components working together.

SAM: 10:40

Here's a concept that trips up a lot of technical leaders, and it comes straight from the fundamental difference between deterministic and conversational interfaces. I call it the "User Perspective versus Reality" problem.

MORGAN: 10:53

What's the core tension?

SAM: 10:55

In your organization, you know your products and services deeply. You understand the capabilities, the limitations, the edge cases. That's your "reality." In a deterministic web application, you present that reality to users through carefully designed UI — buttons, menus, workflows. Users can't ask for something that doesn't exist because there's no button for it.

CASEY: 11:17

The UI constrains the conversation.

SAM: 11:20

Exactly. But the moment you introduce a conversational AI interface, your carefully controlled "reality" gives way to the user's perspective. And here's the thing — user knowledge exists on an enormous spectrum, from complete misunderstanding of your products to potentially knowing more than your customer service representatives.

MORGAN: 11:39

That's a wide range to design for.

SAM: 11:42

It's massive. A user might ask for a feature that doesn't exist, use terminology you don't recognize, combine requests in ways you never anticipated, or reference capabilities they assume you have based on a competitor's offering. In a traditional UI, none of these scenarios would surface. In a conversational interface, they're daily occurrences.

CASEY: 12:03

So the architecture has to account for this entire spectrum?

SAM: 12:07

That's the key insight. An AI-centric application accounts for that entire spectrum of user understanding — from confusion to expertise — rather than just the deterministic functional paths you could previously dictate. Your system needs to gracefully handle users who don't understand what you offer, users who misunderstand what you offer, users who understand perfectly, and users who know edge cases you haven't documented.

MORGAN: 12:33

How does this manifest in practice?

SAM: 12:35

Consider this scenario: a user asks "Can I use feature X on product B?" In your reality, only product A supports feature X. In a traditional app, there would be no way to even ask this question — the feature X button simply wouldn't appear on product B's interface. But in a chat interface, users will absolutely ask. And your architecture needs a strategy for this.

CASEY: 12:57

Not just an error message, but an actual strategy?

SAM: 13:01

Right. And we'll talk about how to turn these moments into opportunities rather than dead ends. But the architectural implication is clear — you need robust out-of-scope detection, graceful fallback handling, and dialog management that can redirect without frustrating users.

SAM: 13:17

Building on that user perspective challenge, there's another layer of complexity that most architectures handle poorly — multi-intent queries and partial understanding.

MORGAN: 13:28

What do you mean by multi-intent?

SAM: 13:30

Users rarely ask single, clean questions. They combine requests. "Can I use feature X on product A? What about product B?" That's two related queries in one utterance. Or consider: "Book me a flight to London next Tuesday, and reserve a hotel nearby, oh and I'll need a rental car too." That's three distinct intents bundled together.

CASEY: 13:50

And the system needs to handle all of them?

SAM: 13:53

It needs to recognize all of them, prioritize them, execute them in a sensible order, and track which ones succeeded and which ones failed. If the flight books but the hotel doesn't have availability, the system can't just say "Done!" — it needs to report partial success and ask how to proceed on the hotel.

MORGAN: 14:12

That's significantly more complex than handling single requests.

SAM: 14:16

It is. And then there's context switching. A user asks about weather in London for their trip, then suddenly asks "What about flight delays at Heathrow?" A rigid system might fail because "flight delays" wasn't part of the original travel booking flow. But a context-aware system recognizes this as a related but new intent and either handles it — if it has that capability — or gracefully redirects.

CASEY: 14:40

How does the architecture support this?

SAM: 14:43

Dialog state tracking becomes essential. The system maintains a model of the conversation — what topics have been discussed, what intents are pending, what information has been collected. Intent prioritization logic decides what to tackle first when multiple requests arrive. And the orchestrator needs the flexibility to spawn multiple function calls from a single user turn, then synthesize the results.

MORGAN: 15:07

What about when the system only partially understands a request?

SAM: 15:11

This is where it gets interesting. The user says something, and the NLU layer extracts some intents and parameters with high confidence, but others are ambiguous. A well-designed system doesn't just fail or guess — it acknowledges what it understood, acts on the confident parts, and asks clarifying questions about the uncertain parts.

CASEY: 15:31

So it's transparent about its understanding?

SAM: 15:35

Exactly. "I can book your flight to London for next Tuesday. For the hotel, did you want something near the airport or in central London?" The system demonstrates competence on what it grasped while surfacing ambiguity constructively. This requires the architecture to support partial execution and graceful clarification flows — not all-or-nothing processing.

MORGAN: 15:56

That's a very different interaction model than traditional applications.

SAM: 16:01

It is. And it demands architectural support for parallel intent processing, partial completion states, and dynamic clarification dialogs. Most current implementations don't have this sophistication — they either try to handle everything at once and fail, or they force users into one-intent-at-a-time interactions that feel painfully rigid.

SAM: 16:22

Now here's where I want to challenge the defensive mindset most teams bring to fallback handling. When a user asks for something you can't provide, most systems return some variation of "Sorry, that's not available." That's a wasted opportunity.

MORGAN: 16:37

Wasted how?

SAM: 16:38

Consider the cost of getting that user to this moment of interaction. The marketing spend, the product design, the email campaigns, the paid advertisements, PR efforts, direct sales outreach, new customer incentives, follow-up nurturing — all of that represents significant investment in bringing this person to your application.

CASEY: 16:58

And they're asking for something.

SAM: 17:00

They're 95% of the way to finding a solution. They've articulated a need. They've engaged with your system. And if their request doesn't match exactly what you offer, but you have something close — something that could fulfill their underlying need — you have a golden opportunity to present it.

MORGAN: 17:18

So fallbacks become sales opportunities?

SAM: 17:21

Exactly. If you have similar products with somewhat similar functionality — not exact matches, but in the range of fulfilling user needs — your fallback handler shouldn't say "Sorry, this isn't available." It should say, "Product B doesn't support feature X, but Product A does — and it also includes these additional capabilities that might interest you."

CASEY: 17:42

That's a fundamentally different architectural approach to fallbacks.

SAM: 17:47

It requires your fallback logic to be product-aware, to understand relationships between offerings, and to have enough context to make relevant suggestions. This isn't just error handling — it's intelligent redirection. And it dramatically changes the ROI calculation on your AI investment.

MORGAN: 18:05

The system needs to know not just what failed, but what alternatives exist.

SAM: 18:09

And present them persuasively. This is why the NLU layer and orchestrator need deep integration with your product catalog and business logic — not just your APIs. You're building a system that can recover gracefully and add value even when the initial request can't be fulfilled.

TAYLOR: 18:26

Let's compare some leading approaches. On one end, you have traditional deterministic web workflows — rigid, predictable, easy to test, but limited in handling varied user input.

MORGAN: 18:37

That's a big problem for reliability.

TAYLOR: 18:39

Exactly. Now, when you add structured function calling, the LLM outputs structured calls like order_pizza(size=large), improving precision. Adding an orchestrator on top of this manages these calls, maintains dialog state, and handles fallback intents.

CASEY: 18:55

What about different frameworks and approaches?

TAYLOR: 18:58

The Deepset Team provides flexible agent frameworks focusing on retrieval-augmented generation and multi-step tool usage. Their team emphasizes that this isn't binary — it's a spectrum from fully deterministic to fully agentic, and most production systems sit somewhere in between. It's great for complex multi-turn workflows but requires more engineering to build robust orchestrators.

MORGAN: 19:21

And Sunil Ramlochan's work at PromptEngineering.org?

TAYLOR: 19:25

His Context-Aware Conversational AI Framework focuses on intent recognition, context management, and graceful handling when queries fall outside service scope. He emphasizes providing alternative recommendations rather than dead-end responses — which ties directly to the opportunity framing we just discussed.

CASEY: 19:43

So decision criteria for CTOs?

TAYLOR: 19:45

Use deterministic workflows when user inputs are limited and predictable — like basic forms or menus. Use LLM chatbots without structured outputs for exploratory prototypes or low-stakes assistance where consistency isn't critical.

MORGAN: 20:00

And structured function calling plus orchestrators?

TAYLOR: 20:03

That's your go-to for production-grade AI chatbots needing precise backend action, multi-intent handling, and graceful fallbacks — think enterprise assistants or commerce bots.

CASEY: 20:14

And hybrid approaches blending AI flexibility with deterministic guardrails are increasingly common for compliance-sensitive domains.

TAYLOR: 20:22

Right. As Mahesh Kumar notes, "embedding an LLM in a business process means redefining how tasks are routed, governed, and interpreted" — not eliminating deterministic logic, but thoughtfully combining both paradigms.

ALEX: 20:36

Now, let's get technical. How does the NLU layer work under the hood in an AI-driven chatbot architecture?

MORGAN: 20:43

So the orchestrator is the glue between the NLU's probabilistic outputs and deterministic backend APIs.

ALEX: 20:49

Exactly. It receives these calls and manages their lifecycle: sending requests to airline and hotel booking APIs, handling asynchronous responses, and maintaining dialog state to track successful bookings or errors.

MORGAN: 21:03

What about the division of responsibility we've been discussing?

ALEX: 21:07

This is crucial to understand. The AI handles understanding and phrasing — interpreting what the user wants and formulating natural responses. The tools handle execution and business rules — actually performing actions and enforcing constraints. This division is what allows the system to cover a wide space of user requests while maintaining reliability.

CASEY: 21:28

Why does that division matter so much?

ALEX: 21:31

Because it lets each component do what it's best at. The LLM excels at handling linguistic variation — a hundred different ways to ask for a pizza all get correctly mapped to the same function call. But the LLM shouldn't be deciding business logic like pricing or inventory availability. That stays in deterministic code where it can be tested, audited, and guaranteed consistent.

MORGAN: 21:55

What about supporting components?

ALEX: 21:57

Semantic caching can store responses to similar queries, reducing latency for common requests. Episodic memory — records of past orchestration decisions — helps the system learn which paths worked for similar queries. These components add determinism back into the probabilistic system where it's beneficial.

CASEY: 22:16

And monitoring?

ALEX: 22:17

Logging every LLM output, function call, and fallback invocation is critical. Engineers track metrics like function call accuracy, fallback rates, and confidence scores to iteratively improve prompts, API design, and fallback policies.

MORGAN: 22:32

And API design itself matters?

ALEX: 22:34

Crucially. Each tool or API call should be designed to fail safely. If given a request it can't fulfill, it should return a clear signal or message — not crash or return ambiguous errors. The LLM can then use that signal to formulate a helpful reply. This safe-fail design is what enables graceful degradation across the entire system.

MORGAN: 22:55

Fascinating — this NLU-orchestrator-API triad is the backbone of modern chatbot architecture.

SAM: 23:01

Let's talk about one of the most challenging aspects of NLU-driven systems — handling requests that fall outside what your system can do. This is fundamentally different from traditional applications.

MORGAN: 23:14

How so?

SAM: 23:15

In a GUI, if a feature isn't available, the user simply doesn't see a button for it. Problem avoided. But in a conversational system, research literature emphasizes that because "user input utterances are arbitrary, not all queries can be answered." Users will absolutely ask for things you don't support, in ways you never anticipated. And detecting these out-of-scope requests reliably is essential — but difficult.

CASEY: 23:40

What goes wrong when systems don't handle this well?

SAM: 23:43

Two failure modes. False positives — misclassifying a valid request as out-of-scope, rejecting something you could have handled. That frustrates users. False negatives — failing to recognize an unsupported request and either hallucinating an answer or executing the wrong function. That causes real problems.

MORGAN: 24:02

So what's the solution framework?

SAM: 24:04

Multiple layers working together. First, fallback and refusal behaviors need to be built into the architecture explicitly. And here's a key point from the research — this logic should be implemented in the tool layer rather than leaving the LLM to guess the response. The function for "do X" should check the product parameter against a list of supported products and return a polite, structured error if not found. The LLM's job is to route correctly; the deterministic code handles the actual validation.

CASEY: 24:36

So the LLM still understands the request?

SAM: 24:39

Right. The LLM might correctly parse "I want feature X on product B" and decide to call do_X(product=B). But the function itself knows product B doesn't support X and returns a structured "not supported" result with suggestions. The LLM then uses that to formulate a helpful response. This keeps the probabilistic and deterministic responsibilities cleanly separated.

MORGAN: 25:03

What about the 99-intents pattern we've mentioned?

SAM: 25:06

This is Ivan Westerhof's concept — deliberately creating intents whose purpose is to attract out-of-scope questions and define the correct next steps. Rather than one generic "I don't understand" fallback, you design specific intents to catch likely categories of unsupported queries.

CASEY: 25:24

Can you give examples?

SAM: 25:25

A 99-intent for competitor product questions might respond: "I can only help with our products, but here's how our offering compares to what you mentioned." A 99-intent for feature requests might say: "That feature isn't available yet, but I can add your interest to our product feedback." A 99-intent for questions outside your domain entirely might offer to connect to a human agent.

MORGAN: 25:50

So you're designing for failure cases proactively.

SAM: 25:53

Exactly. And Westerhof emphasizes an important point — don't just say "please rephrase" when the issue isn't phrasing. If the user clearly asked for something you don't support, asking them to rephrase is frustrating and unhelpful. Acknowledge the limitation directly and offer a path forward.

CASEY: 26:11

What about graceful degradation more broadly?

SAM: 26:14

The system should always leave the user better off than a dead end. Sunil Ramlochan's best-practice guide emphasizes that when queries fall outside service scope, the bot should "provide alternative recommendations to users when necessary." This connects back to the golden opportunity framing — every out-of-scope request is a chance to redirect constructively, not just apologize.

MORGAN: 26:37

So the solutions stack is: safe-fail tool design, 99-intents for category-specific handling, and constructive redirection as the default behavior.

SAM: 26:47

Right. And all of this needs architectural support — your NLU training data needs out-of-scope examples, your orchestrator needs routing logic for fallback intents, and your response generation needs to be helpful rather than dismissive.

ALEX: 27:01

Let's talk outcomes. Deployments using structured function calling with orchestrators report significant reductions in misclassified intents compared to free-text LLM outputs — often in the 30-50% range.

MORGAN: 27:15

That's a meaningful improvement for accuracy.

ALEX: 27:18

It is. Plus, fallback invocation rates tend to drop, meaning the system handles edge cases more gracefully, improving user satisfaction.

CASEY: 27:26

What about latency?

ALEX: 27:28

Latency varies. The LLM inference plus orchestration adds some overhead — typically 500 to 1500 milliseconds per user turn — but smart caching and asynchronous calls mitigate this. Semantic caching can dramatically reduce latency for common query patterns. It's an acceptable trade-off for richer capabilities.

MORGAN: 27:49

And error rates?

ALEX: 27:50

Function execution errors decrease due to structured calls and better API design. Hybrid AI-deterministic architectures maintain compliance and trust, which is critical in regulated domains.

CASEY: 28:02

Iterative monitoring and prompt tuning also improve these metrics over time?

ALEX: 28:07

Absolutely. The Deepset Team notes that building AI products tends to be iterative — teams often start with a simple, constrained version and gradually allow more AI autonomy as they learn user needs and model behaviors. Early on, a more deterministic approach might prevail, and over time the system can move toward more agency in areas where flexibility clearly benefits the user.

MORGAN: 28:30

So the payoff is better UX, broader capabilities, and operational reliability, balanced against some complexity and latency costs.

CASEY: 28:38

Time for some skepticism. As promising as this looks, probabilistic NLU introduces unpredictability that complicates exhaustive testing.

MORGAN: 28:47

Because you can't anticipate every natural language variation?

CASEY: 28:51

Exactly. Research literature emphasizes this challenge — because user input utterances are arbitrary, not all queries can be answered, and queries that don't fall into any supported intent are defined as out-of-scope. Detecting these reliably is essential but difficult.

ALEX: 29:08

Out-of-scope detection remains a headache — false positives can frustrate users by rejecting valid requests, while false negatives risk executing wrong functions or generating nonsense responses.

CASEY: 29:20

And structured function calling requires careful schema design. If the API doesn't handle unexpected parameters gracefully, the whole system can fail. Patrick Chan points out the mapping challenge — users speak in high-level concepts, APIs expect low-level parameters, and mismatches cause failures.

ALEX: 29:38

Hybrid architectures add complexity — more components to monitor and maintain, increasing operational overhead. Not to mention the need for continuous human-in-the-loop interventions in compliance-critical environments.

MORGAN: 29:52

What about user expectations?

CASEY: 29:54

They often overshoot system capabilities. Without clear fallback messaging — and this is where the 99-intents pattern helps — users get confused or frustrated when the bot refuses or rephrases requests.

SAM: 30:07

And as Sunil Ramlochan emphasizes, when queries fall outside service scope, the bot should provide alternative recommendations rather than dead ends. That requires more sophisticated response generation than most teams initially build.

MORGAN: 30:21

So while this approach unlocks new capabilities, it demands rigorous engineering discipline and ongoing maintenance.

SAM: 30:28

Let's ground this in real-world deployments.

MORGAN: 30:31

So these architectures aren't just theoretical — they're powering critical, production-grade systems across sectors.

SAM: 30:39

Here's a challenge scenario — a user requests a multi-product booking that partially exceeds system capabilities.

MORGAN: 30:46

Approach one: pure deterministic UI with fixed workflows. The user is blocked from unsupported features, leading to frustration.

CASEY: 30:54

Approach two: LLM chatbot with free-text input but no structured output handling. It tries to parse and execute the request but risks hallucinating actions or making inconsistent API calls, causing failures.

ALEX: 31:07

Approach three: LLM with structured function calling plus orchestrator, 99-intents pattern, and multi-intent handling. It dynamically parses all intents, issues precise API calls for what it can handle, detects unsupported features via specialized fallback intents, maintains dialog state, and reports partial success while suggesting alternatives for what couldn't be fulfilled.

SAM: 31:31

The trade-off? Approach three is more complex to implement and monitor but dramatically improves user success rates and reduces error frequency.

CASEY: 31:40

But it demands sophisticated fallback handling, confidence scoring, partial completion logic, and product-aware redirection.

MORGAN: 31:48

So, the key question — do you prioritize simplicity and predictability or flexibility and richer capabilities?

SAM: 31:55

For mission-critical systems with diverse user needs, approach three is increasingly the only viable option. The investment in architectural complexity pays off in user satisfaction and conversion.

SAM: 32:07

Let's talk actionable patterns. First, implement structured function calling to convert LLM outputs into precise backend calls. This is foundational for reliable execution.

ALEX: 32:18

Design an orchestrator service to manage calls, maintain dialog state, and handle fallback logic. This is the critical bridge between probabilistic understanding and deterministic execution.

SAM: 32:30

Build your APIs and tools to fail safely — return clear, structured signals when requests can't be fulfilled, not crashes or ambiguous errors. The orchestrator needs actionable information.

CASEY: 32:42

Implement the 99-intents pattern — create specialized intents to catch categories of out-of-scope queries and route them to helpful handlers rather than generic error messages.

TAYLOR: 32:53

Design your fallbacks as opportunities, not dead ends. When users ask for unsupported features, redirect them to alternatives that can fulfill their underlying needs.

MORGAN: 33:03

Architect for multi-intent queries from the start. Users will combine requests, and your system needs intent prioritization, parallel processing capability, and partial completion handling.

SAM: 33:15

Build robust dialog state management. Track what's been discussed, what's pending, what's succeeded, and what's failed. Context switching should feel natural, not jarring.

TAYLOR: 33:25

Apply confidence thresholds and policy engines to enforce business rules and safety — for example, rejecting function calls below a confidence score or requiring human review for high-stakes actions.

ALEX: 33:37

Add semantic caching and episodic memory to bring determinism back where it helps — reducing latency for common queries and learning from past orchestration decisions.

SAM: 33:47

Monitoring is essential. Log every LLM output, function invocation, fallback, and error to enable iterative improvements. Track fallback invocation rates — they reveal blind spots.

ALEX: 33:59

Start small with simple intents and build up complexity. The Deepset Team recommends beginning with more deterministic approaches and gradually allowing more AI autonomy as you learn. Avoid overly broad API schemas that increase failure surfaces.

MORGAN: 34:14

And test extensively — simulate edge cases, multi-intent queries, and out-of-scope requests to stress your fallback mechanisms and partial understanding flows.

MORGAN: 34:24

Quick plug — if you want a comprehensive guide on RAG and AI agents with practical code examples and diagrams that illuminate these architectural concepts, grab Keith Bourne's second edition on Amazon. It's a goldmine for engineers wanting to build or optimize AI-driven systems.

MORGAN: 34:41

Memriq AI is an AI consultancy and content studio building tools and resources for AI practitioners.

CASEY: 34:48

This podcast is produced by Memriq AI to help engineers and leaders stay current with the rapidly evolving AI landscape.

MORGAN: 34:56

Head to Memriq.ai for more AI deep-dives, practical guides, and cutting-edge research breakdowns.

SAM: 35:02

Despite progress, several open problems remain. Out-of-scope intent detection still struggles with false positives and negatives, causing user confusion or erroneous actions.

TAYLOR: 35:13

Balancing AI autonomy with deterministic governance without introducing excessive complexity remains a tricky design challenge. As Mahesh Kumar notes, we're still learning how to redefine task routing and governance for AI-embedded processes.

ALEX: 35:28

Handling multi-intent queries and seamless context switching in real-time remains an active research and engineering frontier. Dr. De Chiara points to the challenge that the UI itself can invent new flows — and we're still developing patterns to manage this emergent behavior.

CASEY: 35:45

Designing APIs and chatbot tools that fail safely and provide clear, actionable signals to the orchestrator is underexplored. Patrick Chan's work on user-aligned functions is a step forward, but more standardization is needed.

MORGAN: 35:59

We also lack widely adopted standards and best practices for AI plus tool orchestration architectures, making engineering more ad hoc than it should be.

SAM: 36:08

And scaling monitoring and feedback loops to continuously improve system behavior at production scale is a major operational hurdle.

JORDAN: 36:16

Most critically, many CTOs still haven't recognized that this is a paradigm shift, not a feature addition. Until leadership understands they need to rearchitect rather than integrate, AI initiatives will continue to underperform.

MORGAN: 36:30

My key takeaway — CTOs, this is your wake-up call. If you're still treating AI as a bolt-on feature, you're architecting for failure. The NLU layer fundamentally changes how your application stack operates.

JORDAN: 36:42

The shift from closed-world to open-world design — from predefined user paths to emergent, co-created interactions — demands rethinking architecture from the ground up. You're building an evolving conversation engine, not adding a feature.

CASEY: 36:57

Don't underestimate the complexity of multi-intent handling and partial understanding. Users combine requests, switch contexts, and express themselves ambiguously. Your architecture needs to handle all of it gracefully.

TAYLOR: 37:10

The solutions framework matters — safe-fail APIs, 99-intents for proactive fallback handling, and constructive redirection as the default. These aren't nice-to-haves; they're requirements for production reliability.

SAM: 37:24

Turn your fallbacks into opportunities. When users ask for something you can't provide, don't give them a dead end — give them an alternative that fulfills their underlying need. That's where ROI lives.

ALEX: 37:36

Under the hood, the division of responsibility is key — AI handles understanding and phrasing, tools handle execution and business rules. Keep that separation clean, and your system stays reliable at scale.

JORDAN: 37:49

And finally, continuous iteration — monitoring, feedback, and human-in-the-loop — is essential to tame the inherent uncertainty of natural language. This is a living system, not a static deployment.

MORGAN: 38:01

Thanks for tuning into Memriq Inference Digest - Engineering Edition. This has been a deep dive into the transformative impact of the NLU layer in AI chatbot architectures — and why it demands a fundamental rethink from technical leadership.

CASEY: 38:16

Remember, while the tech is exciting, engineering discipline, architectural clarity, and realistic expectations are key to success.

MORGAN: 38:24

See you next time for more AI engineering insights. Cheers!

CASEY: 38:28

Goodbye, and keep building smart!

Episode 1

Why Your AI Is Failing: The NLU Paradigm Shift CTOs Can’t Ignore

Transcript

About the Podcast

Listen for free

About your host

Memriq AI