Episode 2

Agent Engineering Unpacked: New Discipline or Just Hype?

Is agent engineering the next big AI discipline or a repackaged buzzword? In this episode, we cut through the hype to explore what agent engineering really means for business leaders navigating AI adoption. From market growth and real-world impact to the critical role of AI memory and the evolving tool landscape, we provide a clear-eyed view to help you make strategic decisions.

In this episode:

- The paradox of booming agent engineering markets despite high AI failure rates

- Why agent engineering is emerging now and what business problems it solves

- The essential role of AI memory systems and knowledge graphs for real impact

- Comparing agent engineering frameworks and when to hire agent engineers vs ML engineers

- Real-world success stories and measurable business payoffs

- Risks, challenges, and open problems leaders must manage

Key tools and technologies mentioned: LangChain, LangMem, Mem0, Zep, Memobase, Microsoft AutoGen, Semantic Kernel, CrewAI, OpenAI GPT-4, Anthropic Claude, Google Gemini, Pinecone, Weaviate, Chroma, DeepEval, LangSmith

Timestamps:

00:00 – Introduction & Why Agent Engineering Matters

03:45 – Market Overview & The Paradox of AI Agent Performance

07:30 – Why Now: Technology and Talent Trends Driving Adoption

11:15 – The Big Picture: Managing AI Unpredictability

14:00 – The Memory Imperative: Transforming AI Agents

17:00 – Knowledge Graphs & Domain Expertise

19:30 – Framework Landscape & When to Hire Agent Engineers

22:45 – How Agent Engineering Works: A Simplified View

26:00 – Real-World Payoffs & Business Impact

29:15 – Reality Check: Risks and Limitations

32:30 – Agent Engineering In the Wild: Industry Use Cases

35:00 – Tech Battle: Agent Engineers vs ML Engineers

38:00 – Toolbox for Leaders: Strategic Considerations

41:00 – Book Spotlight & Sponsor Message

43:00 – Open Problems & Future Outlook

45:00 – Final Words & Closing Remarks

Resources:

"Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition
This podcast is brought to you by Memriq.ai - AI consultancy and content studio building tools and resources for AI practitioners.

Thanks for tuning into Memriq Inference Digest - Engineering Edition. Stay curious and keep building!

Transcript

MEMRIQ INFERENCE DIGEST - LEADERSHIP EDITION

Episode: Agent Engineering Unpacked: New Discipline or Just Hype?

Total Duration:: 29:47

============================================================

MORGAN: 00:00

Welcome to Memriq Inference Digest - Leadership Edition, the podcast where we unpack the latest in AI with a sharp focus on what it means for leaders like you. This show is brought to you by Memriq AI, a content studio dedicated to building tools and resources for AI practitioners. Check them out at Memriq.ai for some truly insightful deep-dives.

CASEY: 00:21

Today, we're diving into a hot topic: Agent Engineer — is it truly a new discipline shaping the future of AI, or just a fancy rebranding of existing hype? It's a question that's on the lips of many VPs, heads of product, and founders trying to navigate the AI landscape without getting lost in the technical weeds.

MORGAN: 00:41

And if you want to go deeper — like, really deep — with clear diagrams, thorough explanations, and hands-on labs, you should definitely check out the second edition of Keith Bourne's book "Unlocking Data with Generative AI and RAG." Just search his name on Amazon. It's a fantastic resource for understanding this space beyond the headlines.

CASEY: 01:01

This episode builds on our recent deep-dive into the NLU layer—that's the Natural Language Understanding component that fundamentally changes how users interact with your applications. If you caught that episode, you know we talked about the paradigm shift from predictable, button-click interfaces to open-ended conversational systems. That shift is exactly why agent engineering has emerged as a discipline.

MORGAN: 01:26

Right—we discussed how the NLU layer acts as a "brain" that interprets natural language and decides what actions to take. Agent engineering is essentially the craft of building that brain and everything it connects to.

CASEY: 01:40

Over the next 25 minutes, we'll explore why agent engineering is gaining so much attention, what it actually means for your business, the complete landscape of tools and frameworks—not just one vendor's solution—and the critical role of AI memory systems that most leaders overlook. Plus, we'll push back on the hype and discuss the risks. Ready?

MORGAN: 02:01

Let's crack on!

JORDAN: 02:02

Here's something that might surprise you — despite AI agents currently failing nearly 70% of real-world tasks according to a Carnegie Mellon benchmark, the market for agent engineering is exploding. We're looking at projections soaring from $5.4 billion in 2024 all the way up to an eye-watering $199 billion by 2034.

MORGAN: 02:24

Wait, hold on — they're failing 70% of tasks, but the market is booming? That's quite the paradox.

CASEY: 02:31

Exactly — it's like investing billions in jet engines that stall most of the time but somehow promise to revolutionize air travel. There's massive confidence, but also huge risk.

JORDAN: 02:42

And investors are putting their money where their mouths are. LangChain recently raised $125 million to build a whole platform around agent engineering. That's not small change.

MORGAN: 02:53

Right — and companies like Sierra are already showing that AI agents can double or even triple resolution rates compared to traditional chatbots. Klarna's AI assistant handles 2.3 million customer conversations a month — that's the equivalent of 700 full-time human agents — and it's driving a $40 million profit improvement for them.

CASEY: 03:15

The talent market reflects this too. Agent engineers at well-funded startups are commanding $240,000 to $360,000 annually—some of the highest compensation in tech outside of executive roles. That's what happens when demand massively outstrips supply for a skillset that barely existed two years ago.

MORGAN: 03:35

So on one hand, the tech is far from perfect. On the other, early deployments are delivering impressive business value. That's the tension leaders need to understand.

JORDAN: 03:46

Exactly. If you don't get this balance, you could either miss out on a game-changing advantage or overinvest in something that doesn't yet deliver.

CASEY: 03:55

Here's the short and sweet of it — Agent engineering is the craft of building autonomous AI systems that perceive their environment, reason through complex information, and take actions to achieve goals in unpredictable settings.

MORGAN: 04:09

So it's not just about AI spitting out answers, but AI actually doing things — making decisions and interacting with tools.

CASEY: 04:16

Spot on. It's a blend of product thinking, software engineering, and data science working together to wrangle the inherent unpredictability of AI behavior.

MORGAN: 04:26

And in plain terms, "unpredictability" here means AI doesn't always give the same answer or make the same move each time — it's like managing a team that sometimes improvises.

CASEY: 04:37

Exactly. Forecasts say by 2028, AI agents will autonomously make 15% of enterprise decisions — up from basically zero this year. Gartner predicts 33% of enterprise software will include agentic AI by that same date. That's a massive shift in how companies operate.

MORGAN: 04:55

If you remember nothing else, remember this: Agent engineering is about creating AI that acts, not just talks, and that's transforming business decision-making.

JORDAN: 05:04

Let's set the stage — before 2023, AI was mostly stuck in predictable boxes: rule-based automation and predictive models that couldn't take independent action. Think of them like calculators — reliable, but can't decide what operation to perform on their own.

CASEY: 05:21

So the AI we had was great at specific tasks, but lacked the flexibility and autonomy needed for complex workflows.

JORDAN: 05:28

That all changed with the release of function calling for GPT-4 in June 2023, which allowed AI models to interact reliably with external tools — like plugging a Swiss army knife into an AI brain. Suddenly, the impressive demos became production-ready agents.

MORGAN: 05:45

And that's huge — function calling means the AI can not only generate text but also trigger actions, access databases, or even control other software, all guided by its understanding of the task.

JORDAN: 05:57

Add to that rapid advances in foundation models like GPT-4, Claude 3 and 3.5, and Gemini, bigger "context windows" — which you can think of as the AI's working memory — and significant API cost reductions. These factors combined have lowered the economic barriers to deploying agents widely.

CASEY: 06:15

To put numbers on it: $3.8 billion was raised by AI agent startups just in 2024, tool use accuracy for Anthropic's Claude 3 is over 90%, and API costs have dropped by 75% between 2024 and 2025. That's why the market is charging ahead.

JORDAN: 06:33

Plus, job listings show over 6,200 AI agent engineer roles open right now — enterprises are actively hiring to build these systems.

MORGAN: 06:42

Clearly, it's not just hype — the tech, the market, and the talent are aligning to make agent engineering a strategic priority.

TAYLOR: 06:50

Let's unpack what agent engineering really means under the hood — but without getting lost in technical jargon. At its core, agent engineering is about managing AI's uncertainty and unpredictability when it interacts with the real world — think of it like managing a team of creative but unpredictable employees.

CASEY: 07:09

Unlike traditional software where you get the same output every time, AI agents produce varying responses — meaning their answers can differ given the same input, much like how a person might answer differently on different days.

TAYLOR: 07:23

Exactly. This connects directly to what we covered in the NLU episode—the shift from "closed-world design" where users click predefined buttons, to "open-world design" where users can say anything in natural language. That's the fundamental architectural change agent engineers are dealing with.

MORGAN: 07:41

So the NLU layer we discussed is actually the gateway that creates this unpredictability?

TAYLOR: 07:47

Precisely. And engineers need to design systems that can handle this variability gracefully. Key architectural approaches include cyclic reasoning loops — this is like the agent continually reassessing its situation and adjusting its plan — and tool selection optimization, which means choosing the right external tools for the task at hand.

MORGAN: 08:07

And you mentioned "human-in-the-loop" — that sounds important.

TAYLOR: 08:11

It is. It means involving human oversight in AI decision-making, especially for sensitive or complex cases. It's like having a manager review critical decisions before they're finalized. This balance ensures reliability while scaling AI autonomy.

CASEY: 08:26

So the discipline blends product strategy, engineering rigor, and data science to create AI agents that are adaptable but still trustworthy.

TAYLOR: 08:35

Yes, and that's fundamentally different from traditional software or machine learning engineering. The focus is on orchestration — coordinating multiple AI models, tools, and workflows — rather than building single, predictable systems.

MORGAN: 08:49

So agent engineering is the evolution of AI from a tool that informs to a system that acts autonomously in uncertain environments.

SAM: 08:57

Now here's something most leadership discussions about AI agents completely overlook: memory architecture. And it might be the most important factor determining whether your agent investments succeed or fail.

MORGAN: 09:10

Memory? That sounds very technical. Why should leaders care?

SAM: 09:14

Think about it this way: imagine hiring a brilliant executive assistant who forgets everything about you, your preferences, your past conversations, and your instructions every single time you interact with them. That's what most AI agents do today—they're goldfish. Brilliant goldfish that can analyze and respond, but goldfish nonetheless.

CASEY: 09:34

So memory is what transforms agents from one-shot responders to actual assistants that learn?

SAM: 09:40

Exactly. Memory transforms AI agents from simple, reactive tools into dynamic, adaptive assistants. Without it, agents must rely entirely on what's provided in a single session, limiting their ability to improve over time or personalize their responses.

MORGAN: 09:56

What types of memory are we talking about?

SAM: 09:59

There's a framework from Princeton researchers called CoALA—Cognitive Architectures for Language Agents—that breaks this down beautifully. Think of it like how human memory works, but for AI.

CASEY: 10:10

Procedural memory sounds like the agent actually learning to do its job better?

SAM: 10:16

Precisely. LangChain has developed something called LangMem specifically for this. It identifies patterns in how the agent performs and automatically updates the agent's operating instructions to reinforce effective behaviors. It's like the agent is continuously training itself based on real-world feedback.

MORGAN: 10:35

So the agent gets better at its job over time without someone manually reprogramming it?

SAM: 10:41

That's the promise. Research from late 2024 showed that agents with robust long-term memory actually improve the more they remember—they learn from mistakes and get better at their tasks.

CASEY: 10:53

What should leaders know about the vendor landscape here?

SAM: 10:57

It's maturing rapidly. Beyond LangMem, there's Mem0—which recently benchmarked 26% higher accuracy than OpenAI's memory solution with 91% faster performance. There's Zep with its knowledge graph approach—great for understanding relationships and timing. Memobase launched in January as an open-source option.

MORGAN: 11:17

So when evaluating agent projects, leaders should be asking: "What's our memory strategy?"

SAM: 11:22

Absolutely. If your team can't answer that question clearly, that's a red flag.

ALEX: 11:28

Building on the memory discussion, let's talk about how agents acquire genuine domain expertise—because that's often what separates agents that deliver real value from expensive chatbots.

CASEY: 11:39

Domain expertise? You mean the agent actually understanding your industry?

ALEX: 11:44

Yes. Most AI agents can access information, but they don't truly understand how concepts in your domain relate to each other. That's where ontology-based knowledge graphs come in.

MORGAN: 11:55

Can you explain that in business terms?

ALEX: 11:58

Sure. Think of an ontology as your company's "operating manual" for understanding a domain. If you're in financial services, it defines what a "growth stock" is, how it relates to "equity securities," which relates to "financial instruments," and so on. It captures the relationships and rules that experts in your field know intuitively.

CASEY: 12:18

So the agent isn't just searching for keywords—it's actually reasoning through relationships?

ALEX: 12:24

Exactly. Without this, an agent might find documents that mention "tech stocks" but won't understand that a specific company is a tech stock if the document doesn't explicitly say so. With an ontology, the agent can make that inference.

MORGAN: 12:39

This sounds like a significant upfront investment.

ALEX: 12:42

It can be, but the ROI is compelling. Organizations are reporting 300-320% returns on knowledge graph implementations. And here's the key insight: ontologies derived from existing databases—your CRM, your ERP, your product catalogs—perform comparably to those built from scratch, at a fraction of the cost.

CASEY: 13:02

So companies can leverage their existing data infrastructure?

ALEX: 13:06

Exactly. The best approach is often to start with what you have. Your structured databases already encode a lot of domain knowledge implicitly—you just need to make it explicit so agents can use it.

MORGAN: 13:18

What questions should leaders ask their teams about this?

ALEX: 13:22

First: "How does our agent understand our domain, beyond what's in the prompt?" If the answer is just "it reads our documents," that's a limited approach. Second: "Can our agent reason about relationships, or just retrieve information?" The difference is substantial in complex business scenarios.

TAYLOR: 13:40

Let's compare agent engineering with traditional roles, and then look at the full landscape of tools—because LangChain isn't the only option, and leaders should understand the choices.

CASEY: 13:51

Good point—we've heard a lot about LangChain. What else is out there?

TAYLOR: 13:56

The framework landscape has exploded. Let me walk through the major categories in business terms.

MORGAN: 14:02

So companies have real choices depending on their existing infrastructure?

TAYLOR: 14:07

Exactly. If you're a Microsoft shop, Semantic Kernel makes integration much smoother. If you need multiple agents working together—say, one for research, one for writing, one for review—CrewAI makes that intuitive. If you want maximum control and flexibility, LangGraph gives you that.

CASEY: 14:24

What about the AI models themselves—the actual "brains"?

TAYLOR: 14:28

You have meaningful choices there too. OpenAI's GPT-4 remains strong for general reasoning. Anthropic's Claude 3.5 and 3.7 models are often preferred for complex tool use—they have over 90% accuracy on tool invocation. Google's Gemini offers competitive pricing and good performance. Mistral provides strong options for enterprises concerned about data sovereignty.

MORGAN: 14:51

And vector databases we keep hearing about?

TAYLOR: 14:54

For leaders, think of vector databases as the "memory banks" that let agents quickly find relevant information. Pinecone is the managed cloud option with strong enterprise support. Weaviate is open-source with more flexibility. Chroma is lightweight for smaller deployments. The choice depends on scale, existing infrastructure, and whether you want managed versus self-hosted.

CASEY: 15:17

So when should a company hire an agent engineer versus an ML engineer or software engineer?

TAYLOR: 15:23

Use agent engineers when your product requires multi-step workflows, real-time tool integration, and handling unpredictable user inputs—like customer service automation or complex business process automation. ML engineers are better when the focus is on prediction accuracy with proprietary data—fraud detection, recommendation engines, forecasting models.

MORGAN: 15:44

Any metrics on team composition?

TAYLOR: 15:46

Industry experts recommend roughly four AI or agent engineers to every ML engineer, reflecting the complexity of agent systems. And importantly, prompt engineering roles have declined 80-90% since 2022—it's become a skill within agent engineering rather than a standalone role.

CASEY: 16:05

Interesting. So agent engineering is a broader, more integrated discipline that encompasses what used to be separate specialties.

ALEX: 16:13

Let's get into how agent engineering actually works — but I'll keep this accessible. Imagine building an AI assistant that can autonomously handle customer service tickets end-to-end.

MORGAN: 16:24

Sounds straightforward, but I know it's anything but.

ALEX: 16:27

Right. The agent needs to do four things well. First, perceive—understand what the customer is actually asking, even if they phrase it oddly or provide incomplete information. This is where the NLU layer we discussed in our previous episode comes in.

CASEY: 16:43

So it's perceive, remember, reason, act—in a loop?

ALEX: 16:46

Exactly. And the key insight is that this isn't a linear process—it's cyclical. The agent might act, observe that something unexpected happened, reason about why, and take a different action. Just like a human employee would.

MORGAN: 17:00

What makes this reliable enough for business use?

ALEX: 17:03

Three things. Checkpointing—saving progress so if something fails, the agent can resume rather than starting over. Human-in-the-loop controls—for high-stakes decisions, humans can review before the agent acts. And rigorous evaluation—constantly testing whether the agent is getting better or worse at its job.

CASEY: 17:22

Evaluation sounds critical.

ALEX: 17:24

It's the difference between demo agents and production agents. Tools like DeepEval run what we call "needle in a haystack" tests—hiding specific facts in large datasets to see if the AI finds them correctly. If your agent starts missing needles, you know something's wrong before customers do.

MORGAN: 17:42

So it's impressive how much engineering goes into managing uncertainty rather than eliminating it.

ALEX: 17:48

That's the magic — turning unpredictability from a bug into a feature, but with guardrails.

ALEX: 17:54

Now, what does all this engineering buy you? Klarna's AI assistant reduced customer service resolution times by 82%, handling 2.3 million conversations monthly, which translates to a $40 million profit improvement.

MORGAN: 18:08

Wow, 82% faster? That's a massive operational win.

ALEX: 18:12

Absolutely. Decagon's agents autonomously handle 70% of 60,000 monthly tickets, delivering ten times higher ticket deflection rates than expected — meaning fewer human agents needed and huge cost savings.

CASEY: 18:25

That kind of deflection is a game-changer for scaling customer support.

ALEX: 18:30

TELUS processes over 100 billion tokens monthly with AI agents, boosting engineering productivity by 30%. Microsoft reported 353% ROI for Copilot users in small and medium businesses. Equinix achieved 68% employee request deflection with 43% fully autonomous resolution.

MORGAN: 18:49

Those aren't just numbers — they're competitive advantages that can reshape markets.

ALEX: 18:54

Exactly. And a recent Accenture survey shows enterprises using AI agents grow revenue 2.5 times faster and improve productivity by 2.4 times compared to their peers.

CASEY: 19:06

So while agent engineering is complex and imperfect, the payoff is real and substantial when done right.

ALEX: 19:12

The companies succeeding treat this as rigorous engineering with realistic expectations—not as magic transformation. That mindset difference matters.

CASEY: 19:22

But let's dial back the enthusiasm for a moment. We've seen AI agents failing 70% of real-world tasks in benchmarks. That's a huge concern for enterprise reliability.

MORGAN: 19:32

That's true — a failure rate that high means many agents aren't ready for critical roles yet.

CASEY: 19:38

Plus, Gartner predicts 40% of agentic AI projects will be canceled by 2027 due to cost overruns and unclear value. There's a real risk of overhyping and over-specializing.

MORGAN: 19:50

So companies might invest heavily only to see projects stall or fail?

CASEY: 19:55

Exactly. And the role definition problem is real—Pave's data shows 83% of AI/ML employees still have "Machine Learning" in their title, while only 3% have "Artificial Intelligence." The market hasn't standardized what these roles even mean.

MORGAN: 20:10

What about safety and security?

CASEY: 20:12

Agents accessing multiple tools and external data pose real risks — data leaks, unauthorized actions, compliance issues. And explainability is limited, meaning it's hard to audit why an agent made a specific decision. For regulated industries, this is a significant hurdle.

MORGAN: 20:29

Sounds like a balancing act between innovation and caution.

CASEY: 20:33

Precisely. Leaders need to manage expectations carefully, avoid chasing every shiny new project, and invest in robust evaluation and oversight frameworks. The winners will be those who combine genuine capability building with healthy skepticism.

SAM: 20:48

Let's look at where agent engineering is already making waves. Customer service leads the charge — Sierra, founded by former Google and Salesforce executives, deploys agents for clients like Sonos and SiriusXM that double to triple resolution rates. Klarna and Decagon are automating multi-step workflows, from troubleshooting to refunds, dramatically reducing human workloads.

MORGAN: 21:10

That really aligns with the payoff metrics Alex mentioned.

SAM: 21:14

In financial services, firms use agents to modernize legacy systems and assist analysts. McKinsey QuantumBlack helped a large bank tackle 400 pieces of legacy software—a $600 million plus project—using AI agent "squads" supervised by humans who shifted from doing the work to overseeing the agents.

CASEY: 21:34

That's a fascinating shift—from doer to supervisor.

SAM: 21:37

Exactly. That's the pattern we're seeing: humans moving from execution to oversight. Telecommunications uses agents for tech support, handling complex diagnostics across networks. Healthcare employs AI agents for diagnostic assistance and patient triage, although safety concerns mean human oversight remains critical.

MORGAN: 21:56

So adoption is broad, but with varying levels of autonomy depending on risk.

SAM: 22:01

Right. Job growth reflects this — agent engineer roles have grown 143% since 2024 across industries, a strong signal of broad adoption and strategic importance.

CASEY: 22:12

These real-world examples show agent engineering isn't theoretical — it's impacting business outcomes today.

SAM: 22:19

Let's throw a scenario into the ring: a company wants to automate customer support tickets with 60-80% autonomous resolution within six months. Who wins — agent engineering or traditional ML engineering?

TAYLOR: 22:32

I'm Team Agent Engineer here. For complex, multi-step workflows with unpredictable inputs, agents shine. They can orchestrate multiple tools, escalate to humans when needed, and iterate quickly — typical timeline is 3 to 6 months to production. They're built to handle the "open-world" nature of customer conversations we discussed in the NLU episode.

CASEY: 22:54

But those agents come with higher per-query costs and potentially more unpredictable behavior. That could be a dealbreaker for some enterprises.

ALEX: 23:03

I'd back ML engineers for accuracy when you have proprietary, well-labeled data. They require longer training cycles — 6 to 9 months — but deliver highly precise handling for well-defined query types. Better for compliance-heavy industries where you need to explain exactly why a decision was made.

MORGAN: 23:21

So if speed and flexibility matter more, agent engineering wins; if accuracy and auditability are paramount, ML engineering is better?

SAM: 23:30

Exactly. The agent approach typically achieves 60-80% autonomous resolution; traditional ML approaches tend toward 30-40% but with higher precision. It's really about fit to the use case and business priorities.

CASEY: 23:44

Leaders should weigh factors like iteration speed, cost, data availability, regulatory requirements, and risk tolerance before choosing.

SAM: 23:52

That's the takeaway — no one-size-fits-all.

SAM: 23:55

For leaders wondering where to start, here's a practical framework.

MORGAN: 24:00

What about memory and knowledge—we discussed those earlier?

SAM: 24:03

Critical day-one decisions. Your team should be able to answer: "How will our agent remember context across conversations?" and "How will it understand our specific domain?" If those answers are vague, slow down and get clarity before investing further.

CASEY: 24:19

What about managing tool sprawl?

SAM: 24:21

Good point. Research shows reducing the number of integrated tools can improve accuracy by 40% and reduce execution time by 70%. Start with the essential tools for your use case; resist the urge to connect everything at once. You can always add more later.

ALEX: 24:38

Also, invest in evaluation from day one. Tools like DeepEval and LangSmith help you understand whether your agent is actually getting better or just getting deployed. The companies that struggle are often those that shipped without rigorous testing infrastructure.

MORGAN: 24:54

Any team structure advice?

SAM: 24:56

Build what experts call "T-shaped" teams—people with deep agent engineering skills combined with broad AI and software knowledge. Pure specialization is risky in a field evolving this fast. The recommended ratio is roughly four AI/agent engineers to every ML engineer.

CASEY: 25:13

And start small?

SAM: 25:14

Absolutely. Pick one high-value workflow, define clear success metrics, prove value, then expand. The companies failing are often those trying to "transform everything" at once.

MORGAN: 25:25

Before we move on, a quick plug — if you want the full picture beyond what we cover here, Keith Bourne's second edition of "Unlocking Data with Generative AI and RAG" is a treasure trove. It's being updated for the latest frameworks and covers the RAG and agent architectures we've been discussing. Well worth adding to your leadership reading list—or assigning to your technical teams.

MORGAN: 25:48

This episode is brought to you by Memriq AI — an AI consultancy and content studio building tools and resources for AI practitioners.

CASEY: 25:57

We produce this podcast to help engineers and leaders stay ahead in the fast-evolving AI landscape. For more AI deep-dives, practical guides, and research breakdowns, head over to Memriq.ai.

MORGAN: 26:08

They're a great partner to keep your AI strategy sharp.

SAM: 26:12

Let's talk about the big open challenges still facing agent engineering. Reliability tops the list — with current failure rates around 70%, enterprises can't blindly trust agents for mission-critical tasks yet.

CASEY: 26:25

That's a showstopper in regulated industries or where customer trust is paramount.

SAM: 26:31

Then there's the lack of industry-standard benchmarks to evaluate agent quality in production. Without agreed metrics, measuring progress and comparing solutions is tough—every vendor claims their approach is best, but apples-to-apples comparison is nearly impossible.

MORGAN: 26:47

What about security?

SAM: 26:49

Agents' access to external tools and data creates vulnerabilities. Without robust safeguards, there's risk of data leaks or unauthorized actions. This is especially concerning as agents get more autonomous—they can do more damage faster if something goes wrong.

ALEX: 27:05

And cost optimization remains a challenge — API usage isn't cheap at scale. Balancing performance with expense requires sophisticated engineering that many teams underestimate.

CASEY: 27:16

Explainability is another hurdle — it's often unclear why an agent made a specific decision, making audit and compliance difficult. For regulated industries, this alone can be a blocker.

SAM: 27:27

There's also active debate about whether "agent engineer" will persist as a distinct role or eventually merge back into broader AI engineering. The technology is evolving so fast that today's specialization might be tomorrow's basic skill.

MORGAN: 27:42

So lots of open questions for leaders to monitor.

SAM: 27:45

All of these require strategic investment and oversight to mature agent engineering into a dependable enterprise capability. The good news: the companies figuring this out now will have significant competitive advantages.

MORGAN: 27:59

My key takeaway? Agent engineering is a new frontier for AI — it's messy, uncertain, but packed with strategic potential. Leaders should embrace it thoughtfully, not reactively.

CASEY: 28:10

I'd say, don't get swept away by hype. Manage expectations, focus on rigorous evaluation, and remember that 70% failure rates and $40 million profit improvements can both be true simultaneously. The discipline is real; the hype is overblown. Both statements are accurate.

JORDAN: 28:27

For me, the human-in-the-loop approach is critical — it's the bridge between AI autonomy and business trust. The successful deployments we're seeing treat this as human-on-the-loop supervision, not human replacement.

TAYLOR: 28:40

Understanding the full landscape of frameworks and tools—not just one vendor's pitch—will help you make better strategic decisions. The choices matter more than most leaders realize.

ALEX: 28:51

I'm excited by how memory systems and knowledge graphs can give agents genuine domain expertise—but these are day-one architectural decisions, not bolt-on features. Ask your teams about memory strategy early.

SAM: 29:04

And finally, keep your eye on the open problems. Invest in reliability, security, and explainability now to avoid costly surprises later. The leaders who build those foundations will outpace those chasing features.

MORGAN: 29:17

That wraps up this episode of Memriq Inference Digest - Leadership Edition. Thanks to Casey, Jordan, Taylor, Alex, and Sam for their insights.

CASEY: 29:26

If you're exploring agent engineering, remember: it's a journey with real risks and big rewards. And if you haven't caught our NLU layer episode, go back and listen—understanding that paradigm shift is foundational to everything we've discussed today.

MORGAN: 29:41

Thanks for listening. We'll see you next time — and until then, keep leading with insight.

Next Episode All Episodes Previous Episode

About the Podcast

Show artwork for The Memriq AI Inference Brief – Engineering Edition

The Memriq AI Inference Brief – Engineering Edition

RAG pipelines, agent memory, knowledge graphs — the technical details that matter. Let's dig in.

Listen for free

RSS Feed Spotify

About your host

Memriq AI

Keith Bourne (LinkedIn handle – keithbourne) is a Staff LLM Data Scientist at Magnifi by TIFIN (magnifi.com), founder of Memriq AI, and host of The Memriq Inference Brief—a weekly podcast exploring RAG, AI agents, and memory systems for both technical leaders and practitioners. He has over a decade of experience building production machine learning and AI systems, working across diverse projects at companies ranging from startups to Fortune 50 enterprises. With an MBA from Babson College and a master's in applied data science from the University of Michigan, Keith has developed sophisticated generative AI platforms from the ground up using advanced RAG techniques, agentic architectures, and foundational model fine-tuning. He is the author of Unlocking Data with Generative AI and RAG (2nd edition, Packt Publishing)—many podcast episodes connect directly to chapters in the book.