Effective Conversational AI you own this product

Chatbots that work

Andrew Freed, Cari Jacobs, Eniko Rozsa
Foreword by Jesús Mantas

April 2025
ISBN 9781633436404
328 pages

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / Machine Learning / Natural Language Processing

table of content

Part 1 Framework for improving conversational AI

1 What makes conversational AI work?

1.1 Introduction to conversational AI

1.1.1 Why use conversational AI?

1.1.2 How does conversational AI work?

1.1.3 How you build conversational AI

1.2 Introduction to generative AI in conversational AI

1.2.1 What is generative AI

1.2.2 Generative AI guardrails

1.2.3 Effectively using generative AI in conversational AI

1.3 Introducing continuous improvement in conversational AI

1.3.1 Why continuously improve

1.3.2 The continuous improvement cycle

1.3.3 Communicating continuous improvement to stakeholders

1.4 Follow along

Summary

2 Building a conversational AI

2.1 Building an FAQ bot

2.1.1 FAQ bot foundations

2.1.2 Static question and answering

2.1.3 Dynamic question and answering

2.2 Routing agents and process-oriented bots

2.2.1 Routing agents

2.2.2 Transitioning from a routing agent to a process-oriented bot

2.3 Responding to the user with generative AI

2.3.1 Integrating with an LLM

2.3.2 Routing requests to an LLM

Summary

3 Planning for improvement

3.1 Knowing when you need to improve

3.2 Your cross-functional team

3.3 Driving to the same goal

3.3.1 Revisit business goals

3.3.2 Effectiveness

3.3.3 Coverage

3.4 Identifying and resolving problems

3.4.1 Finding problems

3.4.2 Group review

3.4.3 Determining acceptance criteria

3.5 Developing and delivering fixes

3.5.1 Sprint planning

3.5.2 Measure again

Summary

Part 2 Pattern: AI doesn’t understand

4 Understanding what your users really want

4.1 Fundamentals of understanding

4.1.1 The impact of weak understanding

4.1.2 What causes weak understanding?

4.1.3 How do we achieve understanding with traditional conversational AI?

4.1.4 How do we achieve understanding with generative AI?

4.2 How is understanding measured?

4.2.1 Measuring understanding for traditional (classification-based) AI

4.2.2 Measuring understanding for generative AI

4.2.3 Measuring understanding with direct user feedback

4.3 Assessing where you are today

4.3.1 Assessing your traditional (classification-based) AI solution

4.3.2 Assessing your generative AI solution

4.4 Obtaining and preparing test data from logs

4.4.1 Obtaining production logs

4.4.2 Guidelines for identifying candidate test utterances

4.4.3 Preparing and scrubbing data for use in iterative improvements

4.4.4 The annotation process

4.5 What does the data tell us?

4.5.1 Interpreting annotated logs for traditional (classification-based) AI

4.5.2 Interpreting annotated logs for generative AI

4.5.3 The case for iterative improvement

Summary

5 Improving weak understanding for traditional AI

5.1 Building your improvement plan

5.1.1 Identify problematic patterns in misunderstood utterances

5.1.2 Incremental improvements

5.1.3 Where to start: Identifying the biggest problems

5.2 Solving “wrong intent matched”

5.2.1 Improve recall for one intent

5.2.2 Improve precision for one intent

5.2.3 Improve the F1 score for one intent

5.2.4 Improve precision and recall for multiple intents

5.3 Solving “no intent matched”

5.3.1 Clustering utterances for new intents

5.3.2 When to stop adding intents

5.4 Supplementing traditional AI with generative content

5.4.1 Combining traditional and generative AI for an intent

5.4.2 Prompting to convey understanding

Summary

6 Enhancing responses with retrieval-augmented generation

6.1 Beyond intents: The role of search in conversational AI

6.1.1 Using search in conversational AI

6.1.2 Benefits of traditional search

6.1.3 Drawbacks of traditional search

6.2 Beyond search: Generating answers with RAG

6.2.1 Using RAG in conversational AI

6.2.2 Benefits of RAG

6.2.3 Combining RAG with other generative AI use cases

6.2.4 Comparing intents, search, and RAG approaches

6.3 How is RAG implemented?

6.3.1 High-level implementation

6.3.2 Preparing your document repository for RAG

6.4 Additional considerations of RAG implementations

6.4.1 Can’t we just use an LLM directly?

6.4.2 Keeping answers current and relevant with RAG

6.4.3 How easy is it to set up the ingestion pipeline?

6.4.4 Handling latency

6.4.5 When to use a fallback mechanism and when to search

6.5 Evaluating and analyzing RAG performance

6.5.1 Indexing metrics

6.5.2 Retrieval metrics

6.5.3 Generation metrics

6.5.4 Comparing efficiency of indexing and embedding solutions for RAG

Summary

7 Augmenting intent data with generative AI

7.1 Getting started

7.1.1 Why do it: Pros and cons

7.1.2 What you need

7.1.3 How to use the augmented data

7.2 Hardening your existing intents

7.2.1 Get creative with synonyms

7.2.2 Generate new grammatical variations

7.2.3 Build strong intents from LLM output

7.2.4 Creating even more examples with templates

7.3 Getting more creative

7.3.1 Brainstorm additional intents

7.3.2 Check for confusion

Summary

Part 3 Pattern: AI is too complex

8 Streamlining complex flows

8.1 The pain of complexity

8.1.1 Complexity’s effect on the end user

8.1.2 Complexity’s effect on business metrics

8.1.3 The incremental cost and benefit of reducing complexity for the user

8.2 Simplifying and streamlining the user journey

8.2.1 Spotting complex dialogue flows

8.2.2 Using what is known about the user

8.2.3 Aligning with the user’s mental model

8.2.4 Allowing flexibility in the expected user responses

8.2.5 Supporting self-service task flows with API/backend processes

Summary

9 Harnessing context for an adaptive virtual assistant experience

9.1 Importance of context in virtual assistant performance

9.1.1 How context influences user interactions

9.1.2 What is contextual information?

9.2 Understanding modality

9.2.1 Comparing modalities

9.2.2 Importance of modality in designing virtual assistant flows

9.2.3 Examples of how modality affects user experience

9.2.4 Voice bot design considerations

9.3 Enhancing context awareness and improving the overall user experience with RAG

9.3.1 Designing adaptive flows with RAG

9.3.2 Strategies for retrieving and generating contextually relevant responses

9.3.3 Maintaining and updating adaptive flows

Summary

10 Reducing complexity with generative AI

10.1 AI-assisted process flows at build time

10.1.1 Generating dialogue flows with generative AI

10.1.2 Improving dialogue flow with generative AI

10.2 AI-assisted process flows at run time

10.2.1 Executing dialogue flows with generative AI

10.2.2 Using LLM for a search process

10.3 AI-assisted flows at test time

10.3.1 Setting up generative AI to be the user

10.3.2 Setting up the conversational test

Summary

Part 4 Pattern: Reduce friction

11 Reducing opt-outs

11.1 What drives opt-out behavior?

11.1.1 Immediate opt-out drivers

11.1.2 Motivations for later opt-outs

11.1.3 Gathering data on opt-out behavior

11.2 Reducing immediate opt-outs

11.2.1 Start with a great experience: Greetings and introductions

11.2.2 Convey capabilities and set expectations

11.2.3 Incentivize self-service

11.2.4 Allow the user to opt in

11.3 Reducing other opt-outs

11.3.1 Try hard to understand

11.3.2 Try hard to be understood

11.3.3 Be flexible and accommodating

11.3.4 Convey progress

11.3.5 Anticipate additional user needs

11.3.6 Don’t be rude

11.4 Opt-out retention

11.4.1 Start right away by collecting opt-out data

11.4.2 Implementing an opt-out retention flow

11.5 Improving dialogue with generative AI

11.5.1 Improving error messages with generative AI

11.5.2 Improving greeting messages with generative AI

11.6 Sometimes it’s okay to escalate

Summary

12 Conversational summarization for smooth handoff

12.1 Intro to summarization

12.1.1 Why summarization is needed

12.1.2 Elements of effective summaries

12.2 Preparing your chatbot for summarization

12.2.1 Using out-of-the-box elements

12.2.2 Instrumenting your chatbot for transcripts

12.2.3 Instrumenting your chatbot (for data points)

12.3 Improving summaries with generative AI

12.3.1 Generating a text summary of a transcript with summarizing prompts

12.3.2 Generating a structured summary of a transcript with extractive prompts

Summary

Overview

1 What makes conversational AI work?

This chapter sets the stage for building conversational AI that users actually want to use. It defines conversational AI and the main solution types—question answering, process-oriented workflows, and routing—and explains why so many assistants fail: they misread intent, impose unnecessary complexity, or trigger immediate opt-outs. The chapter introduces a simple, universal flow for successful systems: understand what the user wants, gather only the information needed, and deliver the result quickly and ethically. It emphasizes user-centered design, the interplay between intent models, dialogue, and APIs, and the importance of designing for the channel and context to reduce friction and personalize help.

The chapter then introduces generative AI as a complementary tool to classic techniques. Large language models can bolster intent understanding, simplify and improve copy, power retrieval-augmented answers, and accelerate builder workflows such as data augmentation and dialogue drafting. Because LLMs can be biased or hallucinate, the chapter stresses practical guardrails: choosing appropriate models and training data, adding contextual prompts, pre- and post-filtering for unsafe content, and keeping humans in the loop when risk is high. It also highlights the need to experiment with models and parameters for each task, optimizing for both performance and safety.

Finally, the chapter advocates a disciplined, continuous improvement cycle: measure, identify a problem tied to business outcomes, implement targeted fixes, deploy, and repeat. Small, incremental changes are preferred over large, risky overhauls because they deliver value sooner, are easier to diagnose, and create more learning opportunities. Success depends on improving the full chain—engagement, understanding, and fulfillment—and communicating progress in business terms. Teams should tie technical work to metrics like containment, average handle time, time to resolution, and customer satisfaction, ensuring stakeholders see clear, compounding value from ongoing enhancements.

A painful chat experience with a process-oriented bot that puts cognitive burden on the user. The AI has not provided any value in three conversational turns.

A delightful experience that uses context and reasonable assumptions to complete the user's goal quickly. The context could be loaded from a log-in process (chat) or from caller phone number (voice).

Flow diagram for conversational AI. In many use cases “additional information” includes user profile data.

Conversational AI logical architecture annotated with password reset example.

It takes a dream team with diverse skills to build an enterprise-ready conversational AI.

Adding context in the prompt is an important way to guide a large language model.

Impact of changing one LLM parameter (repetition penalty).

Cumulative success in a process is dependent on success in each of the individual steps. Visually it looks like a funnel that narrows after each step.

A continuous improvement lifecycle for conversational AI.

Large changes — like retraining all intents — take a long time and have less predictable outcomes.

Many small changes — like retraining one intent at a time — has a smaller “blast zone” for each change, bringing quicker value and more learning.

Area over the dotted line is additional business value over “big bang” change. Working code in production delivers value!

Summary

Conversational AI must be built with the user experience in mind. Good conversational AI helps users complete their tasks quickly. Bad conversational AI frustrates users.
There are thousands of generative AI models. Large language models are a subtype of generative AI models good at generating text.
LLMs can perform many tasks with impressive performance but also have significant risks including hallucination. It takes thoughtful guidance and guardrails to use LLMs effectively and responsibly.
LLM technology can supplement conversational AI. LLMs can respond to users directly and also assist you in building your conversational AI.
Continuous improvement is possible and necessary for effective conversational AI.
Iterative improvement delivers higher business value with lower risk.

FAQ

What are the main types of conversational AI?

Three common types: 1) Question-answering (FAQ bots that reply directly), 2) Process-oriented or transactional assistants (guide users through multi-step tasks and often call APIs), and 3) Routing agents (triage and hand off to the right bot or human). Many real systems combine all three.

What are the top failure modes to watch for?

The big three are: 1) The bot doesn’t understand user intent, 2) The flow puts too much complexity on the user, and 3) Users immediately opt out. Fixes include improving intent training with representative data, simplifying and personalizing dialogue with context, and writing concise, engaging copy.

How does a conversational AI work at a high level?

It follows three steps: 1) Figure out what the user wants (NLU/intent recognition), 2) Gather needed information (dialogue management, state, and API orchestration), and 3) Fulfill the request (execute a transaction, answer, or route to a human).

What skills and components are needed to build an effective assistant?

You need a cross‑functional team (design, data science, development, product/compliance). Core components include an intent classifier, well-defined APIs, and a conversation flow that collects the right info for fulfillment while respecting channel constraints and security.

When should I use generative AI versus classic techniques?

Use classic intent/flow orchestration for reliable, procedural tasks; add generative AI to enhance understanding, generate answers from your content (RAG), summarize, and improve copy or training data. They work best together: generative AI augments, not replaces, classic approaches.

How do I reduce hallucinations and keep generative AI safe?

Apply layered guardrails: pick suitable models and training data, pre-filter inputs for hate/abuse/profanity, provide clear contextual prompts (and retrieved documents), post-filter outputs, and, for higher risk, keep a human in the loop. Monitor before, during, and after deployment.

What is retrieval‑augmented generation (RAG) and why use it?

RAG retrieves relevant documents from your trusted sources and uses them as context for an LLM to generate answers. It grounds responses in your content, improving relevance and reducing hallucinations for question‑answering use cases.

How do I choose the right LLM and parameters?

Match the model to the task (generation, classification, extraction, summarization, RAG) and experiment across prompts and parameters (e.g., repetition penalty, temperature). Don’t generalize from a single test; evaluate cost, latency, and quality on representative inputs.

What does a continuous improvement cycle look like?

Measure a baseline, identify a problem tied to a business metric, implement a targeted change, deploy, and repeat. Favor many small, reversible changes over big‑bang rewrites—they deliver value sooner, reduce risk, and create more learning opportunities.

Which business metrics should I use to show value to stakeholders?

Link technical work to outcomes like containment, average handle time, human touches (for routing), time to resolution, NPS, and compliance. Communicate in business terms (impact on cost and satisfaction), not just technical metrics like intent F1 scores.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$55.99 $33.59

you save $22.40 (40%)

include audio $24.99 $14.99

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$55.99 $33.59

you save $22.40 (40%)

include audio $24.99 $14.99

eBook

pdf, ePub, online

$55.99 $33.59

you save $22.40 (40%)

include audio $24.99 $14.99

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more