Invented by Kleiman; Laura J., Peterson; Patrick M., Marvel; Rick
The world of customer service is always moving. Companies want to help people faster and better. One way they do this is by using Interactive Voice Response (IVR) systems—the phone menus that talk to you when you call a bank or store. But making these systems work well is hard and takes a lot of time. This article will explain a new way to make IVR systems smarter and easier to manage, based on a recent patent application. We’ll explore why this matters, how previous systems worked, and what makes this new idea special.
Background and Market Context
When you pick up your phone and call customer service, you often hear a computer voice. This is the IVR system. It asks you to press buttons or say words, then tries to help you or send you to the right person. IVR systems are everywhere—banks, airlines, phone companies, and many more use them. These systems help companies answer lots of calls without needing a real person every time. This saves money and time.
But IVR systems are only helpful if they work well. If callers get stuck, confused, or annoyed, they may hang up or become unhappy customers. That’s why companies want their IVR to be quick, clear, and friendly. To make IVRs better, companies often record calls and have people listen to them. These people try to find out which parts of the IVR work and which parts don’t. They listen for the prompts—the messages the IVR says to callers—and see how people respond.
Doing this by hand is slow and expensive. Imagine having to listen to thousands of phone calls, just to mark where the IVR says “Please enter your account number” or “Your call may be recorded.” For a typical IVR with hundreds of different prompts, this can take weeks. If the company changes the IVR, the whole process starts again from scratch. This makes it hard for companies to quickly improve their systems.
Because of these challenges, there is a growing need for tools that can automatically find and mark these prompts in call recordings. If technology can do this job, human analysts can spend more time finding ways to help customers and less time doing boring, repetitive work. This is why the market is eager for smart, computer-assisted tools that make IVR analysis faster and better.
Scientific Rationale and Prior Art
Let’s look at how things worked before this new invention. Traditionally, human analysts had to listen to hours and hours of calls. They would carefully try to spot every prompt the IVR said, cut out a short snippet (about 800 milliseconds), and save it for future use. These snippets would be stored in a database. Later, software could use these snippets to search other calls for the same prompt. But before the software could do any of that, a lot of manual work was needed.
There have been some attempts to speed things up. Some systems use audio signal processing to try and detect known prompts. If a prompt is already in the database, the software can find it in new calls. But if the IVR changes, or if there are new prompts, humans still have to listen to more calls and cut new snippets. This is a big bottleneck, especially for companies with large or changing IVRs.
Another approach has been using text transcripts. With speech-to-text software, call recordings can be turned into written words. Analysts can then look for common phrases. This can be quicker than working with audio, but it’s not always accurate. Speech-to-text software can make mistakes, especially if the audio is noisy or the speaker has an accent. Also, even if the words are right, sometimes the same prompt is said in slightly different ways. This makes matching tricky.
Some software tries to group similar prompts together using text similarity checks or machine learning. For example, if “Please enter your account number” and “Enter your account number now” mean the same thing, a good system should put them together. Word embeddings—mathematical representations of words—have helped with this, letting computers find phrases with the same meaning even if the words are not exactly the same. But again, this only works well if you already have a set of prompts to look for.
To sum up, older systems have three main problems:
First, they depend too much on human effort to find and tag prompts. Second, they get confused when prompts are new or said in different ways. Third, they struggle to keep up when the IVR changes often. This means companies still spend weeks setting up their IVR analysis tools, and customers wait longer for improvements.
Invention Description and Key Innovations
The patent we are looking at brings a fresh approach to finding IVR prompts. Instead of relying on humans to find every prompt, this method lets a computer do most of the work. Here’s how it works, step by step, in simple words.
First, the computer gets a big batch of recorded calls from the IVR system. It then breaks each call into pieces called “utterances.” An utterance is a stretch of speech between silences. For example, the IVR might say “Please enter your account number,” then there’s a pause—that’s one utterance. The caller might respond, and after a pause, the IVR might say something else—that’s another utterance.
Once the system has all these utterances, it needs to figure out which ones are likely to be IVR prompts. Here’s the clever part: computers (like IVRs) are much more repetitive than humans. If a certain audio snippet pops up in many different calls, it’s probably a pre-recorded IVR prompt, not a human speaking. Humans rarely say exactly the same thing in exactly the same way, but computers do.
So the system compares each utterance to a growing set of known prompt candidates. If the utterance matches something already in the set—maybe it sounds almost the same, or the words are very similar—it adds one to a counter for that prompt candidate. If it doesn’t match anything, it creates a new prompt candidate and starts counting. This process continues for all utterances in all calls.
After checking all the calls, the system looks at the counts. If a certain prompt candidate shows up a lot—above a set threshold—it gets marked as a “good” prompt. For example, if a prompt is found in more than 5% of calls, it’s likely an IVR prompt. The system can then cut an audio snippet (say, 800 milliseconds from the start of the utterance) and save it for future use.
This process can be done both with audio (comparing sound waves) and with text (comparing transcripts). For text, the system uses word embeddings and cosine similarity to group utterances with similar meanings. This way, even if the prompt is said in slightly different ways, or if the transcript isn’t perfect, the system can still group them together.
The system can also tag call recordings with the exact places where prompts appear, making it much easier for human analysts to review calls later. If needed, humans can double-check or adjust the results, but most of the heavy lifting is done by the computer.
What makes this invention stand out is how it blends automation with smart grouping. It doesn’t just look for exact matches—it uses patterns, frequency, and meaning to find prompts even when things change. It works even if you don’t have a list of prompts to start with. This cuts down setup time from weeks to days, lets companies respond faster when they update their IVRs, and frees up analysts to focus on improving the customer experience, not tagging calls.
From a technical view, the invention offers flexibility. It can work with audio, text, or both. It uses silence detection to find utterances, but it can also pick up on changes in tone or voice type. It doesn’t just match sound—it looks for clusters of similar prompts and can handle small changes in how prompts are said. It can even create a spreadsheet of all the good prompts, with links to the exact spots in the calls where they appear.
This new way of detecting prompts is not locked to one kind of computer or software. It can be run on regular computers, in the cloud, using virtual machines, or even as a containerized microservice. This means it fits easily into today’s flexible, modern IT setups.
Conclusion
IVR systems are a key part of how companies help their customers, but making these systems work well has always been hard. The old ways of finding and marking prompts were slow and needed too much human work. The new method described in this patent changes that. It lets a computer scan lots of calls, find repeating prompts using audio and text analysis, and build a library of prompts quickly. This means faster setup, easier updates, and better experiences for both companies and callers. As companies look for smarter customer service tools, inventions like this are paving the way for quicker, more efficient, and more human-friendly IVR systems.
Click here https://ppubs.uspto.gov/pubwebapp/ and search 20250218434.




