How NLP can drive speech recognition use in commercial environments

Features

24/03/2021

Control by voice is becoming more commonplace in commercial environments. The next step is to not just recognise what’s said, but analyse how something is said too.

Tim Kridel catches up with Rana Gujral, the CEO of Behavioral Signals, an AI-driven technology developer working in the field of natural language processing (NLP) and focused on prediction of actions based on analysis of human reactions.

TK: How does NLP determine a person’s emotional state, especially in applications that involve a wide variety of people and personalities? For example, when some people are upset or frustrated, they talk low and slow, while other people raise their voice/pitch and speak faster. (This has been a challenge for NLP for call centre virtual assistants.) So if the NLP is in, for example, interactive digital signage or a conference room control system, how can it tell when some users are getting what they want and which ones are about to give up?

RG: The context of the interaction is what really makes the difference. By context here we mean things like who is the speaker, how they started the conversation, what is their history with the call centre, etc. The more the system knows about the speaker and the situation the better it can decipher the emotions and behaviours exhibited. So, it is important to identify a speaker’s neutral (or emotion-balanced) state based on all these things and then process emotions as cases of divergence from that state. There is also a global “neutral” state-of-course which is determined based on data from multiple speakers, which allows the system for example to identify a speaker from a reference standpoint. For example, relatively angrier than others.

TK: Do accents, slang and other attributes affect NLP’s ability to analyse a person’s state of mind?

RG: Yes, accents could affect the system’s ability to robustly analyse someone’s state of mind. It’s a factor of having properly modelled the context of the interaction. It would be very difficult for a generically-trained emotion-recognition system to properly account for the idiosyncratic properties of speakers in a specific region, or of a specific culture, for example. Adaptation of the models by employing some sort of transfer learning is typically a requirement in such cases.

TK: In this interview you talk about how conferencing vendors could use the API to refine their technology, such as to provide better user experiences. That was more than a year ago, so I’m wondering if you can share any updated information.

RG: There are many compelling applications. I’d like to highlight a couple of them:

Adjusting the video layout in teleconferences that involves a group of people based on the way they speak and their overall behaviour to ensure better engagement of all participants.
Providing real-time feedback to the participants in business-related conversations to help make a participant aware of their emotions and potentially help them avoid situations in which, for example, they may overreact.

TK: Are any vendors giving their enterprise clients access to tools for collecting and analysing employee and/or customer speech interactions? In other words, just as vendors can use those insights to refine their products, maybe those enterprises could use them to refine workstyles, customer-facing interactions, etc.?

RG: Certainly, such tools are being increasingly adopted by the call-centres of large enterprises. These tools help these enterprises draw significant insights from interactions with the customers. In fact, now that we see an increasing number of interactions taking place virtually, these interactions can be analysed similar to phone calls. As a result, these tools will likely be employed to refine individual workstyles and improve the overall working environment.

TK: Gartner predicts that “by 2025, 75% of conversations at work will be recorded and analysed, enabling the discovery of added organisational value or risk.” What are some challenges to achieving that? For example, in a recent feature I explored how GDPR and other security/privacy concerns affect enterprise use of smart speakers and other devices that are continually listening – even if it’s just for wake words. Wouldn’t those regulations and concerns limit what vendors and/or enterprises can collect?

RG: Companies often look at regulations as problematic. GDPR definitely has its detractors. A common belief is that stricter regulations place these companies at a disadvantage in comparison to companies which operate in countries with lesser restrictions. Certainly, with more oversight, one has an added overhead of compliance. However, these concerns are misplaced. Trust in a digital world can only be accomplished through stricter guidelines – as a consumer, when we feel more secure, we are more likely to share information. This in turn allows companies access to more data to work with. Hence GDPR helps build trust and is really good for everyone. Applying GDPR delivers a higher standard which in turn leads to a competitive advantage.