The success of speech recognition in commercial applications comes down to trust

Features

30/03/2021

Joel Chimoindes, VP Europe at distributor Maverick AV Solutions, speaks to Tim Kridel about the opportunities for speech recognition in pro AV applications.

TK: What’s driving interest in and adoption of speech recognition for pro AV applications? And which ones (e.g., interactive digital signage for wayfinding, conference room control systems)? For example, is Covid-19 a recent driver because speech interaction eliminates the need for touching a display and thus the risk of the virus being spread by the screen? And what were some drivers before the pandemic that will continue long afterward?

JC: The rapid adoption and improvement of voice recognition technology in the consumer space is impacting demand for its integration in the workplace and interactive signage applications. The events space has been driving innovation in the area with live subtitling and translation already emerging at international conferences. At Microsoft Inspire in 2019, they showcased a language-translating HoloLens hologram and this was an inflexion point, showcasing the possibilities of the technology.

The last 12 months have been transformative in the integration of voice control into our everyday lives with applications now linked to many of our work processes, including meeting transcription and diary scheduling.

The requirement for touchless technology experiences has certainly increased during the pandemic, but the real driver is the increasing sophistication of the applications living behind the voice technology. AI is driving the collection and processing of the data behind the request and helping turn that into meaningful commands. Its accuracy and the speed at which it is obtained is the real gamechanger.

TK: What do AV integrators and their clients need to consider when deciding where and how to implement speech recognition? Are there any use cases where it makes more sense than others? For example, is it a better fit for the workplace because employees can be alerted that speech is now an option? Or is it an equally good fit for public places such as airports and hospitals, where people might not expect to be able to talk with wayfinding signage?

JC: The proliferation of voice control in our homes means users are increasingly comfortable with using it in all kinds of environments. ‘Cortana start my meeting’ will become commonplace within Teams integrated meeting spaces, but equally, it will speed up interactions with all kinds of digital signage. For example, users will be able to ask wayfinding displays for a direction rather than navigating touchscreens and virtual maps. It could have a place in the evolution of the retail experience with shopping becoming more personalised and concierge. Imagine ordering a different size to be brought to you in your changing room via voice control.

The three major areas of consideration for the implementation of voice are security, transparency and integration. How will the users expect their data to be stored and used and what implication does that have on the data security of the organisation? Are you being clear about how data will be used? Can you access the up-to-date information required to make the system work correctly?

TK: Natural language processing technology keeps getting more sophisticated. How does that enable a wider range of use cases? For example, is speech recognition now able to accommodate a wide range of accents and non-standard words/phrases/slang?

JC: Natural language processing is rapidly improving but it’s not quite there yet. For example, there are sometimes a huge number of errors in your average meeting transcript. In this application, this isn’t necessarily a problem as it’s been proven that we naturally fill in the gaps while reading. However, it is a problem in business-critical applications where a small error in a transcript or command could be a huge issue.

Therefore while accuracy is still improving, integrators need to provide non-voice activated options in every case to ensure accessibility and the application should be carefully considered depending on how the misinterpretation of a command may affect the user experience of your service or brand.

TK: Gartner predicts that “by 2025, 75% of conversations at work will be recorded and analysed, enabling the discovery of added organizational value or risk.” Are there opportunities to use speech recognition systems for that kind of analysis? If so, how can organizations balance that with security/privacy? Maybe one concern is a system in a conference room constantly listening for wake words and getting hacked so it eavesdrops.

JC: How data is gathered and stored is absolutely critical in the deployment of voice recognition in businesses. Transparency with the people participating in a voice recorded conversation will be one area of consideration, data security another.

A robust security policy and infrastructure will be vital to AV integration projects as voice becomes integral to our devices. From an organisation point of view, trust in the system by the workforce will be critical in ensuring it doesn’t impact on creativity, collaboration and productivity.

However, the opportunities are really exciting. Organisations will be able to more accurately predict and evidence trends, personalise experiences and streamline administrative tasks. In the short term, it will reduce friction between people and technology, starting meetings more quickly, creating a more human interaction for digital signage than has ever been achieved.