The reasons behind bad videocall quality might not be what you think

Features

15/03/2023

Video collaboration usage keeps growing. So does the need for insights into what users are really experiencing. Tim Kridel investigates why that’s easier said than done.

Microsoft Teams notched 270 million users at the end of 2022, up from 145 million in 2021. That soaring adoption is just one example why the global video collaboration services market is on track to have a compound annual growth rate (CAGR) of 12.5% through 2030, according to Grand View Research.

Those numbers highlight some challenges and opportunities. The more that people rely on Teams, Zoom and other video collaboration services for work, school and telehealth, the more important it is to ensure that they have a consistently great user experience. If they don’t, two things can happen.

First, they might stop using the service that their employer pays for, thus undermining its ROI. That can also mean less business for its vendors and integrator if the company scales back spending on rooms and other infrastructure.

Employees also might start using another service — one their employer hasn’t vetted for security or cost. That can lead to privacy breaches, such as unauthorised people eavesdropping on confidential meetings. This scenario has been playing out for years with cloud storage, where employees get frustrated by the company-provided application and start using their personal Dropbox or iCloud.

The good news is that these challenges create opportunities for vendors and integrators to provide clients with tools and managed services for troubleshooting and optimising user experiences. These go beyond traditional IT tools, which focus on network performance, and beyond the tools built into video collaboration platforms.

“We've got video collaboration platforms and the various analytics and data that come from those,” says Faye Bennett, whose eponymous consulting firm advises vendors, integrators and end users. “Then we've got IT monitoring platforms, which have been used for donkey’s years. Now we've got this new wave of AV monitoring and management solutions.”

Knowing where to look

One drawback to relying solely on traditional IT tools is that they don’t detect many of the conditions that undermine user experiences with video collaboration. Based on how they evaluate a network’s performance, everything could appear fine in terms of bandwidth, jitter, packet loss and latency — even though the videoconferences are plagued with tiling, freezing and other gremlins.

Part of the reason is that they’re not designed to capture information about endpoint aspects. So if attendees complain that the audio keeps dropping out, network key performance indicators (KPIs) such as packet loss will all look fine because the culprit is actually a mic that’s configured incorrectly or in the wrong place.

Another reason is that audio and video have much different network requirements than email, database queries and other types of enterprise traffic. So although AV has spent the past decade migrating off bespoke networks and onto corporate LANs and WANs, IT tools haven’t fully evolved to accommodate many of AV’s unique requirements. Meanwhile, the tools built into video collaboration platforms also may not paint a complete picture of user experiences because, for example, they don’t have access to network diagnostic data.

Yet another challenge is that most meetings span multiple networks, such as the corporate LAN out to a service provider’s network to a peering point and then on to other service provider networks. Getting visibility into each of those network hops isn’t always possible, so there are blind spots.

“The main thing missing from all of these communication tools like Zoom and Teams is the ability to understand what's going on in the infrastructure: whether there is an impact to the service that is happening within the infrastructure or whether it's actually happening at the source point,” says Hakan Emregul, Accedian director of solutions engineering and strategic partnerships. “What happens when I start having problems in my network? How do they identify that? The major gap in all of these is the inability to see what's really happening in the localised infrastructure versus the end-to-end service connectivity.”

Some vendors recognise that gap and are closing it, such as Accedian, which makes network management and troubleshooting tools for enterprises and telecom operators. In a recent blog post, the company explained the nuances that differentiate quality of experience (QoE) from quality of service (QoS):

“QoE focuses not on the efficiency of the data transported over the network, but on the information within the data sent. This requires a greater scrutiny of network imperfections that may be negligible to certain applications, but will greatly affect an end user’s satisfaction in others. For example, in a cloud-based CRM system, a 5% packet loss is nothing — but on a VOIP call, even a 0.5% packet loss can reduce data throughput by over 30%. This can result in a pretty bad experience for the users on the call.”

Service-level agreements (SLAs) ensure that at least some of the service provider networks are meeting QoS requirements. Enterprises also can make changes in their networks to help rule out some problems when troubleshooting.

“For example, in Teams you can set differentiated services code point (DSCP) markers for QoS, which basically separates out things like audio, video and any content sharing,” says Don Lambresa, Project Audio Visual CEO. “That needs to be taken into consideration because that can increase the quality of the meeting. We also find that a lot of people need to get their network ready in terms of the firewall side of things.”

Project Audio Visual uses Crestron’s XiO Cloud platform to ferret out room and endpoint problems that network-focused tools can miss.

“That's been around for quite a while, and it's probably the most comprehensive so far,” Lambresa says. “It allows you to do things like motorcycle into the room and see the touch panel. So in a Teams environment, for example, we could see whether the panel was logged out or whether the panel had been activated correctly in terms of what the client was pushing button-wise to start to start that feature.

“Then we also have parts showing some of the statistics within the Teams admin console. So we could dig right down into some of the calls to see that the problem was coming from somebody remotely because their network connection wasn't great. We can feed that back to the client to say: ‘You've had reports of issues within their meeting. We can see that audio and video quality is poor incoming from an external participant,' or 'We can see it's the room that's causing the problem.'"

Distributed workforces create additional factors

With the rise of remote work, employees often will assume that their home network is the culprit, such as because their kids have it clogged up with streaming and multiplayer gaming, or because their ISP is bogged down. It turns out that bandwidth often isn’t the bottleneck.

“If you measure the throughput required by a Zoom call, it's incredibly small,” says Scott Sumner, CMO of Kadiska, whose platform helps monitor QoE in hybrid environments. “You could be sharing with many people, full-screen HD, and you're looking at like 350 kbps. The compression technology and the transcoding are absolutely incredible. So the chance of it being a bandwidth issue is very small.”

Diagrams from the Kadiska dashboard showing loss (left) and latency (right) within the Microsoft cloud network, and where they originate from.

A bigger factor is how the sessions wind their way through the internet.

“It's more about the number of hops, the latency, the packet loss and getting to those critical media servers that are handling the streaming calls,” Sumner says.

The QoE impact varies somewhat by service. For example, as Sumner’s colleague explained in a blog post, in a Teams meeting, everyone connects to the same Microsoft media server, which is chosen based on its proximity to the first attendee. That can lead to scenarios where people from Australia and Europe are all connected to a US server.

“This will obviously add network latency and can lead to poor voice and video quality for these US-based users,” the blog post said. “Microsoft recommends to keep the network latency below 100 milliseconds to guarantee proper streaming traffic performance. To achieve this, Microsoft recommends to connect as quick as possible to its backbone. So the ISP you are connected to should ideally peer directly to the Microsoft network.”

An obvious challenge is that some remote employees might not have multiple ISPs to choose from or that none of them have peering agreements with Microsoft. This could be a business opportunity for AV vendors or integrators to create a portfolio of partner ISPs capable of meeting peering requirements, SLAs and other aspects that affect QoE.

Problems can be opportunities

Some integrators are already turning the QoE challenge into a business opportunity by developing their own management and troubleshooting platforms. One example is AVI-SPL’s Symphony, which ingests data from multiple vendors’ devices and management tools to provide a holistic view of QoE. This helps overcome challenges related to correlating data from multiple, siloed tools to piece together a picture of QoE. Swivel chairing and manual analysis takes time, prolonging the time that users have to spend enduring a poor experience. Symphony also can be configured to automate tasks, which frees staff to focus on more complex things.

“Teams does not prioritise video over audio until you hit nine participants,” says Laurie Berg, AVI-SPL vice president of product operations for Symphony. “We do some of our monitoring with a bot, and if there are not nine people in that call, that bot will show up and [keep] the audio calls from taking up a square. That's still real estate on my screen, making it difficult for me to see what I need to see.”

QoE tools such as Symphony can help not only resolve problems faster, but also provide valuable historical data. That can be used to develop continual-improvement strategies to avoid QoE problems and enable informed business decisions. For example, they can make it easier to identify which device models have higher-than-average problems and thus should be avoided when adding or refreshing rooms.

“A customer of ours was having issues with a particular technology,” Berg says. “[They could just] deal with the ticket as a siloed event, rebooting the device and fixing the issue. But they started looking at the tickets in a historical standpoint: ‘Why is it that this model is causing this across the board for the last month?’ That type of analytics is how you make decisions 6, 12, 18 months down the line.”

Finally, integrators could use experience-monitoring platforms such as Accedian’s Skylight to create managed services where they remotely monitor and troubleshoot their clients’ video collaboration services. Or they could provide portals where clients can access QoE dashboards to see for themselves.

“They could give their customers a basic performance portal so they could see their general [status],” Emregul says. “They also can say, ‘If you want more information to be able to troubleshoot, I can provide that for a premium.”

Image credits: Tada Images/Shutterstock.com | Andrey_Popov\Shutterstock.com