AI-enabled SoCs are making our smart home devices engaging and more secure
Aug 31, 2020
By Gaurav Arora
Synaptics has always been a pioneer in the area of human machine interface (HMI), starting with the original PC TouchPad. How we interact with our devices and other electronic systems has evolved quite a bit since then, and now consumers are quite comfortable with using integrated audio, vision, and video capabilities, such as voice and gestures, to control and engage with all sorts of products.
This is especially true in the market for home entertainment systems, which are emerging as a hub of sorts that combine elements of traditional TVs, a PC-like platform by which we access the internet, media streaming devices, smart speakers and soundbars, video conference systems, even exercise aids and home security platforms.
The interface to these systems is the most obvious change happening to this category of system. New generation devices use AI-based techniques to embed more intelligence and contextual awareness that enhance the user experience. Set-top boxes that recognize user programming preferences, exercise apps that can interpret body motion, and more natural interactive voice and motion commands are all becoming more available in the ubiquitous device that anchors our living room.
But the interface is only one aspect of this evolution, as consumers demand better performance, response times and privacy from the products that entertain and inform them. Underlying this evolution is the progressive move to edge computing which helps reduce the latency, privacy and security challenges of always having to rely on fetching data or content from the cloud. Edge computing addresses many of these concerns but requires more efficient and security-minded approaches, as well as meeting challenging consumer price points.
At the same time, users also want higher performing (i.e. higher resolution) visual experiences, as we move more broadly to 4K and 8K displays. This not only taxes edge-computing resources, but also puts a strain on traditional content delivery methods, requiring novel scaling and compressions schemes.
With the addition of camera sensors in home devices, users expect smooth tracking of people as they walk around their living room while on a video call, so the people they are conversing with get a more immersive view. To top it off, the social conversation can be made more interactive and engaging by adding Augmented Reality-effects on the video stream. Not to be left behind, users expect better audio quality and multi-channel sound enhancement on products like video soundbars.
Synaptics’ AI-enabled solutions for audio, vision, and video processing are a big part of how device makers are addressing this transformation. In addition to our traditional expertise in HMI, we are applying machine learning and neural networks to enhance the privacy and visual and aural data processing challenges in this new class of systems.
From a security standpoint, the progression of ways we interact with our devices has led to a multitude of security and privacy risks that transform our ‘smart’ devices into windows to our homes for annoying marketers or bad actors. This is largely due to the lack of local computing performance and memory in today’s home devices, requiring an upload of data they collect to the cloud for interpretation. Additional data being collected by more microphones and cameras integrated into the device exacerbates the situation, generating information far too personal to expose outside the home. On top of that without heavy compression — which can erase exactly the subtle cues the intuitive human interface needs — there is not even the upstream bandwidth to push all this data back to the cloud. All of this means the local device must have the capacity to analyze the data at the edge in the device itself, protecting the user and also delivering a more real-time user experience.
In parallel, the overwhelming adoption of 4K Ultra-High Definition displays is causing a crisis in downstream bandwidth as well. Content providers are only offering a small amount of 4K content. And system operators simply don’t have the bandwidth to provide endless high-quality 4K streams, so they have been encoding at 720p or 1080p, rather than at 4K. This produces a decent viewing experience, but fine detail cannot be reproduced when this decoded stream is upscaled to 4K. AI offers a solution. A machine-learning model can classify portions of the image, infer the original appearance and generate appropriate additional pixels that reproduces the fine details.
The converging forces from multi-modal user interfaces, increased privacy & security, and image-enhancement all drive toward the same conclusion: the need for a powerful, flexible, and secure inference acceleration engine in the SoC at the heart of our home entertainment systems.
We take a deeper dive into how our VS680 SoC can address these challenges of new human interfaces, enhanced security, and real-time image enhancement in Electronic Design Magazine. Such a solution is a key to drive the evolution of home entertainment systems such as emerging set-top boxes, streamers and sound bars to become the nerve center for a multi-modal interface between humans and their smart homes.
See the article in Electronic Design here.