On April 9, 2019, VoiceFirst Events hosted The Voice of the Car Summit in San Francisco. Voice is an integral part of the CloudCar strategy, so I attended to learn about the latest trends and discover new technologies that could help expand our infotainment platform.
The agenda offered up a rapid-fire series of short talks by industry experts and technology suppliers about the increasing role of voice in the vehicle, and how the complexity of driving and expectations of users are shaping both the technology and the industry. Noticeably absent from the speaker lineup were any representatives from auto OEMs, with Harman being the only Tier-1 to speak. Also missing was any substantive discussion on potential impacts coming from autonomous vehicles or mobility. Regardless, there was plenty to talk about and the agenda was packed. I’ll cover what I felt were a few of the highlights and major themes of the day.
Keynote speaker Bret Kinsella from Voicebot.ai kicked off the event with an overview of voice trends and market data. According to Kinsella, nearly twice as many U.S. adults have used a voice assistant in the car than they have through a smart speaker. That’s a surprising stat considering the prevalence of voice assisted devices in the home. However, he points out that voice started in the vehicle back in 2004, years before it arrived in the home, giving it a much longer time to build up a user base. User access to voice assistants in the car is split almost equally across three primary sources: the one supplied by the OEM, Bluetooth-enabled handsets and the general-purpose ones delivered through either Apple CarPlay or Android Auto.
It’s not surprising then that users are increasingly expecting voice to be a feature in their cars. 60 percent of new car buyers today consider the in-car voice assistant an influencing factor in their purchase decisions. These days, voice is used primarily to control car features, access navigation or initiate phone calls. Kinsella points out that voice enables much than just these activities, making the market ripe for new in-vehicle applications and use cases. That was a great lead-in for a number of presentations that showcased applications of voice technology in new and creative ways for active drivers. Below is a wrap up of interesting voice applications.
Voice technology apps in the industry
Have you ever been bored of one-way audio programming while on a long trip? Niko Vuori at Drivetime creates voice-based, interactive live games that engage drivers as they compete against other players. It not only addresses boredom, but Vuori points out it can actually improve driver safety. When delivered at a pace consumable by drivers it has been shown to increase alertness similar to the way a conversation with a passenger can help keep a driver awake. Further, interacting through voice reduces the tendency to engage in other more distracting activities such as fumbling with a smart phone. It makes sense.
Ashley-Marie Cashion of what3words is trying to transform the number one use for voice in the car – navigation. Today’s paradigm requires a user to speak the long mailing address which as we all know can easily be misinterpreted by the ASR. They have come up with a novel alternative approach that divides the entire world into a grid of three square-meter tiles. Each tile is assigned a random three-word label that is easy for a voice-operated navigation system to understand, and more accurate to pinpoint as a destination. Think about being able to locate the parking garage entrance rather than just the block its address is associated with. Or navigating to a location that has no address.
It’s an idea the is clearly catching on, and Cashion claims it has already been adopted by over 1,000 companies, including map providers, voice assistants and auto OEMs. Soon, you too may be asking your car to direct you to “banana-monkey-cage.“
Just adding voice to an application or a vehicle doesn’t guarantee a good user experience. How voice is implemented is critical for making users actually want to use it. A lot of research and development has been done over the years to improve how voice assistants engage with humans, and vice-versa. Most of the key voice technology suppliers were at the conference. Each presented their perspectives on how this should be done.
Ariane Walker from Amazon was first up. She backed up her claims that Alexa has broad adoption as a voice platform with numbers: over 100 million Alexa-based devices are in the market today, supporting over 80,000 different skills. Walker then sited a survey of over 5,000 users that showed most want to be able to access the voice technology they are already familiar with everywhere they go. Clearly, Amazon wants to extend Alexa’s existing momentum into the vehicle.
Adam Enfield from Nuance described their goal to make the voice experience as humanlike as possible. Their Talk First technology seeks to achieve this by eliminating the need for a wake word or a push-to-talk button. Their studies show that users prefer this approach, even at the perceived risk of overall privacy knowing that the system is always listening. Nuance is also investigating ways to engage all facets of human communication, including non-verbal cues such as gestures and eye gazes.
Google’s Andrew Ku detailed how they segment the driver’s entire journey into three stages – before, during and after the trip. Before starting, users tend to engage both voice and screen (called intermodal) as they plan their trip. During the drive, user interaction is, or at least should be voice heavy. At the end of the drive and before the user exits the car, Google has found that users tend to be focused more on the screen. According to Andrew, a successful user experience needs to intentionally account for all phases of the journey.
Katie McMahon from SoundHound closed out the day by revealing how their architecture performs both the ASR and NLU functions simultaneously. McMahon claims this approach yields the best results for speed and accuracy. She also demonstrated how it enables them to process complex, multi-parameter search requests. McMahon then showed how you can issue additional requests to filter the results down to a small set of options that would be consumable by a driver. The conversational nature of the requests and the fast results makes for an impressive demo.
What lies ahead with voice-first in the vehicle
At the end of the summit, it is helpful to look for overriding themes to best summarize the information given as a whole. There were many, but two in particular stood out for me.
Voice-first is not just a catch-phrase. It is becoming a priority for an industry that is moving rapidly to make it a reality. Several presentations showed automobile commercials that emphasize the power of voice throughout the ad while refraining from even showing the car until the very end. This shows OEMs are acknowledging that voice command doesn’t just contribute to the driving experience, it defines it.
At CloudCar, voice-first has always been a foundational principal. Our infotainment platform offers OEMs the ability to openly create and deliver rich user experiences enabled through complex, multi-intent commands. The extensible architecture delivers media and information content from multiple domains with the flexibility to add new and unique, voice-enabled services in the future.
A second theme that was clear was the question as to how the industry will strike the balance between competing desires for customized versus general purpose voice assistants. On the one side are consumers who naturally want to have the voice assistant they are already familiar with available in their vehicles. This offers consistency and continuity across their digital devices and eliminates the need to learn new dialogs. It’s a compelling vision that Amazon and Google are ready and willing to evangelize and fulfill.
On the other side are the OEMs with different goals. They want to maintain ownership of their data and provide branded and differentiated user experiences to their customers. Both Nuance and SoundHound enable car manufacturers to achieve these goals and proudly featured that as an advantage in their presentations.
The CloudCar architecture is voice agnostic, able to support Alexa, Google Assistant, SoundHound, Nuance, Cortana, a custom assistant or any hybrid combination. It also handles the challenging task of arbitration that determines which assistant or service to route each command to. Above this sits a machine learning layer that continuously analyzes the user’s behavior to provide a more personalized experience and intelligent, context-aware recommendations.
While voice-first is driving the design of the user experience in the modern vehicle, the CloudCar platform gives OEMs the freedom to craft solutions that are truly customer-first. To learn more about the CloudCar solution, visit cloudcar.com. We will definitely attend this event again next year. Hope to see you there, too.