Hidden audio commands can be used to take control of popular voice-activated devices, researchers say
Concealed voice commands can be used to take control of smartwatches and smartphones, according to researchers, who are keen to highlight the security risks inherent in increasingly popular voice-only devices such as Amazon Echo, the Apple Watch and Android Wear.
A group of researchers from the University of California, Berkeley and Georgetown University demonstrated that malicious commands can be disguised so as to go unnoticed by humans, but can still be recognised by voice-activated devices, which could carry out their instructions.
The researchers demonstrated the use of two types of commands – one is distorted but still understandable by humans, sounding like the voice of a Dalek from Doctor Who, while the other is more heavily garbled and only works with devices whose voice systems are known in detail to the attacker.
The first type of command was verified to work with most currently available voice-activated devices, such as those from Apple or Google, the researchers said.
It could be emitted within earshot of the device in a public area or concealed in a popular online video in such a way as to go unnoticed by the user, they said.
“Depending upon the device, attacks could lead to information leakage (e.g., posting the user’s location on Twitter), cause denial of service (e.g., activating airplane mode), or serve as a stepping stone for further attacks (e.g., opening a web page hosting drive-by malware),” they wrote in a paper.
Broad range of targets
Previous research has shown that hidden voice commands can be made in such a way as to be recognised by devices while remaining unnoticed by humans, but the new paper is the first to demonstrate that such commands can be constructed even with very little knowledge about the target speech recognition system, the researchers said.
“Our attacks demonstrate that these attacks are possible against currently-deployed systems, and that when knowledge of the speech recognition model is assumed more sophisticated attacks are possible which become much more difficult for humans to understand,” they wrote.
They said the attacks work well against Google Now’s speech recognition system, while Apple’s Siri seemed to be more selective about recognising distorted speech.
The researchers demonstrated a screening system that they said recognised nearly 70 percent of the hidden voice commands as being malicious.
Users can protect themselves by setting up devices to require a fingerprint or password before accepting voice commands, but such protections make voice activation significantly more difficult to use, they said.
“Active defenses, such as audio CAPTCHAs, have the advantage that they require users to affirm voice commands before they become effected,” they wrote. “Unfortunately, active defenses also incur large usability costs, and the current generation of audio-based reverse Turing tests seem easily defeatable.”
They said filters that slightly degrade audio recognition quality were more promising, and “can be tuned to permit normal audio while effectively eliminating hidden voice commands”.
The research is to be presented at the Usenix Security Symposium in August.
Are you a security pro? Try our quiz!