Listening to Keystrokes Recorded by a nearby Phone with 95% Accuracy


As deep learning continues to advance and microphones become more ubiquitous, along with the growing popularity of online services through personal devices, the potential for acoustic side-channel attacks to impact keyboards is on the rise.

A team of researchers from the UK have trained an AI model that steals data from the system. The model has shown a significant accuracy of 95%. Further, when they demonstrated this deep learning model on a Zoom call, they noted an accuracy of 93%.

The researchers discovered that wireless keyboards emit detectable and interpretable electromagnetic (EM) signals through their studies. However, a more widespread emission, which is abundant and simpler to identify, comes in keystroke sounds. Therefore, they used keystroke acoustics for their research. Further, the researchers studied the keystroke acoustics on laptops since laptops are more transportable than desktop computers and, therefore, more available in public areas where keyboard acoustics may be overheard. Also, Laptops are non-modular, which implies that identical laptop models will come equipped with the same type of keyboard, leading to similar keyboard signals being emitted.

This study introduced self-attention transformer layers in the context of attacking keyboards for the first time. The effectiveness of their newly developed attack was then assessed in real-world scenarios. Specifically, they tested the attack on laptop keyboards in the same room as the attacker’s microphone (using a mobile device). Also, they evaluated the attack on laptop keystrokes during a Zoom call.

In the setup process, the team employed an iPhone microphone and trained the AI using keystrokes. This surprisingly straightforward approach highlights the potential ease with which passwords and classified data could be compromised, even without specialized equipment.

A MacBook Pro and an iPhone 13 mini were used for the experimentation. The iPhone was positioned 17cm away from the laptop on a folded micro-fiber cloth to minimize desk vibrations. To capture keystrokes, the researchers leveraged the built-in recording function of the Zoom call software. On the second laptop dataset, which they referred to as the ‘Zoom-recorded data,’ they captured keystrokes by using the built-in feature of the Zoom video-conferencing application. 

The results that the researchers got were impressive. They found out that when trained on keystrokes recorded by a nearby phone, the model achieves an accuracy of 95%. Further, the model showed an accuracy of 93% when trained on keystrokes recorded using the video-conferencing software Zoom. The researchers emphasize that their results prove the practicality of side-channel attacks via off-the-shelf equipment and algorithms.

In the future, the researchers are looking to develop more robust techniques to extract individual keystrokes from a single recording. This is crucial because all ASCA methods rely on accurately isolating keystrokes for proper classification. Also, using smart speakers to record keystrokes for classification can be used, as these devices remain always-on and are present in many homes.