A team of researchers from British universities has developed a new deep learning model that can steal data from keyboard keystrokes recorded using a microphone with an accuracy of 95%. This is a significant threat to data security, as it has the potential to leak sensitive information such as passwords, discussions, messages, or other confidential information to malicious third parties.
The Attack Model
The first step of the attack is to record keystrokes on the target’s keyboard, as that data is required for training the prediction algorithm. This can be achieved via a nearby microphone or the target’s phone that might have been infected by malware that has access to its microphone. Alternatively, keystrokes can be recorded through a Zoom call where a rogue meeting participant makes correlations between messages typed by the target and their sound recording.
The researchers gathered training data by pressing 36 keys on a modern MacBook Pro 25 times each and recording the sound produced by each press. Then, they produced waveforms and spectrograms from the recordings that visualize identifiable differences for each key and performed specific data processing steps to augment the signals that can be used for identifying keystrokes.

CoAtNet Classifier
The spectrogram images were used to train ‘CoAtNet,’ which is an image classifier, while the process required some experimentation with epoch, learning rate, and data splitting parameters until the best prediction accuracy results could be achieved.

In their experiments, the researchers used the same laptop, whose keyboard has been used in all Apple laptops for the past two years, an iPhone 13 mini placed 17cm away from the target, and Zoom. The CoANet classifier achieved 95% accuracy from the smartphone recordings and 93% from those captured through Zoom. Skype produced a lower but still usable 91.7% accuracy.
Defense Measures
For users who are overly worried about acoustic side-channel attacks, the paper suggests that they may try altering typing styles or using randomized passwords. Other potential defense measures include using software to reproduce keystroke sounds, white noise, or software-based keystroke audio filters.
Remember, the attack model proved highly effective even against a very silent keyboard, so adding sound dampeners on mechanical keyboards or switching to membrane-based keyboards is unlikely to help. Ultimately, employing biometric authentication where feasible, and utilizing password managers to circumvent the need to input sensitive information manually, also serve as mitigating factors.
Conclusion
Acoustic attacks have become much simpler due to the abundance of microphone-bearing devices that can achieve high-quality audio captures. This, combined with the rapid advancements in machine learning, makes sound-based side-channel attacks feasible and a lot more dangerous than previously anticipated. Therefore, it is crucial to take necessary precautions to protect sensitive information from being stolen by attackers.