Within the tech security community, Zoom got called out today for a potential security vulnerability for the usage of the Mac...
Krisp – AI for voice quality improvements
A simple 'helper' application which works to improve the quality of your existing Internet communications application.Chris Koehncke
Applying AI models to real-time media is hard but there are promising developments. Months ago, I had experimented with Krisp.ai, a product offering from 2Hz. A SF based start-up, Krisp.ai, is looking to license its AI driven audio SDK to communication applications.
The Krisp SDK promise is to “on the fly” improve audio quality by removing background noises and upscaling the existing audio to HD. Extra benefits are the ability to handle packet loss and dynamically adjust the audio levels. This article focuses on background noise elimination.
2Hz using AI has built their own DNN, they’re calling krispnet, built on a training of 10k speakers, 20k different background noises and 2.5k hours of audio. Is that enough? In the world of AI more training data is better, but a good start.
The Krisp.ai application (available now for Mac and Windows) uses this krispnet SDK/. Krisp creates a new sound input/output device on your computer. You manually configure each comms application to send your audio first to Krisp. Setup is straightforward.
Krisp offers a free full trial of the application. After 14 days though the application will only work for the ‘speaker’ (inbound audio) and NOT the microphone element.
I set about to ‘test’ Krisp in my open office environment with heaps of noisy co-workers (sadly typical for many SF tech offices these days). My tests are not scientific test but real enough to form an opinion of the Krisp app.
Let’s start with a baseline. Here is a brief unmodified recording I did (using Audacity) on a MacBook Pro using the internal microphone at 44100Hz (high quality).
From a previous posting on microphones, the Mac has a great internal mic. In the baseline sample, my voice is clear and high quality. However, the background noise is similarly HD and competes with my voice.
Below is a 44100 Hz sample with Krisp ENABLED.
The background noise is mostly gone in the Krisp enabled example. Negatively, Krisp has slightly altered my voice and introduced some new artifacts. The artifacts give my voice a more mechanical sound and somewhat less natural. Nonetheless, it’s an improvement over the standard (opinion). The Krisp app did not add any noticeable uptick in CPU utilization.
Let’s step the quality down a bit now. Below are two additional samples at only 8kz sampling rate, a more likely an Internet communications app. Krisp did an even better job here.
At the lower sampling rates AND with background noise, Krisp shines here as you can hear, there is nearly no background noise. While audio quality in both sample isn’t HD, with the Krisp enabled example you don’t strain to pick out my voice (as all the background has been eliminated).
Note: Krisp purports to work on the inbound audio, however, I was unable to notice any remarkable difference when enabled. This raises the question of whether Krisp is of any value past the trial period unless you upgrade to a paid version. Krisp worked well with Zoom in my testing.
The paid version of Krisp is $20 per month which seems high for a ‘feature’ type product ($120 a year if you pay at once). Krisp is keen to get trial use and offers a referral link which extends your free usage period for 2 months. You can use mine https://ref.krisp.ai/u/u456100422
AI for in real time media process is still new. The Krisp app is promising with the microphone element working well for those of us in noisy environments. Krisp is usable today and with additional model enhancements likely to improve in the future.