Audio watermarks offer a firm assurance that the materials can be traced back to their source, regardless of how they were redistributed. Not only that, embedding metadata in the sound in a way that is imperceptible to the listener, but still easily detectable to a computer is important for various other reasons.
You see, in this digital world we live in, content often gets redistributed, reused, reformatted, transcoded and so on. All of that makes it hard for traditional metadata to survive, so we’ve had to find new, ingenious ways to create a robust watermark that can survive all of these changes so that we can always trace the content back to the original owner or the creator.
On that note, we’ve decided to introduce you to audio watermarking, explain a few things about it, draw some comparisons and talk about where and why one might want to use it.
1. What’s Audio Watermarking?
We’ll try to explain it without getting too technical. Audio watermarking is a process of embedding a unique sound pattern, imperceptible to the human ear, but easily detectable and identifiable by a computer. This pattern is embedded in the audio file itself, meaning it’s part of an audio signal, not the digital file itself. That means that if someone was to compress, let’s say a FLAC file, into an MP3, you wouldn’t lose the digital watermark in the compression process.
You might wonder, how does one embed a sound within an audio track, but at the same time, makes it undetectable by a human ear, yet easily picked up by a regular microphone and distinguishes by software? Well, to answer that question, we have to dig in a little deeper and go through a brief history of audio watermarking and see how it evolved over time.
2. What’s Ultrasound Watermarking?
As you might’ve guessed when we’ve asked, the one way to hide the sound within an audio track, without it being heard by the listener, is to hide in the frequencies the human ear can’t pick up on. In most instances, a human can detect sound in the range of 20Hz and 20kHz, meaning any sound below or above it won’t be heard. In most cases, anything quiet and above 16kHz is not picked up by the human ear, therefore, that audio spectrum is called the ultrasound range.
Because of that, watermarks, in the early days, we’re placed in the ultrasound range. Most microphones could pick up on these sounds, we couldn’t hear them and all was well. However, the problem with it is that when you compress an audio file, you do it by removing the “excess” audio spectrum, which meant that this type of watermarking wasn’t robust enough, as it was easily removed without affecting the audio quality.
3. What’s Spread Spectrum Watermarking?
Spread spectrum watermarking, as you can probably tell by the name, is watermarking over a broad spectrum of frequencies. Since the ultrasound watermarking was too narrow and easily removed, finding a way to hide the watermark over the whole spectrum. As you might guess, this meant that placement had to be much better, otherwise the listeners would easily pick up on it and that’s unacceptable.
This was avoided by the use of low amplitude noise over the full spectrum and the general idea was that if the original track was complex enough, listeners wouldn’t hear the low-level noise. However, some did hear it. Additionally, low-level noise is particularly sensitive to pitch shifting, more precisely to the Doppler Effect, which rendered this method somewhat useless in day-to-day applications.
4. What’s Echo-Modulation?
Echo is a reflection of the sound and what you may not know is that every sound echoes off of every object or surface, we just don’t hear it. We’ve evolved in such a way that we don’t hear the short echoes – only the long ones. That’s why we can hear the echo in a large, empty room or from the bottom of the well – because the sound only reflects off of one, far away from the surface.
Hiding the audio watermark in short echoes is a relatively new technique, used only by Intrasonics at the moment. It involves complex calculations of natural, short echoes of the original sound and hiding the metadata or a watermark in those echoes. Unlike ultrasound or spread spectrum, this method of implementing watermarks is a lot less sensitive to pitch shifts or the Doppler Effect, so it’s a lot more suited to day-to-day applications. This could also be the solution to the second screen problem, but we’ll get to that in just a second.
5. What Are Day-To-Day Applications of Audio Watermarking?
The first and the most obvious day-to-day application is embedding the owner and creator metadata within the audio track. What this essentially means is, embedding the song’s identity into the track itself so that the source and the creator can be easily identified if there’s a need for it. This is mostly done to stop piracy. By identifying the embedded digital audio watermark we can easily detect whether the audio file is pirated or obtained from a legal source.
Additionally, this is also what stops people from using the copyrighted material in their content, for instance, using a copyrighted audio track in your YouTube video. An algorithm would pick up on the watermark which would result in either a copyright strike from YouTube or financial compensation of the usage of original material.
Audio watermarking is also used for measuring audiences. Watermarking technology can distinguish between different versions of the same content, in the same way, it can differentiate the original from a pirated copy. With the use of watermarks, it’s easily distinguishable whether the content is enjoyed through regular TV, on-demand TV or streaming, as all of them have different watermarks. The same thing applies for commercials, as watermarking technology can distinguish between two of the same commercials being run on two different platforms or channels, which makes marketing easier etc.
We’ve mentioned earlier something called the “second screen problem”. This is arguably the biggest problem, other than creating the perfect audio watermark, this industry is facing. The second screen problem is when noise or other outside sounds interfere with the watermark identification. For instance, if a watermarked content is being streamed in a noisy environment, that nose could interfere with the microphones ability to pick up on the watermark which would render it undetectable.
Conclusion
As you can see, the field of digital audio watermarking is very exciting and complex. It’s far from perfect, but it is constantly evolving and we can only wait and see what the future has in store for us and how this technology will change our lives for the better.