Digital Audio Sampling

On the video side, we deal with image sizes, aspect ratios, zoom settings, f-stops and frame rates. The audio side has its numbers too, but they’re a bit easier to wrap your head around.

Isn’t All Audio Digital Today?

Back in olden times, audio was recorded to analog tape. For the end user, this wasn’t much more difficult than our digital recorders today, but there was much more going on behind the scenes. For example, back in the day, a typical recording studio hired an engineer to maintain the audio equipment. Tape heads had to be aligned, recording bias adjusted and levels calibrated before every session. If the studio or producer changed the brand or grade of recording tape, a whole new group of settings was necessary. Then there were noise reduction systems that introduced their own calibrations and artifacts into the mix. To top it all off, the recording engineer had to know how to record to tape a certain way so it would sound correct on playback. This pain and suffering still goes on today – especially in high-end studios where they want that elusive “analog sound” on their modern recordings.

Kinda makes our job look simple, doesn’t it? For a simple digital audio recording, we plug an audio interface into our computer, launch a recording application, check the meters and hit Record. Basic level adjustments are all we have to deal with until post production. When recording audio with your camcorder, it’s even simpler. Just plug in a mic and hit Record – the camera does the rest for you. Of course, as with analog, there’s a lot going on under the surface in the digital world too, but the microprocessors are in charge. We just let them do their thing.
As audio enters a digital recorder or camcorder, the audio is digitized and turned into digital “words” that are copied to tape, hard disk or memory card. On playback, the digital stream is decoded and turned back into an analog signal that plays through your speakers or headphones. Simple, right? Well…

Digital Audio Sampling Rates

As the audio is digitized, it takes on two specific characteristics: sampling rate and bit depth. Sampling rate is how many times per second audio is sampled to convert it into a digital file. The more samples per second, the better the quality. The actual number of samples required to create a theoretically perfect copy are ruled by the Nyquist-Shannon sampling theorem. The math inside this little gem could make your head spin, so we’ll paraphrase: the sampling rate must be at least twice the frequency recorded. For example, if you’re recording percussion, cymbals and the like, recorded frequencies and their harmonics could easily reach 20,000Hz. This means the sampling rate required to recreate an accurate copy is 40,000Hz. If you’ve paid attention, you know that the CD audio specification requires a 44,100Hz sampling rate and DV video uses a 48,000Hz sampling rate. This leaves plenty of wiggle room at the top end of the scale. Math nerds, feel free to look all this up on Wikipedia. That should keep you busy for a while.

The DV audio specification actually calls for two sampling rates: 48kHz and 32kHz. The 48kHz (usually referred to as 16-bit) setting is the standard for two-channel or stereo recording. The 32kHz (usually referred to as 12-bit) version can record stereo as well, but it is also possible to record four simultaneous channels of audio on certain cameras. The Canon XL2 is the first camera that comes to mind in this category. A 32kHz sampling rate means the top of the recorded audio spectrum is 16,000Hz, which is fine for dialog and other duties. If 44,100Hz and higher sampling rates are referred to as CD quality, the 32,000Hz setting would be FM radio quality. It’s fine for casual use, but not recommended for critical applications. Oddly, many camcorders default to the 32kHz audio setting although they cannot actually record four channels.

Bit Depth

Bit depth is more complicated to explain, so we’ll start with an analogy. In digital imaging, there are three common bit depths: 8-bit, 16-bit and 24-bit. An 8-bit image has only 256 colors available. While you might never notice these limits in a cartoon image, an 8-bit sunset would look pretty strange. Since the colors are limited, you will see a great deal of banding in the gradient from the horizon up to the sky. A 16-bit image has a little more range – 65,536 colors to be exact. This provides enough options to make a convincing image, especially on small screens like your cell phone or a handheld game. But check your digital sunset, and you’ll still see some banding, just not as drastic as the 8-bit version. In digital imaging, 24-bit is often referred to as True color, since it offers 16,777,216 color possibilities. A 24-bit image covers the majority of colors seen by the human eye. Digital cameras – both still and video – produce 24-bit images, and we’re all quite pleased with the results.

Staying with the digital photo analogy, let’s think about audio. Each digital audio sample is essentially a snapshot of the audio at that moment in time. You can imagine that an 8-bit sample – with only 256 audio “colors” – might be a little grainy, and you’d be right. On the other hand, a 16-bit sample has a great deal more range and produces a very reasonable version of the audio. That’s why it’s the standard bit depth for DV, DVD and audio CD. 24-bit recordings have the benefit of over 16.7 million audio “colors” per sample. While you won’t use them on standard DVDs or music CDs, the Blu-ray Disc format supports 24-bit audio in a variety of formats.

Next, take the bit depth and multiply it by the sampling rate. Now you have an idea of the quality potential for your recording. A typical DV or DVD audio soundtrack has a bit depth of 16 and a sampling rate of 48kHz. So 48,000 times a second, the audio is digitized with a 16-bit depth. This results in a very clean recording that accurately reproduces the original source.


But in the End…

Ultimately, you won’t have to worry about sampling rates or bit depth very much. Your equipment will take care of most of it for you automatically. However, now that you know what the numbers mean, it will be easy for you to do a quick visual check whenever you shoot video or record audio. If the camcorder says it’s recording at 32kHz, change it before the shoot. Setting up a voiceover session is easier if you know to use the 16-bit, 48kHz settings on your audio interface. Plus, you can always post your Facebook status saying you’re pondering the Nyquist Theorem, and your friends will think you’re really smart.

SIDEBAR

In the Studio

In professional recording studios, they commonly record audio at 96/24. This means a 24-bit depth at 96kHz sampling rate. Why so high? The 24-bit depth allows amazing dynamic range and “color” possibilities. Engineers and producers like the 96kHz sampling rate because it gives the recording more breathing room or “air.” Of course, the final product is resampled to 16-bit at 44,100Hz for CD releases. You won’t hear the difference on your iPod, but, in the studio, the improvement is obvious. This makes better original recordings and, hopefully, a level of quality that will stand the test of time.

Contributing Editor Hal Robertson is a digital media producer and technology consultant.