The conversion of a analog audio signal or voltage into a digital representation is known as quantization. The continuous, real-world audio signal, representable as a smooth waveform with positive and negative pressure levels, is recorded in a series of periodic snapshots known as “samples”. The rate at which these amplitude snapshots occurs is called the “sampling rate”. Each sample is, like a frame of video, a picture of the signal at that moment. Specifically, it is a picture of its amplitude. That, in the end, is all the recording system cares about: “what is the amplitude?”. The succession of these amplitude measurements (“samples”, shown below as dotted lines) results in a digital approximation of the original audio signal.
Digital Audio: sampling rate basics
The frequencies and notes we hear in a recorded piece of music are merely the result of these changing amplitudes over time.
The difference between the actual incoming audio signal (grey line) and the quantized digital signal (red line) is called the “quantization error”. The difference looks terrible at the moment, but we’ll get back the original smooth signal a little later on.
For CD-quality sound the rate is 44,100 samples per second, sometime written as 44.1k (kiloHertz). This sampling rate is one of many but, as part of the original CD-quality standard, it is certainly the most commonly used, even today when CD’s are all but obsolete. The reason for this number and not something higher or lower is a compromise between two things: 1) the desire to have enough resolution to record all of the sounds humans care about and 2) the need to keep file sizes small enough to fit on a standard CD. Raw audio at 44.1k (16bits) uses around 10 MB per minute, and since a CD disk can hold around 750 MB, this leaves room for 75 minutes of music, enough to store a standard double-sided album of music.
But is 44.1k enough? As we discussed in class, this question demands we know a little about human hearing. The range of our hearing includes frequencies up to around 20,000 Hertz–some of us less, depending on age and/or the number of really loud, hearing-destroying concerts we’ve attended. Whatever sampling rate we choose, the system must take samples fast enough to represent signals we humans care about. Since every cycle of a waveform has both a positive and negative pressure, a crest and a trough, top and a bottom, we must dedicate a minimum of two samples for each cycle of a wave. Therefore, the highest frequency a digital system can represent is half of the sampling rate. This is the so-called “Nyquist frequency”, the highest frequency that a digital conversion can represent given the sampling rate. In the case of 44.1k, the highest frequency we can acurately represent is 22,050 Hertz. According to our initial understanding of human hearing, this frequncy seems to be enough, we can capture frequencies up to 20k and even a little beyond. This is just the beginning of the story, as we’ll review in the next section on bit depth.