Executive Summary on How Digital Audio works

ratherdashing

Kablamminator
I started writing this in the "modellers topped out" thread to address some misunderstandings about how digital audio works, and decided that it deserves it's own thread. Here you go:

--------

Here is the executive summary of how digital audio works:

Think of a sine wave on a chart (if you don't know what a sine wave is, think of an S turned on its side). This is a sound wave in the air, which is what our ears hear. Now imagine if under the curve of the wave, you drew straight vertical lines that went from the bottom of the page to the bottom of the curve. Finally, draw a straight line between the tops of each of the lines. You will notice that the "connect the dots" line is pretty close to, but not exactly the same as, the original line.

This is how sound is represented digitally. The computer has a list of numbers, each representing one of these lines, and when they are played through the D/A converter (which essentially "connects the dots") the result is an audio signal. The lines are referred to as samples. Software that processes audio is essentially just changing the value of those samples.

Digital audio quality is measured using two values: sample rate, and sample size.

The sample rate is the number of samples per second, measured in Hertz. The more samples there are, the closer the "connect the dots" line will be to the real audio wave. This is why higher sample rates are preferable. CD audio has a sample rate of 44.1 kHz, or 44,100 samples per second.

Sample size is exactly that: the size of each sample. As I said, a sample is simply a number; the sample size determines how big that number is allowed to be. The bigger the number, the more precise the sample will be. Imagine you had to spell your name on the top of a desk using building blocks. If you used a small number of really large blocks, it would be difficult to make your name readable. With lots of small blocks, it would look much better. Now apply this analogy to the lines under the sine wave: if the lines were made of only a few big blocks, they would not accurately represent the wave. This would be a small sample size.

One thing that confuses a lot of people is that sample size is measured in bits. For example, CD audio has a 16 bit sample size. This does NOT mean there are only 16 levels of sound possible in a sample. 16 bits refers to the size of the number in binary, which happens to be 65,536. This means that the sample can have over sixty-five thousand possible values.

Combining sample rate and sample size, we can get an idea of just how much data is used to process the audio. Using the CD example again (16 bit 44.1 kHz), we know that there are 44,100 samples, each with up to 65,536 possible values, per second of audio. That's a lot of precision! Yes, it is still not a perfect curve, but it is so close our ears can't tell.

I hope that is a good explanation.
 
Re: Executive Summary on How Digital Audio works

Furthermore, when converted back to analog, the slices are reassembled, then fed through a smoothing filter that resonates at the sample rate, thereby smoothing out the resulting steps.

Where the rot sets in, is with a 44.1k sample rate, a 22.05k frequency wave gets sampled exactly twice. If the 22k wave happens to be in phase with the sampling clock, it gets sampled at the top and bottom of the wave. If it's 90° out of phase, it gets sampled as the wave is crossing 0, and disappears. The in phase wave fares little better, as the output smoothing filter renders it a pure sine.

22k is right at the top of the audible range, and frequencies lower than that suffer as well, ie. 11k gets sampled 4 times per cycle, and it only starts to get decent below 6.5k. So you can see how instruments like cymbals, wind instruments, and distorted guitars that have complex high frequency waveforms suffer.

Now keep in mind air also smooths and rounds off high frequencies, as do speakers and our ears, but still, I think the medium should be as faithful as possible.

----

Also, keep in mind that only a full volume signal gets the benefit of all 16 bits of resolution. A signal at half amplitude gets sliced into 8 bits, and so on, until the subtle background stuff is getting chopped up pretty coarsely.
 
Re: Executive Summary on How Digital Audio works

Interresting reading. Thanks.
 
Re: Executive Summary on How Digital Audio works

22k is right at the top of the audible range, and frequencies lower than that suffer as well, ie. 11k gets sampled 4 times per cycle, and it only starts to get decent below 6.5k. So you can see how instruments like cymbals, wind instruments, and distorted guitars that have complex high frequency waveforms suffer.

This is where the proverbial "can of worms" comes into play. The most significant part of digital audio is dependent on the quality of the analog filters. So, we need to talk about filters for a moment.

Contrary to popular belief, filters add content. They don't take it away. Lets use a good coffee filter for our analogy. A good filter will remove what we don't want, (acid, oils, grinds), that mask what we do want: flavor. So, by adding a filter, we get more of what we want.

Same with digital audio filters. By removing all possible combinations that can't exist in our 20 - 20k frequency responce, we're left with what should be the proper audio signal. Should be . . . thats the trick.
 
Re: Executive Summary on How Digital Audio works

another problem is that you have only accounted for half of the information, namely, the amplitude

time-variant signals have phase information as well as amplitude information ... analog-to-digital converters only 'convert' amplitude information .. when the digital signal is recreated by an digital to analog converter, the phase information is essentially 'invented' with no relationship to the original source signal that was converted and processed

in human hearing, phase information is used to locate sound sources in 3-space ... it is the reason you can tell if something is 'behind and to the left' or 'behind and to the right' .. think of it as sonic spatial perception in a way that is similar to the use of both eyes to have depth perception

not only do a/d converters distort phase information, but they distort it differently based on frequency ...

interesting stuff

t4d
 
Re: Executive Summary on How Digital Audio works

nice job guys! THANKS!
 
Re: Executive Summary on How Digital Audio works

Interesting reading guys!

Only one problem. I've never met an executive who'd read that much - you have to draw them a picture. :)
 
Re: Executive Summary on How Digital Audio works

there are problems in this post, i'll try to get to them one by one

Furthermore, when converted back to analog, the slices are reassembled, then fed through a smoothing filter that resonates at the sample rate, thereby smoothing out the resulting steps.

what does it mean to say a filter resonates at a sample frequency? ... i have never heard that term ... the output of a D/A goes through a low pass filter that removes higher frequency artifacts and passes signals in the audio range

Where the rot sets in, is with a 44.1k sample rate, a 22.05k frequency wave gets sampled exactly twice. If the 22k wave happens to be in phase with the sampling clock, it gets sampled at the top and bottom of the wave. If it's 90° out of phase, it gets sampled as the wave is crossing 0, and disappears. The in phase wave fares little better, as the output smoothing filter renders it a pure sine.

this is inaccurate because it is incomplete ... first, 22KHz in inaudible so anything happening up there is irrelevant ... even if a 22KHz signal is present in the composite input wavefom, only its (much weaker) subharmoinics are audible - and they are sampled with fidelity

22k is right at the top of the audible range, and frequencies lower than that suffer as well, ie. 11k gets sampled 4 times per cycle, and it only starts to get decent below 6.5k. So you can see how instruments like cymbals, wind instruments, and distorted guitars that have complex high frequency waveforms suffer.

this is mathematically inaccurate .. nyquist's theorem proves that 100% of the amplitude information is preserved at 2 samples per cycle ... oversampling is only used for error correction in hardware/software processing - not to increase sonic properties .. in fact, the extra information is ultimately thrown away before output

Also, keep in mind that only a full volume signal gets the benefit of all 16 bits of resolution. A signal at half amplitude gets sliced into 8 bits, and so on, until the subtle background stuff is getting chopped up pretty coarsely.

misleading if not outright false - all signals irrespective of amplitude are sampled with full granularity = each of the 65k+ increments .. at full volume, you get 16 '1s' in the output register, but the granularity / resolution is equivalent .... the 'rounding error' is 1/65k no matter which increment corresponds to max amplitude of the input signal ..now, this illustrates why it is best practice to insure that the largest amplitude of a signal to be sampled be properly 'leveled' as to 'hit the last bit' to extract maximum dynamic range and minimal 'rounding error' .. but small signals have the same resolution as big ones
 
Re: Executive Summary on How Digital Audio works

I'm not feeling too good today, so take my angry foaming-at-the-mouth with a grain of salt, BUT!

If you're going to make hash of everything I say, PLEASE put forth the effort to educate me. Don't make ME go back and look up all the information I've long since forgotten the location of, if you yourself know what I learned was wrong.

If what you really meant was "I'm not sure about this, please clarify" then say so.

All right, here we go.

1). I was led to believe that the newly reconstructed analog waveform is fed through a filter(capacitor/inductor network?) that resonates at the sample rate, therefore (for example) rendering the 22k square wave a pure sine. I could be way off, and technology has probably changed as well.

2). I never said 22k was audible. But...go hang out with any group of audiophile studio wizards and you'll find that matter is very much in question. I said 11k was audible.

3). 100% of the amplitude information can be preserved at 2 samples per cycle, but that assumes a sine wave, right? What about waveform complexity?

4). What you saw was false. What I attempted to communicate is not false. Any level signal is chopped into the same size pieces, but quieter signals have fewer pieces to work with, by nature of their quietness.


Anyway, I have no particular emotional attachment to incorrect knowledge. Teach me and I'll get smarter. Put me down and I'll get ticked.
 
Re: Executive Summary on How Digital Audio works

1). I was led to believe that the newly reconstructed analog waveform is fed through a filter(capacitor/inductor network?) that resonates at the sample rate, therefore (for example) rendering the 22k square wave a pure sine. I could be way off, and technology has probably changed as well.

The filter doesn't change a square wave into a sine wave. The audible signal is not broken into a square wave in digital sampling, just a bunch of points that fit the approximate shape of a curve. I think that what you're talking about is the algorithm that's used to plot the curve between each of the points that are used to reconstruct the analog wave from the digital data set. Either way, there isn't anything that resonates at the sample rate. At least not in any of the circuits that I've built/debugged.

2). I never said 22k was audible. But...go hang out with any group of audiophile studio wizards and you'll find that matter is very much in question. I said 11k was audible.

It has been argued that frequencies above 22k can be heard/perceived (just like frequencies below 20 Hz can be felt). I'm a little on the fence on this one so I can't really comment. Maybe someone else can help here?

3). 100% of the amplitude information can be preserved at 2 samples per cycle, but that assumes a sine wave, right? What about waveform complexity?

100% of the amplitude information can be preserved with a single sample. Amplitude is just a number defining the loudness of the sample (from 2^0 - 2^16 in 16 bit audio). Sampling frequency only effects the recorded frequency (oversampling above 44.1 kHz allows a truer recorded waveform shape in higher frequencies). The Nyquist theorem that was quotes states that "Exact reconstruction of a continuous-time baseband signal from its samples is possible if the signal is bandlimited and the sampling frequency is greater than twice the signal bandwidth." Which means that you need MORE THAN twice the sample rate of a frequency in order to get a faithful signal. With 44.1 kHz, you get a good reproduction all the way up to around 22 kHz.



4). What you saw was false. What I attempted to communicate is not false. Any level signal is chopped into the same size pieces, but quieter signals have fewer pieces to work with, by nature of their quietness.

I think you may be a little confused here. Any 16 bit recording ALWAYS has 2^16 bits to work with no matter what the volume of the signal is. Quieter signals simply recieve a lower number than louder signals. While theoretically this means that digital signals have less dynamic range, the noise floor of analog equipment generally limits it to far less of a dynamic range than a 16 bit signal. As long as a digital signal has frequency information, it doesn't matter what the amplitude is set to.
 
Re: Executive Summary on How Digital Audio works

you seem to be having a problem understanding the nature of what you call 'wave form complexity' ... wave form complexity is nothing but stacked sinewaves of various amplitude (and phase) ... at any given point in time (e.g. a sample) the waveform simply 'is' whatever (composite) amplitude it is (the sum of all its frequency components' amplitudes at that moment) ... i am not sufficiently close enough in time to my undergrad education on this to be able to give a full explication of fourier analysis, so you can either look it up or trust me when i say that all of the components get sampled together and reconstructed together .. everything less than 1/2 the sampling rate can be recreated in amplitude ... anything above half the sampling rate cannot be heard and is irrelvant (IMO)

now, we can go on about filtering linearity across the frequency range .. but i am not interested in that ... suffice to say that the output of the smoothing filter can still be a square wave if all the constituent frequencies that made up that sine wave are less than the filter roll off freq

as for all the bits about quietr signals and louder signals ... it is simply a fact that the 'y-axis' is cut into 65K+ little bits ... at any sampling point, the singal rises up to fall between one of bits ... it is either rounded up or rounded down (quantization error) .. this error is no more and no less for the rounding between the 1st and 2nd quantization levels than it is for the 64,000th and 64,001st quantization levels ... and i don't have the math in front of me, but it is roughly 3dB / per bit of dynamic range .. so 16 bits has a theoretical instantaneous dynamic range of 96 DB .. so if we cut 96 dB into 65K pieces, you can see how small the error is .. now, we can argue the obvious point that 1/65K is a bigger percentage of a small thing than it is a bigger thing .. but you can have that argument by yourself, especially if you think that you can hear that difference .. iirc, under ideal lab conditions, the human ear has typically 1 dB of granulairty (i.e. the smallest increase or decrease that can be recognized)

as for 'put me down and i get ticked' - i apologize if you took my corrections as a personal attack ... and i hope you feel better
 
Last edited:
Re: Executive Summary on How Digital Audio works

another problem is that you have only accounted for half of the information, namely, the amplitude

time-variant signals have phase information as well as amplitude information ... analog-to-digital converters only 'convert' amplitude information .. when the digital signal is recreated by an digital to analog converter, the phase information is essentially 'invented' with no relationship to the original source signal that was converted and processed

No, digitizing a signal keeps phase information -- the only thing that is lost is precision.

It's not to say that transforms later performed on the signal, such as an FFT, might lose phase information. But, that's not a given. For example, if you were writing a digital compressor, all you would care about is squashing the amplitude. In that, case, there would be no reason to perform an FFT, and phase information would remain.
 
Re: Executive Summary on How Digital Audio works

No, digitizing a signal keeps phase information -- the only thing that is lost is precision.

It's not to say that transforms later performed on the signal, such as an FFT, might lose phase information. But, that's not a given. For example, if you were writing a digital compressor, all you would care about is squashing the amplitude. In that, case, there would be no reason to perform an FFT, and phase information would remain.

hmmm, don't all a/d/a converters have a phase noise spec? and isnt the loss of 'precision (good term!) actually a phase discontinuity? which for the practical purpose of audio recreation for human ear consumption, essentially phase destruction? ... in realistic terms, it isnt as if we all of a sudden hear the piccolos a couple inches further to the left .. it all comes at us pretty damn squashed and full-frontal, no?

i am pretty sure that a lot of high end math guys have consumed a metric buttload of dollars working on digital phased array radars to try to ring this kind of error out of guidance, tracking, telemetry, etc radars ... i think it gave the quallcom guys fits working up digital cellular too (although they also had code spread error to deal with for CDMA) ..

interesting stuff
t4d
 
Last edited:
Re: Executive Summary on How Digital Audio works

hmmm, don't all a/d converters have a phase noise spec?

Not sure what a phase noise spec is, but to test that phase information is kept, made a quick sound file in good ol' Sound Edit 16.

Created two sine waves (440 Hz). Panned one to the left, one to the right. The two are in phase.

Created two more sine waves one second later in the file. Manually shifted one so that it was approximtely 180 degrees out of phase. I then saved it as an aiff and play it in QuickTime.

The first sound (in phase) sounded like it was coming from between my speakers. The second pair sounded spread out (I think if played through headphones, two sounds 180 degrees out of phase should sound like they are originating from inside the skull). Either way, they sound different. This is OSX running a Presonus Firebox to a pair of studio monitors.

So, there you have it: D/a converters preserve phase. I guess to really test this, I should mike up the speakers and record into my Powerbook to see if it holds for A/D converters. My hunch is "yes", since A/D converters should convert that analog amplitude to digital amplitude, and nothing else.
 
Re: Executive Summary on How Digital Audio works

nice work!

i think we are talking past each other .. i am not arguing that phase information is not 'kept' - i know it is ... i am asserting that it is altered beyond utility for its purposes .. functionally destroyed .. i know that phase information is 'present', but i contend that it is not related to the input in a continuous manner

and at some point, shouldnt those two 'out of phase signals' you created cancel each other out entirely so that you hear nothing? ... in fact, if you can't get them to cancel, i would assert that it proves the point i am trying to make ... the phase distortion introduced by the processing inhibits cancellation
 
Re: Executive Summary on How Digital Audio works

OK . . . as to the whole phase thing: In my understanding phase simply refers to the wave beginning with respect to the 0 point. A sine wave has negative and positive peaks. If you take two sine waves that are the same but start at slightly different points then you'll get phase cancellation. This phase canellation is created by sound bouncing off of different objects in a room, and is part of the natural reverb sound of a room. (When a noise is played phase anomolies caused by waves bouncing off of hard surfaces and interfering with other waves cause this phasing problem.) This is also the way that a phasor pedal works: it adds together different waves but offsets them by a certain amount.

Are you talking about clock jitter? Because if an a/d or d/a converter doesn't have perfect clocks then there will be an anomaly that gives a strange phased sound to some of the higer frequencies . . .
 
Re: Executive Summary on How Digital Audio works

... Quieter signals simply receive a lower number than louder signals. ...

... it is simply a fact that the 'y-axis' is cut into 65K+ little bits ... at any sampling point, the signal rises up to fall between one of bits ...

Exactly. Exactly what I thought I said! :smack: I see I don't speak your language. I see why I do so poorly on forums. I'm over here babbling Swahili while everyone corrects me only to say the exact same thing I'm trying to!! Do I have rocks in my keyboard? Do I sound like a drunken idiot savant in the idiot stages? AARR!

(Wait- was that a hanging preposition? Why do I try to type when I'm depressed?)

as for 'put me down and i get ticked' - i apologize if you took my corrections as a personal attack ... and i hope you feel better

Thanks.
 
Re: Executive Summary on How Digital Audio works

nice work!

i think we are talking past each other .. i am not arguing that phase information is not 'kept' - i know it is ... i am asserting that it is altered beyond utility for its purposes .. functionally destroyed .. i know that phase information is 'present', but i contend that it is not related to the input in a continuous manner

and at some point, shouldnt those two 'out of phase signals' you created cancel each other out entirely so that you hear nothing? ... in fact, if you can't get them to cancel, i would assert that it proves the point i am trying to make ... the phase distortion introduced by the processing inhibits cancellation

If you have two signals 180 degrees out of phase they will cancel out completely. This happens because the positive side of the sine wave on one will cancel out the negative sound on the other. The only problem with jury rigging an experiment to prove this is that any system has a noise floor. The noise floor will be left behind when you move the position of your sine wave over manually.
 
Back
Top