Mixing Sound.Part 1: Universal Observations

This site offers surround professionals an opportunity to exchange information, voice their opinions, and get their production questions answered by their peers. This was created for you to discuss issues pertaining to the surround sound revolution.

by tomlinson holman

Starting a series on mixing, we look at those observations common to many music mixers working for the first time in surround as opposed to 2-channel stereo. These observations have been accumulated over the last few years as first one then another mixer explained their early surround experience in meetings I’ve attended. A possibly surprising feature was how much commonality there was in surround mixing technique among the various mixers, despite their never having talked to one another.

1. Surround mixing generally uses less program equalization than stereo. The reason for this seems to be that dense mixing of many elements leads to sonic clutter in stereo, resulting in the masking of one part of the program by another. Once the sources are more spread out spatially, they are more easily distinguished, so it is not as much trouble to get them to be heard.

One of the interesting differences between pure music mixing and mixing for picture is the fact that music mixes are more likely to be heard over and over than audio-for-picture ones. One thing that is attractive to music listening is following the various streams on different occasions, or even shifting focus among the streams during one listening. Perceptual psychologists call this auditory streaming. First we might hear the melody and lead singer, the next time the background vocals, then concentrate on the lead instrument part, etc. This is one of the reasons for wanting to listen over and over. It is the ability to “hear into the mix” that matters in this context.

With sources more spread out, they can be better heard. This was found first technically in command and control headquarters by the U.S. Army during World War II. By spatially spreading out a bell, buzzer, klaxon, horn, etc., listeners were better able to distinguish quickly which was alerting them than if they were concentrated in one spot. This “multi-point mono” approach means that spreading separate monaural sources out in space makes them better distinguished.

Since one of the primary ways to distinguish one sound from another is by its timbre, stereo mixing often involves elaborate equalization just to make the various instruments audible. Such equalization may well not be representing the timbre of the source well, but rather emphasizing it so that it can be heard in context against others in the mix. Equalization for multichannel thus becomes both easier and also better able to concentrate on rendering the timbre of the source in a manner that is closer to the natural acoustic timbre of the source.

There are still reasons to equalize. One of the main ones is microphone placement. Since the choice of microphone placement captures the timbre of a natural source only along one or multiple axes, but not the whole output of the source, one main reason to equalize is to get the source to sound like itself! This may seem strange, but let’s take an exaggerated case and you’ll see what I mean.

Let’s say we have to mic Yo Yo Ma playing the cello, but there can be no microphone in front of the instrument because we’re photographing the performance and it is undesirable to interrupt the view. We might wind up with a Boundary Layer Microphone on the floor in front of the cellist. The microphone has very flat response on a large enough baffle, which the floor would be, but it is off axis from the position where we might choose to capture the timbre correctly. Here it is easy to agree with the proposition that, although we’ve used a flat microphone, we have not captured the instrument properly, and so program equalization needs to be done.

One way to determine the equalization in this case is analytically. We can put up a microphone where we’d like it in rehearsal (hopefully of a solo portion of the program to avoid clutter from other sources) and capture the long-term spectrum at that point in space. Comparing it to the long-term spectrum of the mic on the floor (and getting the sign of the difference right!) would result in the required equalization to turn the timbre captured at the undesired microphone position into one much more like the reference position — and possibly even indistinguishable from the “right” position. I’ve done this for voice using the various microphone techniques normally used, like boom mics vs. lavalieres for instance, and we’ve gotten a much more interchangeable sound quality from the less desirable positions. Unfortunately it took a ten-band parametric equalizer to make the match. I say unfortunately because, although it worked well, it also showed how up against it a dialog mixer is when faced with a console with a typical 4-band parametric equalizer attempting to match a boom mic and a lavaliere by ear.

So equalization can be reserved more for getting things to sound good than for getting them to be distinguished in the mix for multichannel compared to stereo. Generally this means you need less peaky voice tracks for multichannel vs. stereo, among other things.

There is another reason at work for voice equalization. Voices, especially lead vocals, are usually panned center in stereo. Panning a source to the center of a stereo soundfield leads to a “darker” timbre, due to interaural crosstalk. This is the sound from the left loudspeaker that reaches the right ear, and vice versa. Since the flight time from the left speaker to the right ear is slightly longer (only about 200 microseconds longer typically) than for the right speaker to the right ear, for sound centered in middle of a stereo image, constructive and destructive interference will occur at the ear due to the time delay. The most salient feature of this effect is a strong dip centered in the 2 to 3 kHz region.

I have a theory that goes like this: People have put up their favorite studio microphones in front of a soloist and panned it to the center of a multichannel system, adding the same choice of equalization that they have used before for stereo. They have found the sound honky — peaked up in the presence region around 2 to 3 kHz — said, “Ug, this doesn’t work,” and rejected the center channel and have gone back to stereo. The reason that they had this experience is that they have a long-time experience with phantom image stereo and have, through the choice of microphone and equalization, overcome the interaural crosstalk-induced dip due to stereo. Survival of the fittest has caused the evolution in microphone and EQ choice for 2-channel stereo to have a peak in just the region where the listening conditions induce a dip! If you start with a flatter microphone and EQ, you may well find that you like the multichannel center better.

2. Compression is less needed in multichannel sound than in stereo. The reasons are much the same as for equalization — compression is used to keep one auditory stream at a more constant level so that it can be distinguished from other sources in the mix. This need is not so pressing in multichannel as it is in stereo. Again, there may still be some need to compress, but not in such an exaggerated way. I find that, in mixing dialog in documentaries in particular, that a small amount of compression is a good thing, and makes the sound more like the source than without it, and easier to listen to.

My theory for this starts with the fact that the microphone is capturing sound at one point in space (mono dialog recording, as is standard). When we listen naturally, we hear at two points in space. When I take a measurement microphone into a space and measure the detailed frequency response from a source, I see a lot of peak-to-peak variation in level caused by standing waves and reflections — even up to very high frequencies. Yet, when I spatially average the sound, even over an area of only a head size in diameter, the peak-to-peak variation is reduced considerably (also reducing the corresponding need for sound system equalization). The recorded voice as it goes through its various formants produces varying spectra, the components of which are affected by the single-point pick-up in turning the strong peak-to-peak variation in response into a stronger-than-heard peak-to-peak variation in level. Thus, in my book, employing some compression actually makes the source sound “more like itself” than without it, because our two ears are averaging the soundfield spatially, and the microphone is not.

The amount of compression I’m talking about for documentary dialog is in the range of 6–8 dB maximum. Greater compression leads to hearing artifacts, and, of course, you should not hear background pumping or other problems or you will need to back off on the compression.

I must say that I don’t have much experience with pop music vocals, but what I hear often seems to be a “radio mix” intended for intelligibility in less than ideal conditions by employing way too much compression (they’re going to do it again at the radio station anyway), and multichannel doesn’t play on the radio anyway (at least not yet!).

So I’ve given a rationale for some compression being right, whether that be by riding gain on the classical soprano or putting the source track through a compressor — but not too much.

Goldilocks comes to mind: the porridge (equalization and compression) should not be too cool and not too hot, but just right.

Surround Professional Magazine