Probably the most often heard complaint about movies played on home theaters is that “they’ve buried the dialog,” or “I have to turn the volume up and down.” Mixes made for theatrical release are made under highly standardized conditions, yet a visit to the Academy sound “bake off” played under these calibrated conditions this year showed the levels to be from pleasant (Monsters Inc., Amelie), through satisfyingly loud (Black Hawk Down, Pearl Harbor), to very loud (Lord of the Rings) � a level that would not be tolerated in most theaters that would thus turn it down � to ridiculously loud (Fast and Furious) � fingers in the ears the whole time � that no theater anywhere ever played the film at.
Even those that are pleasant to pleasantly loud in the theater are perceived as much louder at home when played at the same physical sound pressure level. This is due to a psychoacoustic scaling function: we come to “expect” certain sound levels in certain spaces, and exceeding them even by a dB or two is perceived as louder. As explained before in these pages, dialog normalization with AC-3 on DVD-V is usually set so the absolute level is “turned down” by 4 dB, and Home THX gear turns it back up by 2 dB, for a net loss of 2 dB, just about the right amount to get the same perceived level in your living room. I guess two “wrongs” sometimes do make a right.
So now that we’ve got the same apparent sound pressure level in the home room as the theater, why do you have to turn it up and down? Why doesn’t dynamic range scale? And why is dialog less easily perceived in the “midfield” environment of home than the distant “farfield” of the theater?
These questions turn out to be interrelated. Let’s take the last one first, as it illuminates the others. All the text books will tell you that, in a large space you are in the farfield, defined as the area where the level is equal as you move around due to dominance by the reverberant soundfield. Conventional wisdom says that you are in the nearfield at home � after all, it’s much smaller in room volume, and thus doesn’t have nearly the long reverb time of the large space.
In this case, the text books and conventional wisdom have it all wrong. Large spaces dealt with in books on architectural acoustics are based on the large and difficult spaces that acousticians deal with, such as churches, auditoriums, concert halls, train stations, airports, and the like. While these are really good examples since they are difficult, they do not tell us the conditions of today’s cinemas. Why? Because cinemas are much deader for their room volume than other spaces; are of intermediate, not enormous size; and use directional screen loudspeakers. With my recent work in measuring everything possible about modern cinemas, I can separate the direct sound from the reverberant and find the dominance of the direct sound level � and I’m here to tell you that it is quite dominant � exactly the opposite of the cases in the text books.
In a set of test cases I ran some years ago, for the best seats the listener was seated at between 50 and 70 percent of the critical distance. (The critical distance is that distance from a source along its axis where the level of the direct sound and reverberant sound from the source are equal.)
Now let’s look at home listening. We’ve got a living room of reasonable volume, let’s say 3000 cu. ft. There’s a rug on the floor, and some furnishings, but otherwise the space is “modern,” that is, without heavy drapery or other absorption � it’s from the “acres of white gypsum board” wall school of architecture, because that looks “clean” to the architect’s eye. We may have tamed the most egregious flutter echoes with some book cases, but it’s pretty acoustically (and visually!) bright in there.
Our loudspeakers are good home theater ones. They’re direct radiators (for the fronts at least) with drivers ranging from a woofer down to a 1-inch tweeter. Most of the text books do not cover this case, but, if they did, the authors would be in for a shock. The combination of reverberation time and speaker directivity means that we are now sitting at about 1.8 times the critical distance!
Who would’ve thought it? In the theater, our listening is direct field dominated � we may be at 1/2 the critical distance, and, at home, we are reverberant field dominated � we’re at nearly twice the critical distance!
Adding one more complication to the mix is that this explanation applies across most of the frequency range, but, for the home at least, it reverses course at high frequencies. There, the speakers have become more directional, and the reverberation time has fallen, thus the soundfield “reaches out” to you to make the top octave direct sound dominant. Although I found this independently in the home theater context, I later found out that John Eargle had found this “reach out and touch you” factor true for the top octave of control rooms, too, and published it in “Equalizing the Monitoring Environment,” J.A.E.S., Vol. 21, Number 2, pp. 103 (1973).
By the way, the complications of this, combined with X Curve equalization in the large room, and more or less “flat” loudspeakers at home, are what justifies the “re-equalization” curve of Home THX, or, alternatively, why material you mixed in the dub stage does not translate to the home theater, sounding brighter there. The foregoing is at least part of an explanation of why dialog is harder to understand at home: it stands out less from the reverberation than it did in the theater. While we don’t think of our home rooms as particularly reverberant because the reverb times are lower than many large spaces simply due to their smaller volume, proportionally speaking, a home theater is a much more live environment than a cinema. Some of the most dedicated audiophile’s 2-channel stereo rooms are also some of the most reverberant when examined, because stereo lacks so much in the envelopment department that it has to be gotten artificially by adding it to the listening space. This is fatal for mix engineers.
OK, maybe we’ve answered why dialog is not as intelligible, causing us to have to “turn it up” to hear it, and then back down for the louder effects, but this is compounded by another factor. We are listening at home, at night, and we don’t want to disturb anyone, and yet we want to catch the dialog. Our listening “regime” is different from that intended, and the mixes are transferred directly from the original film masters into the home environment. Little else can be done for several reasons: the original sound mixers who are in the best position to make aesthetic judgments about the material are unavailable, having moved on; the monitoring conditions of transfer are very different from either dubbing stage or the home listening room and you wouldn’t want to make judgments that are supposed to translate under these conditions; and so forth.
This problem of wide dynamic range mixes played back at home was anticipated in the ATSC process to determine standards for digital television, and the result was DRC, or Dynamic Range Control. Made a part of AC-3, it was supposed to solve this problem by permitting the program producer to choose a compression that worked, that could be optionally applied at playback. The amount of compression is transmitted as metadata, either for broadcast television or on packaged media. One problem with this is the “one size fits all” approach. Just how much compression should you, the program producer, apply, when listeners are all over the map in their needs? Perhaps you ought to try out the program at home, at night, with the baby asleep, and see what you find.
So the first question to ask the complainant about dynamic range is “have you engaged DRC?” or words to that effect. It might be called a “night” switch, or DRC, but it should be present in the decoder. Of course, DRC doesn’t apply to 2-channel Lt/Rt mixdowns coming from settop boxes and hi-fi tracks of VHS, so that’s another matter.
A paper of mine related to these topics was published in The Proceedings of the AES 12th International Conference: Perception of Reproduced Sound, Copenhagen, Denmark, 1993 June 28�30, called “Translating the Experience of Film and Television Sound from Room to Room.” This paper includes the hard numbers that lie behind the work described above.