Just listen for a moment.
What do you hear?
Maybe you’re in a coffeeshop, surrounded by the bustle of other customers, the busywork of baristas, the sound of the city just outside. Maybe you’re in your room, a dog barking in the distance outside, cars passing, music playing in the background, maybe even the television. (Which, frankly, is just rude. I expect your undivided attention!) Maybe you’re alone in the library. It’s quiet. But is it really? Distant footsteps among the stacks. The hum of the air conditioning…
Unless you’re reading this in a sensory deprivation chamber, you are surrounded by sound. The soundscape around us shapes our understanding of the world, becoming its own meaningful context for every other sense perception. Most of the time, it barely registers, we don’t attend to it unless we are listening for something in particular. But take it away and we feel lost, vulnerable, disoriented.
Not surprisingly, sound provides an equally meaningful context for cinema. Or at least, it shouldn’t be surprising. But then again, it wasn’t until 1927 that Sam Warner figured out how to marry sound and image in The Jazz Singer, the first film with synchronized dialogue. Before that, no one much cared that cinema was a purely visual medium. And as Sam toiled away at the new technology, most of the other movie moguls in Hollywood assumed it was a passing fad. That no one really wanted to hear the actors talking.
In the century or so since they were all proven wrong, sound has become co-expressive with cinematography, that is, it shapes how we see what’s on screen, just as the images we see influence how we perceive the sounds.
Just listen to how French filmmaker Agnès Varda has used sound and image together over the last half century:
And like cinematography, sound recording and reproduction has increased in sophistication and technical complexity, developing its own important contribution to cinematic language along the way. So much so that when we talk about the use of sound in cinema we talk about it in terms of sound design, a detailed plan for the immersive effects of a motion picture’s soundscape that begins in pre-production before a single frame is shot and extends to the very end of post-production, often the final element in the entire process.
Before we get to how that soundscape is shaped in the post-production process, let’s look at how (and what) sound is recorded during production. The production sound department is made up of several specialists dedicated to recording clean sound on set as the camera rolls. They include the on-set location sound recordist or location sound mixer, who oversees the recording of on-set sound and mixes the various sources in real-time during production, boom operators, who hold microphones on long poles to pick up dialogue as close to actors as possible without being seen on camera (it helps if they are very tall, and relatively strong, those poles get heavy after a while), and assistant sound technicians, responsible for organizing the equipment and generally assisting the sound mixer.
And just like the camera department, the sound department has their own set of specialized equipment to make their work possible. Obviously, there are microphones involved. But sound recordists can be as particular about their microphones, what brand, type and technology, as cinematographers are about their cameras. Microphones can be omni-directional or directional, cardioid or super-cardioid, mono or stereo, and each one will pick up sounds in a distinctly different way. You can use a shotgun mic on a boom pole to target a sound source from a reasonable distance with a shielded cable. Or you can use a tiny lavalier mic taped to the collar of an actor that sends an audio signal wirelessly to the recorder. Or you can use all of the above in an endless number of configurations all feeding into the same field mixer for the recordist to monitor and record.
Now you may be wondering, isn’t there a microphone right there on the camera? Why not just use that and save all that headache?
First of all, if you asked that out loud, every sound recordist in the universe just collectively screamed in agony. Second, the reason they’re all so upset is that cameras are designed to record an image, not sound. And while they may have a relatively cheap omni-directional microphone built-in, or even inputs for higher-quality microphones, nothing can replace the trained ears of a location sound mixer precisely controlling the various streams of audio into equipment designed to do just that. Which is why, even now, most cinema uses dual-system recording, that is, recording sound separate from image during production.
Dual-system recording allows for a more precise control over the location sound, but it also comes with its own problem: synchronization. If the sound is recorded separately from the image, how do you sync them up when you’re ready to edit? Glad you asked. Ever seen one of these:
We have lots of names for it, clapper, sticks, sound marker, but the most common is slate, based on the fact that in the early days it was made out of slate, the same stuff they use to make chalkboards. It serves two purposes. The first is to visually mark the beginning of each take with the key details of the production as well as the scene, shot, and take number. This comes in handy for the editor as they are combing through all of the footage in post-production. The second is to set a mark for sound synchronization. A crew member, usually the second camera assistant, holds the slate in front of the camera and near a microphone and verbally counts off the scene, shot and take number, then SLAPS the slate closed. In post-production, the editors, usually an assistant editor (cause let’s face it, this is tedious work), can line up the exact frame where the slate closes with the exact moment the SLAP is recorded on the microphone. After that, the rest of the shot is synchronized.
In fact, this whole process, repeated for every take during production, is a kind of call-and-response ritual:
1st Assistant Director: “Quiet on the set! Roll sound!”
Sound mixer: “Sound speed!”
1st AD: “Roll camera!”
2nd Assistant Camera: “Scene 1 Apple Take 1” SLAP!
Cinematographer: “Hold for focus. Camera set!”
Director: “And… ACTION!”
Every. Single. Time. And note that the 2nd AC mentions the scene number, 1, the shot, Apple (for shot “A” of scene 1), and the take number, 1.
But wait… sound speed? That’s another of those little anachronisms of cinema. For much of cinema sound history, sound was recorded onto magnetic tape on a clunky reel-to-reel recorder. It would take a moment for the recorder to get up to “speed” once the recordist hit record, so everyone would have to wait until they called out “sound speed!” We use digital recording these days with no lag time at all, but the ritual never changed.
Sometimes, 2nd ACs can have a lot of fun with this little ritual. Check out Geraldine Brezca’s spin on the tradition throughout Quentin Tarantino’s Inglorious Basterds (2009):
Now that we have a sense of how things get recorded on set during production, we should probably cover what gets recorded. The answer: not much. Or at least a lot less than you might think. In fact, the focus of on-set recording is really just clean dialogue. That’s it. Everything else, background sounds, birds chirping, music on a radio, even footsteps, are almost always recorded after production. The main job of location sound recordists is to isolate dialogue and shut out every other sound.
Why? Because sound editors, the folks who take over from the recordists during post-production, want to control everything. Remember how nothing is on screen by accident? The same goes for sound. Clean dialogue has to match the performance we see on screen, but everything else can be shaped to serve the story by layering in one sound at a time.
There is one exception. Another little ritual everyone gets used to on a set. At the end of a scene, when all of the shots are done, the location sound recordist will whisper to the 1st AD, and the 1st AD will call out: “Hold for room tone!” And then everyone stops in their tracks and holds still, remaining completely silent for at least 60 seconds.
But what is room tone? Every space, interior or exterior, has its own unique, underlying ambient sound. What we sometimes call a sound floor. During production, as the actors deliver their lines, the microphones pick up this sound floor along with the dialogue. But in post-production, as the editors pick and choose the takes they want to use, there will inevitably be gaps in the audio, moments of dead air. Room tone recordings can be used to fill in those gaps and match the sound floor of the recorded dialogue.
Of course, as I mentioned, it can be a bit awkward. But it can also be kind of beautiful in its own way:
Room tone is just another example of how sound editors control every aspect of the sound in the cinematic experience.
In the last chapter, we focused on editing the visual elements in a motion picture. How the shots fit together to creative a narrative flow and communicate with the audience. As it turns out, sound requires a similar approach in post-production and is often even more “invisible” than the techniques of picture editing. (In fact, if there are any sound editors reading this book, they probably noticed that picture editing got a whole chapter and all they get is this one crumby section. Typical.)
But sound editing is much more than simply joining up the sounds that already exist. It involves creating all of the sounds that weren’t recorded on set to make up the rich soundscape of the finished motion picture. In that sense, it is literally more “creative” than picture editing! (How’s that, sound editors? Feel better now?)
One important bit of post-production sound creation has to do with dialogue. Sometimes, because of distracting ambient sounds or a poorly placed microphone, an actor’s dialogue for that perfect take is unusable. (C’mon, location sound recordist, you had one job!) In that case, sound editors bring in the actors to perform ADR, short for Automated Dialogue Replacement (sometimes also referred to as Additional Dialogue Recording, or “looping”). They simply play the scene in a repeating “loop” as the actors record the lines over and over until it matches the performance on screen. Then the sound editors adjust the quality of the recording to match the setting of the scene.
But what about all those other sounds that weren’t recorded on set? The birds chirping, the cars passing, even those footsteps? Those too have to be created and gathered together in post-production and layered into the sound design. Many of these sounds already exist in extensive sound libraries, pre-recorded by sound technicians and made available for editors. But many of them must be created to match exactly what the audience will see on screen. That’s where foley artists come in.
Foley artists are a special breed of technician, part sound recordist and part performance artist. Their job is to fill in the missing sounds in a given scene. By any means necessary:
Foley artists have to get creative when it comes to imitating common (and not-so-common) sounds. But sound editors must go beyond recreating the most obvious sounds associated with a scene. Every rustle of clothing, a hand on a cup, brushing a hair behind an ear. It’s these tiny details, most of which we would never notice unless they weren’t there, that help create continuity in the final edit.
Yes, there’s a that word again: continuity. Editing picture for continuity means creating a narrative flow that keeps the audience engaged with the story. Editing sound for continuity has the same goal but relies on different techniques. For example, if we see someone walking on gravel, but hear them walking on a hard wood floor, that break with continuity – or in this case, logic – will take us out of the narrative. The soundscape must match the cinematography to maintain continuity. And since so much of the sound we hear in cinema is created and added in post-production, that requires an incredible attention to detail.
But there are other ways editors can use sound to support the principle of narrative continuity, and not always by matching exactly what we see on screen. For example, a sound bridge can be used to help transition from one shot to another by overlapping the sound of each shot. This can be done in anticipation of the next shot by bringing up the audio before we cut to it on screen, known as a J-cut, or by continuing the audio of the previous shot into the first few seconds of the next, known as an L-cut. This technique is most noticeable in transitions between radically different scenes, but editors use it constantly in more subtle ways, especially in dialogue-heavy scenes. Here are some quick examples:
And just like picture editing, sound editing can also work against audience expectations, leaning into discontinuity with the use of asynchronous sounds that seem related to what we’re seeing on screen but are otherwise out of sync. These are sound tricks, intended to either directly contrast what we see on screen, or to provide just enough disorientation to set us on edge. Here’s one famous example of asynchronous sound from Alfred Hitchcock’s The 39 Steps (1935):
The woman opening the train compartment door discovers a dead body, but instead of hearing her scream, we hear the train whistle. In this case we get an asynchronous sound combined with a J-cut.
Production sound recording and sound editing are all part of the overall sound design of cinema, and there are lots of moving parts to track throughout the process. Take a look at how one filmmaker, David Fincher (along with Christopher Nolan, George Lucas, and a few others), uses all of these elements of sound design to embrace the idea of sound as co-expressive with the moving image:
Once all of the sound editing is done and matched up with the image, the whole process moves to the sound mixer to finalize the project. And if you’ve ever wondered why there are two Academy Awards for sound, one for sound editing and one for sound mixing, this is why. (Or maybe you’ve never wondered that because that’s when you decided to grab a snack. I mean, who pays attention to Best Sound Mixing?) Sound mixers must take all of the various sound elements brought together by the editors, including the music composed for the score (more on that later), and balance them perfectly so the audience hears exactly what the filmmakers wants them to hear from shot to shot and scene to scene.
This is a very delicate process. One the one hand, the sound mix can be objectively calibrated according to a precise decibel level, or degree of loudness, for each layer of sound. Dialogue within a certain acceptable range of loudness, music in its range, sound effects in theirs. Basic math. On the other hand, the mix can and should be a subjective process, actual humans in a room making adjustments based on the feel of each shot and scene. Most of the time, it’s both. And when it’s done well, the audience will feel immersed in each scene, hearing every line of dialogue clearly even when there are car crashes, explosions and a driving musical score.
For example, check out this deconstruction of the sound design from a single scene from The Bourne Identity (2002):
Sound mixing is one of those technical aspects of filmmaking that has evolved over the decades, especially as the technology for sound recording and reproduction has changed in more recent years. Starting with the birth of cinema sound in 1927, movie houses had to be rigged for sound reproduction. Which usually meant a couple of massive, low-quality speakers. But by 1940, sound mixers already experimenting with the concept of surround sound and the ability to move the various channels of sound around a theater through multiple speakers to match the action on screen.
As the century rolled on, newer, hi-fidelity sound reproduction found its way into theaters allowing for more sophisticated surround sound systems, and consequently, more work for sound mixers to create an immersive experience for audiences. George Lucas introduced THX in 1983, a theatrical standard for sound reproduction in theaters to coincide with the release of Return of the Jedi. In 1987, a French engineer pioneered 5.1 surround sound, which standardized splitting the audio into 6 distinct channels, two in the front, two in the rear, one in the center and one just for low bass sound. And as recently as 2012, Dolby introduced Dolby Atmos, a new surround sound technology that adds height to the available options for sound mixers. Now sound can appear to be coming from in front, behind, below or above audiences, creating a 3-D aural experience.
And every element in the final sound track has to be calibrated and assigned by the sound mixer. Check out how complex the process was for the sound mixers on Ford v Ferrari (2019):
Finding the right mix of sound is critical for any cinematic experience, but one element that many filmmakers (and audiences) neglect is the use of silence. The absence of sound can be just as powerful, if not more powerful than the many layers of sound in the final track. Silence can punctuate an emotional moment or put us in the headspace of a character in a way that visuals alone simply cannot.
Check out how skillfully Martin Scorsese uses silence throughout his films:
Of course, in most of these examples silence refers to the lack of dialogue, or a dampening of the ambient sound. Rarely is a filmmaker brave enough to remove all sound completely from a soundtrack. Dead air has a very different quality to it than simply lowering the volume on the mix. But a few brave souls have given it a try. Here’s French New Wave experimental filmmaker Jean Luc Godard playing an aural joke in Band à part (1964):
It’s not actually a full minute of dead air – it’s more like 36 seconds – but it feels like an hour.
Compare that to this scene from the more recent film Gravity (2013):
That was also 36 seconds. Perhaps a little wink from the director Alfonso Cuaròn to the French master Godard. But both are startling examples of the rare attempt to completely remove all sound to great effect.
One of the most recognizable elements in the sound of cinema is, of course, music. And its importance actually pre-dates the synchronization of sound in 1927. Musical accompaniment was almost always part of the theatrical experience in the silent era, and films were often shipped to theaters with a written score to be performed during the screening. Predictably, the first “talking picture” was a musical and had more singing than actual talking.
As the use of sound in cinema has become more and more sophisticated over the last century, music has remained central to how filmmakers communicate effectively (and sometimes not so effectively) with an audience. At its best, music can draw us into a cinematic experience, immersing us in a series of authentic, emotional moments. At its worst, it can ruin the experience altogether, telling us how to feel from scene to scene with an annoying persistence.
But before we try to sort out the best from the worst, let’s clarify some technical details about how and what type of music is used in cinema. First, we need to distinguish between diegetic and non-diegetic music. If the music we hear is also heard by the characters on screen, that is, it is part of the world of the film or tv series, then it is diegetic music. If the music is not a part of the world of the film or tv series, and only the audience can hear it, then it is non-diegetic music. Too abstract? Okay, if a song is playing on a radio in a scene, and the characters are dancing to it, then it is diegetic. But if scary, high-pitched violins start playing as the Final Girl considers going down into the basement to see if the killer is down there (and we all know the killer is down there because those damn violins are playing even though she can’t hear them!), then it is non-diegetic.
Diegetic versus non-diegetic sound is a critical concept in the analysis of cinema, and crafty filmmakers can play with our expectations once we know the difference (even if we didn’t know the terms before now). For example, non-diegetic music can communicate one emotion for the audience, while diegetic music communicates something entirely different for the characters on screen. Think about the movie JAWS (1975). Even if you haven’t seen it, you know those two, deep notes – da dum… da dum – that start out slow then build and build, letting us know the shark is about to attack. Meanwhile, the kids in the water are listening to pop music, completely oblivious to the fact that one of them is about to be eaten alive!
And this concept applies to more than just music. Titles, for example, are a non-diegetic element of mise-en-scene. The audience can see them, but the characters can’t.
Second, we need to distinguish between a score written by a composer, and what we could call a soundtrack of popular music used throughout that same motion picture. The use of popular music in film has a long history, and many of the early musicals in the 1930s, 40s and 50s were designed around popular songs of the day. These days, most films or tv series have a music supervisor who is responsible to identifying and acquiring the rights for any popular or pre-existing music the filmmakers want to use in the final edit. Sometimes those songs are diegetic – that is, they are played on screen for the characters to hear and respond to – or they are non-diegetic – that is, they are just for the audience to put us in a certain mood or frame of mind. Either way, they are almost always added in post-production after filming is complete. Even if they are meant to be diegetic, playing the actual song during filming would make editing between takes of dialogue impossible. The actors have to just pretend they are listening to the song in the scene. Which is fine, since pretending is what they do for a living.
But the type of music that gets the most attention in formal analysis is the score, the original composition written and recorded for a specific motion picture. A film score, unlike popular music, is always non-diegetic. It’s just for us in the audience. If the kids in the water could hear the theme from JAWS they’d get out of the damn water and we wouldn’t have a movie to watch. It is also always recorded after the final edit of the picture is complete. That’s because the score must be timed to the rhythm of the finished film, each note tied to a moment on screen to achieve the desired effect. Changes in the edit will require changes in the score to match.
It is in the score that a film can take full advantage of music’s expressive, emotional range. But it’s also where filmmakers can go terribly wrong. Music in film should be co-expressive with the moving image, working in concert to tell the story (pun intended, see what I did there?). The most forgettable scores simply mirror the action on screen. Instead of adding another dimension, what we see is what we hear. Far worse is a score that does little more than tell us what to feel and when to feel it. The musical equivalent of a big APPLAUSE sign.
These tendencies in cinematic music are what led philosopher and music critic Theodor Adorno to complain that the standard approach to film scores was to simply “interpret the meaning of the action of the less intelligent members of the audience.” Ouch. But, in a way, he’s not wrong. Not about the less intelligent bit. But about how filmmakers assume a lack of intelligence, or maybe awareness, of the power of music in cinema. Take the Marvel Cinematic Universe for example. You all know the theme to JAWS. You probably also know the musical theme for Star Wars, Raiders of the Lost Ark, maybe even Harry Potter. But can you hum a single tune from any Marvel movie? Weird, right? Check this out:
The best cinema scores can do so much more than simply mirror the action or tell us how to feel. They can set a tone, play with tempo, subvert expectations. Music designed for cinema with the same care and thematic awareness as the cinematography, mise-en-scene or editing, can transform our experience without us even realizing how and why it is happening.
Take composer Hans Zimmer for example. Zimmer has composed scores for more than 150 films, working with dozens of filmmakers. And he understands how music can support and enhance a narrative theme, creating a cohesive whole. In his work with Christopher Nolan, The Dark Knight (2008), Inception (2010), Interstellar (2014), his compositions explore the recurring theme of time:
Musical scores can also emphasize a moment or signal an important character. Composers use recurring themes, or motifs, as a kind of signature (or even a brand) for a film or tv series. The most famous of these are the ones you can probably hum to yourself right now, again like Star Wars, Raiders of the Lost Ark, maybe even Harry Potter. Composers can use this same concept for a specific character as well, known as a leitmotif. Think of those two ominous notes we associate with the shark in JAWS. That’s a leitmotif. Or the triumphant horns we hear every time Indiana Jones shows up in Raiders. That’s a leitmotif.
Oh, and all those movies I mentioned just now? They all have the same composer. His name is John Williams. And he’s a legend:
Video and Image Attributions:
Traditional Wooden Slate. Public Domain Image.
How The Sound Effects In ‘A Quiet Place’ Were Made | Movies Insider by Insider. Standard YouTube License.
‘Ford v Ferrari’ Sound Editors Explain Mixing Sound for Film | Vanity Fair by Vanity Fair. Standard YouTube License.
Jaws (1975) – Get out of the Water Scene (2/10) | Movieclips by Movieclips. Standard YouTube License.
John Williams and the universal language of film music by Dan Golding – Video Essays. Standard YouTube License.
- If you want to see more videos like this one, check out InDepth Sound Design's YouTube channel, it's pretty cool: https://www.youtube.com/channel/UCIaYa00v3fMxuE5vIWJoY3w ↵