🎬 Sound Editing Process

Let’s start with a very high-level look at how the audible portion of a video is crafted. In further sections we’ll explore the “how” in greater depth, but for now, this page should serve to explain the bones of the process.

The production sound you’ve captured (everything you got on set and bring to editorial) makes up a surprisingly small percentage of the total film’s soundscape. The work sound designers and foley artists perform is hugely underrated and it’s often one of the things documentary or non-fictional films spend too little time considering. As with everything else in this course, designing the sound, rather than just lazily taking what you recorded, is key to success. Intentional vs. incidental. Sound familiar? Here’s a brief look at the elements commonly present in good sound design.

  • Production Dialog: Sync check and dialog cleanup are handled by a dedicated “production dialog editor” because dialog is likely one of the most key elements of sound. Human speech is the most common element to any type of video, whether corporate or narrative, so we spend the bulk of this page, and a large amount of time in post, on this part of the process.
  • ADR: Sometimes the dialog you got on set wasn’t great. The acronym means different things to different people (usually along the lines of Automatic/Automated Dialog Recording/Replacement) but all it means is dubbing over dialog in post when production dialog doesn’t work. Maybe a loud waterfall overpowered the speaker, maybe clothing rustle ruined a lav mic take, maybe a boom operator fell asleep. For whatever reason, you can’t save what was recorded on set so you’re capturing in post. It should be said that some productions intentionally neglect production sound in favor of copious amounts of post dubbing (ever seen a Bollywood film?).
  • Foley: Artificially wonderful, often massively exaggerated, effect sounds. These are a blast to create as the sound itself is what matters, not what’s making the sound.
  • Music: This is responsible for a large part of human emotional response to a film. Timing, conform, and final sync are handled by the “music editor” on a larger film.

The Process

So here’s a general process for working with sound:

1. Track organization: Get individual actors and their individual mics all on dedicated tracks. This makes things much easier organizationally as you’ll want to be able to apply osme effects to an entire track’s audio. Put the mono (single channel) elements above the stereo (two channel) ones. That’s why music and sound effects come at the bottom of the list as they’re more likely to be stereo files. Each of these bullets could multiple tracks. For example, “Dialog” could be 8 different tracks of dialog. That’s partially why this organization is so important. Video editors often like working with mixed down files where all production audio is represented on a single track. When it comes to audio, though, you’ll want the ability to dive into every channel recorded. For a given take, you may need to grab one line from a boom mic and another from a lav mic and seamlessly blend the two. For that reason you need access to the individual “Iso” channels.

Track Order From Top To Bottom
Dialog (DX or “VOX” in the music world)
Sound effects (SFX)
Music (MX)

With your timeline laid out and labeled in a logical manner, the first step in pretty much every production is to get the dialog in the right place. Most of the next few steps focus on that.

2. Normalize the audio to get levels in the ballpark. This is often done during editorial so the editor has everything roughly loud and consistent enough to edit. As you know, modern recording generally leaves ample headroom which means you’ll likely need to boost levels. Resolve lets you adjust clip gain automation before the track level automation. This means you can get your individual clips within a timeline all to the same average value to facilitate editing. Just right click an audio or clip in the timeline or media pool and “Normalize Audio Levels”. In the dialog box that follows, you have options. Perhaps the simplest is to leave “Sample Peak Program” as your mode selection and set the target level to which you want to “normalize” your clips. In this mode, Resolve looks at the clip’s loudest point and adjusts the volume of the whole clip so that loudest point hits your -2dBFS target. This is a handy way to get all your clips close to the same volume. Consider this however: what if someone coughs right into their microphone in the course of the recording? Resolve will see the corresponding peak in the waveform and base its normalization value on that loudest peak which is not representative of your dialog. A better way to work is often to select ITU-R BS.1770-4 as your normalization mode. This will use “loudness” rather than peak values to normalize and is a better way of getting the dialog to sound similar in average loudness. The ideal approach would be to fix the problematic audio peak (using a simple keyframe curve dip would be fine), bounce the audio clip, then normalize the new clip, but I find this a bit cumbersome when I just want to get into the edit and deal with audio editing later. Unfortunately you do have to bounce the “dipped” clip to a new file or the normalization will not see the edit you’ve made and you’ll get the same result where normalization considers the offending peak.

If you select multiple clips, take note of the “Relative” vs. “Independent” setting. If you choose “Relative”, Resolve will consider all the clips you selected and apply one normalization value to the whole group. This means if you have three clips from the same interview, but one clip is substantially louder, that loud clip will cause normalization to improperly pull the level of all the clips down since it’s normalizing the audio based on the peaks. “Independent” is a better option if you want each of the selected clips to be analyzed and then normalized on its own, and it’s the option I typically use. -2dBFS is a pretty loud normalization, assuming you’re working with a dialog track as the primary audio source for your program. This will get the audio level’s peaks right up near clipping and resolve any massive discrepancy in levels between your various audio clips making it easy to edit. In the mixing phase you’ll likely be targeting a different level for your dialog however.

3. “Franken-edit” the audio for the best possible dialog. You’ll often have the need to chop up the audio based on performance or technical needs and what sounds like a fluid sentence in the final product is actually a hodgepodge of several different takes (and possible microphones). Make sure each clip has a smooth transition or crossfade at its head and tail to avoid pops or distracting sounds. As with other audio edits, make sure there’s a constant room tone underneath your dialog. It will hide a multitude of sins.

4. Clip Automation First, per clip, use automation (rubber bands) to visually β€œcompress” clips with excessive dynamic range. Again, this step can often be done in the editing workflow of any NLE. This is similar to the compression you’ll apply later, but it’s a way to manually even out loud and soft parts across the dialog.

6. Levels are wise to start thinking more about at this stage. Pick the loudness standard you’re working to under Project Settings>Fairlight (-15dBFS is a good average for web content, but do note that as of 2020 this is yet to be standardized) and then you meter your mix to see how close you are to that standard. Reset the loudness meter, start the meter, playback your audio and watch the measure of integrated loudness. In this case, the -8.9 indicates that I’m 8.9 dB below my target loudness. So I need to bring my overal level up around 9dB, but what do I do if my peaks are clipping at -2dB? Obviously -2 plus 9 puts us beyond 0, our limit for digital audio. It’s time to talk about applying effects, in this case to affect the “dynamics” of the audio.

5. Effects are visited more thoroughly on the audio effects page, but with regard to the last issue, it’s worth mentioning compression while we discuss dialog editing. A compressor allows you to take the loud stuff (and only the loud stuff) down in volume. Because of that, we can compress the peaks of the audio and then raise its overall level up without risk of those peaks clipping. Again, there’s more info on compression on the effects page, but for now you can see why its a powerful tool. Common tools for editing dialog at this point include Noise Reduction, used to help separate speech from noise and EQ which will get your audio sounding well-balanced and potentially reduce interfering sounds at unwanted frequencies.

9. Music editing requires that you pull down or “duck” the level of the music when the dialog arrives overtop. This can be done manually with clip automation, with track automation, or by “side chaining” a compressor so it watches the dialog track and dips the music whenever dialog appears.

10. Sound Effects, Foley and ADR should be added and treated with whatever effects are necessary to get them at the right level and make them feel like the sit within the same sound space (e.g. how does reverb compare between the sound effect and the existing mix?). Reverb can be used to match a ‘dry’ effect sound with the appropriate “echo” of the production environment or indicate the sound source’s proximity. EQ can help match production microphones or help a sound ‘sit’ where it needs to in conjunction with other sounds. Realize that the sound effects you hear in a theater are hyper-real and not what was captured from a dialog mic on set. Even if production audio from set (PFX) is used, it’s usually layered with a lot more exciting content across a wider frequency range for audible interest (and void of the room tone present in the production effect).

11.Final Mix: All of these elements combine in what’s called a “mix”. On a bigger show, this mixing process itself is generally divided into two or three person teams based on dialog, music and effects. Again, you might not have the manpower to divide all these responsibilities, but knowing they exist is key to making sure you give each phase of the process its due diligence. Getting all the levels correct in relation to one another is typically done in the “mixing” phase. It’s much easier to work with the various elements of Music (MX), Dialog (DX), and Sound Effects (SFX) if they are grouped together into what we call “stems”.

12. Mastering: Afterward, your sound will be sweetened and tailored to the specifics of its distribution medium (e.g. is this going to a theater, to the web, etc?). Mastering engineers have all manner of tricks up their sleeves for bringing out the best in your audio. Though many audio engineers would be shy to admit it, a simple “Ultramaximizer” plugin from Waves doesn’t do too bad a job of the sweetening part if handled carefully. I’d describe it as “HDR” for audio in that it sort of takes everything and makes it hyperreal, lifting subtle textures and frequencies to the forefront to make the sound more lively. You can easily push it too far, but for what’s basically a one-slider master it’s not bad. Again, I’d be crucified for saying it, but this is probably as close to easy mastering for the masses as I can suggest. This can also be set as an end-of-chain limiter so nothing in the mix is allowed to exceed whatever limit you set (-.5 dB for example).

That said, it’s not just peak audio levels we’re concerned with. Modern distribution platforms will watch your audio’s loudness as an average. Bouncing your mix to a new track and running the whole thing through the integrated loudness meter (yes, you’ll have to wait for the whole thing to play as of Resolve 16 unless you invest in third-party software) is a great way to check the integrated loudness of the entire mix. See the audio levels section for more details there.