I’m still of the opinion that modern pre-amps are so good that you often should just record into camera, but, in the case where you can’t, you have three options for syncing sound to picture in dual system setups.
If you have to sync, this is the ideal solution. Every recording device has a matching clock which is written to metadata in the file. All the software has to do in post is match up the clocks.
If you don’t have timecode, your next best option is usually to tell the software to analyze the contents of the audio tracks and see if there are any matching audio “scratch tracks” in the video files. This can be on-camera audio acquired by low quality internal camera mics, or a downmix sent from the sound department to the camera. However it happens, the camera’s recorded clips must have some sort of audio for this to work. It’s not a perfect solution, often ending in false positives and requiring some manual sync, but it can work well. Every modern NLE now has this feature, but PluralEyes still does it best, and if you’re syncing a lot it’s worth buying a license.
The most tedious, by far, is the manual sync. Painful as this can be, it’s so often necessary, even on professional shoots. If, during production, a clapper or slate was used, this becomes easier. All you have to do is match up the closing of the sticks on the slate with its accompanying spike in the audio waveform. This is one reason a 2nd assistant camera will often call the word ‘marker’. It makes a nice aural cue that the correct waveform spike (sometimes there’s more than one) is the one following the ‘marker’.
In the metadata section, we’ll look at how syncing this audio has added important video metadata as well.
In addition to syncing production sound, it’s nice to make sure all the audio in the project is converted to 48kHz and either 16-bit or 24-bit. Anything less than these can cause issues, and .wav and .aiff files will play nicer than .mp3 so convert the latter if you can’t get higher quality source material.
This is a great look at sync and the steps that follow it. Understanding what happens later on with your audio will help clarify some of the process of the initial sound sync.
We can’t talk about syncing without talking about conforming back to the original audio files. If you’re working in a small team, or by yourself, this will be another necessary part of the job after the edit is complete.
You’ll usually sync dailies to a mix track from the sound recorder, this is a mixdown of all the individual “iso” tracks and it’s only used for dailies, and sometimes editorial. This means your various boom mics, lav mics, plant mics, etc. are all on one audio track and you can’t individually separate them. For the progressive editor desiring maximum control however, it’s useful to sync to the full sound files with all of their individual channels. Modern NLE’s can handle the large amount of tracks coming from production sound mixers, as well as display it in convenient ways. I think it’s best to leave the information available to the editor rather than simply syncing a mixdown. Resolve is excellent at reading and displaying iXML, a metadata format for including track names from the audio field recorder. While they show up great in Resolve, only in Resolve 16 is Blackmagic finally allowing for synced audio filenames to export in an AVID AAF. This has been a weak point in using Resolve to deliver files for editorial in the past.
The typical “industry-standard” workflow has been to export both an AAF file and the original sound recorder’s associated files from the picture-locked edit. It’s helpful to have a reference render of the edit as well to make sure there aren’t issues in conforming the mixdown audio back to the original audio. On a bigger shoot, it’s usually the dialog editor that gets the fun job of ensuring all the original “split” sound files are correctly imported in place of the mixdown files. Even if you edit with the original sound files, it’s helpful to give a sound editor more than just a media managed project of the final edit with handles. The sound editor will want the original files from the field recorder to pull things like room tone, ambience, and potentially alternate takes. Exporting timelines for conforming for color is very similar to the audio conform process. In both situations you have to do a bit of cleanup before exporting your timeline. Make sure your tracks are explicit mono or stereo tracks and that you don’t have clips muted/disabled. Automation is usually okay, but check for audio effects that will be unique to the NLE and not available in the DAW (digital audio workstation) of the sound department. As of 2019, that DAW is primarily ProTools.
Keeping your audio timeline organized is important, and it’s fairly common for the order of the tracks to follow the pattern below. Put the mono elements above the stereo ones–that’s why music and sound effects come at the bottom of the lists as they’re more likely to potentially come in stereo. Each of these bullets could multiple tracks. For example, “Dialog” could be 8 different tracks of dialog. That’s partially why this organization is so important.
All of that being said, you can see why I get excited about a program that does editorial, sound and color all in one application. These conforming issues don’t exist in Resolve because it’s one program for everything.
We can’t conclude our sync discussion without mentioning the 2-pop. Though it’s being forgotten more and more in modern workflows, the famous 2-pop is still a useful tool used to verify sync before a reel of a motion picture or a broadcast event begins. It’s a 1kHz tone, played for a single frame, two seconds before the program’s first frame; it’s short duration makes it sound like a blip or ‘pop’. Visually, it’s usually accompanied by the SMPTE countdown leader (that analog clock countdown sign you’ve seen before).
A typical TV program starts at 1 hour or 01:00:00:00. You’ll hear this called the “First Frame Of Action”. This means the two pop will happen precisely at 00:59:58:00. FFOA is usually 01:00:08:00 in the film world (due to the duration of the SMPTE countdown leader) so the 2-pop occurs at 1:00:06:00.
It’s good practice to place a 2-pop at the end of a sequence as well. If sync has drifted it’s an easy way to verify it.