Take a look at YouTube’s recommended upload settings. Do you know what all this means?
The size of an image, sometimes called its “raster”. The most fundamental element of a digital image is called a “picture+element” or “pixel”. When you hear people talk about “HD” or “8K”, they’re talking about resolution or how many pixels make up a given frame.
Resolution matters much less than most people think. The difference between standard definition “SD” and high definition “HD” was very noticeable. Beyond HD the returns diminish. Steve Yedlin (Star Wars DP) has one of the better explanations of where resolution fits in the image quality spectrum.
A progressive image refers to nothing more than showing the entire video frame at once.
Early video signals were limited by distribution bandwidth for television and therefore introduced interlaced video. In order to keep motion smooth and reduce flicker, the image was essentially divided in half and displayed in line pairs. Every other line is displayed at any given time in an interlaced “field”. These alternating fields look funny when paused, but when played back they actually increase temporal resolution.
The “aspect ratio” refers to the width vs height of the overall image.
But the pixels themselves can also have an aspect ratio if they are “non square pixels”. If the software you’re using doesn’t recognize the format as a non-square-pixel format your image will appear skewed. Many modern formats don’t have this issue, but it’s a good one to be aware of.
Frame Rate refers to how quickly frames of video are capture in succession and it’s usually measured in “frames per second” or FPS. Common frame rates include:
Remember to set the frame rate of your project at the beginning. Most modern NLE software allows you to set a frame on a per timeline basis, but it’s something you need to consciously choose.
Any clips that do not conform to the timeline frame rate will be played back with skipped or duplicated frames to try to match the timeline frame rate.
24fps to 30fps is easy; you’re essentially duplicating a frame every 4 frames which isn’t all that noticeable. It’s essentially the process of ‘pulldown’ common to telecine operations of the past where film material was converted for broadcast (more on that below). You’re adding additional frames, which works especially well when going to 60i.
It’s quite tough to convert from 30 fps to 24 fps so do try to avoid this. Because you can’t evenly drop frames, the resulting playback is asymmetrical, dropping 1 frame, then 2 frames, then 1 frame then 2 frames, etc. This is visually noticeable and requires some form of motion vector analysis or “optical flow” to blend frames together. This can result it some weird artifacting. Most of the other common conversions are possible with less potential for harming the image.
This is a great reference on what happens when putting high frame rate footage into a lower frame rate timeline.
This is the process of converting 24 fps footage to 60i fps using a 2:3 pulldown. See how 5 frames are made from 4 frames, but the 3rd and 4th frames are hybrid frames with fields from 2 different source frames? The opposite direction, 60i to 24p, is called a ‘reverse pulldown’.
You can see that the increased “temporal” resolution of 60i makes for a better conversion to 24p.
So you can see that there’s already a form of resolution reduction inherent to how the camera captures different colors of light, this is sort of a “chroma subsampling”. However, after the electrical charges from photons striking the sensor are converted to digital values and demosaicing is performed, there’s another form of color compression and that’s what most people refer to when they reference chroma subsampling.
Frame.io has a great visual explanation here.
“Edit Lists” are a Apple-specific mp4 extensions; they are atoms that basically allow you to pick portions of the video file for playback. You could, for example, play on loop, the middle two seconds of a 15 second video. This isn’t commonly used for most people, but YouTube’s warning to avoid them should make sense now.
“Atoms” (or “boxes” in the ISO spec) are data within the video file container that hold specific information about the video file’s parameters. These descriptive atoms differ from the actual media data (the individual frames of video or samples of audio) itself.
Though the atom’s location should be determined at the compression and muxing stage, software does exist to move the “moov” atom’s location after compression has happened. This hierarchical structure of atoms containing data separate from the atoms describing that data is part of what makes editing easy with the Quicktime format. In fact, the descriptive bits and the media bits don’t even have to reside in the same .mov file. Media can be ‘redescribed’ by changing description media atoms rather than having to rewrite the media file.
For example, the most common “moov” atom is sometimes called the “movie atom” and includes information on video length, track count, timescale and compression. Perhaps most importantly, it’s also the index with information about where the actual media file to be played is stored. Within the “moov” atom sits a “trak” sub-atom for each of the movies tracks, and within each trak atom sits a “mdia” atom with even further defined specifics. That moov atom is crucial for the playback of the entire clip, and the end user will not be able to scrub the playhead or jump to a location in the clip without it. For this reason, in some web streaming situations it’s crucial to load it first so you’ll see parameters in encoding software for “progressive download,” sometimes called “fast start,” or “use streaming mode.” “Muxing” is the term used for merging your video track, audio tracks, and subtitles all into one container.