Take a look at YouTube’s recommended upload settings. Do you know what all this means?
The size of an image, sometimes called its “raster”. The most fundamental element of a digital image is called a “picture+element” or “pixel”. When you hear people talk about “HD” or “8K”, they’re talking about resolution or how many pixels make up a given frame.
Resolution matters much less than most people think. The difference between standard definition “SD” and high definition “HD” was very noticeable. Beyond HD the returns diminish. Steve Yedlin (Star Wars DP) has one of the better explanations of where resolution fits in the image quality spectrum.
A progressive image refers to nothing more than showing the entire video frame at once.
Early video signals were limited by distribution bandwidth for television and therefore introduced interlaced video. In order to keep motion smooth and reduce flicker, the image was essentially divided in half and displayed in line pairs. Every other line is displayed at any given time in an interlaced “field”. These alternating fields look funny when paused, but when played back they actually increase temporal resolution.
The “aspect ratio” refers to the width vs height of the overall image.
But the pixels themselves can also have an aspect ratio if they are “non square pixels”. If the software you’re using doesn’t recognize the format as a non-square-pixel format your image will appear skewed. Many modern formats don’t have this issue, but it’s a good one to be aware of.
Frame Rate refers to how quickly frames of video are capture in succession and it’s usually measured in “frames per second” or FPS. Common frame rates include:
Remember to set the frame rate of your project at the beginning. Most modern NLE software allows you to set a frame on a per timeline basis, but it’s something you need to consciously choose.
Any clips that do not conform to the timeline frame rate will be played back with skipped or duplicated frames to try to match the timeline frame rate.
24fps to 30fps is easy; youβre essentially duplicating a frame every 4 frames which isnβt all that noticeable. Itβs essentially the process of βpulldownβ common to telecine operations of the past where film material was converted for broadcast (more on that below). Youβre adding additional frames, which works especially well when going to 60i.
It’s quite tough to convert from 30 fps to 24 fps so do try to avoid this. Because you canβt evenly drop frames, the resulting playback is asymmetrical, dropping 1 frame, then 2 frames, then 1 frame then 2 frames, etc. This is visually noticeable and requires some form of motion vector analysis or “optical flow” to blend frames together. This can result it some weird artifacting. Most of the other common conversions are possible with less potential for harming the image.
This is a great reference on what happens when putting high frame rate footage into a lower frame rate timeline.
This is the process of converting 24 fps footage to 60i fps using a 2:3 pulldown. See how 5 frames are made from 4 frames, but the 3rd and 4th frames are hybrid frames with fields from 2 different source frames? The opposite direction, 60i to 24p, is called a ‘reverse pulldown’.
You can see that the increased “temporal” resolution of 60i makes for a better conversion to 24p.
So you can see that there’s already a form of resolution reduction inherent to how the camera captures different colors of light, this is sort of a “chroma subsampling”. However, after the electrical charges from photons striking the sensor are converted to digital values and demosaicing is performed, there’s another form of color compression and that’s what most people refer to when they reference chroma subsampling.
Frame.io has a great visual explanation here.
“Edit Lists” are a Apple-specific mp4 extensions; they are atoms that basically allow you to pick portions of the video file for playback. You could, for example, play on loop, the middle two seconds of a 15 second video. This isn’t commonly used for most people, but YouTube’s warning to avoid them should make sense now.
“Atoms” (or “boxes” in the ISO spec) are data within the video file container that hold specific information about the video file’s parameters. These descriptive atoms differ from the actual media data (the individual frames of video or samples of audio) itself.
Though the atom’s location should be determined at the compression and muxing stage, software does exist to move the “moov” atom’s location after compression has happened. This hierarchical structure of atoms containing data separate from the atoms describing that data is part of what makes editing easy with the Quicktime format. In fact, the descriptive bits and the media bits don’t even have to reside in the same .mov file. Media can be ‘redescribed’ by changing description media atoms rather than having to rewrite the media file.
For example, the most common “moov” atom is sometimes called the “movie atom” and includes information on video length, track count, timescale and compression. Perhaps most importantly, it’s also the index with information about where the actual media file to be played is stored. Within the “moov” atom sits a “trak” sub-atom for each of the movies tracks, and within each trak atom sits a “mdia” atom with even further defined specifics. That moov atom is crucial for the playback of the entire clip, and the end user will not be able to scrub the playhead or jump to a location in the clip without it. For this reason, in some web streaming situations it’s crucial to load it first so you’ll see parameters in encoding software for “progressive download,” sometimes called “fast start,” or “use streaming mode.” “Muxing” is the term used for merging your video track, audio tracks, and subtitles all into one container.
The data rate of a video is simply a measure of how large the file is, usually measured per second. Uncompressed video is very inefficient and therefore usually not a thing. Most video formats employ some form of compression, even if it’s not immediately apparent. Common bitrates of highly compressed footage would be something like an HD YouTube video which is compressed to around 8Mbps (8 megabits per second). Something like a Sony Venice shooting in 6K produces files larger than 2Gb per second (2 gigabits per second).
VBR means “variable bit rate” or that the bit rate changes over time. This is useful since some complex scenes have a lot of motion and could require a lot more information than other scenes. Think of a thousand tree leaves swaying in the wind. Every frame is different and the amount of detail is complex. Contrast that with a static shot of the sky. There is little detail in the frame and little change over time so the data rate required can be much lower. The alternative to VBR is CBR or “Constant Bit Rate” where the bitrate is steady through the entirety of the clip.
To minimize file sizes, cameras compress/encode an image upon capture and playback devices must decompress or decode the image. This compression+decompression is abbreviated as “Codec“. Choosing a codec is a very important part of any workflow. Some codecs are efficient for storing video files, but very demanding to play back and may not contain much room to manipulate the image in post. Codecs will employ two main types of compression.
Compression within the frame. This is considered intra-frame compression since the compression doesn’t cross between frames.
Compression across various frames over time. This is called inter-frame compression since the compressor will look at “groups of pictures” and compress them together for increased efficiency.
The “I” frame will contain the entire image, but successive frames (called “B” and “P” frames) will contain only the parts of the image that change over time.
It’s important to realize that the codec of a video file is not the same as its container. h.264 is a compression standard, but it could live within a .mov container, a .mp4 container, a .mxf containter, etc. The container is basically a form of standardizing information about how the video content is stored, both the video and its metadata.
That video was a bit in depth, but you should well understand that compression compounds in severity with every generation. This is best demonstrated via this multi-generational compression demonstration on YouTube where this guy uploaded a video to YT 1000 times.
From Frame.io:
Codecs are a core technical consideration for every step of post-production, so it is critical that everyone who touches your workflow understands the basics.
A codec is the set of rules that tells computers and electronic equipment how to handle your media files, most notably digital video footage. The term codec is a shortening of the words compressor-decompressor or coder-decoder. As the name implies, codecs make video files smaller for storage, and then turn the compressed data back into a usable image when you need to use it again.
[Note: Codecs are not the same as containers. Containers are the file types that actually store digital video data, and can be used in conjunction with many different codecs. Imagine containers as the box where you put data, and codecs as the instructions for packing and unpacking the box. H.264, DNxHD, and ProRes are codecs, while .movs, .mxfs are containers.]
We need codecs because uncompressed video files are gigantic. A single minute of uncompressed 4K footage measures in the dozens of gigabytes. So, you will probably never have the option of working with uncompressed footage (video without a codec) all the way through your workflow. Itβs just too large and too complex to work with in most cases.
Codecs solve this problem by reducing the size of your footage and making it easier to work with across the post-production pipeline. But just because the image is compressed doesnβt mean it will look worse. While not all codecs are created equal, there are many high-end codecs you can choose from that still deliver stunning image quality. The right codec will make your footage much easier to manage, and you may not even be able to tell a difference from the original.
That said, you will be able to tell a difference if you choose the wrong codec for certain post-production processes. Using a codec with too much compression might make your desired color-correction unachievable, or make your VFX work look unrealistic. On the other hand, codecs with too little compression might make your video uneditable on normal computer hardware, and cause issues when transferring data between teams or facilities. Different codecs are good in different contexts, but there is no single codec that works for every use case.
Choosing codecs is one of the most important technical decisions you will make for your project. To a large extent, codecs determine what you can do with your footage in post and how complex your workflow will be. It is usually much safer, faster, and cheaper to pick codecs that you know will fit neatly into a pre-planned/pre-existing workflow, than it is to improvise a new workflow around codecs that you have not used before. The last thing you want is to find out some or all of your footage is unusable because the codec isnβt compatible with your software or hardware. As with everything in post-production, test all your codec choices before the project begins.
Of course, to pick codecs effectively, you need to understand how they impact the visual qualities of video. While the mathematics and computer science behind codecs are quite complicated, there are only a few fundamental concepts you need to know to choose the right codecs for your production.”
Popular Codecs include
This term refers to a codec designed for post production. Camera originals would be converted to this format for editorial. It’s generally intraframe for quick scrubbing with minimal system taxation and can handle multiple generations of encoding without significant quality loss.
CABAC: “Context-adaptive binary arithmetic coding is a form of entropy encoding used in the H.264/MPEG-4 AVC and High Efficiency Video Coding (HEVC) standards. It is a lossless compression technique, although the video coding standards in which it is used are typically for lossy compression applications.”
Open/Closed GOP: “A closed GOP is a group of pictures in which the last pictures do not need data from the next GOP for bidirectional coding. Closed GOP is used to make a splice point in a bit stream.”
Bit depth refers to the size of the binary digit used to store a color value. The higher the number, the more granular the value stored.
Bit depth is not dynamic range. In other words, your blackest black and whitest white do not change with bit depth. The “range” of colors doesn’t even necessarily change. The effect of bit depth is in the amount of subtle differences between the darkest and brightest point of any given color.
Imagine it like a staircase. Increasing bit depth doesn’t move the first floor any lower or the second floor any higher, it simply adds more steps between the levels.
As always, Frame.io has a great explanation.