Standards Conversion Part 1

In the post “video frames – introduction”, we looked at the two dominant frame rates used throughout the world today – 29.97 and 25 frames per second. Both formats were defined by engineers in the USA and Europe at the dawn of television in the 1930’s. And a need to maintain backwards compatibility has subsequently ensured these rates are still dominant today.

If we were re-designing television, without the need for backwards compatibility, we would do it much differently. However, we are saddled with 29.97 and 25, and despite the many opportunities to change the frame rates, for reasons beyond engineering logic, we are stuck with them for the foreseeable future.

To enable European viewers to watch American programming, and vice versa, the two formats must be changed to the format the viewers television is working in. There are some multi-standard televisions available that can detect and change between 29.97 and 25, but in the mainstream, televisions will only work at one format. Also, broadcast workflows generally only work in one format, and set-top-boxes suffer the same restriction.

Image size is different between the two formats, America uses a system based on 525 lines, and Europe 625 lines. The horizontal widths are different too, America uses 63.6 micro-seconds, and Europe 64 micro-seconds. The differences may be subtle, but they have huge effects and ramifications for conversion.

Horizontal and vertical scaling is a relatively straight forward process for modern computers. Up and down conversion algorithms are easily designed and implemented.

Converting between frame and hence field rates is an enormous task and the quality of conversion varies depending on the vendor used. The main challenge to overcome is that there is no easy integer relationship between the two rates. Technically, the two frame rates do repeat synchronously every 50 seconds, or 1500 USA frames, or 1250 European frames.

If the images had no motion, then the conversion would be easier. The challenges start to emerge when we see motion in the broadcast. Our human-visual-eye system is very good at detecting jitter and discontinuities in motion, a throw-back from our ancestors when we needed to detect predators lurking in the dark. Consequently, when converting between rates we must determine the motion and content between frame samples.

Motion estimation is one solution to temporal determination and content of frames. Algorithms analyze frames of images to identify motion of objects, and then determine where the image would lie in the next frame of the format being converted to. Intuitively, the more frames we must analyze the more accurate will be our motion estimation and hence construction of the new frame.

Real challenges in computing power and input/output limitations start to emerge as we increase the number of frames being analyzed. At high definition rates, a video frame could easily be 60 Mbits, and one second would be 1.8 Gbits. Processing many different images in real-time causes havoc with computing systems, not to mention the latency and potential lip sync errors introduced.

Converting between the two dominant frame rates of 29.97 and 25 frames per second is a difficult and complicated task, the success of which is usually related to the cost of the solution.