Enter the maze

MPEG: Movie magic

What's your favourite film? We call films "films" because of what they are made of. Moving Pictures revolutionized entertainment and the secret was in the way sequences of pictures could be printed onto rolls of film. The future of movies and TV isn't film, it's streams of digits. Both are going digital... So what's your favourite MPEG then?

Let's get moving

Roll of film at the start of a movie

Photos went digital using something called JPEG. It is just an agreed way of converting single pictures into streams of 1s and 0s. It plays some magic tricks on our eyes. Once you understand how the JPEG magic trick is done it's easy to see how movies might go digital. They are just a stream of still images: frames that are each different from the other. So a movie could just be a series of JPEG images. It turns out we can do better than the obvious though.

Movies are tougher than photos because there is so much data - so many images just for a few seconds. When we show them in quick succession, one after the other, another of our brain's little tricks called 'persistence of vision', combines and blends the individual frames all into a continuous motion. We 'see the movie'. So how can we trick our brains again and not have to send every single frame (or block) of a movie one by one? The answer comes from watching lots and lots of movies!

All change please

It's the same from frame to frame, why bother sending it over and over again

Watch a movie or TV show. Things move on the screen but just as importantly some things stay the same. The blocks making the tree in the background or the studio set are the same from scene to scene. So if it's the same from frame to frame, why bother sending it over and over again. This is the idea behind MPEG, the movie format (MPEG stands for Moving Pictures Experts Group); send only the block data you need. So we have to send some frames, and of course we use JPEG to do this as it's already reduced the data in each block by mind tricks. But we look at the movie first and compute which frames (set of blocks) we need to send in a complete (well JPEG) form and which frames actually just contain bits of other frames. So we send instructions on how to use other frames to build these. If the tree is static, send it only once, and then send instructions that in other frames just add in the tree we already have.

The frames we send in full are called I frames (I for Intra frame). The frames that are built from I frames, by moving blocks around are called P frames (P is for predicted). P frames must follow I frames in the movie, as they are built from I frame data. We send the information on how to move the blocks only, but if it turns out that something new appears in the frame, a ball appears or a door opens, then we need to send the instructions for the new block and how to create it. Since we are only sending movement instructions to shift existing blocks around or having to create the occasional new block, the data is much less than that we would need if we had to send everything.

The story so far

Moody Blue film reel

So far, we have built I frames from compressing a set of images in the original movie. We have used the information in I frames to build P frames by sending instructions for shifting the I frame data around. Both have reduced the amount of data we had to send. Is there anything else we can do to make the data sent even smaller? Yes there is. We have sent the data for I frames and P frames, but in between we can create B frames. B frames (Bi-directional frames) are the cheapest to create from a data point of view. We build B frames by taking an I frame and a later P frame (built from the I frame data), and use them both to make up what's going on in between. A B frame takes the information in the I and P frames on either side of it and looks at how to use the information to create the picture it is supposed to be. It takes blocks from the I frame and P frame and moves them around as is required for the action to move seamlessly between them. Only at the last will a B frame need to contain any new information. So in order of reducing data we have the I frame JPEG tricks first, then less data using the block move around to get a P frame and finally the B frame which has the least as it makes use of the data in the other two. The amount of data used for a movie can be changed by selecting the proportions of the different types of frames. MPEG uses clever computation based on the fact that movies tend to contain sequences of frame that don't change much, and it uses this to drop the amount of data needed.

Order, Order

I, P and B frames are instructions not pictures. So first, for a movie, we need to process all the frames to find the best way to crush the data needed down. A P frame follows in time after an I frame. It's using the I frame data, but a B frame in between needs both the I and the P data to work. This means that the order that the frames are sent isn't the same as the order the viewer sees them in. All the I and P frames need to go first then the intermediate B frame instructions follow. So when you are watching an MPEG movie, or digital TV which uses similar techniques, your computer is actually doing some time travel. It's storing the I and P frames till it gets the instructions for B frames and then slots the newly created B frames in between the I and P's to show to you.

The next time you're watching a film think how MPEG is using tricks in both space and time to create the illusion of movement. What you see is all just 11001100011. Oh but don''t forget to enjoy the movie.