Each scene consists of a single background drawing, with a number of characters and foreground objects. Each is drawn separately on a clear sheet of acetate called a cel. Then each frame of a scene is photographed, with the cels stacked in the appropriate order. The skier cel can be moved for each exposure, or more likely a sequence of cels will represent the movement of the skier. So a great deal of drawing effort has been saved. Only one background has been drawn, and only one flag for the foreground. The savings by drawing each frame in its entirety is immense. This deconstruction of a video sequence into different components or objects is the basis of MPEG-4 video coding, which is the big difference between MPEG-4 and earlier standards. Rather than the restrictions of coding a single two-dimensional rectangular video image, MPEG-4 allows both twodimensional and three-dimensional objects to be mixed in a synchronized presentation. MPEG-4 breaks away from the cinematic representation and moves toward the virtual reality world of video games. This paradigm shift leads MPEG- 4 to be a natural vehicle for rich media presentations. Immersive and interactive presentations can combine synthetic three-dimensional objects with two-dimensional stills and conventional video images. An object has shape, texture, and motion. The object texture is equivalent to the information that was intraframe encoded in blocks by the earlier MPEG standards by the discrete cosine transform. Consider a downhill racer again. The skier is moving, but the background (called a sprite) is essentially a still image. If the skier could be separated from the background, the background could be sent once, and the skier alone could be transmitted as a moving image. This potentially would save a large amount of data. A sprite is defined as a large video object, possibly panoramic, and it is persistent over time. The media player can crop and spatially transform (warp) the sprite; for example, as a camera pans around the foreground objects. Rendering a number of video objects together in the player is relatively easy. Separating video objects from a two-dimensional scene within the encoder is nontrivial. This process is called video segmentation and is a subject of ongoing research. The concept is a natural process for the human brain; we look at a scene and immediately decompose it into a number of objects. So, although the idea of object coding may seem foreign to a video engineer, to the neural physiologist it must seem like the obvious way to encode visual images. The scanned raster with which we are familiar is an engineering convenience. In practice, object coding is not used for rectangular video scenes. Objects are used to combine video audio and graphics objects in interactive content, much like the DVD. A rectangular video image is encoded in much the Video compression 93
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
Comments Off
MPEG-4 broke from the rectangular and two-dimensional bit map of previous coding, and adopted object coding. Video objects Consider the cartoon as an analogy cartoons are produced by cel animation. For example, consider a downhill skier. The animators deconstruct the storyboard image into components: the background and the characters. 92 The Technology of Video and Audio Streaming background cel skier cel cel movement flag cel animation camera Figure 5.6 Cel animation for animated cartoons.
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
Comments Off
Video compression 91 forward group of pictures video sequence I-frame P-frame B-frame backward forward predicated from previous I or P frames Spatial compression only predicated from previous or next frames Figure 5.5 I, P, and B frames. So that the decoder can make backward prediction from a future frame, the natural frame order is resequenced. The B-frames are transmitted after the next and previous pictures that each references. MPEG-4 natural video encoding MPEG-4 is the first MPEG system that supports streaming as part of the standard. The ability to stream is not related to the method of compression, but instead to the way that the video sequence is time-referenced, so that the media server can control the delivery rate to the player. The requirements for the standard were for a flexible multimedia encoding format designed to support a very wide range of bit rates, from 5 Kbit/s up to 50 Mbit/s. This is sufficiently flexible to cover low bit rate wireless data through to HDTV applications. Version 1 MPEG-4 supports the following formats and bit rates: Bit rates Typically between 5 kbit/s and 10 Mbit/s Scan formats Interlaced as well as progressive video Resolutions Typically from sub-QCIF to beyond HDTV
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
Comments Off
Motion prediction The temporal compression is arranged in short sequence of frames called a group of pictures (GOP). MPEG defines three types of frame within the group: Intraframe or I-frame These are coded spatially, solely from information contained within the frame. Predicted frame or P-frame These are coded from previous I- or P-frame pictures. The decoder uses motion vectors to predict the content from the previous frames. Bidirectional frame or B-frame These pictures use past and future I and P pictures as a reference, effectively interpolating an intermediate picture. I-frames provide reference points for random access to a stream. The number of pictures between I-frames is set by the encoder, and can be varied to suit subject material. The data in a typical P-frame are one-third of that in an I-frame, and B-frames are half that of a P-frame. 90 The Technology of Video and Audio Streaming block slice video group of pictures sequence picture macroblock Y 16 x 16 pixels C 8 x 8 8 x 8 pixels Figure 5.4 MPEG picture hierarchy.
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
Comments Off
separately develop new codecs was a waste of resources, so a joint video team (JVT) was set up to develop the advanced video codec or AVC. The fruition of this work was a new high-performance codec to rival the best proprietary codecs. It is designated H.264 by the ITU, and MPEG-4 Part 10 by the MPEG organization. Much like MPEG-1 and MPEG-2, the standard defines the syntax of an encoded video bitstream with a reference decoder. The design of the encoder is left to each manufacturer. The advantage of this approach is that consumer devices can incorporate a compliant decoder that will play the output of any encoder. The manufacturers are free to differentiate their products by price and performance. Video compression 89 Table 5.2 Summary of Compression Formats Compression ISO/IEC Target Typical Application format number bandwidth resolution Issue date bit/s pixels H.261 1988 1990 384 k 2 M 176 144 Video conferencing, or 352 288 low delay H.263 1992 28.8 k 768 k 128 96 Video conferencing to 720 480 MPEG-1 11172 400 k 1.5 M 352 288 CD-ROM 1993 MPEG-2, MP@ML 13818 1.5 M 15 M 720 480 Broadcast television, DVD 1994 MPEG-4 14496 28.8 k 500 k 176 144 Fixed and mobile web 1998 or 352 288 AVC, H.264 14496 10 General purpose 2002 MPEG compression MPEG compression divides a video sequence into groups of pictures. Each picture in the group then is divided into slices of macroblocks. A macroblock comprises four luminance blocks and one U and V color block each. The block is the basic unit for the spatial compression. The concept of the slice first was introduced in the MPEG-1 standard. It divides the picture, so that if a fatal data error occurs within one slice, the rest of the frame still can be decoded.
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
Comments Off
for PAL and NTSC systems (352 288 at 25 fps for PAL, and 352 240 at 30 fps for NTSC). It uses progressive scanning. The video compression, like H.261, uses the discrete cosine transform with variable-length coding. The motion prediction was improved over H.261 with subpixel motion vectors and the introduction of bidirectional predicted (B) frames. It was designed for storage-based applications like the CD-ROM, at data rates up to 1.5 Mbit/s, and does not support streaming. H.263 H.263 is a development of H.261 aimed at low bit rate applications. It dates from 1992. H.261 did not have a data rate low enough for operation at 28 kbit/s, so could not be used for videophone applications on analog phone circuits. For lower data rates a thumbnail size picture can be coded; the H.263 standard supports SQCIF, QCIF, CIF, 4CIF and 16CIF resolutions. H.263 is now the baseline standard for MPEG-4 natural video coding. MPEG-2 A higher resolution, high-quality system for broadcast television, MPEG-2 is intended to replace analog composite systems (NTSC, PAL) for digital transmission systems. It is also used for DVD encoding. Its primary applications use channel bandwidths greater than 4 Mbit/s. The main profile at main level (MP@ML) is a standard definition television frame rate and resolution with data rates up to 15 Mbit/s. The standard was extended to support high-definition television bit rates (up to 80 Mbit/s) and an I-frame-only studio profile (50 Mbit/s). MPEG-3 was to be a separate high-definition standard but was dropped in favor of extensions to MPEG-2 and MPEG-4. MPEG-4 MPEG-1 and MPEG-2 were developed for specific applications. MPEG-1 is for multimedia CD-ROM presentations and MPEG-2 is for broadcast television. The spawning of so many potential multimedia applications, from hand-held wireless devices to high-definition home theaters, led to demands for a much more flexible coding platform. The support of the very low bit rates used for some streaming is one example of the new demands. AVC (Advanced Video Codec, H.264) H.263 is over ten years old, and, in many applications, no longer delivers the performance expected of a video codec. The MPEG and ITU realized that to 88 The Technology of Video and Audio Streaming
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
Comments Off
Compression codecs Compression codecs fall into three families: International standards Proprietary formats Open standards International standards often use patented technology. A licensing authority controls the collection of the appropriate fees on behalf of the patent holders. Proprietary codecs collect revenues through a variety of methods. Open standards are usually under the umbrella of the open source community, and free for all to use. Evolution of international standards The video codecs we use today come from two backgrounds: the first is the telecommunications industry and the second is multimedia. These are some of the most used codecs. H.261 videoconferencing This was the original video codec, and was the starting point for the MPEG-1 standard. H.261 was formulated under the auspices of the ITU for videophones and videoconferencing over ISDN lines. The videophone is now becoming a viable product, but videoconferencing has long been a key communication medium for business. One of the demands of both applications was real-time operation, so the codecs had to have a short processing delay. The standard defines screen sizes of the Common Intermediate Format (CIF), 352 288, and Quarter CIF (QCIF) of 176 144 pixels. It uses progressive scan and 4:2:0 sampling. Data rates from 64 kbit/s up to 2 Mbit/s are supported by the standard. The compression is DCT-based with variable-length coding. As an option, intermediate predicted (P) frames could be used with integer-pixel motion-compensated prediction. MPEG-1 This was the first successful standard developed by the multimedia community for audio-visual coding. The standard has long been used for video presentations on CD-ROMs. The normal resolution is source or Standard Input Format (SIF). Unlike the common interface format of H.261, the spatial resolution differs Video compression 87
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Adult Web Hosting services
Comments Off
differences from one frame to the next, but there are many areas of the picture that do not change. This redundancy of information from one frame to the next can be exploited to lower the data rate. The basis of the compression is to transmit only the difference between frames frame differencing. The player stores the entire picture in a frame store, and then reconstructs the picture from the previous frame and the difference information. Since most of the difference between frames is from moving objects, there is further potential to reduce the data. The player already has the information to reconstruct the object; it is just in the wrong position. If a motion vector could be sent to say this block has moved from position A to position B then the block would not have to be retransmitted. Of course it is not that simple. We are looking for a coding gain, where the total data rate is reduced with no apparent change to the picture quality. Motion vectors are additional data that have to be transmitted to the player. As an example, a scene with considerable subject movement coded to MPEG-2 at a rate of 6 Mbit/s can comprise 2 Mbit/s of motion vectors. That represents onethird of the total data. Another problem is the reveal. As an object moves, what is revealed in its previous location? Motion estimation To generate motion vectors the encoder has to estimate the movement of picture elements. There are many ways to do this, some more complex than others. Since codecs like MPEG define the decoder rather than the encoder, the method used for vector generation is left to the encoder designer. In a typical encoder, the motion prediction compares a previous frame with the current frame, and then estimates how blocks of the picture have moved. A local decoder generates the previous frames rather than using the original frames before intraframe compression. This is so that the predictor is using the same information that the player s decoder has available for computation. The motion predictor makes trial moves of a block of pixels to establish the best match for the position of any moving elements. This match is then used to create a motion vector. Block matching is not the only technique for motion estimation, but it is one of the simplest to implement. One consequence of the use of motion vectors is that the aggregate data rate will vary with the amount of motion in a scene. For a given image quality, a stationary scene may need 4 Mbit/s, and a scene with rapid movement may need 6 Mbit/s. A constant quality requires a variable bit-rate. This may conflict with the distribution medium; channels like telco circuits and satellite transponders have a fixed bandwidth. 86 The Technology of Video and Audio Streaming
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
Comments Off
Each 8 8 block of pixels is transformed to another 8 8 block of transform coefficients. Vertical edges in the picture transform as a horizontal frequency, and horizontal edges as vertical frequency coefficients. At this stage there has been no compression of the data. The goal is to gain a representation of the video signal in a form where perceptual and statistical redundancies can be used to reduce the data rate. The coefficients are normalized and quantized. The quantization process allows the high-energy, low-frequency coefficients to be coded with a greater number of bits, while using fewer for the high-frequency coefficients. So as we go from top left to bottom right, the coefficients tend toward zero. Video compression 85 low frequencies mid frequencies DC high frequencies vertical edges diagonal edges horizontal edges Figure 5.3 Spatial frequencies. Variable-length coding (VLC) By scanning the coefficients in a zigzag the result is long runs of zeros. The coefficients then are converted to a series of run-amplitude pairs. Each pair indicates the number of zero-value coefficients and the amplitude of the nonzero coefficient. These run-amplitude coefficients are then coded with a variablelength code. This uses shorter codes for commonly occurring pairs and longer codes for less common pairs a form of entropy coding. These can be runlength encoded to reduce to the overall data rate. Temporal or interframe compression Video is a sequence of similar images, with step changes at scene boundaries. In many sequences there is virtually no change from one frame to the next. In scenes with subject motion, or where the camera is moving, there will be
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
Comments Off
Discrete cosine transform This has formed the basis of all the popular video compression codecs. The picture is divided into regular blocks, and then the transform is applied block by block. The original coding schemes chose a basic block of 8 8, that is, 64 pixels. MPEG-4 introduced variable block sizes. 84 The Technology of Video and Audio Streaming Normalization & quantization block of pixels DCT frequency coefficients 8 8 8 8 horizontal frequency vertical frequency quantized coefficients with zigzag scanning 8 8 start end horizontal frequency vertical frequency X Y Figure 5.2 Discrete cosine transform.
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Adult Web Hosting services
Comments Off