These techniques have been utilized by some proprietary streaming codecs, again to improve video quality within the constraints of the average channel bandwidth. The codecs use the server and player buffers to average the data, so that a continuous bandwidth is transmitted across the IP connection. Figure 8.8 shows two streams, one constant and one variable, used across the same fixed bandwidth channel. Note that, during the action scenes, the VBR channel has an increased bit rate. The CBR channel will have a lower quality during these scenes. Again, like DVD encoding, two-pass encodes give better results with VBR. The first pass analyzes the entire video file to find the best encode rates. The second pass is the true encode and uses the data gathered from the first pass. Obviously this technique can be used only when encoding a video file, rather than a live stream. 172 The Technology of Video and Audio Streaming encoder fixed bandwidth circuit compress buffer streaming server buffer decompress media player video VBR file VBR file video Figure 8.7 Variable bit buffering. lower mean bit rate time bit rate quality gap action scene action scene VBR CBR Figure 8.8 Variable and constant bit rates.
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
viewer to select the correct streaming file. First, this is an unnecessary complication for the viewer, and second it does not take into account network congestion. The leading codec suppliers developed multiple bit rate codecs to allow several streams of different bit rates to be bundled into one file. The streaming server then negotiates with the end-user s media player to determine the optimum rate to use for the stream delivery. No user interaction is necessary. A typical example is SureStream from RealNetworks. With SureStream technology, up to eight different bit rates can be encoded. The Helix Server automatically selects the best bit rate to serve for the network conditions. The Microsoft Windows Media Intelligent Streaming is a similar feature. All the viewer sees during network congestion is a gradual lowering of quality, rather than freezes typical of a fixed rate system. Variable bit rate (VBR) Streaming codecs by default use constant bit-rate (CBR) encoding. DVDs and digital satellite television both use variable bit-rate encoding. The DVD has a maximum file size; so, to give the maximum playing time, a low bit rate is indicated. The MPEG-2 compression used differs from an analog recording like VHS. The VHS tape has a fixed video quality, whereas at a fixed bit rate, the MPEG video will have a variable quality, dependent on the entropy of the original content. To avoid visible MPEG artifacts, the DVD encoding rate increases when there is rapid subject movement. During scenes with motion the bit rate increases from a typical value of 3 Mbit/s up to a maximum of 8 Mbit/s. The end result is a constant video quality for the viewer. The bit rate is a trade-off between artifacts and the file size. To achieve the optimum encode within the limit of the maximum file size of the DVD, two passes are used. The first analyzes the material, the second encodes using the parameters from the first. The process can be automatic, or for the best quality transfers the compressionist will adjust the encode rates manually on a scene-by-scene basis. Similar variable bit rates can be used for multiplexed digital satellite television channels. The technique of statistical multiplexing allows channels to aggregate bit rates up to the total that a transponder can carry. If one channel needs more bits, it can borrow from the other channels. This technique is more successful the more channels there are in a multiplex. Statistical multiplexing can be used to add additional channels to a transponder, or to give better quality output when compared with an equal number of fixed bit rate channels. Video encoding 171
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Adult Web Hosting services
The functions of the card include the following: Decoding composite analog NTSC and PAL into a YUV format Demodulating the color component of S-video to U/V Analog-to-digital conversion Decoding DV (IEEE-1394) to uncompressed YUV Color pixel format conversion to an AVI format Interfacing to the PCI bus The cards usually have a number of other facilities; these vary with different models and their manufacturers: Spatial scaling De-interlace filter VCR control port Video preview output The video preview facility may use Microsoft Direct Draw for DMA (direct memory access) to the workstation video driver. This saves loading the CPU, because a real-time preview at 30 fps would take considerable processor resources. To achieve optimal system performance it may be necessary to use a computer with fast or multiple processors, especially for live events. The performance of the disk drives should not be forgotten. For archiving or file conversion, use a high-performance disk drive, for example a 10,000-rpm SCSI. This can help to offload performance requirements from the main CPU, freeing up more processing power for the encoder. Encoding sessions From the previous description of encoding and compression, it is clear that there are a huge number of possible parameter sets for encoding. It would not be productive to set up all these parameters from scratch for each encoding session. Instead, templates or profiles can be set up to suit typical content. Most codecs come with a number or profiles already prepared. For example, a template might be stored to encode a Beta SP tape to MPEG-4/AVC for delivery over a corporate network at 500 kbit/s. Encoding enhancements Multiple bit-rate encoding Early deployments of streaming media often posted links to the media file at perhaps three different rates: 30, 100, and 300 kbit/s. So the onus was on the 170 The Technology of Video and Audio Streaming
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
Encoding hardware and software Streaming media usually starts with a program recorded on videotape. (I will consider the special case of live streaming in a different chapter.) The output of the VCR is connected to a video and audio capture board that is fitted to the encoding workstation. The board converts the video to a digital format, usually an Audio-Video Interleaved format (AVI). Apart from the video interfacing, the board has several functions that can ease the processing load on the workstation CPU. Video encoding 169 Analog Y/C decompress demod select deinterlace scaling ADC Digital Sources Digital File Digital YUV YUV YUV YUV AVI color format conversion PCI interface These processes can also be performed in software by the main CPU 601 DV decode YUV S-VHS NTSC/PAL Analog Sources Figure 8.6 Video card block diagram.
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
Interframe compression The next method to compress video is to remove information that does not change from one frame to the next, and to transmit information only in the areas where the picture has changed. This is referred to as temporal or interframe compression. This technique is one of those used by the MPEG-1, MPEG-2, and MPEG-4 standards. Compression classes The different algorithms are classified into families: 1. Lossless 2. Lossy Naturally lossy Unnaturally lossy If all the original information is preserved the codec is called lossless. A typical example for basic file compression is PKZIP. To achieve the high levels of compression demanded by streaming codecs, the luxury of the lossless codecs is not possible the data reduction is insufficient. The goal with compression is to avoid artifacts that are perceived as unnatural. The fine detail in an image can be degraded gently without losing understanding of the objects in a scene. As an example we can watch a 70-mm print of a movie or a VHS transfer, and in both cases still enjoy the experience, even though the latter is a pale representation of the former. If too much compression is applied, and the artifacts interfere with perception of image, then the compression has become unnaturally lossy. Packetization This usually is implemented within the compression codec. There are two facets to the packetization. One, the compressed video file has to be formatted into IP packets to allow transmission over a standard IP circuit. The second is to wrap control data around the compressed video and audio data, so that the streaming server can control the real-time streaming of the media data. Streaming wrapper The packetization of the media is followed by a packet builder. This generates a Real-Time Protocol (RTP) hint track (metadata) that instructs the server how to stream the file. This is covered in more detail in Chapter 11. Note that there are proprietary streaming formats that do not use RTP. 168 The Technology of Video and Audio Streaming
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
bits per pixel averages to 24 bits for 4:4:4, 16 for 4:2:2, and 12 for 4:1:1 and 4:2:0. More detail on these formats can be found in Chapter 4. Result of scaling The typical scaled video data rate at a size and frame rate used with analog modems is 1.15 Mbit/s. Compression to a rate suitable for delivery below 56 kbit/s will require a further 30:1 reduction. To reduce the rate even further, some form of image compression has to be employed. Compression Compression removes information that is perceptually redundant; that is, information that does not add to the perception of a scene. Compression is a tradeoff between the level of artifacts that it causes and the saving in bandwidth. These trade-offs sometimes can be seen on satellite television. If too many channels are squeezed into one transponder, fast-moving objects within a scene can become blocky and soft. Like scaling, compression of video splits into spatial compression (called intraframe) and temporal or (interframe) compression. Intraframe compression Single frames can be compressed with spatial, or intraframe, compression. This can be a simple system like run-length encoding, or a lossy system where the original data cannot wholly be reconstructed. A typical example of a lossy system is JPEG, a popular codec for continuous-tone still images. Video encoding 167 Table 8.6 Sampling Formats Format Nomenclature Sampling 601 Packed 4:2:2 AVI Planar IYUV/I420 4:2:0 or 4:1:1 YV12 4:2:0 or 4:1:1 YVU9 16:1:1 Packed YUY2 4:2:2 UYVY 4:2:2 YVYU 4:2:2
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
It can be reduced to 16 bits (5:6:5, giving 65,536 colors), 15 bits (5:5:5 giving 32,786 colors), or to 8 bits (giving 256 colors). Note that the designation indicates the bit depths for R:G:B. Eight-bit coding uses a predefined palette of colors (indexed). For very low bit rate channels, the reduction in color resolution can be acceptable, but continuous-tone objects will exhibit a posterization effect. Subsampling The alternative to reducing the bit depth is to use the perceptual redundancy of human visual perception. This allows the color resolution to be reduced relative to the luminance resolution, without apparent loss of picture resolution. This is achieved by subsampling the color information. YUV format The red, green, and blue signals from the camera are matrixed to give a luminance signal (Y) and two color-difference signals (Blue Y) or U, and (Red Y) or V. This is called YUV. It allows compatibility with legacy monochrome systems, and allows the color resolution to be reduced to save on channel bandwidth and reduce program storage requirements. YUV coding is the standard for interconnecting video equipment, whether as the 601 digital format, in IEEE- 1394 compressed digital format, or encoded as a composite analog signal (NTSC and PAL). Raw RGB occasionally is found in graphics equipment, and is the standard for display drivers (VGA). Early composite digital video systems chose to sample analog video at four times the NTSC color subcarrier. This rate of 13.5 MHz (4 3.375) was adopted for digital component sampling standards. Since the human visual system primarily uses the luminance information to resolve scene detail, the color information can be sampled at a lower rate with no apparent loss of detail. This is a form of perceptive compression. The luminance sampling rate is referred to as 4 after the multiplier used. The color sampling rate may be a half (2) or one quarter (1) depending on the performance required. So a YUV sample is referred to by the ratios of the sampling rates of the three different channels 4:2:2 or 4:1:1. YUV formats are divided into two groups: packed and planar. In the packed format, the Y, U, and V components are stored in a single array. The 601 standard uses a single array. With the AVI formats two pixels are stored as a single unsigned integer value in one macro-pixel. In the planar format, the Y, U, and V pixels are encoded as three separate arrays. It should be noted that 4:2:2 sampling represents a two-thirds reduction in bit rate over RGB 24, and 4:1:1 is a reduction of one-half. The number of 166 The Technology of Video and Audio Streaming
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
The QCIF frame is the ideal size to place a video window within a web page. There is still room left for the menus and textual content of interactive applications. The larger frame sizes are more useful where video is the primary content and will occupy most or all of the display. Temporal scaling This is the process of dropping frames below the normal video rate of 30 frames per second (fps). For certain subject material the frame rate can be reduced to one-half or even more without serious degradation in picture quality. The deciding factor is the amount of motion in the picture. A talking head can be viewed at rates as low as one-fifth of television rates of course a fast-moving subject like a sportscast would not be satisfactory at these low speeds. This sounds like a big problem for streaming video, but much of the demand for low bit rate content is for applications like distance learning. Here the typical program uses a talking head, just the kind of material that can stand the highest temporal compression. RealVideo will encode at a variable frame rate. The user sets the maximum rate, and then the codec automatically adjusts the frame rate based on the clip size, the target delivery bit rate, and emphasis set for smoothness or visual clarity. One scene may be 7 fps, for example, while another is 10. A maximum set to 15 fps means the frame rate could vary anywhere between 15 and 0.25 fps. Color resolution The camera and display are both RGB devices. The usual coding is to take an 8-bit sample each of the red, green, and blue values at each sample point or pixel. This gives a total of 24 bits, but it is often padded for ease of processing to give a 32-bit word. A 24-bit pixel can display 16 million colors. To reduce the bit rate of the streaming video, the bit depth of a sample can be reduced, albeit at a reduction in the number of colors displayed. Video encoding 165 Table 8.5 AVI Formats AVI format Bit depth Padding Bytes per pixel R G B RGB32 8 8 8 8 4 RGB24 8 8 8 3 RGB16 5 6 5 2 RGB15 5 5 5 1 2 RGB8 Indexed palette 1
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
samples to one-quarter or one-eighth of a full-size television resolution picture. Table 8.4 shows common streaming resolutions compared to television. Figure 8.5 shows these frame sizes in relation to an SVGA monitor display for NTSC territories. Note that the BT.601 standard has been scaled to compensate for the nonsquare pixels. 164 The Technology of Video and Audio Streaming Table 8.4 Common Image Formats Image format Resolution Frame rate YUV Application HDTV, 1080i 1920 1080 30 interlaced 4:2:2 Broadcast television HDTV, 720p 1280 720 30 progressive 4:2:2 Broadcast television CCIR-601 (NTSC) 720 486 30 interlaced 4:2:2 Broadcast television, DVD CCIR-601 (PAL) 720 576 25 interlaced 4:2:2 SIF (NTSC) 352 240 30 progressive 4:2:0 Streaming video, CD-ROM, SIF (PAL) 352 288 25 progressive 4:2:0 MPEG-1 CIF 352 288 30 progressive 4:2:0 Video conferencing, streaming video QCIF 176 144 30 progressive 4:2:0 Video phone, streaming video Note: Standard Interface Format (SIF); Common Interface Format (CIF); Quarter Common Interface Format (QCIF). 601 SVGA HDTV 720P QSIF SIF width (pxls) height (pxls) 200 400 600 0 200 400 600 800 1000 1200 Figure 8.5 Display resolutions.
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
control Play, Fast Forward, Rewind, and Stop and gives status feedback plus a readout of the tape time code. Frame-accurate control allows the exact in and out points of the clip to be set in a record list. This avoids the need to trim the clip later in the encoding process. Lowering the bit rate A standard resolution television frame is 720 483 pixels (720 575 for 625/50), RGB sampled at 13.5 MHz, and has an 8-bit resolution per sample. At a rate of 30 frames per second this is a total data rate of 248 Mbit/s, and that is without any synchronization or control data. To stream video to a 56 kbit/s modem the bit rate has to be reduced by a factor of over 4,000. How is this achieved while still producing a recognizable picture? There are two methods used to achieve this. The first is to scale down the video data. This can achieve a reduction of over 130:1, clearly a long way off the target ratio of 4000:1. To reach this target a further reduction of 30:1 is required. This is achieved using sophisticated compression algorithms to remove redundant data, while retaining a viewable picture. Scaling The simplest way to lower data rate is to scale down the original video size. Three techniques are used to scale down the signal; the first and most apparent is to use a smaller video frame size than a television picture. Since video is often shown as a window in a web page, it is perfectly acceptable to reduce the picture to one-quarter size (320 240 pixels) or even smaller. This is called spatial scaling. It produces an instant reduction of data rate to one that is more easily processed by desktop workstations. The second scaling is temporal. Broadcast television uses 25 (PAL, SECAM) or 30 (NTSC) frames per second to give good portrayal of motion. Film uses 24 frames per second. Video material that originally was shot on film is converted to one of the two television rates during the telecine process. The third method of scaling down is to decrease the color resolution. Dedicated hardware processors in the video card can be used for the scaling, or it can be a straight software process using the main CPU. For live encoding it is best to use the video card and relieve the main processor load. Spatial scaling A smaller window can be used to display video; typical window sizes have resolutions of 320 240 or 176 144 pixels. This radically reduces the number of Video encoding 163
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services