Youth Training Camp | "Introduction to Web Multimedia" Notes

发表于 2022-01-30 14:30 1331 字 7 min read

cos avatar

cos

FE / ACG / 手工 / 深色模式强迫症 / INFP / 兴趣广泛养两只猫的老宅女 / remote

文章系统介绍了Web多媒体技术的发展历程与核心概念,从PC时代的Flash插件到移动互联网时代的HTML5与Media Source Extensions(MSE)的演进,重点讲解了图像与视频的基本参数(分辨率、深度、帧率、码率等)、视频帧类型(I/P/B帧)、时间戳(DTS/PTS)及GOP结构,并阐述了视频编码通过去除空间、时间、编码和视觉冗余实现压缩的原理。同时介绍了HTML5多媒体元素、MSE API实现流媒体播放、HLS等流媒体协议及其在点播、直播、云游戏等场景中的应用。

This article has been machine-translated from Chinese. The translation may contain inaccuracies or awkward phrasing. If in doubt, please refer to the original Chinese version.

Web Multimedia History

  • PC era: Flash and other playback plugins, rich clients.
  • Mobile internet era: Flash and others were gradually phased out, HTML5 emerged, but its supported video formats were limited
  • Media Source Extensions, supporting multiple video formats, etc.

Fundamental Knowledge

Encoding Formats

Image Basic Concepts

  • Image resolution: Used to determine the pixel data that makes up an image, referring to the number of pixels an image has in the horizontal and vertical directions.
  • Image depth: Image depth refers to the number of bits needed to store each pixel. Image depth determines the number of possible colors or possible grayscale levels for each pixel.
    • For example, a color image uses R, G, B three components per pixel, each component using 8 bits, so the pixel depth is 24 bits, and the number of representable colors is 2^24, which is 16,777,216;
    • While a monochrome image requires 8 bits per pixel, so the pixel depth is 8 bits, with a maximum of 2^8, which is 256 grayscale levels.
  • Image resolution and image depth together determine the size of an image~

Video Basic Concepts

  • Resolution: the image resolution of each frame
  • Frame rate: the number of video frames contained in a video per unit of time
  • Bitrate: refers to the amount of data transmitted per unit of time for a video, usually measured in kbps, meaning kilobits per second.
  • Resolution, frame rate, and bitrate together determine the size of a video

Types of Video Frames

I-frame, P-frame, B-frame

I-frame (Intra-coded frame): An independent frame that carries all its own information, can be decoded independently without relying on other frames

P-frame (Predictive-coded frame): Requires referencing the preceding I-frame or P-frame for encoding

B-frame (Bi-directional predictive-coded frame): Depends on both preceding and following frames, representing the difference between the current frame and surrounding frames

1 -> 2 -> 3 ->……

image.png

DTS (Decode Time Stamp): Determines when the bitstream should be sent to the decoder for decoding.

PTS (Presentation Time Stamp): Determines when the decoded video frame should be displayed

When no B-frames exist, the DTS order and PTS order should be the same

GOP (Group of Picture)

The interval between two I-frames, typically 2~4 seconds

image.png

If there are many I-frames, the video will be larger

Why Do We Need Encoding?

Video resolution: 1920 x 1080

Then the size of one image in the video: 1920 x 1080 x 24/8 = 6,220,800 Bytes (5.2M)

Then for a video at 30FPS, 90 minutes long, the total size would be: 933GB, way too large!

Not to mention higher frame rates like 60FPS…

What does encoding compress away?

  • First, spatial redundancy:

image.png

  • Temporal redundancy: Only the ball’s position has changed below, everything else remains the same

image.png

  • Coding redundancy: For an image like this, blue can be represented by 1 and white by 0 (since there are only two colors, using something like Huffman encoding)

    image.png

  • Visual redundancy

    image.png

Encoding Data Processing Flow

image.png

Removing spatial and temporal redundancy through prediction -> Transform to remove spatial redundancy

  • Quantization removes visual redundancy: removing things the visual system can barely perceive
  • Entropy encoding removes coding redundancy: frequently occurring items require shorter encoding

Container Formats

The video encoding described above stores only pure video information

Container format: a container for storing audio, video, images, or subtitle information

image.png

image.png

Multimedia Elements and Extended APIs

video & audio

The <video> tag is used to embed a media player in HTML or XHTML documents, supporting video playback within documents.

<!DOCTYPE html>
<html>
<body>
    <video src="./video.mp4" muted autoplay controls width=600 height=300></video>
    <video muted autoplay controls width=600 height=300>
        <source src="./video.mp4"></source>
    </video>
</body>
</html>

The <audio> element is used to embed audio content in documents.

<!DOCTYPE html>
<html>
<body>
    <audio src="./auido.mp3" muted autoplay controls width=600 he ight=300></audio>
    <audio muted autoplay controls width=600 height=300>
     <source src=" ./audio.mp3"></source>
    </audio>
</body>
</html>
MethodDescription
play()Start playing audio/video (asynchronous)
pause()Pause currently playing audio/video
load()Reload the audio/video element
canPlayType()Check if the browser can play the specified audio/video type
addTextTrack()Add a new text track to the audio/video
PropertyDescription
autoplaySet or return whether the video auto-plays after loading.
controlsSet or return whether audio/video displays controls (e.g., play/pause)
currentTimeSet or return the current playback position in the audio/video (in seconds)
durationReturn the length of the current audio/video (in seconds)
srcSet or return the current source of the audio/video element
volumeSet or return the volume of the audio/video
bufferedReturn a TimeRanges object representing the buffered portion of the audio/video
playbackRateSet or return the playback speed of the audio/video.
errorReturn a MediaError object representing the audio/video error state
readyStateReturn the current ready state of the audio/video.
EventDescription
loadedmetadataTriggered when the browser has loaded audio/video metadata
canplayTriggered when the browser can start playing audio/video
playTriggered when the audio/video has started or is no longer paused
playingTriggered when the audio/video is ready after being paused due to buffering
pauseTriggered when the audio/video has been paused
timeupdateTriggered when the current playback position has changed
seekingTriggered when the user starts moving/jumping to a new position in the audio/video
seekedTriggered when the user has moved/jumped to a new position in the audio/video
waitingTriggered when the video stops because it needs to buffer the next frame
endedTriggered when the current playlist has ended

Limitations

  • audio and video don’t support direct playback of HLS, FLV, and other video formats
  • Video resource requests and loading cannot be controlled through code, making the following features impossible:
    • Segment loading (saving bandwidth)
    • Seamless quality switching
    • Precise preloading

MSE (Extended API)

Media Source Extensions API

  • Plugin-free streaming media playback on the web

  • Supports playback of HLS, FLV, MP4, and other format videos

  • Enables video segment loading, seamless quality switching, adaptive bitrate, precise preloading, etc.

  • Basically supported by mainstream browsers, except Safari on iOS

image.png

  1. Create a mediaSource instance
  2. Create a URL pointing to the mediaSource
  3. Listen for the sourceopen event
  4. Create a sourceBuffer
  5. Add data to the sourceBuffer
  6. Listen for the updateend event

image.png

  • Player playback flow

image.png

Streaming Protocols

image.png

HLS stands for HTTP Live Streaming, an HTTP-based media streaming protocol proposed by Apple for real-time audio and video streaming transmission. Currently, the HLS protocol is widely used in video-on-demand and live streaming.

Application Scenarios

image.png

  • VOD/Live streaming -> Video upload -> Video transcoding
  • Images -> Supporting new image formats
  • Cloud gaming -> No need to download cumbersome clients, runs on remote servers, with video streams transmitted back and forth (high latency requirements)

Summary and Reflections

This lesson introduced the basic concepts of Web multimedia technology, such as encoding formats, container formats, multimedia elements, streaming protocols, and described various application scenarios for Web multimedia

Most of the content cited in this article comes from Teacher Liu Liguo’s class and MDN

喜欢的话,留下你的评论吧~

© 2020 - 2026 cos @cosine
Powered by theme astro-koharu · Inspired by Shoka