This post discusses the complexities of recording WebRTC calls and web page recording more broadly. It highlights three types of web capture: extracting text and images, taking screenshots at regular intervals or after specific events, and capturing a full A/V stream of everything the browser outputs. The focus is on the last kind of headless web page recording: capturing the full A/V stream from the browser.
The post also explores what to record in a web app, emphasizing that the content shown in the final recording should be that of a passive participant who doesn't have any UI controls. It then delves into the challenges of running remote screen capture jobs on commodity servers without GPU support and how this can lead to performance issues.
The solution proposed is layered web page recording, which involves identifying different types of content within a web page and optimizing their rendering paths separately for reasonable performance on commodity servers. Daily's Video Component System (VCS) is designed for this kind of layered acceleration, allowing developers to create React-based applications using built-in components that always fall on the right acceleration path.