Is Video Chat Blur Background using AI?

Kevin Wong
3 min readDec 8, 2023

--

Blurred Video Chat Background —(DallE 3 generated)

Have you ever wondered how the blur background or virtual background functions in video conference tools such as Zoom, Teams, or Google Meet? It’s a feature we often take for granted in the era of remote work. Did you know that this is made possible because of artificial intelligence (AI)?

How does it work?

There are a few important steps to get a video feed and apply a blurred background to it.

  1. Capturing video from camera
  2. Use a computer vision model to analyze where the subject is (in this case the face and body)
  3. Replace the area around the subject with a blurred overlay or another image.
  4. Redrawing the combined output to canvas to create final visual output.

Create the solution from scratch

In this article, we explore how to solve each of the steps to derive to a working solution.

  • We get a live webcam video feed.
  • We use a convolutional neural networks (CNN) model like body-segmentation to identify the subject. There are many models that can identify the subject and the library chosen in this article is TensorFlow with body-segmentation models. https://github.com/tensorflow/tfjs-models/tree/master/body-segmentation
  • HTML canvas features to combine the masked information of the subject and the video.

Demo

I have put together a Codesandbox demo to illustrate how this all comes together:

  1. Use react-webcam to retrieve a live feed from a webcam
  2. Use TensorFlow body-segmention model to identify the subject
  3. Make use of canvas to combine the blur effect on the part that is not found as the subject area

Live Demo on Codesandbox

https://codesandbox.io/p/github/pragmaticgeek/blurry-cam-demo

(be sure to open the preview in another tab, as of writing there are glitches running the demo in the Codesandbox preview panel)

Github link

The solution

Using the demo on a sample video from Videvo (https://www.videvo.net/video/scientist-woman-and-blood-sample-analysis/4834026/)

In the top left, the original video source is displayed, while the mask is generated through TensorFlow’s body-segmentation model in the top right (indicated by the red portion). Each source video frame is analyzed using TensorFlow’s body-segmentation CNN model, trained through machine learning (ML), to identify the subject area. The lower part shows the applied mask with a blur effect on the surrounding area (of the red part) that constructs the output. Notably, all the processing occurs on the browser side using the WebGL 2.0 spec, eliminating the need for any backend services and minimizing latency, making it particularly efficient for browsers with GPU power. WebGPU API can be used as well to speed up the segmentation process but as of writing, there is limited support from all browsers, so WebGL was utilized to back the TensorFlow processing instead.

Conclusion

Yes, computer vision models and machine learning (ML) play a crucial role, utilizing libraries like TensorFlow to separate subjects in images. Various CNN models, such as BlazePose and MoveNet, can be applied, with differing effectiveness, which I plan to explore in another writeup. The next time you use video conferencing tools like Zoom, remember that ML and AI are at work, masking out your background.

--

--

Kevin Wong
Kevin Wong

Written by Kevin Wong

Software Engineer and Technology Enthusiast based out of Vancouver, British Columbia. (https://www.linkedin.com/in/kevinkswong/)

No responses yet