Facial Sentiment Analysis as user input

Kevin Wong
4 min readDec 13, 2023

--

AI facial sentiment analysis — DALL-E 3 generated

Thinking into a future with AI, it’s intriguing to consider how generative AI will have the ability to dynamically analyze and respond to our expressed emotions in real-time. This includes interpreting facial expressions, speech nuances (such as pitch and tone), spoken words, and physical actions. With trained AI models, generative AI has the potential to predict and understand individuals better than they know themselves.

During my exploration of TensorFlow and computer vision ML models for implementing a blurred background video feed, I delved into various TensorFlow models. One particularly fascinating project is face-api.js, which simplifies the development of AI applications for identifying facial features using TensorFlow’s web browser engine exclusively. These solutions can run entirely within a web browser, with no dependencies for backend requests in any analysis.

Let’s look under the hood:

face-api is a tool that utilizes various pre-trained models to perform specific features in facial recognition.

From their Github repository project tag line, face-api.js:

JavaScript API for face detection and face recognition in the browser implemented on top of the tensorflow.js core API

The API allows for:

  1. Detecting faces and track faces of a given image
  2. Detecting face landmarks (where the eyes are, where the nose is, where the corner of the mouth is, etc…)
  3. Detecting face familiarity. (Think of these ML models as something you see in crime buster movies where the CCTV tracks a subject’s face on video and the system alerts to tell it is the person of interest.)
  4. Detecting gender, age and unique feature of a face model. (for example, when detect a wrinkle what does that mean? or if there’s a scar what is that?)

The models

The following models are utilized to make the above feature work.

  • SSD Mobilenet V1
  • Tiny Face Detector
  • Multi-Task Cascaded Convolutional Neural Networks (MTCNN)

SSD Mobilenet V1

SSD MobileNet V1 is a computer vision model used for object detection, commonly employed in applications like image and video analysis. In simple terms, it can identify and locate multiple objects within an image or video. It combines MobileNet, a lightweight convolutional neural network (CNN) for image classification, with Single Shot Multibox Detector (SSD) architecture, allowing it to efficiently recognize and pinpoint objects in real-time, making it suitable for various tasks, such as autonomous vehicles, surveillance, and augmented reality.

Tiny Face Detector

The Tiny Face Detector model is a computer vision algorithm designed to identify and locate small faces in images. In simple terms, it scans pictures and highlights areas where small faces are present. This can be useful in applications like facial recognition or analyzing crowded scenes where faces might be tiny or distant. The “Tiny” in its name suggests efficiency, as it’s designed to be lightweight and fast for real-time processing on devices with limited resources.

MTCNN

MTCNN, or Multi-task Cascaded Convolutional Networks, is a deep learning model used for face detection. In simple terms, it quickly and accurately identifies faces in images by going through multiple stages, gradually refining its predictions. The model first locates potential face regions, then refines them in subsequent steps to improve accuracy. MTCNN is widely employed in applications like facial recognition, video analysis, and image processing to automatically detect and extract faces from pictures or video frames.

face-api.js’s has a live demo playground for you to understand what each model is capable of

https://justadudewhohacks.github.io/face-api.js/

Pre-trained models

If you are looking at some pre-trained models to work with face-api.js, it’s part of their repository at:

Live Demo

I had an alternative idea myself to analyze my own sentiment through the web cam in a Codesandbox application that I built from scratch.

https://codesandbox.io/p/github/pragmaticgeek/face-sentiment-analysis

In the demo, the app will take a video feed from a webcam and apply the face-api face-detection and face-landmark detection API to detect a facial subject. Upon identifying a facial subject, it will also map a sentiment based on facial landmarks to determine the emotion.

Parting Thoughts

The facial sentiment analysis technology is indeed very interesting for the future of modern user input to computer systems. With improved models and the right ideas, it can accomplish amazing things. With the browser inherently able to run all the models within the web browser, it truly opens up the world for better web applications and user input.

For example, I have worked on a freelance project where I assisted a psychologist in building an application to track the mood of his patients. What if we used this technology to enhance a mood tracker based on daily recordings instead of writing a journal (and personally tracking the patient’s own mood)? Depressed patients could have an easier time monitoring their psychological state this way.

What if we trained the model to be a lie detector? For example, using a recorded feed by authorities to analyze if the subject is lying based on facial expressions?

What about a poker-playing bot? Build a bot to analyze facial sentiment, voice sentiment, and action-tracking sentiment to predict how to play a poker hand?

What about product design loops? Sit a user in front, conduct user testing on prototypes of products, and use sentiment analysis — not just through a user experience (UX) journey on the screen, but also capture their emotions when using your prototype. This will certainly create a more feature-rich analysis of the user testing assumptions.

As I continue to brainstorm the potential applications of facial sentiment analysis, it’s like a constant stream of light bulb moments illuminating the possibilities. I am genuinely excited about the future prospects of utilizing facial sentiment as a unique form of user input. The opportunities ahead are truly boundless.

--

--

Kevin Wong

Software Engineer and Technology Enthusiast based out of Vancouver, British Columbia. (https://www.linkedin.com/in/kevinkswong/)