Creating a Synthetic Dataset with Stable Diffusion

Project Overview

In this project, I focused on generating a synthetic dataset for emotion recognition using Stable Diffusion and the diffusers library from Hugging Face. This initiative is part of the “Computer Vision in 30 Days Coding Challenge” by Felipe (Computer Vision Engineer on YouTube). The goal was to create images of people displaying various emotions, which I plan to use later to train a classifier to detect emotions using MediaPipe Face Detection.

Having spent over eight years creating synthetic data in the form of CGI for film and TV as a VFX artist, the idea of generating visually plausible imagery has always been close to my heart. The integration of machine learning and diffusion technologies has profoundly impacted the VFX industry, speeding up workflows and enhancing our capabilities. By creating synthetic data with a focus on realism, I see an exciting opportunity to bridge my VFX background with computer vision.

The Process

Using Google Colab for GPU support, I leveraged the diffusers library to create synthetic images of human faces displaying three core emotions: happy, sad and surprised. By experimenting with both prompts and negative prompts, I refined the image generation process to produce realistic, photorealistic imagery that would be useful for machine learning models.

Challenges and Solutions

While generating images for the happy, sad, and angry emotions was relatively straightforward, surprised expressions proved to be a challenge for Stable Diffusion. Approximately 40% of the surprised images lacked the realism required for a balanced and useful dataset. To address this imbalance, I curated the dataset by removing a similar percentage of images from the other categories, resulting in a final dataset of 152 images per emotion.

Key Learnings

Prompt Engineering Matters: Carefully crafting prompts and negative prompts significantly impacted the quality of generated images. I spent time refining these parameters to achieve more realistic results.

Balancing the Dataset: Achieving a balanced dataset required both careful generation and curation, especially when faced with challenges in generating convincing images for specific emotions.
Bridging Skills: My experience in creating synthetic CGI as a VFX artist translated well into generating synthetic datasets for machine learning, highlighting the interconnected nature of these domains.

GitHub: https://github.com/BrandtBrandtBrandt/stable-diffusion-synthetic-data-images.git

What’s Next?

In a future step, I will use this dataset to train an emotion recognition classifier and continue exploring the intersection of VFX, synthetic data generation, and machine learning.

Acknowledgment:

This project was part of the Computer Vision in 30 Days Coding Challenge by Felipe, also known as the Computer Vision Engineer on YouTube.