Removal of unwanted objects in Images: The use of semantic segmentation

Published on July 1, 2023 | Alborz Sabet

Semantic segmentation is one of the key tasks in computer vision that has been revolutionized by deep learning. At its core, semantic segmentation involves classifying each pixel in an image into a predefined category. This pixel-wise classification is particularly useful for detailed image editing tasks, such as object removal. Modern semantic segmentation models are typically built using Convolutional Neural Networks (CNNs). These networks are trained on large datasets with annotated images where each pixel is labeled with a class. The CNN learns to recognize patterns and textures associated with different objects, allowing it to accurately predict the class of each pixel in new, unseen images.

Architectures and Models

Several architectures have been influential in advancing semantic segmentation:

  • Fully Convolutional Networks (FCNs): FCNs can take input of any size and output a segmentation map.
  • U-Net: Designed for biomedical image segmentation, U-Net architecture features a contracting path to capture context and a symmetric expanding path for precise localization.
  • DeepLab: DeepLab models use atrous convolutions to capture multi-scale context by adopting multiple parallel filters with different rates.

Object Removal with Semantic Segmentation

In the context of object removal, these models work as follows:

  1. Segmentation Map Generation: When an image is fed into the model, a segmentation map is generated, assigning a label to each pixel.
  2. Target Object Isolation: The target object to be removed is isolated based on the segmentation map. This could be done automatically by the model if trained for specific object classes or manually by the user.
  3. Masking and Inpainting: The isolated object is then masked, and inpainting techniques are applied. Inpainting uses information from the surrounding pixels to fill in the gap left by the removed object, maintaining the coherence of the background.

Image before AI removal
Image after AI removal

Inpainting Techniques

Inpainting can be handled by traditional algorithms like the PatchMatch or Telea’s method, but deep learning offers more advanced solutions:

  1. Generative Adversarial Networks (GANs): GANs can generate realistic textures and patterns that blend seamlessly with the surrounding area.
  2. Partial Convolutions: Proposed by NVIDIA, this technique involves convolutional layers that can conditionally update themselves based on the presence of the content, making it ideal for handling irregular masks created after object removal.
  3. Adobe Firefly: A cutting-edge tool from Adobe, Firefly utilizes advanced machine learning algorithms for content-aware fill tasks. It excels at understanding the context of an image, enabling it to fill gaps left by object removal with textures and elements that are visually coherent with the surrounding area. Firefly's strength lies in its ability to adapt to various image styles and complexities, offering a powerful solution for both amateur and professional photo editing.

Training and Challenges

Training semantic segmentation models requires careful consideration:

  • Data Quality: High-quality, pixel-accurately labeled data is essential.
  • Class Imbalance: Some classes are more frequent than others, which can lead to biased models.
  • Boundary Precision: Achieving high precision around the boundaries of objects is a challenging task, often addressed by combining semantic segmentation with edge detection.

By leveraging these advanced deep learning techniques, software tools can perform object removal with astonishing accuracy and speed, saving countless hours of manual editing and allowing professionals to focus on the creative aspects of image manipulation.

Adobe Firefly - Object removal with generative fill

Before; ropes getting in the way of the frame Image before AI removal
After; ropes removed from image frame Image after AI removal

One intriguing aspect of using advanced tools like Adobe Firefly for object removal is the phenomenon of 'hallucinated' content during the inpainting process. When filling in the gaps left by removed objects, these algorithms don't just replicate surrounding textures and patterns; they often generate entirely new content that wasn't in the original image. This 'hallucination' happens as the model predicts what could plausibly exist in the removed object's place, based on the context and structure of the surrounding area. This step is particularly fascinating, as it's where the models creative capabilities come into play, often leading to surprisingly realistic and seamless results. The following images showcase this process, where an intermediate step reveals an unique approach to reconstructing the image, before finalizing the removal seamlessly.

Before Image before AI removal
Intermediate step Image after AI removal
After Image after AI removal