Photorealistic Image Synthesis for Object Instance Detection

Abstract

We present an approach to synthesize highly photorealistic images of 3D object models, which we use to train a convolutional neural network for detecting the objects in real images. The proposed approach has three key ingredients: (1) 3D object models are rendered in 3D models of complete scenes with realistic materials and lighting, (2) plausible geometric configuration of objects and cameras in a scene is generated using physics simulations, and (3) high photorealism of the synthesized images is achieved by physically based rendering. When trained on images synthesized by the proposed approach, the Faster R-CNN object detector achieves a 24% absolute improvement of mAP@.75IoU on Rutgers APC and 11% on LineMod-Occluded datasets, compared to a baseline where the training images are synthesized by rendering object models on top of random photographs. This work is a step towards being able to effectively train object detectors without capturing or annotating any real images.

Dataset

Dataset of ~400K PBR images - Low and high quality PBR images rendered in 6 scenes are provided. There is one ZIP archive per scene and quality. Each archive has 10-25GB and contains the following directories:
- rgb, depth – Color and depth images.
- gt – Ground-truth 6D object poses and 2D bounding boxes, represented as in the BOP format.
- mask, mask_visib – Ground-truth masks of object silhouettes and masks of the visible parts of object silhouettes.
Camera parameters for scenes 1–5
Camera parameters for scene 6

By using the dataset, you accept these license terms.

Examples of PBR images

Below are examples of high quality PBR images of the LineMod objects in Scenes 1–5 (top five rows), and images of the Rutgers APC objects in Scene 6 (bottom row). The images were automatically annotated with 2D bounding boxes, masks and 6D poses of the visible object instances.