SIGNeRF

Scene Integrated Generation for Neural Radiance Fields

SIGNeRFGeneratively edits NeRF scenes in a controlled and fast manner.

Overview

We propose SIGNeRF, a novel approach for fast and controllable NeRF scene editing and scene-integrated object generation. We introduce a new generative update strategy that ensures 3D consistency across the edited images, without requiring iterative optimization. We find that depth-conditioned diffusion models inherently possess the capability to generate 3D consistent views by requesting a grid of images instead of single views. Based on these insights, we introduce a multi-view reference sheet of modified images. Our method updates an image collection consistently based on the reference sheet and refines the original NeRF with the newly generated image set in one go. By exploiting the depth conditioning mechanism of the image diffusion model, we gain fine control over the spatial location of the edit and enforce shape guidance by a selected region or an external mesh.

Pipeline

We leverage the strengths of ControlNet, a depth condition image diffusion model, to edit an existing NeRF scene. We do so with a few simple steps in a single forward pass. We start with an original NeRF scene 0 and select an editing method / region 1. For proxy object selection we place a mesh object into the scene 2 and therefore control the precise location and shape of the edit. We position reference cameras in the scene 3, render the corresponding color, depth, and mask images, and arrange them into image grids 4. These grids are used to generate the reference sheet with conditioned image diffusion 5. To propagate the edits to the entire image set, for each camera, a color, depth, and mask image are rendered and placed into the empty slot of the fixed reference sheet. We generate a new edited image consistent with the reference sheet by leveraging an inpainting mask. The step is repeated for all cameras 6. Finally, the NeRF is fine-tuned on the edited images 7. For a more detailed explanation with , we recommend watching our Explanation.

Results

With SIGNeRF we allow to either generate new objects into an existing NeRF scene or edit existing objects within the scene. We achieve this in a controllable manner by either proxy object placement or shape selection. In both cases, text is used to specify the generative outcome. We show results for both below. For more details and comparisons to other methods, we refer to our Results & Comparisons video.

Object GenerationThis allows us to generate new objects into an existing NeRF scene. For the object generation, a proxy object is used enabling us to precisely pick a location where the object should appear and also guide the generation with its shape.

Original Scene

Edited Scene — "A brown bunny"

Original Scene

Edited Scene — "A brown cow"

Object EditingWe also allow editing objects already existing within the NeRF scene. For the object editing a bounding box is used to specify the parts of the scene that should be changed. This allows editing only sections of scene objects (e.g. the body of the person is selected and not the face).

Edited Scene — "A grizzly bear"

Original Scene

Iron Man Plushy

Mouse Tiger

Golden Mouse

Original Scene

Sport Outfit

Pirate Outfit

Batman Outfit

Citation

@inproceeding{signerf, author ={Dihlmann, Jan-Niklas and Engelhardt, Andreas and Lensch, Hendrik P.A.},title ={SIGNeRF: Scene Integrated Generation for Neural Radiance Fields},booktitle ={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},month ={June}year ={2024}pages ={6679-6688}}

More Information

Open Positions

Interested in persuing a PhD in computer graphics?

Never miss an update

Join us on Twitter / X for the latest updates of our research group and more.

Acknowledgements

This work has been partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy - EXC number 2064/1 - project number 390727645 and SFB 1233 - project number 276693517. It was supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A and Cyber Valley