Michael Fischer

NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs

CVPR 2024

Michael Fischer¹, Zhengqin Li², Thu Nguyen-Phuoc², Aljaž Božič², Zhao Dong², Tobias Ritschel¹, Carl Marshall²

¹University College London, ²Meta Reality Labs Research

Paper
Video
Code (coming soon)
Poster
Supplemental

TL;DR:

We investigate transferring the appearance from a source Neural Radiance Field (NeRF) to a target 3D geometry in a semantically meaningful and multiview-consistent way by leveraging semantic correspondence from ViT features.

Abstract:

A Neural Radiance Field (NeRF) encodes the specific relation of 3D geometry and appearance of a scene. We here ask the question whether we can transfer the appearance from a source NeRF onto a target 3D geometry in a semantically meaningful way, such that the resulting new NeRF retains the target geometry but has an appearance that is an analogy to the source NeRF. To this end, we generalize classic image analogies from 2D images to NeRFs. We leverage correspondence transfer along semantic affinity that is driven by semantic features from large, pre-trained 2D image models to achieve multi-view consistent appearance transfer. Our method allows exploring the mix-and- match product space of 3D geometry and appearance. We show that our method outperforms traditional stylization-based methods and that a large majority of users prefer our method over several typical baselines.

Method Overview:
To create a NeRF Analogy, we perform the following three steps:

We first render out a set of images from both the source- and target NeRFs. We then precompute DiNO-ViT features on these images which serve as dense, semantic image descriptors.
In the second step, we sample random pixels (and their position and normal) on the target and compute the correspondence between to the source images by comparing the DiNO-features' cosine similarity. We can then query the source appearance at the maximum similarity, allowing us to create training tuples which combine source- and target information.
Finally, we transfer the appearance from the source to the target by optimizing the target NeRF's appearance to match the source appearance in the training tuples and refine the results with an additional edge-loss.

Results:

In the video captions, DIA is short for Deep Image Analogies, ST is short for Style Transfer, WCT is short for Whitening and Coloring Transform, the algorithm introduced in the paper "Universal Style Transfer via Feature Transforms", SNeRF is the method by Nguyen-Phuoc et al., and Ours is our NeRF Analogy.

DIA

Ours

SNeRF

WCT

DIA

Ours

SNeRF

WCT

DIA

Ours

SNeRF

WCT

DIA

Ours

SNeRF

WCT

DIA

Ours

SNeRF

WCT

DIA

Ours

SNeRF

WCT

DIA

Ours

SNeRF

WCT

Original

Ours

DIA

WCT

SNeRF

Original

Ours

DIA

WCT

SNeRF

Target
(Shape)

Ours
tmp

Source
(Appearance)

Target
(Shape)

Ours
tmp

Source
(Appearance)

target
(Shape)

Ours
tmp

Source
(Appearance)

target
(Shape)

Ours
tmp

Source
(Appearance)

target
(Shape)

Ours
tmp

Source
(Appearance)

target
(Shape)

Ours
tmp

Source
(Appearance)

Citation
If you find our work useful and use parts or ideas of our paper or code, please cite:

@inproceedings{fischer2024nerf,
  title={NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs},
  author={Fischer, Michael and Li, Zhengqin and Nguyen-Phuoc, Thu and Bozic, Aljaz and Dong, Zhao and Marshall, Carl and Ritschel, Tobias},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4640--4650},
  year={2024}
}

Acknowledgements
This research was conducted during an internship at Meta Reality Labs Research. We extend our gratitude to the anonymous reviewers for their insightful feedback and to Meta Reality Labs for their continuous support. Additionally, we are thankful for the support provided by the Rabin Ezra Scholarship Trust.

Paper

Video

Code (coming soon)

Poster

Supplemental