Michael Fischer

SAMa: Material-aware 3D Selection and Segmentation

arXiv preprint

Michael Fischer^1,2, Iliyan Georgiev¹, Thibault Groueix¹, Vladimir G. Kim¹, Tobias Ritschel², Valentin Deschaintre¹

¹Adobe Research, ²University College London

Paper
Video (coming soon)
Supplemental

TLDR:

We fine-tune SAM2 for material selection in 3D representations. We exploit that, as a video model, it is by design multiview-consistent and leverage this property to create a 3D-consistent material-similarity representation in the form of a point cloud, which can be efficiently queried from novel views in just a few milliseconds. Our method works on arbitrary 3D assets (NeRFs, 3D Gaussians, meshes) and does not require any pre-processing, leading to a click-to-selection time of around 2 seconds.

Abstract:

Decomposing 3D assets into material parts is a common task for artists and creators, yet remains a highly manual process. In this work, we introduce Select Any Material (SAMa), a material selection approach for various 3D representations. Building on the recently introduced SAM2 video selection model, we extend its capabilities to the material domain. We leverage the model's cross-view consistency to create a 3D-consistent intermediate material-similarity representation in the form of a point cloud from a sparse set of views. Nearest-neighbor lookups in this similarity cloud allow us to efficiently reconstruct accurate continuous selection masks over objects' surfaces that can be inspected from any view. Our method is multiview-consistent by design, alleviating the need for contrastive learning or feature-field pre-processing, and performs optimization-free selection in seconds. Our approach works on arbitrary 3D representations and outperforms several strong baselines in terms of selection accuracy and multiview consistency. It enables several compelling applications, such as replacing the diffuse-textured materials on a text-to-3D output with PBR materials, or selecting and editing materials on NeRFs and 3D-Gaussians.

Method Overview:
Once a user clicks on a material, we perform the following steps:

Render a sparse set of RGB and depth images of the object (or use pre-rendered images and cached depth) and process these with SAMa, conditioned on the user's click.
Back-project the resulting per-pixel similarity values to 3D to obtain a 3D material-similarity point cloud.
Due to SAMa's multiview-consistent predictions, we do not need any pre-processing or 3D consolidation and can now directly infer the selection results for any novel view by simply using kNN-lookups into this pointcloud. This is very efficient and takes only a few milliseconds (see GUI video below).

Results:

Citation
If you find our work useful and use parts or ideas of our paper or code, please cite:

@article{fischer2024sama,
  title={SAMa: Material-aware 3D Selection and Segmentation},
  author={Fischer, Michael and Georgiev, Iliyan and Groueix, Thibault and Kim, Vladimir G and Ritschel, Tobias and Deschaintre, Valentin},
  journal={arXiv preprint arXiv:2411.19322},
  year={2024}
}

Acknowledgements
This research was conducted during an internship at Adobe Research London. I am honored to be a recipient of the Rabin Ezra Scholarship and deeply appreciate the support of the trust.

Paper

Video (coming soon)

Supplemental