on: Dec. 3, 2018
in: CVPR
✨Oral

DeepVoxels: Learning Persistent 3D Feature Embeddings

Vincent Sitzmann
Justus Thies
Heide Felix
Matthias Niessner
Gordon Wetzstein
Michael Zollhoefer

@inproceedings{sitzmann2019deepvoxels,
    author = {Sitzmann, Vincent
              and Thies, Justus
              and Heide, Felix
              and Nie{\ss}ner, Matthias
              and Wetzstein, Gordon
              and Zollh{\"o}fer, Michael},
    title = {DeepVoxels: Learning Persistent 3D Feature Embeddings},
    booktitle = {Proc. Computer Vision and Pattern Recognition (CVPR), IEEE},
    year={2019}
}

Copy to Clipboard

Deep Generative Models today allow us to perform highly-realistic image synthesis. While each generated image is of high quality, a major challenge is to generate a series of coherent views of the same scene. This requires the network to have a latent space representation that fundamentally understands the 3D layout of the scene; e.g., how would the same chair look from a different viewpoint?

Unfortunately, this is challenging for existing models that are based on a series of 2D convolution kernels. Instead of parameterizing 3D transformations, they will explain training data in a higher-dimensional feature space, leading to poor generalization to novel views at test time - such as the output of Pix2Pix trained on images of the cube above.