Scene Representation Group
Welcome to MIT CSAIL's Scene Representation Group, led by Vincent Sitzmann!
Recent Publications view all
-
Unifying 3D Representation and Control of Diverse Robots with a Single Camera
- on: July 10, 2024
- in: arXiv
-
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
- on: July 2, 2024
- in: NeurIPS
-
Neural Isometries: Taming Transformations for Equivariant ML
- on: June 15, 2024
- in: NeurIPS
-
FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent
- on: April 23, 2024
- in: arXiv
-
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
- on: Dec. 20, 2023
- in: CVPR
- ✨Best Paper Runner-Up
-
Variational Barycentric Coordinates
- on: Oct. 1, 2023
- in: SIGGRAPH Asia
- ✨Journal Track
Our goal is to enable artificial intelligence to perceive and interact with our world the way we humans do.
From a single picture we reconstruct mental representations of the underlying 3D scene that contain information on geometry, materials, lighting, appearance, and more. This process, dubbed neural scene representation, allows us to understand, navigate, plan, and interact with our environment in our everyday lives. Humans learn this skill from little data or supervision, mostly from self-play and by observing.
We want to build models that achieve these skills computationally. How can we learn to infer properties of 3D scenes just from images? How can we quickly acquire new concepts, such as learning about a new object and how to use it, from only a single demonstration? And how can we use such representations for planning our actions?
To achieve this, our research focuses on three core areas: neural rendering, representation learning, and computer vision. Our models are capable of making strong inferences about 3D scenes just from images, which are useful to downstream applications across vision, graphics, and robotics.