Skip to content
Massachusetts Institute of Technology

Scene Representation Group

Welcome to MIT CSAIL's Scene Representation Group, led by Vincent Sitzmann!

Recent Publications view all

Our goal is to enable artificial intelligence to perceive and interact with our world the way we humans do.

From a single picture we reconstruct mental representations of the underlying 3D scene that contain information on geometry, materials, lighting, appearance, and more. This process, dubbed neural scene representation, allows us to understand, navigate, plan, and interact with our environment in our everyday lives. Humans learn this skill from little data or supervision, mostly from self-play and by observing.

We want to build models that achieve these skills computationally. How can we learn to infer properties of 3D scenes just from images? How can we quickly acquire new concepts, such as learning about a new object and how to use it, from only a single demonstration? And how can we use such representations for planning our actions?

To achieve this, our research focuses on three core areas: neural rendering, representation learning, and computer vision. Our models are capable of making strong inferences about 3D scenes just from images, which are useful to downstream applications across vision, graphics, and robotics.