Skip to content
Massachusetts Institute of Technology

Machine Learning for Inverse Graphics 6.S980

From a single picture, humans reconstruct a mental representation of the underlying 3D scene that is incredibly rich in information such as shape, appearance, physical properties, purpose, how things would feel, smell, sound, etc. These mental representations allow us to understand, navigate, and interact with our environment in our everyday lives. We learn this from little supervision, mainly by interacting with our world and observing the world around us.

Emerging neural scene representations aim to build models that replicate this behavior: Trained in a self-supervised manner, the goal is to reconstruct rich representations of 3D scenes that can then be used in downstream tasks such as computer vision, robotics, and graphics.

This course covers fundamental and advanced techniques in this field at the intersection of computer vision, computer graphics, and geometric deep learning. It will lay the foundations of how cameras see the world, how we can represent 3D scenes for artificial intelligence, how we can learn to reconstruct these representations from only a single image, how we can guarantee certain kinds of generalization, and how we can train these models in a self-supervised way.

What you will learn

  • Computer vision & computer graphics fundamentals (pinhole camera model, camera pose, projective geometry, light fields, multi-view geometry).
  • Volumetric scene representations for deep learning: Neural fields & voxel grids.
  • Differentiable rendering in 3D representations and light fields.
  • Inference algorithms for deep-learning based 3D reconstruction: convolutional neural networks, auto-decoding.
  • Basics of geometric deep learning: Representation theory, groups, group actions, equivariance, equivariant neural network architectures.
  • Self-supervised learning of scene representations via 3D-aware auto-encoding.
  • Applications of neural scene representations in graphics, robotics, vision, and scientific discovery.

For more details, see the preliminary Syllabus

Prerequisites

No computer vision or graphics specific background is required. We will however expect you to:

  • have taken a machine learning class with a focus on deep learning
  • be comfortable with picking up new mathematics as needed ("mathematical maturity")
  • have a solid knowledge of:
    • linear algebra,
    • multivariate calculus,
    • probability theory, and
    • programming with vectors and matrices, such as in Numpy, Pytorch or Jax.

Level: advanced undergraduate student or graduate student. This class will not count for your qualification exams — it's a graduate-level seminar!

Prospective Grading Policy

Grading will be split between five module-specific problem sets and a final project:

  • Psets: 60%
  • Final project: 40%

Schedule

6.S980 will be held as 1.5 hour long lectures in room 4-231 on Tuesdays and Thursdays:

  • Tuesdays, 2:30–4:00pm
  • Thursdays, 2:30–4:00pm

Closer to the start of the course we will provide an iCal calendar subscription URL here, which will provide lecture dates and deadlines.

Syllabus

6.S980 will be first held in fall term 2022. We will add relevant dates, such as lecture dates and homework deadlines, as they become available. Dates will likely shift around until the beginning of fall term.

Module 0

Introduction

  • Learning goals
  • How to think about the environment we're in?
  • Computer Vision as Inverse Graphics
  • Different ways of defining 3D

Module 1: Fundamentals of Image Formation

Image Formation

  • Pinhole camera model
  • Light Fields
  • Projective image formation

Multi-View Geometry

  • Camera poses
  • How 3D is encoded in multi-view images
  • Multi-view stereo

Module 2: 3D Scene Representations & Neural Rendering

Neural Networks Review

  • Neural Networks as Function Spaces
  • Kernel View of neural networks
  • Positional Encodings & activations and their impact on kernel functions

Scene Representations

  • Discrete Representations: Meshes, Voxelgrids, Point Clouds
  • Continuous Representations: Fourier Series, Neural Fields
  • Hybrid Representations
  • How to parameterize geometry
  • Pros and cons of different representations: run time, memory usage, etc

Rendering, Differentiable Rendering, and 3D Reconstruction

  • Volume Rendering
  • Sphere-Tracing
  • Light Field Rendering
  • Solving the multi-view stereo problem with differentiable rendering

Module 3: Representation Learning, Latent Variable Models, and Auto-encoding

Deep Learning Review

  • Network types:
    • MLPs
    • Convnets
    • Transformers
  • 2D / 3D / 4D network architectures
  • Inductive biases & weight sharing

Image Representation Learning

  • (masked) Auto-encoding
  • Contrastive learning
  • DiNO & Co

Neural Networks as Prior-Based Inference Algorithms

  • Generative models
  • 3D reconstruction as inference in a latent variable model
  • Self-supervised Scene Representation Learning
  • Inference via optimization
  • Auto-decoding

Conditional 3D Representations

  • Local conditioning
  • Global conditioning
  • Object-centric conditioning

Module 4: Geometric Deep Learning

Representation Theory & Symmetries

  • The problem of generalization
  • High-level intro to Representation Theory:
    • Groups
    • Representations
    • Group actions
    • Equivariance
    • Invariance
  • Important symmetry groups:
    • Rotation
    • Translation
    • Scale

Building NNs with Symmetries

  • Building NNs with symmetries:
    • MLPs
    • Convolutions
    • Transformers
  • Group Equivariant Neural Networks

Guest Lecture

Module 5: Motion and Objectness

Motion and its Modeling

  • Optical Flow
  • Scene Flow
  • Algorithms for estimating optical flow
  • Algorithms for estimating scene flow
  • Modeling motion as part of a scene representation, canonical spaces

Neural ODEs

Objectness

  • Motion as a cue for objectness
  • Relevant architectures:
    • Object-centric representations
    • Capsule Networks
    • Slot attention
    • Unsupervised object discovery of Object Radiance / Light Fields

Guest lecture

Module 6: Applications

Robotics

  • Guest lecture

Vision

  • Guest lecture

Scientific discovery (Cryo-EM)

  • Guest Lecture

AR/VR and graphics applications

  • Guest Lecture