Machine Learning for Inverse Graphics 6.S980

From a single picture, humans reconstruct a mental representation of the underlying 3D scene that is incredibly rich in information such as shape, appearance, physical properties, purpose, how things would feel, smell, sound, etc. These mental representations allow us to understand, navigate, and interact with our environment in our everyday lives. We learn this from little supervision, mainly by interacting with our world and observing the world around us.
Emerging neural scene representations aim to build models that replicate this behavior: Trained in a self-supervised manner, the goal is to reconstruct rich representations of 3D scenes that can then be used in downstream tasks such as computer vision, robotics, and graphics.
This course covers fundamental and advanced techniques in this field at the intersection of computer vision, computer graphics, and geometric deep learning. It will lay the foundations of how cameras see the world, how we can represent 3D scenes for artificial intelligence, how we can learn to reconstruct these representations from only a single image, how we can guarantee certain kinds of generalization, and how we can train these models in a self-supervised way.
What you will learn
- Computer vision & computer graphics fundamentals (pinhole camera model, camera pose, projective geometry, light fields, multi-view geometry).
- Volumetric scene representations for deep learning: Neural fields & voxel grids.
- Differentiable rendering in 3D representations and light fields.
- Inference algorithms for deep-learning based 3D reconstruction: convolutional neural networks, auto-decoding.
- Basics of geometric deep learning: Representation theory, groups, group actions, equivariance, equivariant neural network architectures.
- Self-supervised learning of scene representations via 3D-aware auto-encoding.
- Applications of neural scene representations in graphics, robotics, vision, and scientific discovery.
For more details, see the preliminary Syllabus
Prerequisites
No computer vision or graphics specific background is required. We will however expect you to:
- have taken a machine learning class with a focus on deep learning
- be comfortable with picking up new mathematics as needed ("mathematical maturity")
- have a solid knowledge of:
Level: advanced undergraduate student or graduate student. This class will not count for your qualification exams — it's a graduate-level seminar!
Prospective Grading Policy
Grading will be split between five module-specific problem sets and a final project:
- Psets: 60%
- Final project: 40%
Schedule
6.S980 will be held as 1.5 hour long lectures in room 4-231 on Tuesdays and Thursdays:
- Tuesdays, 2:30–4:00pm
- Thursdays, 2:30–4:00pm
Closer to the start of the course we will provide an iCal calendar subscription URL here, which will provide lecture dates and deadlines.
Syllabus
6.S980 will be first held in fall term 2022. We will add relevant dates, such as lecture dates and homework deadlines, as they become available. Dates will likely shift around until the beginning of fall term.
Module 0 |
||
---|---|---|
Introduction |
|
|
Module 1: Fundamentals of Image Formation |
||
Image Formation |
|
|
Multi-View Geometry |
|
|
Module 2: 3D Scene Representations & Neural Rendering |
||
Neural Networks Review |
|
|
Scene Representations |
|
|
Rendering, Differentiable Rendering, and 3D Reconstruction |
|
|
Module 3: Representation Learning, Latent Variable Models, and Auto-encoding |
||
Deep Learning Review |
|
|
Image Representation Learning |
|
|
Neural Networks as Prior-Based Inference Algorithms |
|
|
Conditional 3D Representations |
|
|
Module 4: Geometric Deep Learning |
||
Representation Theory & Symmetries |
|
|
Building NNs with Symmetries |
|
|
Guest Lecture |
||
Module 5: Motion and Objectness |
||
Motion and its Modeling |
|
|
Neural ODEs |
||
Objectness |
|
|
Guest lecture |
||
Module 6: Applications |
||
Robotics |
|
|
Vision |
|
|
Scientific discovery (Cryo-EM) |
|
|
AR/VR and graphics applications |
|