Skip to content
Massachusetts Institute of Technology

Course Contents

This course dives into advanced concepts in computer vision. A first focus is geometry in computer vision, including image formation, represnetation theory for vision, classic multi-view geometry, multi-view geometry in the age of deep learning, differentiable rendering, neural scene representations, correspondence estimation, optical flow computation, and point tracking.

Next, we explore generative modeling and representation learning including image and video generation, guidance in diffusion models, conditional probabilistic models, as well as representation learning in the form of contrastive and masking-based methods.

Finally, we will explore the intersection of robotics and computer vision with "vision for embodied agents", investigating the role of vision for decision-making, planning and control.

Prerequisites

This class is an advanced graduate-level class. You have to have working knowledge of the following topics, i.e., be able to work with them in numpy / scipy / pytorch. There will be no explainer on this and TAs will not be able to help you with these basics.

Deep Learning: Proficiency in Python, Numpy, and PyTorch, vectorized programming, and training deep neural networks. Convolutional neural networks, transformers, MLPs, backpropagation.

Linear Algebra: Vector spaces, matrix-matrix products, matrix-vector products, change-of-basis, inner products and norms, Eigenvalues, Eigenvectors, Singular Value Decomposition, Fourier Transform, Convolution.

Schedule

6.8300 will be held as lectures in room 26-100:

Tuesday  – 
Thursday  – 

Collaboration Policy

Psets should be written up individually and should reflect your own individual work. However, you may discuss with your peers, TAs, and instructors.

You should not copy or share complete solutions or ask others if your answer is correct (in person or via piazza/canvas).

If you work with anyone on the pset (other than TAs and instructors), list their names at the top of the pset.

AI Assistants Policy

Our policy for using ChatGPT and other AI assistants is identical to our policy for using human assistants.

Just like you can come to office hours and ask a human questions (about the lecture material, clarifications about pset questions, tips for getting started, etc), you are very welcome to do the same with AI assistants.

But: just like you are not allowed to ask an expert friend to do your homework for you, you also should not ask an expert AI.

If it is ever unclear, just imagine the AI as a human and apply the same norm as you would with a human.

If you work with any AI on a pset, briefly describe which AI and how you used it at the top of the pset (a few sentences is enough).

Grading Policy

Grading will be split between four module-specific problem sets and a final project:

65% Problem Sets
6 problem sets
35% Final Project
Proposal (5%) + Final Report & Video/Presentation (30%)

The final project will be a research project on perception of your choice.

You will run experiments and do analysis to explore your research question. You will then write up your research in the format of a blog post, which will include an explanation of the background material, the new investigations, and the results you found.

You are encouraged to include plots, animations, and interactive graphics to make your findings clear. Some examples of nice research blog posts are here: [1] [2] [3] [4].

The final project will be graded for clarity and insight as well as novelty and depth of the experiments and analysis. Detailed guidance will be given later in the semester.

We encourage you to discuss the ideas in your problem sets with other students and AI tools, but we expect you to code up solutions individually. For paper presentations and final projects students may group up in teams of two to three. You may use up to 5 late days total for the problem sets (for exceptional situations, contact the course staff).

Syllabus

Module 0: Introduction to Computer Vision

Introduction to Vision

  • Historical perspective on vision: problems identified so far
  • Impact of deep learning: dataset-driven solutions
  • Unsolved challenges: OOD generalization, learning from limited data, world models

Module 1: Module 1: Geometry, 3D and 4D

What is an Image: Pinhole Cameras & Projective Geometry

  • Image as a 2D signal
  • Image as measurements of a 3D light field
  • Pinhole camera and perspective projection
  • Camera motion and poses

Linear Image Processing & Transformations

  • Images as functions: continuous vs discrete
  • Function spaces and Fourier transform overview
  • Image filtering: gradients, Laplacians, convolutions
  • Multi-scale processing: Laplacian and multi-scale pyramids

Representation Theory in Vision

  • Representations of groups and spaces
  • Lie groups and exponential maps
  • Equivariance and invariance
  • Shift-equivariance in CNNs and Fourier transform connections

No Class (Monday Schedule)

holiday

Geometric Deep Learning and Vision

  • Overview of geometric deep learning principles
  • Challenges of applying geometric techniques to vision tasks
  • Potential research directions

Correspondence, Optical Flow, and Scene Flow

  • Single images vs dynamic measurements
  • Sparse Correspondence and Invariant Descriptors
  • SIFT and SuperGlue
  • Scene Flow

Correspondence, Optical Flow, and Scene Flow 2

  • Dense Correspondence
  • Optical flow equation
  • RAFT and point tracking methods

Multi-View Geometry

  • Triangulation and epipolar geometry
  • Eight-point algorithm and bundle adjustment
  • Depth prediction and self-supervised approaches

Guest Lecture: Deep Learning for 3D Reconstruction

guest lecture
  • Guest Lecture on Recent Techniques of Deep Learning for 3D Reconstruction

Data Structures and Signal Parameterizations

  • Efficient representations of signals
  • Grid-based and adaptive data structures
  • Applications in vision tasks

Differentiable Rendering & Novel View Synthesis

  • Sphere tracing and volume rendering
  • Differentiable rendering techniques

Differentiable Rendering & Novel View Synthesis 2

  • Gaussian splatting
  • Advanced differentiable rendering methods

Prior-Based 3D Reconstruction and Novel View Synthesis

  • Global inference techniques
  • Light field inference and generative models

Student Holiday: Spring Break

holiday

Student Holiday: Spring Break

holiday

Open Problems in Geometry, 3D, and 4D

  • Multi-view generative models
  • Open research directions

Module 2: Module 2: Unsupervised Representation Learning and Generative Modeling

Introduction to Representation Learning and Generative Modeling

  • Generative modeling: density estimation, uncertainty modeling
  • Representation learning: task-relevant encoding
  • Surrogate tasks: compression, denoising, imputation

Latent Variable Models and VAEs

  • Latent variable models: unconditional and conditional priors
  • VAEs and generative query networks
  • Comparative analysis of latent spaces

Diffusion Models

  • Optimal Denoiser Perspective on Diffusion
  • Spectral Perspective on Diffusion
  • Generalization in Diffusion Models

Diffusion Models 2

  • Guidance
  • Score Distillation Sampling

Sequence Generative Models

  • Auto-regressive and full-sequence models
  • Compounding errors and stability

Bridging Domain Gaps

  • Neural scene representation and rendering
  • Domain gap challenges in vision

Non-Generative Representation Learning

  • Alternative representation learning techniques
  • Applications in computer vision

Open Problems in Representation Learning

  • What are objects and how to learn them?
  • Discovering geometry in representations

Guest Lecture: Self-Supervised Learning for Vision

guest lecture
  • Guest Lecture

Module 3: Module 3: Vision for Embodied Agents

Introduction to Robotic Perception

  • Definition and challenges of embodied agents
  • Intersection with vision

Sequence Generative Modeling for Decision-Making

  • Diffusion-based planning and policy models

Vision for Inverse Kinematics and State Estimation

  • Inverse kinematics and state estimation models
  • Applications in robotics