Course Contents
This course dives into advanced concepts in computer vision. A first focus is geometry in computer vision, including image formation, represnetation theory for vision, classic multi-view geometry, multi-view geometry in the age of deep learning, differentiable rendering, neural scene representations, correspondence estimation, optical flow computation, and point tracking.
Next, we explore generative modeling and representation learning including image and video generation, guidance in diffusion models, conditional probabilistic models, as well as representation learning in the form of contrastive and masking-based methods.
Finally, we will explore the intersection of robotics and computer vision with "vision for embodied agents", investigating the role of vision for decision-making, planning and control.
This class is an advanced graduate-level class. You have to have working knowledge of the following topics, i.e., be able to work with them in numpy / scipy / pytorch. There will be no explainer on this and TAs will not be able to help you with these basics.
Deep Learning: Proficiency in Python, Numpy, and PyTorch, vectorized programming, and training deep neural networks. Convolutional neural networks, transformers, MLPs, backpropagation.
Linear Algebra: Vector spaces, matrix-matrix products, matrix-vector products, change-of-basis, inner products and norms, Eigenvalues, Eigenvectors, Singular Value Decomposition, Fourier Transform, Convolution.
6.8300 will be held as lectures in room 26-100:
Tuesday |  –  |
Thursday |  –  |
Collaboration Policy
Psets should be written up individually and should reflect your own individual work. However, you may discuss with your peers, TAs, and instructors.
You should not copy or share complete solutions or ask others if your answer is correct (in person or via piazza/canvas).
If you work with anyone on the pset (other than TAs and instructors), list their names at the top of the pset.
AI Assistants Policy
Our policy for using ChatGPT and other AI assistants is identical to our policy for using human assistants.
Just like you can come to office hours and ask a human questions (about the lecture material, clarifications about pset questions, tips for getting started, etc), you are very welcome to do the same with AI assistants.
But: just like you are not allowed to ask an expert friend to do your homework for you, you also should not ask an expert AI.
If it is ever unclear, just imagine the AI as a human and apply the same norm as you would with a human.
If you work with any AI on a pset, briefly describe which AI and how you used it at the top of the pset (a few sentences is enough).
Grading Policy
Grading will be split between four module-specific problem sets and a final project:
65% | Problem Sets 6 problem sets |
35% | Final Project Proposal (5%) + Final Report & Video/Presentation (30%) The final project will be a research project on perception of your choice. You will run experiments and do analysis to explore your research question. You will then write up your research in the format of a blog post, which will include an explanation of the background material, the new investigations, and the results you found. You are encouraged to include plots, animations, and interactive graphics to make your findings clear. Some examples of nice research blog posts are here: [1] [2] [3] [4]. The final project will be graded for clarity and insight as well as novelty and depth of the experiments and analysis. Detailed guidance will be given later in the semester. |
We encourage you to discuss the ideas in your problem sets with other students and AI tools, but we expect you to code up solutions individually. For paper presentations and final projects students may group up in teams of two to three. You may use up to 5 late days total for the problem sets (for exceptional situations, contact the course staff).
Module 0: Introduction to Computer Vision | ||
Introduction to Vision |
| |
Module 1: Module 1: Geometry, 3D and 4D | ||
What is an Image: Pinhole Cameras & Projective Geometry |
| |
Linear Image Processing & Transformations |
| |
Representation Theory in Vision |
| |
No Class (Monday Schedule)holiday | ||
Geometric Deep Learning and Vision |
| |
Correspondence, Optical Flow, and Scene Flow |
| |
Correspondence, Optical Flow, and Scene Flow 2 |
| |
Multi-View Geometry |
| |
Guest Lecture: Deep Learning for 3D Reconstructionguest lecture |
| |
Data Structures and Signal Parameterizations |
| |
Differentiable Rendering & Novel View Synthesis |
| |
Differentiable Rendering & Novel View Synthesis 2 |
| |
Prior-Based 3D Reconstruction and Novel View Synthesis |
| |
Student Holiday: Spring Breakholiday | ||
Student Holiday: Spring Breakholiday | ||
Open Problems in Geometry, 3D, and 4D |
| |
Module 2: Module 2: Unsupervised Representation Learning and Generative Modeling | ||
Introduction to Representation Learning and Generative Modeling |
| |
Latent Variable Models and VAEs |
| |
Diffusion Models |
| |
Diffusion Models 2 |
| |
Sequence Generative Models |
| |
Bridging Domain Gaps |
| |
Non-Generative Representation Learning |
| |
Open Problems in Representation Learning |
| |
Guest Lecture: Self-Supervised Learning for Visionguest lecture |
| |
Module 3: Module 3: Vision for Embodied Agents | ||
Introduction to Robotic Perception |
| |
Sequence Generative Modeling for Decision-Making |
| |
Vision for Inverse Kinematics and State Estimation |