AI Research Experiences

Harvard CS197

Learn to do applied deep learning research

In this course, you will learn the practical skills required for applied deep learning work, including hands-on experience with model development. You will learn the technical writing skills required for applied AI research, including experience composing different elements of a full research paper.

Instructed by Professor Pranav Rajpurkar.

Lecture Notes

You Complete My Sandwiches

Exciting Advances with AI Language Models

Lecture 1 Notes

  • Interact with language models to test their capabilities using zero-shot and few-shot learning.

  • Learn to build simple apps with GPT-3’s text completion and use Codex’s code generation abilities.

  • Learn how language models can have a pernicious tendency to reflect societal biases.

The Zen of Python

Software Engineering Fundamentals

Lecture 2 notes

  • Edit Python codebases effectively using the VSCode editor.

  • Use git and conda comfortably in your coding workflow.

  • Debug without print statements using breakpoints and logpoints

  • Use linting to find errors and improve Python style.

Shoulders of Giants

Reading AI Research Papers

Lecture 3 notes

  • Conduct a literature search to identify papers relevant to a topic of interest

  • Read a machine learning research paper and summarize its contributions

  • Summarize previous works in an area

In-Tune with Jazz Hands

Fine-tuning a Language Model using Hugging Face

Lecture 4 notes

  • Load up and process a natural language processing dataset using the datasets library.

  • Tokenize a text sequence, and understand the steps used in tokenization.

  • Construct a dataset and training step for causal language modeling.

Lightning McTorch

Fine-tuning a Vision Transformer using Lightning

Lecture 5 notes

  • Interact with code to explore data loading and tokenization of images for Vision Transformers.

  • Parse code for PyTorch architecture and modules for building a Vision Transformer.

  • Get acquainted with an example training workflow with PyTorch Lightning.

Moonwalking with PyTorch

Solidifying PyTorch Fundamentals

Lectures 6+7 notes

  • Perform Tensor operations in PyTorch.

  • Understand the backward and forward passes of a neural network in context of Autograd.

  • Detect common issues in PyTorch training code

Experiment Organization Sparks Joy

Organizing Model Training with Weights & Biases and Hydra

Lectures 8+9 notes

  • Manage experiment logging and tracking through Weights & Biases.

  • Perform hyperparameter search with Weights & Biases Sweeps.

  • Manage complex configurations using Hydra.

I Dreamed a Dream

A Framework for Generating Research Ideas

Lectures 10+11 notes

  • Identify gaps in a research paper, including in the research question, experimental setup, and findings.

  • Generate ideas to build on a research paper, thinking about the elements of the task of interest, evaluation strategy and the proposed method.

  • Iterate on your ideas to improve their quality.

Today Was a Fairytale

Structuring a Research Paper

Lectures 12+13 notes

  • Deconstruct the elements of a research paper and their sequence.

  • Make notes on the global structure and local structure of the research paper writing.

Deep Learning on Cloud Nine

AWS EC2 for Deep Learning: Setup, Optimization, and Hands-on Training with CheXzero

Lectures 14+15 notes

  • Understand how to set up and connect to an AWS EC2 instance for deep learning.

  • Learn how to modify deep learning code for use with GPUs.

  • Gain hands-on experience running the model training process using a real codebase.

Make your dreams come tuned

Fine-Tuning Your Stable Diffusion Model

Lectures 16+17 notes

  • Create and fine-tune Stable Diffusion models using a Dreambooth template notebook.

  • Use AWS to accelerate the training of Stable Diffusion models with GPUs.

  • Work with unfamiliar codebases and use new tools, including Dreambooth, Colab, Accelerate, and Gradio, without necessarily needing a deep understanding of them.

Research Productivity Power-Ups

Tips to Manage Your Time and Efforts

Lectures 18 notes

  • Learn how to use update meetings and working sessions to stay aligned and make progress on a project.

  • Understand how to use various tools and techniques to improve team communication and project organization.

  • Learn strategies for organizing your efforts on a project, considering the stage of the project and the various tasks involved.

The AI Ninja

Making Progress and Impact in AI Research

Lectures 19 notes

  • Learn how to make steady progress in research, including managing your relation with your advisor, and skills to develop.

  • Gain a deeper understanding of how to increase the impact of your work

Bejeweled

Tips for Creating High-Quality Slides

Lectures 20 notes

  • Apply key principles of the assertion-evidence approach for creating effective slides for talks.

  • Identify common pitfalls in typical slide presentations and strategies for avoiding them.

  • Apply the techniques learned in this lecture to real-world examples of research talk slides to improve their effectiveness.

Model Showdown

Statistical Testing to Compare Model Performances

Lectures 21 notes

  • Understand the different statistical tests that can be used to compare machine learning models, including McNemar's test, the paired t-test, and the bootstrap method.Be able to implement these statistical tests in Python to evaluate the performance of two models on the same test set.

  • Be able to select an appropriate test for a given research question, including tests for statistical superiority, non-inferiority, and equivalence.

FAQs

  • CS 197 is a course in applied deep learning research. In this course, you will learn the practical skills required for applied deep learning work, including hands-on experience with method development, model training at scale, error analysis, and model deployment. You will learn the technical writing skills required for applied AI research, including experience composing different elements of a full research paper. Through structured assignments, you will tackle a scoped-out research project in a small team from conception to co-authoring a manuscript.

  • After the course, you should be comfortable with the practical skills required for applied deep learning engineering and research work, including hands-on experience with method development, model training at scale, and deploying your model. Some more concrete learning objectives include:

    Develop strong conceptual background in deep learning

    • Read and (mostly) understand new papers in ML research

    • Understand theory behind important model architectures and algorithms, including Transformers

    • Know when to use certain evaluation metrics and statistical tests

    Gain skills needed to execute deep learning projects

    • Write your own or build upon modular research code

    • Use standard tooling for ML projects, including conda environments, remote clusters, weights & biases, etc.

    • Implement and perform data loading, model training, and results logging in python/PyTorch

    • Solve common ML problems, e.g. how to preprocess data, handle class imbalance, finetune models for downstream tasks

    Present your research effectively

    • Release clear, accessible, and well-documented code on GitHub

    • Write a high-quality technical research report, with clear sections and figures

    • Deliver an engaging research talk

  • There is an application to enroll in CS197. The applications are now closed.

    Diversity and inclusion

    CS 197 welcomes a diversity of thoughts, perspectives, and experiences. The CS 197 teaching staff respects our students’ identities, including but not limited to race, gender, class, sexuality, socioeconomic status, religion, and ability, and we strive to create a learning environment where every student feels welcome and valued. We can only accomplish this goal with your help. If something is said in class (by anyone) or you come across instructional material that made you uncomfortable, please talk to the instructors about it (even if anonymously).

  • If you’re not at Harvard, you can follow this course. We will be sharing course materials online. Sign up at the end of the webpage to get updates.

Course Instruction

The course, in its first offering at Harvard and to the public, has been designed by Professor Pranav Rajpurkar with Elaine Liu & Xiaoli Yang; several members and friends of the Rajpurkar Lab have contributed to an early draft of course materials, including Lucy He, Julie Chen, Vish Rao, Jon Williams, Ryan Chi, Nathan Chi, Mark Endo, Chenwei Wu, Kathy Yu, Ryan Han, Oishi Banerjee, Sameer Khanna, Zahra Shakeri, Julian Acosta, Ethan Chi, Vignav Ramesh, Priya Khandelwal. The Teaching Fellow for the course in Fall 2022 is Katherine Tian.