CS291K - Schedule home | schedule


 

Date

Topic

Comments

Mar 30 Course Introduction
The Bitter Lesson, Rich Sutton
 
Apr  1 Quiz  + April 8, HFH 1132, Talk:  Adaptive Inference in Large Language Models  
Apr 6

Topic: Transformers

Paper: Attention Is All You Need

Basic Concepts
Apr 8

Topic: GPT, ChatGPT

Paper: GPT: Improving Language Understanding by Generative Pre-Training

Paper: GPT-3:  Language Models are Few-Shot Learners

Paper: InstructGPT: Training Language Models to Follow Instructions with Human Feedback

Apr 13

Topic: Retrieval-Augmented Generation

Paper: Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Paper: The Faiss library

 
Apr 15 Topic: Chain of Thoughts
Paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper: Self-Consistency Improves Chain of Thought Reasoning in Language Models
Paper: Tree of Thoughts: Deliberate Problem Solving with Large Language Models
 
Apr 20 Topic: Prompt Engineering and LLM Agents
Blog:  Prompt Engineering
Paper: ReAct: Synergizing Reasoning and Acting in Language Models
Paper: Reflexion: Language Agents with Verbal Reinforcement Learning
 
Apr 24

Topic: More LLM Agents
Paper: Toolformer: Language Models Can Teach Themselves to Use Tools

Paper: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Paper: OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

Practice: OpenAI Tool CallingOpenClawNanoClaw

Project Proposal Due
Apr 27

Topic: Multimodality

Paper: CLIP: Learning Transferable Visual Models From Natural Language Supervision

Paper: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 

Paper: Visual Instruction Tuning

 
Apr 29

Topic: Long Context / KV Cache

Paper: Longformer: The Long-Document Transformer
Paper: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
May 4

Topic: Fine Tuning
Paper: Adapters: Parameter-Efficient Transfer Learning for NLP

Paper: LoRA: Low-Rank Adaptation of Large Language Models

Paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model

 
May 6

Topic: Make it smaller

Paper: Fast Inference from Transformers via Speculative Decoding

Paper:  EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test (please also discuss DeepSeek-V3 Technical Report (MTP part))

 
May 11 Topic: Mixture of Experts
Paper: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper: DeepSeek-V3 Technical Report (MOE part)
Paper review or System Play due
May 13 Topic: Reasoning
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper: Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
 
May 18 Topic: Misc.
Paper: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Paper:
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
May 20 Exam  
May 25

Holiday

May 27 Project Presentation
Jun 1 Project Presentation
 
Jun 3 Project Presentation  
Jun 10 Project Report Due