CS291A - Schedule home | schedule


 

Date

Topic

Comments

Mar 31 Course Introduction
The Bitter Lesson, Rich Sutton
 
Apr  4 Transformers / Quiz Basic Concepts
Apr 7

Topic: GPT, ChatGPT (Weizhi)

Paper: GPT: Improving Language Understanding by Generative Pre-Training (Weizhi)

Paper: GPT-3:  Language Models are Few-Shot Learners  (Weizhi)

Paper: InstructGPT: Training Language Models to Follow Instructions with Human Feedback (Weizhi)

 
Apr 11

Topic: Retrieval-Augmented Generation

Paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Xuan)

Paper: From Local to Global: A Graph RAG Approach to Query-Focused Summarization (Xuan)

Paper: SimCSE: Simple Contrastive Learning of Sentence Embeddings (Xuan)

Practice: Try LangChain to build a question answering chatbot

Apr 14 Topic: Intelligent Agents
Paper: Reflexion: Language Agents with Verbal Reinforcement Learning
Paper: OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
Survey: Agent framework: AutoGen, CrewAI, Swarm and Mica (our own) (Sirui)
 
Apr 18 Topic: Chain of Thoughts
Paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper: Self-Consistency Improves Chain of Thought Reasoning in Language Models
Paper: Tree of Thoughts: Deliberate Problem Solving with Large Language Models
 
Apr 21 Topic: Prompt Engineering and Actionable LLMs
Blog:  Prompt Engineering
Paper: ReAct: Synergizing Reasoning and Acting in Language Models
Paper: Toolformer: Language Models Can Teach Themselves to Use Tools
Practice: Read tutorials, e.g., this one,   try GPT4 function call
 
Apr 25

Topic: Multimodality

Paper: CLIP: Learning Transferable Visual Models From Natural Language Supervision

Paper: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Paper: Visual Instruction Tuning

Project Proposal Due
May 2

Topic: Long Context

Paper: Longformer: The Long-Document Transformer

Paper: Big Bird: Transformers for Longer Sequences
Paper: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
 
May 5

Topic: Prefix and Adapter in Context Learning
Paper: Adapters: Parameter-Efficient Transfer Learning for NLP

Paper: Prefix-Tuning: Optimizing Continuous Prompts for Generation

Paper: LoRA: Low-Rank Adaptation of Large Language Models

 
May 9

Topic: Make it smaller

Paper: DeepSeek-V3 Technical Report (knowledge distillation part)

Paper: Fast Inference from Transformers via Speculative Decoding

Paper: FlexiDepth: Dynamic Layer-skipping in Pre-trained LLMs (brand new, Xuan)

 
May 12 Topic: Mixture of Experts
Paper: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper: DeepSeek-V3 Technical Report (MOE part)
Paper review or System Play due
May 19 Topic: Misc.
Paper:
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
 
May 23 Exam  
May 26

Holiday

May 30 Project Presentation
Jun 2 Project Presentation
 
Jun 6 Project Presentation