CS291K Special Topics in Foundation Models -Schedule

Date	Topic	Comments
Mar 30	Course Introduction The Bitter Lesson, Rich Sutton
Apr 1	Quiz + April 8, HFH 1132, Talk: Adaptive Inference in Large Language Models
Apr 6	Topic: Transformers Paper: Attention Is All You Need	Basic Concepts
Apr 8	Topic: GPT, ChatGPT Paper: GPT: Improving Language Understanding by Generative Pre-Training Paper: GPT-3: Language Models are Few-Shot Learners Paper: InstructGPT: Training Language Models to Follow Instructions with Human Feedback
Apr 13	Topic: Retrieval-Augmented Generation Paper: Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models (Gurusha Juneja ) Paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Paper: The Faiss library (Sohaib)
Apr 15	Topic: Chain of Thoughts Paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Gen Tamada) Paper: Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wren Vandervelde) Paper: Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Javin Zipkin)
Apr 20	Topic: Prompt Engineering and LLM Agents Blog: Prompt Engineering Paper: ReAct: Synergizing Reasoning and Acting in Language Models (Zhaotian Weng) Paper: Reflexion: Language Agents with Verbal Reinforcement Learning (Sterling Hsu)
Apr 22	Topic: More LLM Agents Paper: Toolformer: Language Models Can Teach Themselves to Use Tools (Nik Belle) Paper: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Julia Novick) Paper: OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning (Dakota Bames) Practice: OpenAI Tool Calling, OpenClaw, NanoClaw
Apr 27	Topic: Multimodality Paper: CLIP: Learning Transferable Visual Models From Natural Language Supervision (Ruijie Zhang) Paper: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (Klaus Zhang) Paper: Visual Instruction Tuning (Chuhan Li)	Project Proposal Due
Apr 29	Topic: Long Context / KV Cache Paper: Longformer: The Long-Document Transformer (Joyce Chen) Paper: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (Naomi Rehman) Paper: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (Xinghan Yang)
May 4	Topic: Fine Tuning Paper: Adapters: Parameter-Efficient Transfer Learning for NLP (Runyu Han) Paper: LoRA: Low-Rank Adaptation of Large Language Models (Mihir Srivastava) Paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Yupeng Su)
May 6	Topic: Make it smaller Paper: Fast Inference from Transformers via Speculative Decoding (Olivia Chen) Paper: EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test (please also discuss DeepSeek-V3 Technical Report (MTP part)) (Surya Gunukula)
May 11	Topic: Mixture of Experts Paper: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (Jingtao) Paper: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Jingsi Hu) Paper: DeepSeek-V3 Technical Report (MOE part) (Yue)	Paper review or System Play due
May 13	Topic: Reasoning Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Chengzhi Liu ) Paper: Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning (Zhiyuan Wang)
May 18	Topic: Misc. Paper: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers (Quinn Koster ) Paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Edison Chen)
May 20	Exam
May 25	Holiday
May 27	Project Presentation
Jun 1	Project Presentation
Jun 3	Project Presentation
Jun 10	Project Report Due