DATA8005: Advanced Natural Language Processing

Course Information

Instructor

Lecture

Course Description

Natural language processing (NLP) is the study of human language from a computational perspective. This course is an introductory graduate-level course on natural language processing aimed at students who are interested in doing cutting-edge research in the field. In this class, we will cover recent developments on core techniques and modern advances in NLP, especially in the era of large language models. We will also survey some recent NLP research topics including language grounding, agents, multimodality, interactivity, and interoperability for NLP. Students will gain the necessary skills and experience to understand, design, implement, and test large language models through a final project. We will also introduce cutting-edge research topics and learn how to conduct NLP research through paper readings and discussions. We will potentially also host invited speakers for talks.

Prerequisites

We require students to have prior knowledge undergraduate linear algebra, probability and statistics, machine learning, or deep learning. Familiarity with Python programming is required. Introduction to natural language processing is recommended.

Course Materials

There is no required textbook for this course (Natural Language Processing by Jacob Eisenstein is recommended if you would like to read more about NLP). Readings from papers, blogs, tutorials, and book chapters will be posted on the course website.

Grading

Course Schedule

Date Topic Material Event Due
Week 1
Sep 6
Canceled due to bad weather
Week 2
Sep 13
Introduction (Tao Yu)
[slides]
Readings
Week 3
Sep 20
Introduction to LLMs (Tao Yu)
[slides]
Readings Others Registration Out
Week 4
Sep 27
Introduction to LLMs (Tao Yu)
[slides]
The Llama 3 Herd of Models
[slides]
Readings
Week 5
Oct 4
LM post-training 2: SFT, instruction tuning (Sihui Ji, Tianzhe Chu)
[slides]
LM data and evaluation (Jianrui Wu, Tianle Li)
[slides]
Readings Project registration Due
Week 6
Oct 11
No class
Week 7
Oct 18
No class
Week 8
Oct 25
LM safety, bias, and privacy (Yifeng Lin, Pinglu Gong, Fengyi Xu)
[slides]
LM post-training 2: alignment, RLHF/DPO (Runhui Huang, Yiyang Wang)
[slides]
Readings Project proposal Due
Week 9
Nov 1
Efficient LM adaptation (Sidi Yang, Yatai Ji, Jing Xiong)
[slides]
Efficient LM training (Qi Guicheng, Shen Che, Zijian Ye)
[slides]
Readings
Week 10
Nov 8
Multimodal LMs 1 (Chenming Zhu, Pei Zhou, Yi Zhang)
[slides]
Multimodal LMs 2 (Mengzhao Chen, Tianshuo Yang, Chengqi Duan)
[slides]
Readings
Week 11
Nov 15
LLM/VLMs + Robotics 1 (Feng Chen, Ruizhe Liu)
[slides]
LLM/VLMs + Robotics 2 (Yi Chen, Lu Qiu)
[slides]
Readings Others
Week 12
Nov 22
LLM/VLMs as Agents (Xinyuan Wang, Bowen Wang)
[slides]
Agents in the digital and physical world (Tao Yu)
Readings
Week 13
Nov 29
Embodied AI (Guest lecture : Yanchao Yang)
Week 15
Dec 6
No class
Week 16
Dec 13
No class