I am an Assistant Professor in the Computer Science Department and an HKU-IDS Scholar at The University of Hong Kong, where I co-direct the HKU NLP Lab. I am the recipient of the 2021 Amazon Research Award. I spent one year in the UW NLP Group working with Noah Smith, Luke Zettlemoyer, and Mari Ostendorf. Previously, I completed my Ph.D. in Computer Science from Yale University, advised by Dragomir Radev. Before coming to Yale, I got my master's at Columbia University advised by Owen Rambow and Kathleen McKeown. Throughout my graduate studies, I spent several summers doing internships in industry, including Salesforce Research and Microsoft Research.
My main research interest is in Natural Language Processing. The goal of my research is to design and build conversational natural language interfaces (NLIs) that can help humans explore and reason over data in any application (e.g., relational databases and mobile apps) in a robust and trusted manner. It involves:
Most recent publications on Google Scholar.
* indicates equal contribution.
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A Smith, Luke Zettlemoyer, Tao Yu
Preprint, 2022
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
Yuhang Lai*, Chengxi Li*, Yiming Wang*, Tianyi Zhang*, Ruiqi Zhong*, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu
Preprint, 2022
Binding Language Models in Symbolic Languages
Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Tao Yu
ICLR, 2023
Selective Annotation Makes Language Models Better Few-Shot Learners
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR, 2023
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
Tianbao Xie*, Chen Henry Wu*, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu
EMNLP 2022. Long Paper
In-Context Learning for Few-Shot Dialogue State Tracking
Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf
Findings of EMNLP 2022. Long Paper
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye*, Jiahui Gao*, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong
EMNLP 2022. Long Paper
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong
ICLR 2021. Long Paper
Semantic Evaluation for Text-to-SQL with Distilled Test Suites
Ruiqi Zhong, Tao Yu, Dan Klein
EMNLP 2020. Long Paper
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang and Dragomir Radev
EMNLP 2018. Long Paper
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A Smith, Luke Zettlemoyer, Tao Yu
Preprint, 2022
Coder Reviewer Reranking for Code Generation
Tianyi Zhang, Tao Yu, Tatsunori B Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I Wang
Preprint, 2022
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
Yuhang Lai*, Chengxi Li*, Yiming Wang*, Tianyi Zhang*, Ruiqi Zhong*, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu
Preprint, 2022
Binding Language Models in Symbolic Languages
Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Tao Yu
ICLR, 2023
Selective Annotation Makes Language Models Better Few-Shot Learners
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR, 2023
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
Tianbao Xie*, Chen Henry Wu*, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu
EMNLP 2022. Long Paper
In-Context Learning for Few-Shot Dialogue State Tracking
Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf
Findings of EMNLP 2022. Long Paper
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye*, Jiahui Gao*, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong
EMNLP 2022. Long Paper
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
Jiacheng Ye, Jiahui Gao, Zhiyong Wu, Jiangtao Feng, Tao Yu, and Lingpeng Kong
Findings of EMNLP 2022. Long Paper
Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play
Qi Liu, Zihuiwen Ye, Tao Yu, Phil Blunsom, Linfeng Song
Findings of EMNLP 2022. Long Paper
NL2INTERFACE: Interactive Visualization Interface Generation from Natural Language Queries
Yiru Chen, Ryan Li, Austin Mac, Tianbao Xie, Tao Yu, Eugene Wu
IEEE Visualization Conference NLVIZ Workshop, 2022
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
with the BIG-bench team (442 authors)
Preprint, 2022
FOLIO: Natural Language Reasoning with First-Order Logic
with Simeng Han, Rui Zhang, Alexander R Fabbri, Xi Victoria Lin, Caiming Xiong, Dragomir Radev and many authors
Preprint, 2022
DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
Ziming Mao*, Chen Henry Wu*, Ansong Ni, Yusen Zhang, Rui Zhang, Tao Yu, Budhaditya Deb, Chenguang Zhu, Ahmed H Awadallah, Dragomir Radev
ACL 2022. Long Paper
An Exploratory Study on Long Dialogue Summarization: What Works and What's Next
Yusen Zhang*, Ansong Ni*, Tao Yu, Rui Zhang, Chenguang Zhu, Budhaditya Deb, Asli Celikyilmaz, Ahmed Hassan Awadallah, Dragomir Radev
Findings of EMNLP 2021. Short Paper
SummerTime: Text Summarization Toolkit for Non-experts
Ansong Ni, Zhangir Azerbayev, Mutethia Mutuma, Troy Feng, Yusen Zhang, Tao Yu, Ahmed Hassan Awadallah, Dragomir Radev
EMNLP 2021. Demo Track
Testing Cross-Database Semantic Parsers Using Canonical Utterances
Heather Lent, Semih Yavuz, Tao Yu, Tong Niu, Yingbo Zhou, Dragomir Radev, Xi Victoria Lin
EMNLP 2021 Workshop: Evaluation & Comparison of NLP Systems. Best Paper Award
Logic-Consistency Text Generation from Semantic Parses
Chang Shu, Yusen Zhang, Xiangyu Dong, Peng Shi, Tao Yu, Rui Zhang
Findings of ACL 2021. Long Paper
QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization
Ming Zhong*, Da Yin*, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli Celikyilmaz, Yang Liu, Xipeng Qiu and Dragomir Radev
NAACL 2021. Long Paper
DART: Open-Domain Structured Data Record to Text Generation
with Linyong Nan, Dragomir Radev, Rui Zhang, Neha Verma, Xi Victoria Lin, Caiming Xiong, Richard Socher and many authors.
NAACL 2021. Long Paper
SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing
Tao Yu, Rui Zhang, Alex Polozov, Christopher Meek, Ahmed Hassan Awadallah
ICLR 2021. Long Paper
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong
ICLR 2021. Long Paper
Semantic Evaluation for Text-to-SQL with Distilled Test Suites
Ruiqi Zhong, Tao Yu, Dan Klein
EMNLP 2020. Long Paper
Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text-to-SQL
Yusen Zhang, Xiangyu Dong, Shuaichen Chang, Tao Yu, Peng Shi, Rui Zhang
EMNLP 2020 Workshop on Interactive and Executable Semantic Parsing. Short Paper
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
Tao Yu, Rui Zhang He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter Lasecki, Dragomir Radev
EMNLP 2019. Long Paper
Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev
EMNLP 2019. Long Paper
SParC: Cross-Domain Semantic Parsing in Context
Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher and Dragomir Radev
ACL 2019. Long Paper
Twitter Sentiment in New York City Parks as Measure of Well-being
Richard A Plunz, Yijia Zhou, Maria Isabel Carrasco Vintimilla, Kathleen Mckeown, Tao Yu, Laura Uguccioni, Maria Paola Sutto
Landscape and Urban Planning 2019. Long Paper
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang and Dragomir Radev
EMNLP 2018. Long Paper
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li and Dragomir Radev
EMNLP 2018. Long Paper
TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation
Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, Dragomir Radev
NAACL 2018. Short Paper
Cross-lingual Sentiment Transfer with Limited Resources
Mohammad Sadegh Rasooli, Noura Farra, Axinia Radeva, Tao Yu, and Kathleen McKeown
Machine Translation 2017. Long Paper
The Columbia-GWU System at the 2016 TAC KBP BeSt Evaluation
Owen Rambow, Tao Yu, Axinia Radeva, Sardar Hamidian, Alexander R. Fabbri, Debanjan Ghosh, Christopher Hidey, Tianrui Peng, Mona Diab, Kathleen McKeown, Smaranda Muresan
NIST TAC KBP Workshop, 2016. Long Paper
Tianbao Xie, Ph.D. student, 2022
Hongjin Su, Ph.D. student, 2022
Yiheng Xu, Ph.D. student, 2022, co-advised with Lingpeng Kong
Jiacheng Ye, Ph.D. student, 2022, co-advised with Lingpeng Kong
Zhoujun Cheng, Intern, 2022, SJTU BS/MS
Chen Henry Wu, Intern, 2022, THU BS → CMU PhD
Ming Zhong, Summer Intern, 2020, Fudan MS → UIUC PhD
Da Yin, Summer Intern, 2020, PKU BS → UCLA PhD
Yusen Zhang, Summer Intern, 2020, Emory MS → PSU PhD
Michihiro Yasunaga, Project Student, 2018-19, Yale BS → Stanford PhD
Building Natural Language Interfaces with Large Language Models, Nov. 2022
Amazon NLP Talks
Few-shot In-context Learning with Large Language Models, Jun. 2022
AllState Tech Talks
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models, Feb. 2022
ServiceNow Research (Prev. ElementAI)
SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing, Apr. 2021
NLP Reading Group, Google Research
Learning to Build Conversational Natural Language Interfaces, Jan - Mar. 2021
The University of Hong Kong (CS)
The National University of Singapore (CS)
The University of Wisconsin-Madison (CS)
Simon Fraser University (CS)
The University of Minnesota Twin Cities (CSE)
SParC: Cross-Domain Semantic Parsing in Context, Sep. 2019
Microsoft Research AI Breakthroughs Workshop, Redmond
Organizing Committee
ACL 2023
SUKI: Structured and Unstructured Knowledge Integration Workshop@NAACL 2022
IntEx-SemPar: Interactive and Executable Semantic Parsing Workshop@EMNLP 2020
Program Committee/Reviewer
ACL Rolling Review
ACL: 2020, 2021, 2022
EMNLP: 2019, 2020, 2021, 2022
ICLR: 2022,
NeurIPS: 2022
NAACL: 2019, 2021
COLING: 2020, 2022
AACL-IJCNLP: 2020
Full Resume in PDF.
I did a cycling tour (~2 weeks) at the top of the world, Tibet (avg elevation: ~4500 meters). I am also a student pilot. I enjoy hiking, travelling, and cooking. I ski and skate, and I am learning tennis.
I am from Ningdu (a less developed but beautiful county), Jiangxi Province in China. I’ve lived in (stayed for over 3 months) more than 15 cities including Zhongshan, Beijing, Shanghai, Salt Lake City, New York City, San Francisco, New Haven, Columbus, Honolulu, and San Diego etc. I've also visited over 60 cities around the world.