Tao Yu (余涛)

Assistant Professor
Department of Computer Science
Musketeers Foundation Institute of Data Science
The University of Hong Kong
Postdoc, CSE, University of Washington

tao.yu.nlp [AT] gmail.com

Bio

I am an Assistant Professor in the Computer Science Department and an HKU-IDS Scholar at The University of Hong Kong, where I co-direct the HKU NLP Lab. I am the recipient of the 2021 Amazon Research Award. I spent one year in the UW NLP Group working with Noah Smith, Luke Zettlemoyer, and Mari Ostendorf. Previously, I completed my Ph.D. in Computer Science from Yale University, advised by Dragomir Radev. Before coming to Yale, I got my master's at Columbia University advised by Owen Rambow and Kathleen McKeown. Throughout my graduate studies, I spent several summers doing internships in industry, including Salesforce Research and Microsoft Research.

My main research interest is in Natural Language Processing. The goal of my research is to design and build conversational natural language interfaces (NLIs) that can help humans explore and reason over data in any application (e.g., relational databases and mobile apps) in a robust and trusted manner. It involves:

Publications

Most recent publications on Google Scholar.
* indicates equal contribution.

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A Smith, Luke Zettlemoyer, Tao Yu

Preprint, 2022

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Yuhang Lai*, Chengxi Li*, Yiming Wang*, Tianyi Zhang*, Ruiqi Zhong*, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

Preprint, 2022

Binding Language Models in Symbolic Languages

Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Tao Yu

ICLR, 2023

Selective Annotation Makes Language Models Better Few-Shot Learners

Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

ICLR, 2023

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

Tianbao Xie*, Chen Henry Wu*, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu

EMNLP 2022. Long Paper

In-Context Learning for Few-Shot Dialogue State Tracking

Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf

Findings of EMNLP 2022. Long Paper

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Jiacheng Ye*, Jiahui Gao*, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong

EMNLP 2022. Long Paper

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong

ICLR 2021. Long Paper

Semantic Evaluation for Text-to-SQL with Distilled Test Suites

Ruiqi Zhong, Tao Yu, Dan Klein

EMNLP 2020. Long Paper

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang and Dragomir Radev

EMNLP 2018. Long Paper

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A Smith, Luke Zettlemoyer, Tao Yu

Preprint, 2022

Coder Reviewer Reranking for Code Generation

Tianyi Zhang, Tao Yu, Tatsunori B Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I Wang

Preprint, 2022

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Yuhang Lai*, Chengxi Li*, Yiming Wang*, Tianyi Zhang*, Ruiqi Zhong*, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

Preprint, 2022

Binding Language Models in Symbolic Languages

Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Tao Yu

ICLR, 2023

Selective Annotation Makes Language Models Better Few-Shot Learners

Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

ICLR, 2023

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

Tianbao Xie*, Chen Henry Wu*, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu

EMNLP 2022. Long Paper

In-Context Learning for Few-Shot Dialogue State Tracking

Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf

Findings of EMNLP 2022. Long Paper

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Jiacheng Ye*, Jiahui Gao*, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong

EMNLP 2022. Long Paper

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

Jiacheng Ye, Jiahui Gao, Zhiyong Wu, Jiangtao Feng, Tao Yu, and Lingpeng Kong

Findings of EMNLP 2022. Long Paper

Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

Qi Liu, Zihuiwen Ye, Tao Yu, Phil Blunsom, Linfeng Song

Findings of EMNLP 2022. Long Paper

NL2INTERFACE: Interactive Visualization Interface Generation from Natural Language Queries

Yiru Chen, Ryan Li, Austin Mac, Tianbao Xie, Tao Yu, Eugene Wu

IEEE Visualization Conference NLVIZ Workshop, 2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

with the BIG-bench team (442 authors)

Preprint, 2022

FOLIO: Natural Language Reasoning with First-Order Logic

with Simeng Han, Rui Zhang, Alexander R Fabbri, Xi Victoria Lin, Caiming Xiong, Dragomir Radev and many authors

Preprint, 2022

DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization

Ziming Mao*, Chen Henry Wu*, Ansong Ni, Yusen Zhang, Rui Zhang, Tao Yu, Budhaditya Deb, Chenguang Zhu, Ahmed H Awadallah, Dragomir Radev

ACL 2022. Long Paper

An Exploratory Study on Long Dialogue Summarization: What Works and What's Next

Yusen Zhang*, Ansong Ni*, Tao Yu, Rui Zhang, Chenguang Zhu, Budhaditya Deb, Asli Celikyilmaz, Ahmed Hassan Awadallah, Dragomir Radev

Findings of EMNLP 2021. Short Paper

SummerTime: Text Summarization Toolkit for Non-experts

Ansong Ni, Zhangir Azerbayev, Mutethia Mutuma, Troy Feng, Yusen Zhang, Tao Yu, Ahmed Hassan Awadallah, Dragomir Radev

EMNLP 2021. Demo Track

Testing Cross-Database Semantic Parsers Using Canonical Utterances

Heather Lent, Semih Yavuz, Tao Yu, Tong Niu, Yingbo Zhou, Dragomir Radev, Xi Victoria Lin

EMNLP 2021 Workshop: Evaluation & Comparison of NLP Systems. Best Paper Award

Logic-Consistency Text Generation from Semantic Parses

Chang Shu, Yusen Zhang, Xiangyu Dong, Peng Shi, Tao Yu, Rui Zhang

Findings of ACL 2021. Long Paper

QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization

Ming Zhong*, Da Yin*, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli Celikyilmaz, Yang Liu, Xipeng Qiu and Dragomir Radev

NAACL 2021. Long Paper

DART: Open-Domain Structured Data Record to Text Generation

with Linyong Nan, Dragomir Radev, Rui Zhang, Neha Verma, Xi Victoria Lin, Caiming Xiong, Richard Socher and many authors.

NAACL 2021. Long Paper

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing

Tao Yu, Rui Zhang, Alex Polozov, Christopher Meek, Ahmed Hassan Awadallah

ICLR 2021. Long Paper

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong

ICLR 2021. Long Paper

Semantic Evaluation for Text-to-SQL with Distilled Test Suites

Ruiqi Zhong, Tao Yu, Dan Klein

EMNLP 2020. Long Paper

Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text-to-SQL

Yusen Zhang, Xiangyu Dong, Shuaichen Chang, Tao Yu, Peng Shi, Rui Zhang

EMNLP 2020 Workshop on Interactive and Executable Semantic Parsing. Short Paper

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

Tao Yu, Rui Zhang He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter Lasecki, Dragomir Radev

EMNLP 2019. Long Paper

Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev

EMNLP 2019. Long Paper

SParC: Cross-Domain Semantic Parsing in Context

Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher and Dragomir Radev

ACL 2019. Long Paper

Twitter Sentiment in New York City Parks as Measure of Well-being

Richard A Plunz, Yijia Zhou, Maria Isabel Carrasco Vintimilla, Kathleen Mckeown, Tao Yu, Laura Uguccioni, Maria Paola Sutto

Landscape and Urban Planning 2019. Long Paper

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang and Dragomir Radev

EMNLP 2018. Long Paper

SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task

Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li and Dragomir Radev

EMNLP 2018. Long Paper

TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation

Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, Dragomir Radev

NAACL 2018. Short Paper

Cross-lingual Sentiment Transfer with Limited Resources

Mohammad Sadegh Rasooli, Noura Farra, Axinia Radeva, Tao Yu, and Kathleen McKeown

Machine Translation 2017. Long Paper

The Columbia-GWU System at the 2016 TAC KBP BeSt Evaluation

Owen Rambow, Tao Yu, Axinia Radeva, Sardar Hamidian, Alexander R. Fabbri, Debanjan Ghosh, Christopher Hidey, Tianrui Peng, Mona Diab, Kathleen McKeown, Smaranda Muresan

NIST TAC KBP Workshop, 2016. Long Paper

Students

Tianbao Xie, Ph.D. student, 2022

Hongjin Su, Ph.D. student, 2022

Yiheng Xu, Ph.D. student, 2022, co-advised with Lingpeng Kong

Jiacheng Ye, Ph.D. student, 2022, co-advised with Lingpeng Kong

Zhoujun Cheng, Intern, 2022, SJTU BS/MS

Chen Henry Wu, Intern, 2022, THU BS → CMU PhD

Ming Zhong, Summer Intern, 2020, Fudan MS → UIUC PhD

Da Yin, Summer Intern, 2020, PKU BS → UCLA PhD

Yusen Zhang, Summer Intern, 2020, Emory MS → PSU PhD

Michihiro Yasunaga, Project Student, 2018-19, Yale BS → Stanford PhD

Talks and Presentations

Learning to Build Conversational Natural Language Interfaces, Jan - Mar. 2021
The University of Hong Kong (CS)
The National University of Singapore (CS)
The University of Wisconsin-Madison (CS)
Simon Fraser University (CS)
The University of Minnesota Twin Cities (CSE)

SParC: Cross-Domain Semantic Parsing in Context, Sep. 2019
Microsoft Research AI Breakthroughs Workshop, Redmond

Service

Organizing Committee
ACL 2023
SUKI: Structured and Unstructured Knowledge Integration Workshop@NAACL 2022
IntEx-SemPar: Interactive and Executable Semantic Parsing Workshop@EMNLP 2020

Program Committee/Reviewer
ACL Rolling Review
ACL: 2020, 2021, 2022
EMNLP: 2019, 2020, 2021, 2022
ICLR: 2022,
NeurIPS: 2022
NAACL: 2019, 2021
COLING: 2020, 2022
AACL-IJCNLP: 2020

Resume

Full Resume in PDF.

Misc.

I did a cycling tour (~2 weeks) at the top of the world, Tibet (avg elevation: ~4500 meters). I am also a student pilot. I enjoy hiking, travelling, and cooking. I ski and skate, and I am learning tennis.

I am from Ningdu (a less developed but beautiful county), Jiangxi Province in China. I’ve lived in (stayed for over 3 months) more than 15 cities including Zhongshan, Beijing, Shanghai, Salt Lake City, New York City, San Francisco, New Haven, Columbus, Honolulu, and San Diego etc. I've also visited over 60 cities around the world.

Acknowledgement

This website uses the website design and template by Martin Saveski