Publications

2025

[10] Zhaofeng Wu, Xinyan Velocity Yu, Dani Yogatama, Jiasen Lu, Yoon Kim
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
International Conference on Learning Representations (ICLR), 2025. [Paper]

[9] Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, Daniel Fried
CodeRAG-Bench: Can Retrieval Augment Code Generation?
Findings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025. [Paper]

2024

[8] Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell
Principled Gradient-based Markov Chain Monte Carlo for Text Generation
International Conference on Machine Learning (ICML), 2024.
[Paper]

[7] Ting-Rui Chiang, Xinyan Velocity Yu, Joshua Robinson, Ollie Liu, Isabelle Lee, Dani Yogatama
On Retrieval Augmentation and the Limitations of Language Model Training
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024.
[Paper]

[6] Akari Asai, Sneha Kudugunta, Xinyan Velocity Yu, Terra Blevins, Hila Gonen, Machel Reid, Yulia Tsvetkov, Sebastian Ruder, Hannaneh Hajishirzi
BUFFET: Benchmarking Large Language Models for Cross-lingual Few-shot Transfer
Oral presentation at Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024. [Paper] / [Website]

2023

[5] Xinyan Velocity Yu, Sewon Min, Luke Zettlemoyer, and Hannaneh Hajishirzi
CREPE: Open-Domain Question Answering with False Presuppositions
Oral Presentation at Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
[Paper] / [Code]

[4] Swarnadeep Saha, Xinyan Velocity Yu, Mohit Bansal, Ramakanth Pasunuru, and Asli Celikyilmaz
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation
Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
[Paper]

[3] Jungo Kasai, Keisuke Sakaguchi, Yoichi Takahashi, Ronan Le Bras, Akari Asai, Xinyan Yu, Dragomir Radev, Noah A. Smith, Yejin Choi, Kentaro Inui
RealTime QA: What’s the Answer Right Now?
Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2023.
[Paper] / [Website]

2022

[2] Xinyan Velocity Yu*, Akari Asai*, Trina Chatterjee, Junjie Hu, and Eunsol Choi
Beyond Counting Datasets: Investigating Multilingual Dataset Construction and Necessary Resources
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2022.
[Paper] / [Website]

2021

[1] Akari Asai, Xinyan Yu, Jungo Kasai, and Hannaneh Hajishirzi
One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval
Conference on Neural Information Processing Systems (NeurIPS), 2021.
[Paper] / [Code]

Preprints

[P1] Ting-Rui Chiang, Joshua Robinson, Xinyan Velocity Yu, Dani Yogatama
LocateBench: Evaluating the Locating Ability of Vision Language Models
[Paper]