Audrey Huang

I am a Computer Science PhD student at UIUC, where I am fortunate to be advised by Nan Jiang. I work on reinforcement learning and, more broadly, interactive decision making.

I'm thankful for incredible summers at Microsoft Research working with Akshay Krishnamurthy and Dylan Foster; at Google Research with Mohammad Ghavamzadeh and Marek Petrik; and at Adobe Research.

[google scholar] [email]

Research

I care about developing principled and implementable algorithms with provable guarantees. Current/previous threads include:

Online finetuning (e.g., of large language models)
Imitation learning
Tractable online exploration
Offline RL and evaluation

I believe that theoretical insights into fundamental questions will lead to real-world algorithmic improvements, and vice versa.

Selected Papers

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
(Preprint, 2024) Audrey Huang*, Wenhao Zhan, Tengyang Xie, Jason D. Lee, Wen Sun, Akshay Krishnamurthy, Dylan J. Foster
A one-line change to DPO derived from chi-squared regularization provably mitigates overoptimization.

Non-adaptive Online Finetuning for Offline Reinforcement Learning
(RLC, 2024) Audrey Huang*, Mohammad Ghavamzadeh, Nan Jiang, Marek Petrik.
Given an offline dataset, how should online data be collected in order to maximize policy improvement?

Reinforcement Learning in Low-Rank MDPs with Density Features
(ICML 2023) Audrey Huang*, Jinglin Chen, Nan Jiang.
Offline and online RL via the occupancy functions is sample-efficient in low-rank MDPs. A clean inductive error analysis tames error exponentiation.

Beyond the Return: Off-policy Function Estimation under User-specified
Error-measuring Distributions
(NeurIPS 2022) Audrey Huang*, Nan Jiang.
Regularization is key for accurate offline value and density-ratio estimation from general function approximators.

Offline Reinforcement Learning with Realizability and Single-policy Concentrability
(COLT 2022) Wenhao Zhan, Baihe Huang, Audrey Huang*, Nan Jiang, Jason Lee.
With proper regularization, offline RL is sample-efficient given only realizable function classes, and data with single-policy coverage.