__ _______
__ / /__ ___ _ ___ ___ / ___/ / ___ ___ ___ _
/ // / _ `/ ' \/ -_|_-< / /__/ _ \/ -_) _ \/ _ `/
\___/\_,_/_/_/_/\__/___/ \___/_//_/\__/_//_/\_, /
/___/
I am a Master's student in Computer Science at Stanford University. Previously, I obtained Bachelor's degrees in Computer Science and Business Administration from UC Berkeley, where I was a Regents' and Chancellor's Scholar.
I am interested in data curation and pipelining for reinforcement learning approaches to reasoning problems across language, vision, and physical domains.
Methods for leader-follower (Stackelberg) games with follower-agnostic guarantees.
How well can LLMs follow structural rules like "write exactly 20 words" or "end each sentence with a rhyme"?
Can LLMs update their beliefs and reason counterfactually when solving murder mysteries?
What happens when you let an LLM plan the strategy and RL execute the actions?