M.S. in Data Science, Seoul National University
I'm interested in AI agents and value alignment. As LLMs become more capable, they are increasingly embedded in everyday life through tool use and direct interaction with people. With this comes the need for agents that can handle diverse real-world tasks reliably, and for evaluation methods that go beyond surface-level benchmarks to capture what values these systems actually express in practice.
If any of this resonates, I'd love to chat. Reach me at opusdeisong@snu.ac.kr.
See all on the Publications page.
Reveals a gap between LLM psychological profiles measured by standard questionnaires and those observed in actual generation behavior, questioning the validity of questionnaire-based assessments.
PA-Tool renames tool schema components to match small language models' pretraining vocabulary, improving tool-use accuracy by up to 17% without any model retraining.
A user simulator that generates realistic non-cooperative behaviors, exposing significant performance drops in state-of-the-art tool agents under adversarial conditions.
An evaluation framework grounded in real user-LLM interactions, finding that 44 LLMs consistently prioritize Benevolence and Self-Direction while undervaluing Tradition and Power.