Proposes PA-Tool, a training-free method that renames tool schema components to align with small language models' pretraining familiarity, achieving up to 17% improvement on tool-use benchmarks without model retraining.
Shows that psychological profiles of LLMs obtained via standard human questionnaires diverge substantially from profiles derived from their actual generation behavior, suggesting questionnaire-based assessments reflect desired rather than genuine traits.
Introduces a user simulator that reproduces realistic non-collaborative behaviors (requesting unavailable services, digressing, expressing impatience), revealing that state-of-the-art tool agents suffer significant performance drops under such conditions.
Uses LLMs as virtual respondents for validating psychometric survey items by modeling trait-response mediators, demonstrating cost-effective item validation across Big Five personality, Schwartz values, and VIA character strengths.
Proposes a systematic framework to measure data contamination in psychometric LLM evaluations across item memorization, evaluation memorization, and target score matching, finding strong contamination in popular inventories like BFI and PVQ.
Proposes the Value Portrait framework for evaluating LLM values using items from real user-LLM interactions, finding across 44 LLMs that they emphasize Benevolence, Security, and Self-Direction while undervaluing Tradition, Power, and Achievement.
An LLM-based system for generating Korean college entrance exam English reading comprehension questions, combining abductive reasoning with RAG and knowledge distillation.
Proposes a cryptographic access control technique that encrypts AI model parameters to prevent unauthorized access on lightweight on-device models, utilizing the ELU activation function to handle nonlinear operations under homomorphic encryption.