PRISM: A Methodology for Auditing Biases in Large Language Models
Leif Azzopardi, Yashar Moshfeghi·October 24, 2024
Summary
PRISM audits LLM biases, revealing economically left and socially liberal leanings. It indirectly assesses models' preferences through task-based inquiries, offering a reliable method for auditing. Applied to the Political Compass Test, PRISM maps models' views, showing adherence to left and liberal leanings. Compared to direct approaches, PRISM provides a more accurate estimate of political leanings, with fewer refusals and neutral ratings. Models generally avoid left authoritarian or right liberal positions, indicating potential dimension conflations. PRISM's essay-based method offers insights into models' control and biases across different groups, highlighting the need for further research into how models conceptualize stereotypes and ascribe political stances.
Advanced features