AI Research Circle: Persona Vectors

Monday · 2025-12-08 · 7:30 PM - 8:45 PM

research-papersinterpretabilityalignment

We examined how linear directions in a model's activation space correspond to specific character traits like sycophancy, hallucination propensity, and moral reasoning. The group discussed what personality means for language models and the implications for alignment, safety, and developer tools. Paper: "Persona Vectors" (Chen et al., 2025).

Want more like this?

Get weekly picks matched to what you're into.

One email a week. No noise.