AI Research Circle [members and +1s]

Name: AI Research Circle [members and +1s]
Start: 2025-12-08
Location: San Francisco

Monday · 2025-12-08 · 7:30 PM - 8:45 PM

research-papersinterpretabilityalignment

We examined how linear directions in a model's activation space correspond to specific character traits like sycophancy, hallucination propensity, and moral reasoning. The group discussed what personality means for language models and the implications for alignment, safety, and developer tools. Paper: "Persona Vectors" (Chen et al., 2025).

View Event

Want more like this?

Get weekly picks matched to what you're into.