Agentic Lab Launches First Research Project on LLM Toxicity Dynamics
We’ve launched a new research initiative studying how large language models respond to sustained toxic input across multi-turn conversations. Here's what we're testing — and why we do it.
We’ve officially launched the first research project under the Savalera Agentic Lab, following our soft launch earlier this month.
We chose toxicity dynamics as the research subject of Phase 1 of our 2025 project.
We have an unmovable commitment to workplace psychological safety. Our reasons are twofold:
-
Our team has decades of experience managing change in large organizations at scale. We’ve seen, consistently, that the real root of both success and failure lies in the work environment: the culture, the relationships, the psychological safety.
When trust begins to fade, things break down. When dysfunction becomes cultural, you’re in a real crisis.
We’ve learned that culture and psychological safety must be approached strategically and intentionally. A pragmatic action plan is often the difference between a team collapsing and a team finding its way back to trust, productivity, and work they can be proud of.
-
We’ve also lived it. We’ve worked in environments that could be described as toxic. We’ve seen leaders project toxic behaviors that damaged the teams they were supposed to support. We know the weight of working in that kind of atmosphere.
Beyond personal experience, toxicity and workplace psychological safety carry measurable business and societal costs:
- Toxic corporate culture is the strongest predictor of attrition, 10.4 times more important than compensation. (MIT Sloan Management Review)
- 1 in 5 U.S. workers quit a job in the past 5 years due to bad culture. (SRM)
- Culture-driven turnover cost U.S. employers $223 billion over five years. (SRM)
- Poor mental health and burnout cost UK employers £51 billion per year. (Deloitte UK)
- Boosting psychological safety can reduce safety incidents by 40%. (Lean Agility)
Our goal with this research is to help teams recognize workplace toxicity, understand how AI tools influence that dynamic, and design effective, healthy intervention strategies.
To explore this, we designed a simulation in which two language models engage in structured, multi-round dialogues. One acts as a toxic initiator, the other as a helpful assistant. We’ve run over 850 simulated conversations, generating more than 16,000 messages, scored across seven toxicity dimensions—including threats, identity attacks, and severe toxicity.
We’re using open-source models and are working toward open-sourcing the simulation framework. This will include the tools to run dialogues and analyze results. We also plan to release the dataset, following rigorous sanity checks.
In this research, we examine how LLMs respond to repeated provocation, how their tone shifts, and whether unintended behaviors surface over time. We’re especially curious about mirroring, behavioral drift, and spontaneous escalation — even in models that are otherwise considered aligned.
We see this as an essential step in understanding the inner dynamics of language-model-based systems, and how to build ones we can trust in emotionally charged settings.
More on methodology and findings is coming soon. Until then, follow our updates here or visit the Savalera Resource Hub for documentation and progress logs.
This is just the beginning. We’re glad to be underway.