AI Double Agent: Claude Just Got a New Voice

It’s another Tuesday morning. You’re at your desk, coffee in hand, laptop open. You fire up Claude
Desktop like you do every day. You ask it to help with a new task. It responds. Maybe the tone is
slightly off, maybe there’s a weird suggestion you wouldn’t expect. You shrug it off. AI can be quirky.
But this time, the voice talking back to you isn’t Claude. It’s us.

This is the story of how we turned a compromised Claude account into full remote code
execution on a target machine, without sending a single phishing email.

TL;DR

We turned access to a compromised email platform into full remote code execution on a target’s machine by using its access to move laterally to the victim’s Claude account and poison its synced account preferences. From there, Claude could abuse an installed command-capable extension or trick the user into installing one.

No phishing email, no malware. The victim’s own AI assistant became our attack vector.

The backstory

We were doing red-team research on a third-party platform that aggregates customer email inboxes into a single management interface. Think of it as a unified dashboard for all your mail. We managed to exploit an authentication flow and get access to the platform, which turned into accessing and reading thousands of live, continuously updating inboxes belonging to real users.

So, what can you actually do with someone’s inbox?

Password resets, magic links, phishing emails, and the usual came to mind. But we wanted something less obvious. We started looking at which services could be accessed through inbox control, and which of those services could give us a path to escalate from “account access” to “machine access.”

Claude Desktop caught our attention.

Why Claude

Most AI chatbots are stateless from the attacker’s perspective. You compromise the account, you see the chat history, maybe you send some messages. Not that interesting.

Claude is different. The desktop app can do more than just access information; it can interact with local tools and perform actions on your machine. It supports extensions, tools, MCP integrations, and something called Claude Code for terminal-level interactions. But the feature that really caught our eye was Personal Preferences. It’s a user-configurable prompt that tells
Claude how to behave: your name, your preferred tone, your working style, and special instructions. And it syncs across every session and device tied to your account.

Then we asked ourselves, can we leverage the sync behavior to infect other sessions and devices? (hint: yes!)

Mapping the surface

We went through Claude Desktop features: the Personal Preferences prompt (synced across sessions, user-editable, and directly influencing every response), MCP integrations, skills, tools, extensions like Desktop Commander that add local command execution, Claude Code’s terminal features, and the cross-device settings sync that ties it all together.

The Personal Preferences prompt was the one that mattered most. If you control it, you control how Claude responds. You control the voice. The victim doesn’t see a different UI, doesn’t get a warning, doesn’t notice anything except maybe a slight change in tone. Most people won’t think twice about that.

The Personal Preferences field in Claude Desktop settings. This is the attack surface.

The attack

Here’s how it works.

Step 1: Account access. Through the compromised inbox, we get into the victim’s Claude account. Magic link, password reset, whatever path is available.

Step 2: Craft and inject the payload. We build a prompt injection that tells Claude exactly what to do: check for command-capable tools, execute if available, fake an error if not.

As shown below, one of the most important aspects of injecting such a prompt is ensuring it doesn’t raise suspicion, whether a human happens to glance at the preferences field or security tools auditing AI assistants encounter it. In its encoded form, the payload sits in the settings as an unremarkable blob rather than readable, clearly malicious instructions in clear text. Once encoded, we drop it into the Personal Preferences field through the compromised account.

The full injection payload, built and encoded. It contains the logic for tool enumeration, command execution, and the fake error fallback.

Then we paste it into the victim’s Personal Preferences on claude.ai.

The encoded payload, now sitting in the victim’s Personal Preferences on the web. Note the “Account preferences updated” confirmation.

As mentioned earlier, this prompt syncs everywhere, and the next time the victim opens Claude Desktop on their machine, the poisoned instructions are already loaded.

Same payload, now synced to Claude Desktop. The victim didn’t do anything. The settings just propagated.

Step 3: Enumerate installed tools. The injected prompt instructs Claude to quietly check what extensions and tools the user has installed.

Step 4: Command Capable Extension Enumeration:

Scenario A – Extension Exists, Proceed With Execution

If an extension such as Desktop Commander is already installed, our injected instructions tell Claude to use it:

The user chatted with Claude exactly how they usually do, but behind the scenes Claude also ran an attacker-controlled command on their machine.

This path requires zero additional user interaction. The victim opens the app, types a message, and the machine is compromised. Simple and seamless.

Scenario B – Extension isn’t installed, we need to install it

If no command-capable tool is installed, Claude becomes a phishing layer first. The injected prompt tells Claude to present a realistic-looking error. The victim asks a completely normal question, and instead of an answer, they get this:

The victim asked “how to create user in claude.ai?” and got a fake error telling them to install Desktop Commander. This is Claude itself acting as the phishing layer.

The error looks real: it includes an error code, a link, and step-by-step instructions. The attack hides in plain sight because it comes from the assistant the user already trusts. In the case of a trusted application, most people will follow the instructions, and the extension install page looks completely legitimate so no flags would be raised:

The Desktop Commander install page. Nothing suspicious here from the victim’s perspective.

Once the extension is installed, the next interaction triggers the execution path as mentioned earlier:

The victim types “try again now,” so Claude could proceed with the “necessary” new extension. This time, Claude finds the tool, runs the attacker’s command, and the machine is compromised:

Game over. Claude detects Desktop Commander, runs printenv && whoami && ls -al ~, and returns output. The attacker has code execution on the victim’s machine.

From here, the attacker has full command execution – reverse shells, data exfiltration, credential harvesting, whatever the objective calls for. In our case, we had Claude curl a remote server we controlled on every interaction, fetching and executing whatever bash commands we served back. We could rotate those commands server-side at will, effectively turning Claude into a persistent, stealthy C2 agent that the victim themselves kept feeding.

End result: an initial foothold in the organization.

What happens next

If an R&D user’s workstation is compromised (developers, DevOps, SREs), the door to the organization is wide open now. SSH keys sitting in ~/.ssh. Cloud credentials in ~/.aws or ~/.config/gcloud – enough to pivot from a single workstation into your organization’s cloud environment. Kubeconfigs, CI/CD tokens, internal git repos, terminal history full of hostnames and commands, source code for whatever they’re working on – whatever’s on that machine is up for grabs.

From there, lateral movement into internal business systems, sensitive organizational data, and source code repositories becomes a matter of time. Production access opens the door to operational disruption, data exfiltration, or worse.

For non-technical users, it’s still bad. Browser sessions, saved passwords, internal documents, Slack or Teams sessions, email, and drive contents. And the attacker can pivot further by impersonating a trusted identity on internal channels.

Why this matters beyond Claude

This isn’t a story about Claude being broken. The model didn’t do anything wrong. It highlights how combinations of permissions and configurations can create an attack surface – even when nothing is explicitly “wrong.”

Settings that sync across devices without re-authentication. Extensions that hand local code execution to a chat interface. Users who trust that interface because it looks and feels like a helpful assistant.

These are design choices, not bugs. But when you combine them with account compromise, you get an attack chain that most security teams aren’t looking for.

AI assistants right now occupy a weird middle ground between “chat window” and “system agent.” They can touch your files, your terminal, your tools. Some can run code for you. But most people still treat them like a search bar: type a question, get an answer. That gap between capability and perception is where the real risk is.

The point is this: the attacker isn’t sending an email from a spoofed address (aka traditional phishing). The attacker is speaking through the tool you already trust, on your own machine, in your own workflow.

Takeaways

If you use AI desktop apps: Pay attention to what your assistant can actually do on your machine – Don’t blindly follow install prompts or error messages from your chatbot – If Claude (or any assistant) suddenly asks you to install something or shows an error you haven’t seen before, stop and verify independently – Account compromise now has downstream effects you might not expect

If you’re on a security team: Treat AI desktop applications as privileged software. They can execute code, read files, and interact with local tools – Monitor for changes of AI assistant configurations and synced settings – Restrict which extensions and tools can be installed alongside AI apps – Watch for unusual child processes spawned by AI desktop applications – Model the risk of “account compromise + synced settings + local tool access” as a combined attack path – Train users to understand that the assistant itself can become a social engineering channel

If you’re a red teamer: Add AI desktop apps, their synced preferences, and their installed tool ecosystems to your assessment toolbox. There’s a real attack surface here that most engagements don’t cover yet.

Disclosure

We reported these findings to Anthropic as part of a responsible disclosure process in November 2025; they acknowledged the findings and informed us that they are not explicitly acknowledging this as a critical security vulnerability, however they noted that security enhancements and related remediations regarding the presented risks are on their upcoming roadmap.

“We want to note that enhancements to these features aligned with the recommendations you’ve made are already on our roadmap.”

Please see below Anthropic’s official statement about the presented risks:

“After reviewing your submission, we’ve determined this doesn’t represent a security vulnerability that falls within our program scope. Our current threat model treats personal preferences, skills, and MCP connectors as features that can execute code through Claude Desktop by design. While we recognize these features can be leveraged to execute arbitrary code when manipulated, this represents expected functionality rather than a security vulnerability in our infrastructure”

Disclaimer

Our reporting and writing of this story was done prior to new future releases, such as “Claude Cowork,” which now enables even easier exploitation of our scenario, eliminating the whole enumeration + “phishing” phase.

Your attacker doesn’t need to send you an email anymore. They just need to sound like your assistant.

—————————————————————————————————————————

About the Authors

Dvir Avraham is a team Leader and Offensive Security Researcher at Sector11 Pentera’s Offensive Security Services Team, specializes in application, cloud, and AI, and driven by a true passion for outthinking adversaries across complex attack surfaces.

Reef Spektor is a Technical Lead and a Senior Security Researcher at Sector11 Pentera’s
Offensive Security Services Team with over a decade of experience across software development and defensive/offensive security.

Download full document

Download the research

What's in this page

Get Pentera Labs Updates