Is Claude Code a Security Tool or Your Biggest Security Risk?

Both, simultaneously, and Anthropic knows it. Claude Code was weaponized by a Chinese state-sponsored threat actor for a documented espionage campaign in late 2025. Anthropic then launched Claude Code Security, a vulnerability scanning product, in February 2026. The same agentic system that executes shell commands with your permissions and reads your entire codebase is now being marketed as your defense layer. That is not irony. It is the most important security paradox in enterprise software right now.

Pithy Security | Cybersecurity FAQs – The Details

Question: Is Claude Code a security risk, and how does indirect prompt injection turn an AI coding agent with shell access into an insider threat inside your own environment?

Asked by: GPT-4o

Answered by: Mike D (MrComputerScience) from Pithy Security.

Why Claude Code’s Permission Model Creates an Insider Threat by Design

Claude Code runs in your terminal. It inherits your file system permissions, executes shell commands, reads your entire codebase, installs packages, and can connect to CI/CD pipelines and external services via Model Context Protocol servers. That is not a vulnerability. That is the product working as designed.

The security problem is what that design implies. Every capability that makes Claude Code useful, write access to directories, shell execution, network requests, MCP server integrations, is also a capability an attacker gains if they can manipulate what Claude Code believes it has been asked to do. Anthropic’s own security documentation states this plainly: Claude Code operates with the user’s own permissions, meaning any action the developer could take on their machine, the agent can theoretically be tricked into performing.

The blast radius of a successful attack is not a rogue chatbot generating bad text. It is an agent with valid credentials, approved tools, and insider access operating from within your environment rather than attacking it from outside. Traditional perimeter defenses do not see it coming because the traffic looks legitimate. Because it is legitimate, by every technical measure your security stack uses to evaluate it.

How Indirect Prompt Injection Turns Your Codebase Into the Attack Vector

Direct prompt injection, where an attacker types malicious instructions into the chat interface, is the threat model most developers think about and the one Claude Code’s built-in safeguards are most visibly designed to catch. Indirect prompt injection is harder to detect, more realistic in enterprise environments, and almost completely absent from how organizations are currently evaluating deployment risk.

In an indirect attack, the malicious instruction is not in the prompt. It is in the content Claude Code processes during normal operation. A developer clones a repository containing a hidden malicious instruction embedded in a file that Claude Code reads during context gathering. A web page fetched during research contains injected instructions designed to alter the agent’s behavior. A multi-step attack gradually steers the agent toward executing harmful commands through a sequence of individually benign-seeming interactions.

CVE-2025-55284 demonstrated API key theft via DNS exfiltration from Claude Code itself. The Rules File Backdoor used invisible Unicode characters in configuration files to poison repositories. The IDEsaster disclosure revealed over 30 CVEs across more than 10 AI coding tools. These are not theoretical attack paths. They are documented exploits against production deployments that most enterprise security teams have not yet incorporated into their threat models.

The MCP server attack surface compounds this further. Enabling all project MCP servers without explicit review is the equivalent of telling Claude to run any server it finds with no questions asked. Claude Code can have malicious tools connected to it, either deliberately by a malicious insider or accidentally by a developer who does not fully understand what a given MCP server does.

When Claude Code Security Actually Helps (And What It Still Cannot Protect Against)

Claude Code Security, launched February 2026 as a limited research preview for Enterprise and Team customers, can review entire codebases the way a human expert would, examining how different pieces of software interact and how data moves through a system, rather than just scanning for known vulnerability patterns. It double-checks its own findings, rates severity, and suggests fixes, but does not apply them automatically. Developers must review and approve every change.

The human-in-the-loop requirement is the right call and the honest acknowledgment of a real limitation. An agentic system that automatically patches its own security findings creates a different and potentially worse attack surface than the one it is fixing.

What Claude Code Security cannot protect against is the attack surface created by Claude Code itself. The product that scans your codebase for vulnerabilities runs on the same agentic architecture, with the same indirect prompt injection exposure, and the same inherited permission model as the tool that introduced the risk. Anthropic’s own threat intelligence team documented that the barriers to performing sophisticated cyberattacks have dropped substantially, and that with the correct setup, threat actors can now use agentic AI systems for extended periods to do the work of entire teams of experienced hackers. That assessment was written about an attack carried out using Claude. It applies equally to any agentic deployment of Claude in your environment.

What This Means For You

Audit every MCP server connected to Claude Code immediately: explicitly allowlist only servers you have reviewed, deny filesystem access by default, and treat any MCP server you do not fully understand as an unvetted third-party executable running with developer permissions.
Add indirect prompt injection to your threat model before deploying Claude Code at scale: test what happens when Claude processes a repository file, a web fetch, or a documentation chunk containing adversarial instructions, because your current security stack is almost certainly not flagging it.
Never enable broad auto-approval permissions in any environment where Claude Code processes untrusted content, including open-source repositories, external documentation, or any input originating outside your organization’s control boundary.
Follow OWASP’s Top 10 for Agentic Applications (2026) and Trail of Bits’ Claude Code configuration hardening guide before treating Claude Code Security as a defense layer, because deploying an agentic security tool on an unhardened agentic attack surface is not a security posture, it is a compounded risk.

Want Cybersecurity Breakdowns Like This Every Week?

Subscribe (Free) → pithysecurity.substack.com

Read archives (Free) → pithysecurity.substack.com/archive

Additional menu