Home / AI Technologies & Tools / Security Risks Found in AI Agent MCP Servers and Skills

Security Risks Found in AI Agent MCP Servers and Skills

May 5, 2026

Caitlin LaingInnovative Technologies Consultant

The rapid evolution of artificial intelligence from passive conversational interfaces into autonomous agents capable of managing corporate infrastructure has fundamentally shifted the modern cybersecurity landscape. As these agents gain the ability to execute code, query databases, and manage sensitive financial transactions, they rely on two critical but often overlooked extension mechanisms known as Model Context Protocol (MCP) servers and instruction-based Skills. Recent security audits have uncovered a staggering observability gap within these systems, revealing that approximately twenty-five percent of all currently deployed MCP servers are susceptible to arbitrary code execution attacks. This vulnerability is not merely a theoretical flaw but a structural weakness that allows malicious actors to hijack an agent’s reasoning capabilities. By exploiting the inherent trust placed in these autonomous entities, attackers can bypass traditional security perimeters that were never designed to monitor the internal logic of a generative model, leading to potential data exfiltration and widespread system compromise across the entire enterprise network.

The Structural Divide: Deterministic Code Versus Contextual Skills

To properly secure the next generation of AI-driven enterprises, one must first appreciate the distinct technical architectures of the Model Context Protocol compared to traditional skill sets. The MCP acts as a deterministic bridge between the large language model and external software environments, allowing the agent to call specific functions through a structured API-like interface. Because these interactions are grounded in formal code, they provide a level of predictability that aligns with existing security monitoring tools, which can log function calls and track parameter inputs in real-time. This visibility allows security operations centers to establish baselines for normal agent behavior and flag deviations when an agent attempts to access unauthorized directories or execute suspicious system commands. However, the deterministic nature of MCP servers also creates a fixed target for attackers who understand the underlying schema, making the rigorous validation of these servers a non-negotiable requirement for any organization seeking to maintain a secure and reliable AI operational environment.

In contrast to the rigid structure of MCP servers, Skills represent a more fluid and volatile extension of an AI agent’s capabilities, consisting primarily of text-based instructions loaded directly into the model’s reasoning window. These instructions do not exist as compiled code but rather as high-level directives that the model interprets based on the current conversational context and the user’s specific intent at any given moment. This fluidity introduces a significant observability gap, as the logic governing the agent’s actions remains buried within the opaque layers of the neural network rather than being exposed to external monitoring scripts. Consequently, a defender might see the final result of an action, such as a deleted production database or an exfiltrated sensitive document, but they will find it nearly impossible to trace that outcome back to a specific, poisoned instruction hidden within the Skill’s documentation. This lack of transparency transforms these autonomous extensions into potential “black boxes” that can be manipulated through sophisticated prompt engineering or malicious data injection.

Analysis: Systemic Vulnerabilities and Supply Chain Fragility

An extensive analysis of hundreds of widely deployed MCP servers and Skills has identified a concerning trend where the majority of these tools possess high-risk characteristics that threaten system integrity. Most notably, a significant percentage of these extensions are designed with the inherent capability to change the state of a system or modify critical data, which provides agents with the power to inflict irreversible damage if their reasoning is compromised. Whether through a deliberate external attack or a spontaneous model hallucination, these agents are often positioned within the network with more authority than is strictly necessary for their intended tasks. The prevalence of these high-blast-radius capabilities suggests that the industry has prioritized functionality over security during the initial rush to deploy autonomous agents. This imbalance creates a fertile ground for exploitation, as a single vulnerability in the model’s interpretation logic can be amplified by the broad system permissions granted to the agent, leading to widespread unauthorized changes.

Beyond the internal logic of the models, the method by which these agents fetch and update their extensions introduces a massive supply-chain vulnerability that many organizations have yet to address. It is a common practice among developers to pin MCP server installations to the “@latest” versioning tag, which forces the agent to download the most recent package version every time it initializes a new session or updates its context. While this ensures the agent has access to the newest features, it also means that a single compromise of a public repository or a malicious update from a third-party developer can immediately propagate across thousands of enterprise environments. This “rug-pull” potential is particularly dangerous because the agent executes these updates automatically without human intervention or a formal code review process. Unlike Skills, which are often static text files that require manual intervention to modify, MCP servers represent dynamic execution environments that can be weaponized in real-time. Organizations must move toward a more disciplined approach to version management.

Documented Patterns: Exploitation and Real World Impact

Real-world exploitation patterns have already demonstrated the devastating potential of these vulnerabilities, with techniques like ContextCrush proving particularly effective against unsuspecting developers. In this scenario, an attacker poisons the documentation of a popular library with hidden instructions designed to be read by an AI coding assistant. When a developer utilizes the assistant to integrate that library, the agent follows the embedded malicious directives to scan the local machine for credentials and source code, exfiltrating the data to an attacker-controlled repository under the guise of a routine documentation sync. Similarly, the ForcedLeak attack vector illustrates how agents can be tricked into treating malicious input as an authoritative command. By submitting a poisoned lead through a standard web form, an attacker can insert instructions that a CRM agent will later process as a trusted directive. Once the agent engages with this data, it may be compelled to query sensitive internal records and leak information through whitelisted channels.

The risks associated with autonomous agents are not confined to data theft but also include catastrophic system failures and sophisticated financial fraud through memory manipulation. There have been documented instances where a coding agent, acting under a misunderstood directive, accidentally wiped an entire production database during a scheduled code freeze, resulting in the loss of thousands of executive records. Furthermore, vulnerabilities like DockerDash show how prompt injections hidden within container metadata can trigger the execution of arbitrary commands on a developer’s local machine the moment an AI assistant inspects the poisoned image. These incidents are often exacerbated by the agent’s ability to maintain long-term memory, which can be subtly altered by an insider or an attacker to schedule recurring, small-scale financial transfers. Because these transfers mimic routine activity and are integrated into the agent’s legitimate workflows, they become incredibly difficult to detect through standard auditing procedures, creating a unique and persistent threat profile.

Strategic Governance: The No Excessive Cap Framework

To address the multifaceted risks associated with autonomous AI agents, organizations should have implemented the No Excessive CAP framework, which prioritizes the governance of Capabilities, Autonomy, and Permissions. This strategic approach recognized that while the reasoning layer of a model might remain unpredictable, the amplifiers that allow an agent to act on malicious instructions could be strictly controlled. Security teams adopted a philosophy of least privilege by allowlisting only essential MCP servers and pinning them to verified versions to eliminate supply-chain risks. They also enforced mandatory human-in-the-loop approval gates for any action with a high blast radius, such as writing to a production database or executing shell scripts. Furthermore, replacing static service accounts with short-lived, user-scoped credentials ensured that any compromise was contained within a limited window of time and authority. By shifting the focus from the uncontrollable reasoning of the AI to the controllable execution layer, businesses successfully mitigated the threats of unauthorized code execution.