As enterprise applications increasingly integrate task-specific AI agents, a stark reality is emerging from the rapid pace of innovation: a mere 6% of organizations have an advanced AI security strategy in place. This chasm between adoption and protection is creating a fertile ground for new and unpredictable threats. The coming years are expected to see the first major lawsuits holding executives personally liable for the actions of rogue AI, shifting the consequences of lax governance from a corporate fine to a personal career risk. The core of the problem lies in a massive visibility gap. Security leaders are grappling with the fact that they often have no clear picture of how, where, when, or through which workflows Large Language Models (LLMs) are being used or modified within their own organizations. This lack of insight turns AI security into a high-stakes guessing game and renders effective incident response nearly impossible. The call for more rigorous transparency, particularly at the level of a model’s Bill of Materials (BOM), is growing louder, as traceability becomes the cornerstone of any viable AI defense strategy.
1. The Pervasive Threat of Shadow AI
The findings from recent industry surveys should serve as a wake-up call for every Chief Information Security Officer, revealing that 62% of security practitioners have no reliable method to determine where LLMs are in use across their enterprise. This phenomenon, known as “Shadow AI,” has quickly become the new enterprise blind spot, allowing unsanctioned and unmonitored AI tools to proliferate across departments. Traditional security tools, which were designed for the predictable nature of static code, are fundamentally unequipped to deal with adaptive, learning models that can evolve on a daily basis. This leaves organizations vulnerable to a host of lethal attack methods, including prompt injection, the exploitation of vulnerable LLM code, and jailbreaking techniques that adversaries use to exfiltrate sensitive data. Despite significant investments in cybersecurity, many organizations are blind to these intrusions, which are often cloaked in sophisticated “living-off-the-land” techniques that bypass legacy perimeter defenses entirely. The challenge is no longer just about securing known assets but about discovering and governing a constantly shifting landscape of intelligent agents operating in the shadows.
The financial ramifications of failing to address this visibility gap are both significant and escalating. According to recent data breach reports, 13% of organizations have already reported breaches involving AI models or applications. Of those breached, an alarming 97% lacked adequate AI access controls, demonstrating a systemic failure in basic governance. Breaches stemming from shadow AI or other unauthorized uses of artificial intelligence are particularly costly, adding an average of $670,000 to the total cost compared to their baseline counterparts. This premium exists because when security teams are unaware of which models are running in which environments, their ability to scope the impact of an incident, contain the damage, and eradicate the threat is severely hampered. Incident response devolves into a frantic search for unknown assets, extending the breach lifecycle and amplifying the financial and reputational damage. The inability to answer the simple question, “What is running where?” paralyzes the response effort at the most critical moment.
2. Why Standard Security Frameworks Fall Short
For several years, government mandates like Executive Order 14028 and OMB Memorandum M-22-18 have pushed for the adoption of Software Bills of Materials (SBOMs) for federal vendors, establishing a baseline for supply chain transparency. However, these traditional SBOMs are insufficient for the unique challenges posed by AI. While software dependencies are typically resolved at build time and remain fixed, an AI model’s dependencies are far more dynamic. They often resolve at runtime, fetching weights and other components from external endpoints during initialization. Furthermore, models continuously mutate through processes like retraining, drift correction, and feedback loops. Advanced techniques such as Low-Rank Adaptation (LoRA) can modify model weights directly in production without any version control, making it virtually impossible to track which version of a model is actually running. This fundamental difference means that a standard software SBOM fails to capture the most critical model-specific risks, a fact explicitly acknowledged by NIST’s AI Risk Management Framework, which calls for dedicated AI-BOMs.
The danger is not just theoretical; it is embedded within the very files used to store and distribute AI models. When models are saved in the common pickle format, loading them is akin to opening an email attachment that can execute arbitrary code on a machine. These files are not inert data artifacts but are serialized Python bytecode that must be deserialized and executed to load. During this process, any callable function embedded within the data stream, such as os.system() commands, network connections, or even reverse shells, will be executed. These files are often trusted by default within production systems, creating a massive, overlooked attack vector. While a safer alternative format called SafeTensors exists, which stores only numerical tensor data without executable code, its adoption has been slow. Migrating to SafeTensors requires a significant engineering effort, including rewriting load functions, revalidating model accuracy, and potentially abandoning legacy models where the original training code is lost. This technical debt and friction are holding back a critical security upgrade, leaving countless organizations exposed.
3. The Evolving Landscape of AI-Specific Bills of Materials
In response to the clear shortcomings of traditional security measures, new standards are emerging to provide deeper visibility into the AI supply chain. Both CycloneDX 1.6 and SPDX 3.0, released in 2024, introduced support for Machine Learning Bills of Materials (ML-BOMs). These frameworks are designed to complement, not replace, existing documentation like Model Cards and Datasheets for Datasets, which tend to focus more on performance attributes and training data ethics. ML-BOMs, in contrast, prioritize the critical task of documenting supply chain provenance, detailing a model’s architecture, training data sources, base model lineage, and framework dependencies. Despite the availability of these standards and the growing recognition of their importance, adoption continues to lag alarmingly. A recent survey found that 48% of security professionals admit their organizations are already falling behind on standard software SBOM requirements, and ML-BOM adoption is significantly lower. The tooling to create these critical documents exists, but what is conspicuously missing in many organizations is the operational urgency to implement them.
It is crucial for security leaders to understand that AI-BOMs are primarily tools for forensics and incident response, not firewalls for prevention. When researchers at ReversingLabs discovered compromised models using “nullifAI” evasion techniques, an organization with documented provenance could have immediately identified if they had downloaded the malicious models. This knowledge is invaluable for a swift and targeted response but is practically useless for preventing the initial download. Budgeting and strategic planning must account for this distinction. The ML-BOM tooling ecosystem is maturing rapidly, with vendors shipping solutions, but it is not yet as automated as the tools for software SBOMs. Generating a complete software inventory with tools like Syft or Trivy can take minutes, whereas creating a comprehensive ML-BOM may still require manual processes to fill in the gaps. Furthermore, an AI-BOM will not stop threats like model poisoning, which occurs during training before an organization even acquires the model, nor will it block prompt injection attacks that exploit a model’s functionality. Prevention requires a layered defense with runtime security, including input validation, prompt firewalls, and output filtering.
4. A Proactive Seven-Step Plan for Visibility
The critical difference between an AI supply chain incident that takes hours to contain versus one that takes weeks comes down to preparation. The first and most foundational step is to commit to building a comprehensive model inventory. This process begins with discovering all AI assets by surveying ML platform teams, scanning cloud spend for services like SageMaker, Vertex AI, and Bedrock, and reviewing network logs for downloads from hubs like Hugging Face. A simple spreadsheet tracking the model name, owner, data classification, deployment location, source, and last verification date is a powerful starting point. This inventory directly addresses the core principle that you cannot secure what you cannot see. Parallel to this effort, organizations must go all-in on managing shadow AI. This requires proactively surveying every department—including accounting, finance, and consulting—to identify sophisticated, unsanctioned AI applications that may have API keys and direct access to proprietary company data. The 62% visibility gap exists primarily because no one is asking the right questions. The goal is not just to shut down unsanctioned use but to redirect it to secure, approved, and centrally managed applications and platforms.
With a baseline of visibility established, the next phase involves implementing robust internal policies and processes. A critical control is to require human approval for all production models, establishing a human-in-the-middle workflow. Every model that touches customer data must have a named owner, a documented purpose, and a clear audit trail showing who approved its deployment. Simultaneously, organizations should mandate the use of safer model formats. A policy change to require SafeTensors for all new deployments costs nothing and immediately reduces risk by eliminating the possibility of code execution on load. Existing models using the insecure pickle format should be grandfathered in with documented risk acceptance and firm sunset timelines. To build on this, security teams should pilot ML-BOMs, starting with the top 20% of highest-risk models—those handling sensitive data or making critical business decisions. Using standards like CycloneDX 1.6 or SPDX 3.0, they can begin documenting architecture, data sources, and dependencies. Even an incomplete provenance record is infinitely better than none when an incident occurs.
The final steps extend this rigor to external interactions and procurement, embedding security into the organization’s muscle memory. Every model pulled from a public repository must be treated as a significant supply chain decision, subject to the same scrutiny enterprises learned to apply to software packages after major incidents like leftpad and colors.js. This includes verifying cryptographic hashes before loading the model, caching approved models in an internal repository to control distribution, and blocking unnecessary runtime network access for model execution environments. This internal discipline must be mirrored in external relationships. During the next vendor contract renewal cycle, organizations should add specific clauses for AI governance. These should require vendors to provide SBOMs for their models, document training data provenance, maintain clear model versioning, and adhere to strict incident notification Service-Level Agreements (SLAs). It is also essential to ask whether your organization’s data will be used to train their future models. Requesting these terms costs nothing and shifts the burden of transparency onto the supplier.
5. The Inevitable Reckoning and the Path Forward
The drive to secure the AI supply chain has moved beyond a technical best practice and is rapidly becoming a boardroom priority, propelled by significant regulatory and financial pressures. The EU AI Act’s prohibitions are already in effect, with potential fines reaching as high as €35 million or 7% of global annual revenue. The EU Cyber Resilience Act, with its own SBOM requirements, has also begun its implementation phase, with full AI Act compliance mandated by August 2027. Cyber insurance carriers are watching these developments closely. Given the substantial financial premium associated with shadow AI breaches and the emerging threat of personal executive liability, it is expected that documented proof of AI governance will soon become a standard requirement for policy underwriting, much as ransomware readiness became table stakes for coverage in previous years. These external forces are creating a powerful incentive for organizations to formalize their AI security posture before it is mandated by regulators or insurers.
The challenges in standardizing this new domain were underscored by the SEI Carnegie Mellon SBOM Harmonization Plugfest, which found significant variance in component counts across 243 SBOMs generated by 21 different tools for the exact same software. If such inconsistencies exist for relatively static software, the stakes for dynamic AI models with embedded dependencies and executable payloads are exponentially higher. The first major poisoned model incident that results in a seven-figure cost for response and fines made the case that should have been obvious all along: the AI supply chain is the new soft target. Software SBOMs became mandatory only after attackers repeatedly proved that the supply chain was the path of least resistance. AI supply chains are even more dynamic, far less visible, and exponentially harder to contain. Ultimately, the only organizations that successfully scaled AI safely were the ones that built visibility and governance into their processes from the beginning, long before a breach forced the issue upon them.
