Threat modeling for AI systems
Threat modeling is the discipline you run before attacking or defending - and it’s where traditional security most visibly breaks on AI. You cannot bolt AI threats onto a data-flow diagram and call it done, and your instinct about that is correct.
Why STRIDE - and “STRIDE-AI” - fall short
STRIDE, PASTA, LINDDUN, OCTAVE and VAST were built for static, predictable systems: deterministic logic, fixed data flows, clear trust boundaries, and a pre-determined attacker goal. AI breaks every one of those assumptions. The model is probabilistic and can be socially engineered; instructions and data share one channel (I.2), so the critical trust boundary runs through the model rather than around it; agents are autonomous and show emergent behavior; multi-agent systems add collusion and sybil dynamics; and the “component” itself learns and shifts. The deeper problem is that these methods assume attacker goals are fixed and data flows are static - which falls apart on a black-box, semantically-driven agent. “STRIDE-AI” merely appends AI threat categories to the same static DFD; it’s a useful checklist but it inherits the deterministic-boundary assumption that is the actual problem. That’s the precise reason it disappoints in practice.
MAESTRO - the current agentic method
The Cloud Security Alliance introduced MAESTRO (Multi-Agent Environment, Security, Threat, Risk & Outcome) in 2025 as a threat-modeling framework purpose-built for agentic AI. It decomposes a system into seven interrelated layers, threat-models each, and then hunts cross-layer paths - the compromises that traditional methods miss because they don’t span the stack.
flowchart TB ATK["Attacker / untrusted content"] -->|"enters context (failure point 1)"| L3 L7["L7 · Agent Ecosystem<br/>impersonation · collusion · sybil"] L5["L5 · Evaluation & Observability<br/>blind spots · metric tampering"] L3["L3 · Agent Frameworks<br/>prompt injection · tool misuse"] L4["L4 · Deployment Infrastructure<br/>serving · container · SSRF"] L2["L2 · Data Operations<br/>poisoning · RAG · embedding inversion"] L1["L1 · Foundation Models<br/>adversarial · extraction · jailbreak"] L6["L6 · Security & Compliance, cross-cutting<br/>identity / NHI · access · regulatory"] L7 --> L5 --> L3 --> L4 --> L2 --> L1 L3 -->|"consequential action exits (failure point 2)"| OUT["External effect"] L4 -.->|"cross-layer compromise path"| L1 L6 -.- L3 classDef l fill:#0f1a18,stroke:#5bd1c5,color:#bdeee2; classDef r fill:#241310,stroke:#ff5b4d,color:#ffc4bb; class L1,L2,L3,L4,L5,L7,L6 l; class ATK,OUT r;
The seven layers, with the AI-specific lens overlaid: where untrusted content enters (failure point 1) and where a consequential action exits (failure point 2). Cross-layer is where real compromises live - infrastructure → data → model, then surfaced through the agent.
The layers and their characteristic threats: L1 Foundation Models (adversarial examples, extraction, jailbreaks - II.1, II.18); L2 Data Operations (poisoning, backdoors, RAG and vector-store exposure, embedding inversion - II.2, II.4, II.13); L3 Agent Frameworks (prompt injection, tool misuse, logic manipulation - II.3, II.8); L4 Deployment Infrastructure (serving exposure, container escape, SSRF, pipelines - II.7, II.12); L5 Evaluation & Observability (monitoring blind spots, metric tampering - III.3); L6 Security & Compliance, the cross-cutting layer (identity/NHI, access control, regulatory - III.2, IV.3); and L7 Agent Ecosystem (impersonation, collusion, sybil, rogue agents over A2A - II.7, II.8). MAESTRO extends rather than discards STRIDE - it adds the AI-specific threat classes, the multi-agent context, and a lifecycle (continuous) emphasis that the static methods lack.
The AI-specific lenses any method must add
- The two failure points - map first where untrusted content enters the context and where consequential actions exit (I.2, I.7); the trust boundary runs through the model.
- The lethal trifecta as triage - private data + untrusted content + external comms = exploitable (II.3).
- Autonomy & blast radius - what can the agent do, and the worst per action equals its identity/permissions (III.2).
- Persistence - memory/RAG poisoning survives a restart (III.3).
- Non-determinism - threats are probabilistic; model attack-success-rate, not pass/fail.
- Emergence - multi-agent collusion, cascading failures, delegation escalation.
A practical modern methodology
1. CHARACTERIZE architecture (LLM / RAG / agent / multi-agent), model, data sources, tools, autonomy level, trust assumptions2. DECOMPOSE by MAESTRO's 7 layers; draw the AI data + control flow3. MARK the two failure points: untrusted-content IN, action OUT4. ENUMERATE per-layer + CROSS-LAYER threats; map to MITRE ATLAS + OWASP LLM / Agentic Top 105. ASSESS trifecta present? autonomy/blast radius? persistence? score likelihood x impact6. CONTROL+TEST layered controls (III.1) AND concrete tests handed to the red-team / eval (II.17, II.20)7. ITERATE continuous - models, data, and threats keep movingThreat libraries & risk references
A threat model is only as complete as the catalogue behind it, and no single taxonomy is sufficient - cross-reference several so coverage isn’t bounded by one author’s lens:
- MITRE ATLAS - adversary tactics/techniques for AI, ATT&CK-style (the operational kill-chain; §29).
- OWASP Top 10 for LLM Apps - the priority risk checklist for LLM systems (§7), with the Agentic and NHI lists extending it.
- BIML Architectural Risk Analysis - the Berryville Institute’s design-level risk catalogues (the BIML-78 for generic ML, and an LLM ARA / “23 black-box risks”, IEEE Computer, Apr 2024). Its premise is useful: many ML risks are design-level and don’t require an adversary to be real.
- MIT AI Risk Repository - a living database of 1,700+ risks classified by cause and domain; good for breadth and governance conversations.
- AI Incident Database - real-world AI failures and harms; grounds a threat model in what has actually gone wrong.
- AVID - the AI Vulnerability Database, cataloguing model/data/infrastructure/governance weaknesses with referenceable IDs.