Detection, incident response & forensics for AI

This is where most defenders actually work, and it’s the part the offense-heavy literature covers least. The field’s blunt lesson: Anthropic caught the GTG-1002 campaign (II.14) through usage monitoring - visibility was the control that worked. If you can’t see the agent’s reasoning and tool layer, you can’t detect or investigate an attack on it.

What to capture - AI telemetry

Most orgs log the surrounding application but not the agent. Capture: prompts and completions (with PII handling), every tool call and its arguments, retrieved/RAG context and its sources, the identity used per action (III.2), model and version, and token usage. The emerging standard is the OpenTelemetry GenAI semantic conventions - adopt them so AI telemetry lands in your existing SIEM rather than a silo.

flowchart LR
  subgraph TEL["Agent telemetry · OTel GenAI"]
    L["prompts · tool calls · RAG sources<br/>identity · model · tokens"]
  end
  L --> DET["Detect, mapped to ATLAS<br/>injection · anomalous tool chains<br/>machine-speed behavior"]
  DET --> HUNT["Threat hunt<br/>lethal-trifecta executions"]
  HUNT --> IR["Incident response"]
  IR --> C1["Contain: revoke identity / disable tool"]
  IR --> C2["Eradicate: clean poisoned memory/RAG,<br/>re-validate weights - not just restart"]
  classDef d fill:#0f1a18,stroke:#5bd1c5,color:#bdeee2;
  classDef r fill:#241310,stroke:#ff5b4d,color:#ffc4bb;
  class L,DET,HUNT,IR d; class C1,C2 r;

The two right-hand boxes are what’s genuinely different about AI incident response - containment is revoking an identity, and eradication means cleaning a poisoned store, because a restart alone leaves the attack in place.

What to detect (map to MITRE ATLAS)

title: Indirect prompt injection followed by egress
logsource: { product: ai_agent, service: action_ledger }
detection:
  sel_inject: tool_result.content|contains: ["ignore previous", "system:", "<!--"]
  sel_egress: next_action.type: "outbound_http"
  condition: sel_inject and sel_egress within 2 steps
tags: [atlas.AML.T0051, atlas.exfiltration]   # LLM prompt injection -> exfiltration

# flag any session holding all three legs at once - the exploitable shape (II.3)
sessions where private_data_access AND ingested_untrusted_content AND external_comms
# plus the machine-speed tell: tool-call rate / multi-step progression faster than a human (GTG-1002)

Prompt-injection patterns in inputs and retrieved content.
Anomalous tool-call chains - sensitive-read → external-send (capability chaining / lethal-trifecta execution).
Machine-speed behavior - request rates and multi-stage progressions faster than any human (the GTG-1002 tell).
Excessive-agency drift, data egress via tools, system-prompt-leakage and jailbreak probes.

Incident response - what’s different

Containment is revoking the agent’s identity / disabling the tool - its reach is its credential (III.2), not a host. “Isolate the box” misses it.
Scope the blast radius from the action log: it’s whatever the agent’s tools and data access permitted.
Forensics: the agent’s logs (prompts, tool calls, retrieved content, decisions) are the evidence. The context window is ephemeral - if you didn’t log it, it’s gone; there’s no memory dump after the fact.
Eradication is the trap: a poisoned memory entry or RAG document, or a backdoored model, survives a restart. Clean the data store / re-validate weights (II.3, II.12, II.13), or the malicious instruction re-fires.
Run AI incidents through your existing IR process; update the playbook for the above, and tabletop an agent-compromise scenario.

▸ For the organization

Capture agent-layer telemetry (OTel GenAI) into the SIEM - the app log alone is blind to the agent.
Write ATLAS-mapped detections for injection, anomalous tool chains, and machine-speed behavior; hunt the lethal trifecta.
Extend IR playbooks: containment = revoke identity, scoping = action log, forensics = logs are the only record, eradication = clean poisoned stores / re-validate weights.
Tabletop an agent compromise before you have a real one.

Discovering shadow AI across the organization

Everything above assumes you know which AI is in your estate. Usually you don’t: roughly 98% of organizations report unsanctioned AI use, Netskope put the average enterprise at 223 AI-related data-policy violations a month in 2026 (much of it through personal accounts that bypass enterprise controls), IBM’s 2025 Cost of a Data Breach attributes a measurable cost premium to breaches involving shadow AI, and adversaries are already exploiting GenAI tools at 90+ organizations. You cannot threat-model, secure, or detect an attack on an AI system you don’t know exists, so discovery is the control that precedes all the others - and it is exactly what moves a client off “Level 0 Unaware” on the maturity ladder (IV.2).

Where it hides. Standalone chatbots used through a browser or personal account; AI features embedded in SaaS you already own; browser extensions; copilots; OAuth-connected AI agents with persistent data access; internal MCP servers (II.6); local model installs on endpoints; and unsanctioned cloud model endpoints, GPU spend, and MLOps tooling. Traditional CASB and DLP catch only part of this - Gartner calls embedded and prompt-level AI a “GenAI blind spot” - so discovery has to come from several angles at once.

How to find it

Network & CASB/SSE telemetry. Inspect egress and proxy/SWG logs for traffic to AI endpoints. Microsoft Entra Global Secure Access ships a shadow-AI discovery feature that flags traffic to ChatGPT, Claude, SaaS MCP servers, and model-provider APIs with risk scores and data-transfer volumes; Netskope and Zscaler do the equivalent.
Identity & OAuth grants. Audit third-party app consents and OAuth tokens in your IdP (Entra enterprise apps, Google Workspace app access) - OAuth-connected AI agents are a persistent-access path that never reappears in network logs once granted.
Endpoint. Endpoint DLP to catch sensitive data flowing into AI tools and prompts (Microsoft Purview, Nightfall); scan managed devices for local model installs (Ollama, LM Studio, downloaded weights); inventory browser extensions with AI capabilities.
Cloud & build (AI-SPM). AI Security Posture Management tools inventory models, endpoints, and pipelines and surface shadow AI in build environments before it reaches prod - Wiz AI-SPM, Palo Alto Prisma AIRS, Tenable AI Exposure. Scan cloud accounts for Bedrock / Azure OpenAI / Vertex usage and unexplained GPU consumption.
Code & secrets. Scan repositories for AI-SDK imports (openai, anthropic, langchain) and embedded model API keys - shadow AI often enters as a few lines in an existing app, not a sanctioned project.
Specialized shadow-AI platforms. Dedicated tools close the prompt-level and embedded-AI gap CASB/DLP miss - Lasso Security, Harmonic, Nightfall - with continuous discovery of GenAI apps, copilots, LLM endpoints, RAG pipelines, and agents.
Process signals. Procurement and expense records (AI subscriptions on cards), and ISACA’s guidance to fold AI discovery into existing IT-audit cycles rather than running it once.

Remediating what you find

Discovery without a remediation path just produces a list. Make the response as complete as the attack surface:

Triage and risk-rank each discovered tool by the data sensitivity it touches, the vendor’s security posture, and its terms (does it train on your inputs; where does data reside).
Decide per tool - sanction, restrict, migrate, or block - with differentiated policy: approved tools pass, unapproved are blocked or coached with a clear in-line explanation, since arbitrary blocks just push usage further underground.
Provide approved enterprise-grade alternatives. This is the single most effective control: organizations that gave staff sanctioned tools cut unauthorized AI use by roughly 89%. Banning outright fails - it forfeits the productivity and worsens visibility (the IV.4 board answer).
Bring sanctioned tools under control - enroll them in DLP, runtime guardrails, and tool-call logging (III.1, III.3), and record them in the AI inventory / AIBOM (II.12, II.13) with a named owner.
Policy and training. Most employees know the rules and bypass them anyway, so pair an acceptable-use and data-classification policy with training on why the guardrails exist.
Monitor continuously and measure. Shadow AI is a moving target: re-run discovery on a cadence, and track sanctioned-vs-unsanctioned adoption and business impact, not only risk reduction.

▸ For the organization

Stand up multi-source discovery (network/CASB + OAuth-grant audit + endpoint DLP + AI-SPM + code/secret scan) - no single feed sees all of shadow AI.
Pair every “block” with an approved alternative; it is the control that actually reduces shadow usage.
Feed discovered systems into the AI inventory/AIBOM and the detection telemetry (III.3) so they stop being shadow and start being governed.
Run discovery on a cadence and report movement on the maturity ladder (IV.2), not a one-time scan.