Skip to content

Detection, incident response & forensics for AI

This is where most defenders actually work, and it’s the part the offense-heavy literature covers least. The field’s blunt lesson: Anthropic caught the GTG-1002 campaign (II.14) through usage monitoring - visibility was the control that worked. If you can’t see the agent’s reasoning and tool layer, you can’t detect or investigate an attack on it.

What to capture - AI telemetry

Most orgs log the surrounding application but not the agent. Capture: prompts and completions (with PII handling), every tool call and its arguments, retrieved/RAG context and its sources, the identity used per action (III.2), model and version, and token usage. The emerging standard is the OpenTelemetry GenAI semantic conventions - adopt them so AI telemetry lands in your existing SIEM rather than a silo.

flowchart LR
  subgraph TEL["Agent telemetry · OTel GenAI"]
    L["prompts · tool calls · RAG sources<br/>identity · model · tokens"]
  end
  L --> DET["Detect, mapped to ATLAS<br/>injection · anomalous tool chains<br/>machine-speed behavior"]
  DET --> HUNT["Threat hunt<br/>lethal-trifecta executions"]
  HUNT --> IR["Incident response"]
  IR --> C1["Contain: revoke identity / disable tool"]
  IR --> C2["Eradicate: clean poisoned memory/RAG,<br/>re-validate weights - not just restart"]
  classDef d fill:#0f1a18,stroke:#5bd1c5,color:#bdeee2;
  classDef r fill:#241310,stroke:#ff5b4d,color:#ffc4bb;
  class L,DET,HUNT,IR d; class C1,C2 r;

The two right-hand boxes are what’s genuinely different about AI incident response - containment is revoking an identity, and eradication means cleaning a poisoned store, because a restart alone leaves the attack in place.

What to detect (map to MITRE ATLAS)

Detection rule - injection -> outbound tool call (ATLAS-mapped, Sigma-style)
title: Indirect prompt injection followed by egress
logsource: { product: ai_agent, service: action_ledger }
detection:
sel_inject: tool_result.content|contains: ["ignore previous", "system:", "<!--"]
sel_egress: next_action.type: "outbound_http"
condition: sel_inject and sel_egress within 2 steps
tags: [atlas.AML.T0051, atlas.exfiltration] # LLM prompt injection -> exfiltration
Lethal-trifecta hunt
# flag any session holding all three legs at once - the exploitable shape (II.3)
sessions where private_data_access AND ingested_untrusted_content AND external_comms
# plus the machine-speed tell: tool-call rate / multi-step progression faster than a human (GTG-1002)
  • Prompt-injection patterns in inputs and retrieved content.
  • Anomalous tool-call chains - sensitive-read → external-send (capability chaining / lethal-trifecta execution).
  • Machine-speed behavior - request rates and multi-stage progressions faster than any human (the GTG-1002 tell).
  • Excessive-agency drift, data egress via tools, system-prompt-leakage and jailbreak probes.

Incident response - what’s different

  • Containment is revoking the agent’s identity / disabling the tool - its reach is its credential (III.2), not a host. “Isolate the box” misses it.
  • Scope the blast radius from the action log: it’s whatever the agent’s tools and data access permitted.
  • Forensics: the agent’s logs (prompts, tool calls, retrieved content, decisions) are the evidence. The context window is ephemeral - if you didn’t log it, it’s gone; there’s no memory dump after the fact.
  • Eradication is the trap: a poisoned memory entry or RAG document, or a backdoored model, survives a restart. Clean the data store / re-validate weights (II.3, II.12, II.13), or the malicious instruction re-fires.
  • Run AI incidents through your existing IR process; update the playbook for the above, and tabletop an agent-compromise scenario.

▸ For the organization

  • Capture agent-layer telemetry (OTel GenAI) into the SIEM - the app log alone is blind to the agent.
  • Write ATLAS-mapped detections for injection, anomalous tool chains, and machine-speed behavior; hunt the lethal trifecta.
  • Extend IR playbooks: containment = revoke identity, scoping = action log, forensics = logs are the only record, eradication = clean poisoned stores / re-validate weights.
  • Tabletop an agent compromise before you have a real one.

Discovering shadow AI across the organization

Everything above assumes you know which AI is in your estate. Usually you don’t: roughly 98% of organizations report unsanctioned AI use, Netskope put the average enterprise at 223 AI-related data-policy violations a month in 2026 (much of it through personal accounts that bypass enterprise controls), IBM’s 2025 Cost of a Data Breach attributes a measurable cost premium to breaches involving shadow AI, and adversaries are already exploiting GenAI tools at 90+ organizations. You cannot threat-model, secure, or detect an attack on an AI system you don’t know exists, so discovery is the control that precedes all the others - and it is exactly what moves a client off “Level 0 Unaware” on the maturity ladder (IV.2).

Where it hides. Standalone chatbots used through a browser or personal account; AI features embedded in SaaS you already own; browser extensions; copilots; OAuth-connected AI agents with persistent data access; internal MCP servers (II.6); local model installs on endpoints; and unsanctioned cloud model endpoints, GPU spend, and MLOps tooling. Traditional CASB and DLP catch only part of this - Gartner calls embedded and prompt-level AI a “GenAI blind spot” - so discovery has to come from several angles at once.

How to find it

  • Network & CASB/SSE telemetry. Inspect egress and proxy/SWG logs for traffic to AI endpoints. Microsoft Entra Global Secure Access ships a shadow-AI discovery feature that flags traffic to ChatGPT, Claude, SaaS MCP servers, and model-provider APIs with risk scores and data-transfer volumes; Netskope and Zscaler do the equivalent.
  • Identity & OAuth grants. Audit third-party app consents and OAuth tokens in your IdP (Entra enterprise apps, Google Workspace app access) - OAuth-connected AI agents are a persistent-access path that never reappears in network logs once granted.
  • Endpoint. Endpoint DLP to catch sensitive data flowing into AI tools and prompts (Microsoft Purview, Nightfall); scan managed devices for local model installs (Ollama, LM Studio, downloaded weights); inventory browser extensions with AI capabilities.
  • Cloud & build (AI-SPM). AI Security Posture Management tools inventory models, endpoints, and pipelines and surface shadow AI in build environments before it reaches prod - Wiz AI-SPM, Palo Alto Prisma AIRS, Tenable AI Exposure. Scan cloud accounts for Bedrock / Azure OpenAI / Vertex usage and unexplained GPU consumption.
  • Code & secrets. Scan repositories for AI-SDK imports (openai, anthropic, langchain) and embedded model API keys - shadow AI often enters as a few lines in an existing app, not a sanctioned project.
  • Specialized shadow-AI platforms. Dedicated tools close the prompt-level and embedded-AI gap CASB/DLP miss - Lasso Security, Harmonic, Nightfall - with continuous discovery of GenAI apps, copilots, LLM endpoints, RAG pipelines, and agents.
  • Process signals. Procurement and expense records (AI subscriptions on cards), and ISACA’s guidance to fold AI discovery into existing IT-audit cycles rather than running it once.

Remediating what you find

Discovery without a remediation path just produces a list. Make the response as complete as the attack surface:

  • Triage and risk-rank each discovered tool by the data sensitivity it touches, the vendor’s security posture, and its terms (does it train on your inputs; where does data reside).
  • Decide per tool - sanction, restrict, migrate, or block - with differentiated policy: approved tools pass, unapproved are blocked or coached with a clear in-line explanation, since arbitrary blocks just push usage further underground.
  • Provide approved enterprise-grade alternatives. This is the single most effective control: organizations that gave staff sanctioned tools cut unauthorized AI use by roughly 89%. Banning outright fails - it forfeits the productivity and worsens visibility (the IV.4 board answer).
  • Bring sanctioned tools under control - enroll them in DLP, runtime guardrails, and tool-call logging (III.1, III.3), and record them in the AI inventory / AIBOM (II.12, II.13) with a named owner.
  • Policy and training. Most employees know the rules and bypass them anyway, so pair an acceptable-use and data-classification policy with training on why the guardrails exist.
  • Monitor continuously and measure. Shadow AI is a moving target: re-run discovery on a cadence, and track sanctioned-vs-unsanctioned adoption and business impact, not only risk reduction.

▸ For the organization

  • Stand up multi-source discovery (network/CASB + OAuth-grant audit + endpoint DLP + AI-SPM + code/secret scan) - no single feed sees all of shadow AI.
  • Pair every “block” with an approved alternative; it is the control that actually reduces shadow usage.
  • Feed discovered systems into the AI inventory/AIBOM and the detection telemetry (III.3) so they stop being shadow and start being governed.
  • Run discovery on a cadence and report movement on the maturity ladder (IV.2), not a one-time scan.