Skip to content

The AI attack surface & secure lifecycle

Before the specific attacks, fix the two maps you’ll reuse throughout. The surface has four regions, and every later section lives in one of them: data (training, fine-tune, RAG corpora - II.2, II.13), model (weights, the inference behavior - II.1), application (prompts, tools, agent logic, the protocols - II.3, II.5-II.10), and infrastructure (serving, vector stores, pipelines, cloud - II.7, II.11, II.12, II.13). Google’s SAIF maps cleanly onto these four areas, which is why it crosswalks well to everything else.

Enumerating an AI attack surface (concrete checklist)
[ ] Which features are model-backed? (search, summarize, chat, autocomplete)
[ ] What model/version + guardrail sits behind each? (fingerprint, II.17 Ch2)
[ ] What can the model reach? tools, RAG corpus, memory, other agents (MCP/A2A)
[ ] Which actions are irreversible / outbound? (email, payments, code exec)
[ ] Where does untrusted content enter? (user, web fetch, files, tool results)
# the answers are the map you attack (II.17) and defend (III.1)

The lifecycle is the second map: data collection → training/fine-tuning → evaluation → deployment → monitoring → retirement. Attacks attach at each stage (poisoning at training, extraction and injection at inference, drift and abuse in production), and so do controls. Thinking in lifecycle stages is what turns a list of attacks into a defensible program - it tells you where a given control belongs.