Glossary
The ~60 core terms the rest of the playbook uses without stopping to define.
Adversarial example (evasion) — An input perturbed, often imperceptibly, to make a model misclassify or misbehave. (II.1)
Agent — An LLM given tools, memory, and a loop so it can take actions, not just answer. The intelligence is the model; the agency is the loop. (II.5, Part II)
Agent Card — The JSON descriptor (commonly /.well-known/agent-card.json) by which an A2A agent advertises identity and capabilities; a spoofing target. (II.7)
Agentic loop / orchestration — The control logic running an agent: selects tools, executes calls, manages memory, decides when to stop. Where guardrails and least-privilege are enforced. (II.8)
AIBOM — AI Bill of Materials - an inventory of the models, datasets, adapters, and components in an AI system, for provenance. (II.12, II.13)
Alignment — Training (RLHF/DPO) that makes a model helpful, honest, harmless. A behavioral layer, not a security boundary.
API — Application programming interface - a defined way for one program to call another over a network. The model API is one instance; agents reach tools and data through ordinary APIs too. (II.5)
Attention — The transformer mechanism that weighs how much each token relates to every other token in view.
A2A (Agent-to-Agent) — Open standard for agents delegating tasks to each other, across orgs. (II.7)
Backdoor — Hidden behavior triggered by a specific input, implanted via poisoned data or tampered weights.
Base model — The model straight out of pre-training, before instruction-tuning or alignment.
Capability threshold — A predefined level of dangerous capability that, once an eval shows a model crossing it, triggers stronger controls before release. (II.16)
Configuration / architecture review — Static assessment of how a system is set up against a baseline; finds misconfiguration, not novel exploits. Distinct from a pentest or red team. (II.21)
Confused deputy — A component that acts on requests using its own privileges rather than the caller’s; an over-privileged MCP server is the classic case. (II.6)
Context window — The fixed span of tokens a model can see at once - its working memory; the security-critical asset. (I.7)
Context-CIA — The CIA triad reframed around the context window: read the prompt or another tenant’s data (C), inject acted-on instructions (I), exhaust the loop (A). (I.7)
Data exhaust — Leftover AI data stores - forgotten vector DBs, prompt logs from abandoned or shadow projects - left unmanaged and exposed. (II.13)
Data lake / warehouse — Central stores of raw or structured data (S3, Snowflake, Databricks, BigQuery) that feed training and RAG. (II.13)
Data poisoning — Corrupting training, fine-tuning, or RAG data so the model learns an attacker-chosen behavior; attaches at training time, unlike injection. (II.2, II.13)
DPO — Direct Preference Optimization - a post-training method to align a model from preference data.
Embedding — A vector representing the meaning of text/an image; nearby vectors mean similar content. (II.4)
Evaluation (eval) / benchmark — A repeatable test set scoring a model or system on a capability or safety dimension as a rate; the unit of frontier-safety and guardrail measurement. (II.16, II.21)
Excessive agency — An agent given more capability, autonomy, or privilege than its task needs, enlarging the blast radius of any hijack (OWASP LLM06). (II.8)
Fine-tuning — Further training of a base model on narrower data to specialize it.
Foundation / frontier model — A large general model trained at scale; “frontier” = the most capable current generation. (II.16)
Function calling / tool use — The model emitting a structured request that your code executes, then feeds back. (II.5)
Guardrail — A runtime filter or policy that screens model inputs or outputs; a control to be measured, not a guarantee or a security boundary. (III.1)
Guardrails effectiveness assessment — Bounded, metric-driven evaluation of how reliably a guardrail enforces its policy (catch rate, false positives, coverage); control validation, not red teaming. (II.21)
Hallucination — Confident output that is fabricated or wrong - a primary cause of OWASP LLM09:2025 Misinformation.
Inference — Running a trained, frozen model to produce output; happens on every request.
Ingestion / ETL — The pipeline that pulls, transforms, chunks and embeds source data into the index; the poisoning entry point. (II.13)
Jailbreak — An input that bypasses a model’s safety alignment. Targets the model’s policy, where injection targets the app’s control flow. (II.3)
Lethal trifecta — The three conditions dangerous together: access to private data, exposure to untrusted content, and the ability to act externally. The core agent-risk lens. (II.8)
LoRA — Low-Rank Adaptation - lightweight fine-tuning that produces a small “adapter” file.
MCP (Model Context Protocol) — Open standard connecting an agent to tools and data, over JSON-RPC (stdio or HTTP). Three roles, not machines: host (the app), client (the connector), server (exposes tools; can be local or remote). (II.6)
Membership inference — Determining whether a specific record was in a model’s training data; a privacy attack on the training set. (II.2)
Memory — Anything an agent carries beyond one call: short-term is the context window, long-term is a persistent store (often a vector DB) that can be poisoned across sessions. (II.8)
Model — The trained artifact - a file of weights that maps inputs to outputs.
Model extraction (model theft) — Reconstructing a model’s behavior or weights through access, e.g. heavy querying to distill a clone. (II.2)
Model inversion — Reconstructing sensitive training inputs from a model’s outputs or parameters. (II.2)
Multimodal — A model that handles more than text - images, audio, video. Each modality is an added injection surface. (II.4)
Neural network — Layers of weighted connections whose weights are learned from data.
Non-human identity (NHI) — The identity an agent, service, or workload authenticates with, as opposed to a human; needs least privilege, short-lived audience-bound credentials, and an action log. (III.2)
Parameters / weights — The billions of numbers learned during training; functionally, “the model” itself.
Penetration test — Scoped, hands-on testing of a defined target against defined objectives; narrower than red teaming but still dynamic and adversarial. (II.17)
Pre-training — The first, largest training stage on broad web-scale data → a base model. (II.2)
Prompt — The text input to a model: system prompt + user input + any appended content.
Prompt injection — Malicious instructions in input or ingested content that hijack the model. Direct comes from the user; indirect arrives via content the model reads. (II.3)
RAG — Retrieval-Augmented Generation - fetching documents at inference and feeding them to the model instead of retraining. (II.3)
Red teaming (AI) — Adversarial, goal-driven testing - achieve a harmful outcome by any path; unbounded scope, qualitative deliverable. Contrast pentest and config review. (II.17)
RLHF — Reinforcement Learning from Human Feedback - alignment using human preference signals.
Role-aware retrieval — RAG retrieval that re-checks the requesting user’s permissions against document metadata, preventing permission stripping. (II.13)
SFT — Supervised Fine-Tuning - post-training on curated instruction/response examples.
Supply chain (AI) — Risk in pulled-in components: pretrained models (unsafe deserialization), poisoned datasets, malicious or typosquatted packages and MCP servers, slopsquatting. (II.12)
System prompt — Hidden instructions setting a model’s role and rules; leakable (OWASP LLM07). Steers behavior, does not enforce it - not a security boundary.
Temperature — A sampling setting controlling how random/creative output is. Why a jailbreak or guardrail result is a rate, not a guarantee.
Token / tokenization — The subword units text is split into; models read and write tokens, not words.
Tool — A named function the model can ask to invoke; the model only requests it, the host runs it and returns the result. As dangerous as its privileges. (II.5)
Tool poisoning — Hidden instructions placed in a tool’s description or schema, which the model reads as trusted; fires merely by the tool being connected. A rug pull swaps a clean description for a poisoned one later. (II.6)
Training — Learning weights from data; expensive, done once per model version.
Transformer — The dominant LLM architecture, built around attention.
Trust boundary — The line between zones of differing trust; in AI the decisive one is the path from untrusted content in to privileged action out. (I.7)
Vector database — A store of embeddings for similarity search; powers RAG. Often weakly authenticated and internet-exposed - a top data-layer risk. (II.13)
Vector space / vector store — The geometric space of embeddings; a database of them powers RAG. (II.4)