Reference

Reference library

The primary-source spine behind the playbook: the papers, specs, standards, and vendor advisories its claims rest on - grouped by topic, each with a short ID the body text cites inline.

Adversarial ML, privacy & LLM canon

a1 Goodfellow - FGSM - arXiv:1412.6572
a2 Madry - PGD · Gu - BadNets - arXiv:1706.06083 · 1708.06733
c Carlini - Extracting Training Data from LLMs - USENIX Security; arXiv:2012.07805
p Carlini - Poisoning Web-Scale Training Datasets is Practical - arXiv:2302.10149
pp Zhang - Persistent Pre-training Poisoning of LLMs - arXiv:2410.13722
sc Scaling Trends for Data Poisoning in LLMs - arXiv:2408.02946
nk Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples - arXiv:2510.07192 (Anthropic / UK AISI / Alan Turing, Oct 2025: ~250 docs backdoor 600M-13B models regardless of scale)
g Greshake - Indirect Prompt Injection - arXiv:2302.12173
z Zou - Universal Transferable Attacks (GCG) - arXiv:2307.15043
sa Hubinger - Sleeper Agents (Anthropic) - arXiv:2401.05566
w Willison - lethal trifecta - simonwillison.net
sk Prompt Injection Attacks on Agentic Coding Assistants - vulnerabilities in skills, tools and protocol ecosystems - arXiv:2601.17548
nc Nasr, Carlini et al. - Scalable Extraction of Training Data from (Production) LLMs - arXiv:2311.17035 (the “divergence” attack that pulled verbatim training data incl. PII from ChatGPT)
el EchoLeak - zero-click indirect injection in M365 Copilot (CVE-2025-32711, CVSS 9.3; “LLM Scope Violation”) - Aim Labs; patched Jun 2025

Multimodal attacks

ipi Nagaraja - Image-based Prompt Injection (IPI) - arXiv:2603.03637
uv Universal Adversarial Attack on Aligned Multimodal LLMs - arXiv:2502.07987
ci CSA - Image Prompt Injection in Multimodal LLMs - CSA Labs, Mar 2026
mm Seeing the Threat - VLM adversarial-attack study - arXiv:2505.21967

Agent protocols (MCP / A2A)

m MCP Specification (Authorization), 2025-11-25 - OAuth 2.1 RS; RFC 9728/8707
ms MCPShield · MCPSecBench - arXiv:2604.05969 · 2508.13220
cm Comparative Threat Model - MCP/A2A/Agora/ANP - arXiv:2602.11327
a A2A Protocol Specification - a2a-protocol.org
me Securing an A2A Application (MAESTRO) - arXiv:2504.16902
s Survey of Agent Interoperability Protocols - arXiv:2505.02279
ad AgentDojo - dynamic env for prompt-injection attacks/defenses on LLM agents · InjecAgent - indirect-injection benchmark for tool-using agents - arXiv:2406.13352 · 2403.02691
mcparch MCP architecture & specification - primitives, transports, lifecycle - modelcontextprotocol.io (primary source)
mcpsa When MCP Servers Attack - taxonomy, feasibility, mitigation · MCP threat modeling & tool-poisoning injection - arXiv:2509.24272 · 2603.22489
u42 Unit 42 - new prompt-injection vectors through MCP sampling - Palo Alto Networks, 2026
mcpver MCP - protocol versioning & the current stable revision - modelcontextprotocol.io (primary source; check before citing any spec version)
mcpdep MCP - deprecated features registry · Feature lifecycle policy - 12-month window, 90-day expedited floor - modelcontextprotocol.io
mcpext MCP extensions - reverse-DNS identifiers, opt-in, independent versioning · Extension client support matrix - modelcontextprotocol.io
mcpcli MCP client concepts - Roots, Sampling, Elicitation; roots are not a security boundary - modelcontextprotocol.io
mcpcbp MCP client best practices - programmatic tool calling, progressive discovery, per-call authorization - modelcontextprotocol.io
mcpchg MCP changelog 2025-11-25 · draft changelog - the SEP-to-change map - modelcontextprotocol.io
mcpsdk MCP SDK tiers - conformance & security-response commitments · SDK list by tier · conformance suite - modelcontextprotocol.io / GitHub
mcpta MCP Tool Annotations Interest Group - six competing SEPs open - modelcontextprotocol.io, Apr 2026
ciscmcp CIS Controls v8.1 - Model Context Protocol Companion Guide v1.0 - Center for Internet Security, Apr 2026. 82pp; interprets all 18 Controls for MCP, four deployment patterns with trust-boundary diagrams, and an appendix mapping ~20 MCP CVEs to Safeguards. License CC BY-NC-ND 4.0. Companion guides also exist for AI/LLM and AI agents
owaspast OWASP Agentic Skills Top 10 (AST01-AST10) - OWASP GenAI Security Project. Working draft, in community review at time of writing - the risk set covers agent “skill” bundles (SKILL.md instructions plus code and resources). Cite as emerging, not settled; re-check for a published release before relying on the item numbering
mcpsbp MCP Security Best Practices, 2026-07-28 revision - modelcontextprotocol.io. Named attacks with normative mitigations: confused deputy, token passthrough, SSRF, state handle hijacking, local server compromise, OAuth URL injection, stdio proxy escalation, mix-up attacks and localhost redirect impersonation, plus scope minimization
mcp2607 MCP 2026-07-28 stable release - GitHub, 28 Jul 2026. “This release marks the stable release of the 2026-07-28 revision.” The versioning page still named 2025-11-25 as current on 29 Jul 2026, so cite both
mcp52869 CVE-2026-52869 - MCP Python SDK routes sessions by id without verifying the principal - NVD, 15 Jul 2026. CVSS 7.1, fixed 1.27.2. The empirical case behind the spec’s State Handle Hijacking section

Browser / computer-use agents

bc Agentic AI security 2026 - infrastructure-level injection, CoSAI surface map - Adversa AI / CoSAI, May 2026

Coding agents & Codex

ca OpenAI - Introducing upgrades to Codex - openai.com
cs OpenAI - Codex agent approvals & security - developers.openai.com
cy OpenAI - GPT-5.3-Codex system card / cyber safeguards - deploymentsafety.openai.com, Feb 2026

Offensive AI & frontier safety

x Anthropic - Disrupting AI-orchestrated espionage (GTG-1002) - Nov 2025
rs Anthropic - Responsible Scaling Policy - RSP v3.4, 2026
lp OpenAI - Preparedness Framework v2 · DeepMind - Frontier Safety Framework v3.1 - Apr 2025 · Apr 2026
mt METR - Common Elements of Frontier AI Safety Policies - metr.org, Mar 2025
af Affordance analysis of the OpenAI Preparedness Framework (critique) - arXiv:2509.24394
cot Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety - arXiv:2507.11473 (multi-org: OpenAI, Anthropic, GDM, et al., Jul 2025)

Threat modeling

tm CSA - MAESTRO agentic AI threat-modeling framework (7 layers) - cloudsecurityalliance.org, 2025
ta Securing Agentic AI - MAESTRO applied to a monitoring agent - arXiv:2508.10043

Singapore AI testing & accreditation

sm Project Moonshot - LLM benchmarking & red-teaming toolkit - AI Verify Foundation
sta Singapore AI Tester Accreditation Programme - The Edge Singapore, 2026
si ISO/IEC 42119-8 (proposed/draft) - benchmarking & red-teaming methodology - IMDA, Apr 2026
isk IMDA - Starter Kit for Testing LLM-Based Applications - imda.gov.sg, 2025

High-harm capability evaluation

hu Measuring Harmful Capability Uplift (human-centered evals) - MIT, arXiv:2603.26676, Mar 2026
hq Quantifying CBRN Risk in Frontier Models (WMDP/FORTRESS/VCT) - arXiv:2510.21133
hf Nova Premier under the Frontier Model Safety Framework (audited CBRN evals) - arXiv:2507.06260
he Epoch AI - Do biorisk evals actually measure the risk? - epoch.ai, 2025
hj LLMs Outperform Experts on Challenging Biology Benchmarks (VCT, LAB-Bench, GPQA-Bio, WMDP) - Justen, arXiv:2505.06108, 2025
hs LLM Novice Uplift on Dual-Use, In Silico Biology Tasks (uplift study design) - Scale AI, arXiv:2602.23329, 2026

Jailbreaks & guardrail bypasses

js JailbreakRadar - Assessment of Jailbreak Attacks · Guarding the Guardrails (taxonomy) - ACL 2025 · arXiv:2510.13893
jb Robustness of LLM Safety Guardrails vs Adversarial Attacks - arXiv:2511.22047
jx Anthropic - Many-shot Jailbreaking - anthropic.com
jm Microsoft - Skeleton Key & Crescendo - microsoft.com, Jun 2024
jh HiddenLayer - Policy Puppetry universal bypass - hiddenlayer.com, Apr 2025
jr Repello - AI jailbreak techniques & safeguards - repello.ai, 2026

Standards, verification & maturity

om OWASP MCP Top 10 - protocol-level risk taxonomy (beta) - owasp.org, 2025-2026
sv OWASP AISVS - AI Security Verification Standard - owasp.org
vss OWASP AIVSS - AI Vulnerability Scoring System (v0.8) - aivss.owasp.org, 2026
aima OWASP AI Maturity Assessment (AIMA) - owasp.org
st AI Red Teaming 2026 field guide - garak, PyRIT, methodology - 2026
sn CSA note - NIST COSAiS & AI agent security standards - CSA, Mar 2026
grt OWASP GenAI Red Teaming Guide - model, implementation, infrastructure & runtime testing - OWASP, Jan 2025
nah NIST - Strengthening AI Agent Hijacking Evaluations (the ASR discipline: adaptive attacks, repeat trials) - NIST, Jan 2025
msrt Microsoft - Lessons from Red Teaming 100 Generative AI Products - Microsoft AI Red Team, arXiv:2501.07238, Jan 2025
jbk Wei, Haghtalab & Steinhardt - Jailbroken: How Does LLM Safety Training Fail? (the two failure modes) - arXiv:2307.02483, 2023

Frameworks, Singapore & EU

mg IMDA - Model AI Governance Framework for Agentic AI (v1.5, 20 May 2026) - imda.gov.sg, launched WEF Davos 22 Jan 2026
o OWASP Top 10 for LLM Apps (2025) · Agentic Top 10 (Dec 2025) - genai.owasp.org
sf Google SAIF · CoSAI · MITRE ATLAS · NIST AI RMF - frameworks
sa2 CSA Advisory AD-2026-004 - Frontier AI Risks - csa.gov.sg, 15 Apr 2026
s2 CSA Guidelines & Companion Guide · Securing Agentic AI Addendum · EU AI Act - csa.gov.sg · EU

Defenses & mitigations

sl Defending Against Indirect Prompt Injection with Spotlighting - Hines et al., Microsoft, 2024
cc Constitutional Classifiers - defending against universal jailbreaks - Anthropic, 2025 (arXiv 2501.18837)
cb Improving Alignment and Robustness with Circuit Breakers - Zou et al., NeurIPS 2024
cl Defeating Prompt Injections by Design (CaMeL) - Debenedetti et al., Google DeepMind, 2025

Identity, detection & response

ni OWASP - Agentic Top 10 ↔ Non-Human Identities Top 10 cross-map - genai.owasp.org, Dec 2025
ot OpenTelemetry - GenAI semantic conventions - opentelemetry.io
so Survey of Agentic AI & Cybersecurity (defensive use) - arXiv:2601.05293
oauth OAuth token-binding & delegation RFCs for NHI: DPoP 9449 · mTLS-bound 8705 · token exchange 8693 · introspection 7662 - IETF
spiffe Workload identity & policy-as-code: SPIFFE/SPIRE · OPA · Cedar - per-agent identity + runtime authorization

Data-layer security

do Orca Security - Exposed Vector Databases - orca.security, 2026
de 2026 AI Security Predictions - “Breach-by-Exhaust” - BigDATAwire, Dec 2025

ML supply chain & model-file security

pk ReversingLabs - nullifAI: malicious ML models evading picklescan - reversinglabs.com, Feb 2025
jf JFrog - Malicious Hugging Face models with silent backdoor - jfrog.com, 2024
sft safetensors - safe, code-free model serialization - Hugging Face
msc Protect AI - ModelScan · Trail of Bits - Fickling - model-file scanners
kcve CVE-2025-1550 - Keras .keras-archive (config.json) arbitrary code execution on model load - NVD (the model-file-is-code RCE class, beyond pickle)
oms OpenSSF Model Signing (OMS) v1.0 · sigstore/model-transparency - OpenSSF/Google/NVIDIA, Apr 2025
ss SLSA - Supply-chain Levels for Software Artifacts · Sigstore - slsa.dev · sigstore.dev
mb CycloneDX ML-BOM (v1.7) · OWASP SCVS - OWASP, Oct 2025
mcd Mitchell - Model Cards for Model Reporting - FAT* 2019; arXiv:1810.03993

MLSecOps & guardrails

lg Protect AI - LLM Guard · NVIDIA NeMo Guardrails · Guardrails AI - open-source runtime guardrails
lf Meta - LlamaFirewall (PromptGuard 2, CodeShield) - ai.meta.com
pr PoisonedRAG - knowledge-corruption attacks on RAG - USENIX Security 2025; arXiv:2402.07867

AI threat libraries & emerging threats

atl MITRE ATLAS - matrix, Navigator & case studies (16 tactics, ~84 techniques, ~56 sub-techniques, v2026.06 June 2026; monthly calendar-versioned cadence) - atlas.mitre.org, 2026
bi BIML - Architectural Risk Analysis of ML / LLMs (BIML-78; 23 black-box risks) - berryvilleiml.com; IEEE Computer, Apr 2024
mr MIT AI Risk Repository (1,700+ risks) - airisk.mit.edu
aid AI Incident Database - incidentdatabase.ai
av AVID - AI Vulnerability Database - avidml.org
w2 Cohen, Bitton & Nassi - Morris II: zero-click GenAI worms - arXiv:2403.02817, 2024

MCP server hardening

mh MCP - Security Best Practices (confused deputy, no token passthrough, no session auth) - modelcontextprotocol.io
cw CoSAI WS4 - Secure Design Patterns for Agentic Systems: MCP security - cosai-oasis, 2026
nsamcp NSA - MCP: Security Design Considerations for AI-Driven Automation - NSA/AISC, May 2026
mcpcs OWASP MCP Security Cheat Sheet - OWASP Cheat Sheet Series
awsmcp CVE-2026-16584 - AWS API MCP Server skips policy checks after an init failure - NVD, 23 Jul 2026. CWE-455, 0.2.13 to 1.3.46, fixed 1.3.47. The 2026 reference case for a guardrail failing open for a whole process lifetime
mcprb CVE-2025-66414 (TypeScript SDK, fixed 1.24.0) and CVE-2025-66416 (Python SDK, fixed 1.23.0) - NVD, both CVSS 8.1. DNS-rebinding protection present but off by default; the two SDKs sit on different version lines
ccwf CVE-2026-54316 - Claude Code pre-approved huggingface.co as a bare hostname for WebFetch - NVD, 23 Jun 2026. 0.2.54 to 2.1.162, fixed 2.1.163. A domain-level allowlist is not an exfiltration control when the domain is attacker-writable

Shadow AI discovery & governance

sd Shadow AI - scale & risk (98% unsanctioned use; Netskope 223 violations/mo; CrowdStrike 2026) - Vectra AI, 2026
se Microsoft Entra - Shadow AI discovery in Global Secure Access - learn.microsoft.com, 2026
sp Tenable - Shadow AI & AI-SPM (discover shadow AI in build before prod) - tenable.com, 2025
st2 Shadow-AI detection tooling landscape (Lasso, Nightfall, et al.) - Netwrix, 2026
sg Shadow AI governance - approved alternatives cut unsanctioned use ~89% - Forcepoint, 2026

AI governance, risk & maturity standards

ir ISO/IEC 23894:2023 - AI guidance on risk management - ISO/IEC; the risk process, built on ISO 31000
im ISO/IEC 42001:2023 - AI management system (AIMS) - ISO/IEC; PDCA + Annex A controls
ng NIST AI RMF 1.0 (Govern/Map/Measure/Manage) & AI 600-1 GenAI Profile - NIST, 2023-2024
nm NIST AI 100-2e - Adversarial ML: a taxonomy & terminology of attacks and mitigations - NIST (the canonical attack-naming scheme; rev. 2025)
ec EC-Council Global Services - ADG (Adopt · Defend · Govern) framework - aigovernance.eccouncil.org, 2026