Reference library
Primary sources first; verify versions against the live source. Inline markers throughout use the short IDs below.
Adversarial ML, privacy & LLM canon
a1Goodfellow - FGSM — arXiv:1412.6572a2Madry - PGD · Gu - BadNets — arXiv:1706.06083 · 1708.06733cCarlini - Extracting Training Data from LLMs — USENIX Security; arXiv:2012.07805pCarlini - Poisoning Web-Scale Training Datasets is Practical — arXiv:2302.10149ppZhang - Persistent Pre-training Poisoning of LLMs — arXiv:2410.13722scScaling Trends for Data Poisoning in LLMs — arXiv:2408.02946gGreshake - Indirect Prompt Injection — arXiv:2302.12173zZou - Universal Transferable Attacks (GCG) — arXiv:2307.15043saHubinger - Sleeper Agents (Anthropic) — arXiv:2401.05566wWillison - lethal trifecta — simonwillison.netskSoK - Prompt Injection on Agentic Coding Assistants — arXiv:2601.17548
Multimodal attacks
ipiNagaraja - Image-based Prompt Injection (IPI) — arXiv:2603.03637uvUniversal Adversarial Attack on Aligned Multimodal LLMs — arXiv:2502.07987ciCSA - Image Prompt Injection in Multimodal LLMs — CSA Labs, Mar 2026mmSeeing the Threat - VLM adversarial-attack survey — arXiv:2505.21967
Agent protocols (MCP / A2A)
mMCP Specification (Authorization), 2025-11-25 — OAuth 2.1 RS; RFC 9728/8707msMCPShield · MCPSecBench — arXiv:2604.05969 · 2508.13220cmComparative Threat Model - MCP/A2A/Agora/ANP — arXiv:2602.11327aA2A Protocol Specification — a2a-protocol.orgmeSecuring an A2A Application (MAESTRO) — arXiv:2504.16902sSurvey of Agent Interoperability Protocols — arXiv:2505.02279
Browser / computer-use agents
bcAgentic AI security 2026 - infrastructure-level injection, CoSAI surface map — Adversa AI / CoSAI, May 2026
Coding agents & Codex
caOpenAI - Introducing upgrades to Codex — openai.comcsOpenAI - Codex agent approvals & security — developers.openai.comcyOpenAI - GPT-5.3-Codex system card / cyber safeguards — deploymentsafety.openai.com, Feb 2026
Offensive AI & frontier safety
xAnthropic - Disrupting AI-orchestrated espionage (GTG-1002) — Nov 2025rsAnthropic - Responsible Scaling Policy — RSP v3.3, 2026lpOpenAI - Preparedness Framework v2 · DeepMind - Frontier Safety Framework v3.1 — Apr 2025 · Apr 2026mtMETR - Common Elements of Frontier AI Safety Policies — metr.org, Mar 2025afAffordance analysis of the OpenAI Preparedness Framework (critique) — arXiv:2509.24394
Threat modeling
tmCSA - MAESTRO agentic AI threat-modeling framework (7 layers) — cloudsecurityalliance.org, 2025taSecuring Agentic AI - MAESTRO applied to a monitoring agent — arXiv:2508.10043
Singapore AI testing & accreditation
smProject Moonshot - LLM benchmarking & red-teaming toolkit — AI Verify FoundationsaSingapore AI Tester Accreditation Programme — The Edge Singapore, 2026siISO/IEC 42119-8 (proposed/draft) - benchmarking & red-teaming methodology — IMDA, Apr 2026skIMDA - Starter Kit for Testing LLM-Based Applications — imda.gov.sg, 2025
High-harm capability evaluation
huMeasuring Harmful Capability Uplift (human-centered evals) — MIT, arXiv:2603.26676, Jan 2026hqQuantifying CBRN Risk in Frontier Models (WMDP/FORTRESS/VCT) — arXiv:2510.21133hfNova Premier under the Frontier Model Safety Framework (audited CBRN evals) — arXiv:2507.06260heEpoch AI - Do biorisk evals actually measure the risk? — epoch.ai, 2025hjLLMs Outperform Experts on Challenging Biology Benchmarks (VCT, LAB-Bench, GPQA-Bio, WMDP) — Justen, arXiv:2505.06108, 2025hsLLM Novice Uplift on Dual-Use, In Silico Biology Tasks (uplift study design) — Scale AI, arXiv:2602.23329, 2026
Jailbreaks & guardrail bypasses
jsJailbreakRadar - Assessment of Jailbreak Attacks · Guarding the Guardrails (taxonomy) — ACL 2025 · arXiv:2510.13893jbRobustness of LLM Safety Guardrails vs Adversarial Attacks — arXiv:2511.22047jxAnthropic - Many-shot Jailbreaking — anthropic.comjmMicrosoft - Skeleton Key & Crescendo — microsoft.com, Jun 2024jhHiddenLayer - Policy Puppetry universal bypass — hiddenlayer.com, Apr 2025jrRepello - AI jailbreak techniques & safeguards — repello.ai, 2026
Standards, verification & maturity
omOWASP MCP Top 10 - protocol-level risk taxonomy (beta) — owasp.org, 2025-2026svOWASP AISVS - AI Security Verification Standard — owasp.orgscOWASP AIVSS - AI Vulnerability Scoring System (v0.8) — aivss.owasp.org, 2026saOWASP AI Maturity Assessment (AIMA) — owasp.orgstAI Red Teaming 2026 field guide - garak, PyRIT, methodology — 2026snCSA note - NIST COSAiS & AI agent security standards — CSA, Mar 2026
Frameworks, Singapore & EU
mgIMDA - Model AI Governance Framework for Agentic AI (v1.5, 20 May 2026) — imda.gov.sg, launched WEF Davos 22 Jan 2026oOWASP Top 10 for LLM Apps (2025) · Agentic Top 10 (Dec 2025) — genai.owasp.orgsfGoogle SAIF · CoSAI · MITRE ATLAS · NIST AI RMF — frameworkssa2CSA Advisory AD-2026-004 - Frontier AI Risks — csa.gov.sg, 15 Apr 2026s2CSA Guidelines & Companion Guide · Securing Agentic AI Addendum · EU AI Act — csa.gov.sg · EU
Defenses & mitigations
slDefending Against Indirect Prompt Injection with Spotlighting — Hines et al., Microsoft, 2024ccConstitutional Classifiers - defending against universal jailbreaks — Anthropic, 2025 (arXiv 2501.18837)cbImproving Alignment and Robustness with Circuit Breakers — Zou et al., NeurIPS 2024clDefeating Prompt Injections by Design (CaMeL) — Debenedetti et al., Google DeepMind, 2025
Identity, detection & response
niOWASP - Agentic Top 10 ↔ Non-Human Identities Top 10 cross-map — genai.owasp.org, Dec 2025otOpenTelemetry - GenAI semantic conventions — opentelemetry.iosoSurvey of Agentic AI & Cybersecurity (defensive use) — arXiv:2601.05293
Data-layer security
doOrca Security - Exposed Vector Databases — orca.security, 2026de2026 AI Security Predictions - “Breach-by-Exhaust” — BigDATAwire, Dec 2025
ML supply chain & model-file security
pkReversingLabs - nullifAI: malicious ML models evading picklescan — reversinglabs.com, Feb 2025jfJFrog - Malicious Hugging Face models with silent backdoor — jfrog.com, 2024sftsafetensors - safe, code-free model serialization — Hugging FacemscProtect AI - ModelScan · Trail of Bits - Fickling — model-file scannersomsOpenSSF Model Signing (OMS) v1.0 · sigstore/model-transparency — OpenSSF/Google/NVIDIA, Apr 2025ssSLSA - Supply-chain Levels for Software Artifacts · Sigstore — slsa.dev · sigstore.devmbCycloneDX ML-BOM (v1.7) · OWASP SCVS — OWASP, Oct 2025mcdMitchell - Model Cards for Model Reporting — FAT* 2019; arXiv:1810.03993
MLSecOps & guardrails
lgProtect AI - LLM Guard · NVIDIA NeMo Guardrails · Guardrails AI — open-source runtime guardrailslfMeta - LlamaFirewall (PromptGuard 2, CodeShield) — ai.meta.comprPoisonedRAG - knowledge-corruption attacks on RAG — USENIX Security 2025; arXiv:2402.07867
AI threat libraries & emerging threats
atlMITRE ATLAS - matrix, Navigator & case studies (v5.4.0+, monthly cadence) — atlas.mitre.org, 2026biBIML - Architectural Risk Analysis of ML / LLMs (BIML-78; 23 black-box risks) — berryvilleiml.com; IEEE Computer, Apr 2024mrMIT AI Risk Repository (1,700+ risks) — airisk.mit.eduaidAI Incident Database — incidentdatabase.aiavAVID - AI Vulnerability Database — avidml.orgw2Cohen, Bitton & Nassi - Morris II: zero-click GenAI worms — ACM CCS 2025; arXiv:2403.02817
MCP server hardening
mhMCP - Security Best Practices (confused deputy, no token passthrough, no session auth) — modelcontextprotocol.iocwCoSAI WS4 - Secure Design Patterns for Agentic Systems: MCP security — cosai-oasis, 2026
Shadow AI discovery & governance
sdShadow AI - scale & risk (98% unsanctioned use; Netskope 223 violations/mo; CrowdStrike 2026) — Vectra AI, 2026seMicrosoft Entra - Shadow AI discovery in Global Secure Access — learn.microsoft.com, 2026spTenable - Shadow AI & AI-SPM (discover shadow AI in build before prod) — tenable.com, 2025st2Shadow-AI detection tooling landscape (Lasso, Nightfall, et al.) — Netwrix, 2026sgShadow AI governance - approved alternatives cut unsanctioned use ~89% — Forcepoint, 2026
AI governance, risk & maturity standards
irISO/IEC 23894:2023 - AI guidance on risk management — ISO/IEC; the risk process, built on ISO 31000imISO/IEC 42001:2023 - AI management system (AIMS) — ISO/IEC; PDCA + Annex A controlsngNIST AI RMF 1.0 (Govern/Map/Measure/Manage) & AI 600-1 GenAI Profile — NIST, 2023-2024ecEC-Council Global Services - ADG (Adopt · Defend · Govern) framework — aigovernance.eccouncil.org, 2026