Skip to content

Security vs safety & the threat landscape

Safety concerns unintended harms from a system working as designed (bias, hallucination, harmful content). Security concerns harms from an adversary acting against the system or wielding it (evasion, theft, poisoning, injection, weaponization). This playbook is about security. Three structural properties break traditional appsec:

  • Instructions and data share one channel. No prepared-statement equivalent exists; the model cannot reliably separate a developer’s instruction from text it read. Root of prompt injection.
  • The trust boundary now includes weights and data. A model is a binary trained on data you may not control; both can carry backdoors no code review finds.
  • Behavior is probabilistic and emergent. Defenses degrade under adaptive pressure; offensive capabilities appear with scale rather than being coded.

Who attacks AI, and how the surface widens

The actor set is the familiar one - nation-states (see GTG-1002, II.14), financially-motivated criminals, insiders, hacktivists, and researchers - but AI hands each of them new leverage: cheaper sophisticated tooling, machine-speed execution, and a new social-engineering medium. Synthetic media belongs in the landscape: deepfaked voice and video already enable high-value fraud and impersonation, and detection is unreliable, so the defensive answer is shifting toward provenance - content-authenticity standards like C2PA / Content Credentials that cryptographically sign an asset’s origin and edit history. Treat “is this media real?” as an identity/verification problem, not a detection problem - provenance attests an asset’s origin and edit history, not that its content is truthful, and coverage is still far from universal.