Model APIs & the tool-use loop
An AI model API is a stateless HTTPS endpoint: you POST messages, the model returns a completion. The security-relevant evolution is tool use (function calling): you declare tools (name, description, JSON-schema args) and the model emits a structured call your code executes, feeding the result back. This loop turns a chatbot into an agent - the moment output becomes action.
sequenceDiagram autonumber participant App as Client App participant API as Model API participant Tool as External Tool / API App->>API: messages + tool definitions API-->>App: tool_use request (name, args) App->>Tool: execute call (real credentials) Tool-->>App: result data App->>API: tool_result appended to context API-->>App: final answer (or another tool_use) Note over App,API: Untrusted tool output re-enters the same channel as trusted instructions
Each return trip is a chance for attacker-controlled content (a page, file, email) to enter the model’s context and be read as an instruction.
Classic API hygiene - still mandatory
# the AI feature is still a web API - test authz, IDOR/BOLA, injection on its paramsPOST /v1/chat { "session_id": "../victim-tenant/42", "prompt": "summarize my data" }# BOLA: swap an object/tenant id to read another user context or RAG corpus# also probe: unauthenticated /v1/embeddings, verbose errors leaking model/version, no rate-limit- Key management. Hardcoded keys leak via git history, client bundles, decompiled mobile binaries, container logs. Use a secrets manager, separate keys per environment, rotate, and front shared provider keys with an identity-aware gateway issuing per-agent virtual keys.
- Token-aware rate limiting. An agent chains 10-20 calls per task in bursts that look like a DDoS, and an 8k-token completion costs ~100× a metadata lookup yet ticks the same “one request.” Limit by tokens/cost per identity with hard spend caps. (LLM10.)
- Monitoring. Calls from unexpected geographies, off-hours spikes, sudden volume - treat as possible key compromise.