Cloud, from scratch
I.4 gave you the one-paragraph map; this is the actual course, written for someone who isn’t a cloud person but has to discuss it confidently. Read it once and you can hold your own in any AI-system assessment conversation. The throughline: in the cloud you rent capability instead of owning machines, and everything is an API call gated by an identity - which is exactly why cloud and AI security are the same conversation.
First, the intuition - what the cloud really is and why everyone uses it
Forget the diagrams for a moment. The cloud is renting computing instead of buying it. Instead of an organization buying servers, racking them in a room, powering and cooling and patching them, it rents exactly what it needs from a provider’s enormous data centres and pays by the hour or by usage. Need a hundred GPUs for a training run this afternoon and zero tomorrow? You rent them for the afternoon. That elasticity - plus not owning the hardware headache - is the whole reason the world moved.
A useful analogy: owning servers is owning a car (you buy it, maintain it, it sits idle most of the day); the cloud is ride-hailing (you summon exactly the capacity you need, when you need it, and it’s someone else’s job to keep the fleet running). For a security tester, the consequence is profound: there is no perimeter you can walk around and no server room you can lock. Everything is reached through APIs and consoles over the internet, and the only thing standing between an attacker and a resource is its configuration and its identity controls. That’s why, in the cloud, misconfiguration is the breach - there’s no firewall-and-moat to fall back on. Hold that intuition; the rest of this course is just detail hung on it.
1 · What “the cloud” actually is
Someone else owns the data centre, the servers, the power, and the network; you rent slices of it on demand and pay for what you use. You never see the hardware - you interact with everything through web consoles, command-line tools, and APIs. The three giants: AWS (the largest, broadest service catalogue), Microsoft Azure (deepest enterprise/Microsoft integration; hosts OpenAI models via the Azure OpenAI service, though OpenAI also offers its own direct API), and Google Cloud / GCP (strongest AI/ML and data analytics). You’ll meet all three; the concepts below are identical across them, only the names differ.
2 · The five things you rent
| Building block | What it is | AWS / Azure / GCP name |
|---|---|---|
| Compute | Virtual machines / GPUs that run your code or model | EC2 / Virtual Machines / Compute Engine |
| Storage | Object storage for files, data, model weights | S3 / Blob Storage / Cloud Storage |
| Database | Managed relational & NoSQL stores | RDS·DynamoDB / SQL·Cosmos / Cloud SQL·Firestore |
| Networking | Private virtual networks, load balancers, the perimeter | VPC / VNet / VPC |
| Identity (IAM) | Who/what can do what - the control plane for everything | IAM / Entra ID / Cloud IAM |
Layered by how much you manage: IaaS (you rent raw VMs and run everything on them), PaaS (you deploy code/models onto a managed platform), SaaS (you just use a finished app). Newer layers matter for AI: serverless (functions that run on demand, no server to manage - AWS Lambda) and containers/Kubernetes (packaged apps that scale - where most model-serving lives).
3 · IAM - the one concept to truly understand
If you learn one thing, learn this. Identity and Access Management decides which identity (a user, or a non-human workload like an app or agent) can perform which action on which resource. Its vocabulary: principals (the identity), roles/policies (what they’re allowed), credentials (keys or tokens proving identity), and the principle of least privilege (grant only what’s needed). Almost every cloud breach - and almost every AI-agent breach - is fundamentally an IAM failure: an over-permissive role, a leaked long-lived key, or a workload with far more access than its task requires. For agents this is the non-human-identity problem in III.2, and it’s why IAM is the spine of the threat model (I.9).
4 · The shared responsibility model
The single most-asked cloud-security question, so know it cold: the provider secures the cloud itself (hardware, the data centre, the core infrastructure - “security of the cloud”); you secure what you put in it (your data, your access config, your code, your IAM - “security in the cloud”). The exact line shifts with the service model - with SaaS the provider owns more, with IaaS you own more - but your data and your identity config are always yours. Most cloud incidents are customer-side misconfigurations, not provider failures.
5 · Where AI lives - the cloud AI stack in three layers
Providers package AI at three heights, and knowing which one a client uses tells you the attack surface immediately:
- Foundation-model APIs (top, easiest). Call a hosted model, manage nothing. Amazon Bedrock (multi-model marketplace - Anthropic, Meta, Titan), Azure OpenAI Service (OpenAI models via Azure), Google Vertex AI (Gemini + others). The surface here is the connections (keys, prompts, data, tools), not the model.
- ML platforms (middle). Build, train, deploy your own models: SageMaker (AWS), Azure ML, Vertex AI. Add MLOps pipelines, feature stores, and the supply-chain surface of II.12.
- Raw infrastructure (bottom). Rent GPUs/TPUs and run your own serving stack (vLLM, Triton). Custom AI chips now matter: AWS Trainium, Google TPU, Azure Maia.
2026 additions you should name-drop: provider guardrails (Bedrock Guardrails, Azure AI Content Safety, Vertex AI safety controls) and emerging agent runtimes (Amazon Bedrock AgentCore for deploying/governing agents at scale). These are where the agentic-security conversation (II.5-II.10) meets the cloud.
flowchart TB
subgraph YOURS["YOU secure - security IN the cloud"]
APP["Your AI app + agent logic + prompts"]
DATA2["Your data, RAG corpus, vector DB"]
CFG["Your IAM config, keys, network rules"]
end
subgraph SVC["Service layer (responsibility shifts by model)"]
FM["Foundation-model API · Bedrock / Azure OpenAI / Vertex"]
ML["ML platform · SageMaker / Azure ML / Vertex"]
end
subgraph PROV["PROVIDER secures - security OF the cloud"]
INFRA["Physical data centre · hardware · core network · hypervisor"]
end
APP --> FM --> INFRA
DATA2 --> ML --> INFRA
CFG -. gates everything .- APP
classDef y fill:#1d1708,stroke:#e4a23f,color:#f0d8a8;
classDef s fill:#0f1a18,stroke:#5bd1c5,color:#bdeee2;
classDef p fill:#11151a,stroke:#8fb9ff,color:#cdd9f5;
class APP,DATA2,CFG y; class FM,ML s; class INFRA p;
The amber layer is always your responsibility - your data, your config, your identity. That’s where you focus a test, because that’s where the incidents are.
5b · Hybrid & multi-cloud - the real-world shape
Almost no large organization - and certainly no Singapore government agency - runs on one clean cloud. The real estates are hybrid and multi-cloud, and the seams between environments are where much of the risk lives.
- Hybrid cloud - on-premises data centres connected to public cloud. Common when data residency, legacy systems, or sovereignty rules keep some workloads on-prem while new AI/analytics run in the cloud. The connection (VPN or dedicated link - AWS Direct Connect, Azure ExpressRoute, GCP Interconnect) is itself a trust boundary and an attack path.
- Multi-cloud - using two or more providers at once (e.g. core systems on Azure via the Microsoft relationship, AI/data on GCP, something else on AWS). Driven by best-of-breed choices, resilience, and avoiding lock-in.
- Sovereign / government cloud - providers run isolated regions for government data residency and compliance; in Singapore, agencies consume commercial cloud through GovTech’s central arrangements under the Government on Commercial Cloud (GCC) model. Expect strict residency, segregation, and audit requirements.
6 · The vocabulary that makes you sound fluent
- Region / availability zone - geographic location of resources (matters for data residency / PDPA).
- VPC / subnet / security group - your private network and its firewall rules.
- Public vs private endpoint - whether a service is reachable from the internet (the exposed-endpoint risk in II.7).
- Managed service - the provider runs it; you configure and consume it.
- Infrastructure as Code (IaC) - Terraform/CloudFormation defining infra as files (so misconfig is reviewable and repeatable).
- Secrets manager - the right place for keys/tokens (never in prompts, code, or agent memory - III.2).
- Egress - outbound traffic; restricting it is how you stop SSRF and exfil (II.7, II.17 Ch9).
- Zero trust - never trust by network location; verify every request’s identity (Google BeyondCorp is the canonical example).
- Hybrid / multi-cloud - on-prem + cloud, or several providers at once; the seams between them are prime attack surface.
- Identity federation - one identity provider trusted across environments (SSO/SAML/OIDC); a single high-value target.
- Landing zone - a pre-configured, governed baseline account/subscription structure an org rolls out for consistent security across the estate.