☁️ Microsoft Azure – Hyper-Dense Cloud Guide
Core building blocks, landing zones, security, networking, compute, data, DevOps, observability, governance, cost, and reference architectures.
Azure Overview
Regions, global services, subscriptions, resource groups, ARM control plane.
CoreFundamentalsARMIdentity & Access
Microsoft Entra ID, RBAC, PIM, managed identities, app registrations.
IAMRBACPIMLanding Zone
Management groups, policies, network hub/spoke, shared services, guardrails.
PlatformLZGuardrailsIaC: Bicep / ARM / Terraform
Repeatable deployments, modules, parameterization, CI/CD and drift control.
IaCBicepTerraformReference Architectures
3-tier, microservices on AKS, serverless, event-driven, data platform.
ArchitecturePatternsDesignIntegration & Messaging
Service Bus, Event Grid, Event Hubs, Logic Apps, API Management.
MessagingAPIMServerlessVirtual Network (VNet)
Subnets, NSG, UDR, peering, private endpoints, DNS design.
VNetNSGPrivate LinkHybrid Connectivity
VPN Gateway, ExpressRoute, BGP, routing, on-prem integration.
HybridExpressRouteRoutingLoad Balancing & Ingress
Azure LB, Application Gateway/WAF, Front Door, Traffic Manager.
L7WAFGlobalNetwork Security
Azure Firewall, DDoS Protection, Bastion, segmentation, zero trust network.
FirewallDDoSBastionDNS & Private Resolution
Azure DNS, Private DNS zones, split-horizon, resolver patterns.
DNSPrivate DNSResolverEdge & Global Routing
Front Door, CDN, global anycast, caching, multi-region design.
EdgeCDNGlobalVirtual Machines
VM sizing, availability sets/zones, disks, images, patching, backup.
IaaSZonesDisksAKS (Kubernetes)
Clusters, node pools, ingress, autoscaling, network policy, upgrades.
AKSK8sAutoscaleApp Service
Web Apps, deployment slots, scaling, VNet integration, managed identity.
PaaSSlotsScaleFunctions (Serverless)
Triggers, durable functions, event-driven apps, cold start strategy.
ServerlessEventsDurableContainers
ACR, Container Apps, ACI, image scanning, supply chain basics.
ContainersACRSupply ChainWindows & Linux Ops
SSH/RDP via Bastion, patching, extensions, monitoring agents, hardening.
OpsBastionHardeningStorage
Blob, Files, Queues, Tables, ADLS Gen2, tiers, lifecycle, replication.
StorageADLSLifecycleAzure SQL
SQL Database, Managed Instance, HA/DR, performance, security, backup.
SQLMIPaaS DBAzure Database for PostgreSQL
Flexible Server, HA, read replicas, network private access, tuning.
PostgresHATuningCosmos DB
Partition keys, RU/s, consistency models, multi-region, TTL, change feed.
NoSQLRU/sPartitionAnalytics & Data Platform
Synapse, Databricks, Data Factory, Lakehouse, governance patterns.
AnalyticsETLLakehouseCache & Redis
Azure Cache for Redis, clustering, persistence trade-offs, session store.
CacheRedisLatencySecurity Baseline
Zero trust, identity-first, network segmentation, encryption, logging.
Zero TrustBaselineLoggingDefender for Cloud
CSPM, recommendations, attack paths, workload protection plans.
CSPMCWPPPostureKey Vault
Secrets, keys, certificates, HSM, rotation, managed identity access.
SecretsHSMMIMicrosoft Sentinel
SIEM/SOAR, connectors, analytics rules, incident response workflow.
SIEMSOARKQLPrivate Link Patterns
Private endpoints, service endpoints, DNS, hub/spoke resolution.
Private LinkDNSHub/SpokeCompliance & Data Protection
Policies, encryption at rest/in transit, retention, backup, immutability.
ComplianceRetentionBackupAzure DevOps
Repos, Pipelines, Boards, Artifacts, release strategies and gates.
CI/CDPipelinesGatesGitHub + Azure
Actions, OIDC federation, environments, secrets, deployments.
GitHubOIDCSupply ChainArtifacts & Registries
ACR, image signing, SBOM, vulnerability scanning, promotion flows.
ACRSBOMScanRelease Patterns
Blue/green, canary, ring deployments, feature flags, rollback playbooks.
ReleaseCanaryRollbacksIaC in CI/CD
Plan/apply, environments, policy checks, drift detection, approvals.
IaCPolicyDriftPlatform Engineering
Golden paths, templates, internal developer platform, guardrails.
IDPGolden PathTemplatesAzure Monitor
Metrics, logs, alerts, action groups, dashboards, workbooks.
MetricsAlertsWorkbooksLog Analytics + KQL
Central logging, KQL queries, retention, ingestion cost control.
LogsKQLCostApplication Insights
APM, tracing, dependencies, sampling, live metrics, SLA metrics.
APMTracingSamplingSRE Playbooks
SLI/SLO, incident response, runbooks, dashboards, error budgets.
SRESLORunbooksBackup & DR
Recovery Services Vault, VM backup, DB backup, multi-region strategy.
BackupDRRPO/RTOAutomation
Automation accounts, runbooks, update management, self-healing loops.
AutomationRunbooksSelf-healGovernance Core
Management groups, subscriptions, RBAC model, naming/tagging standards.
GovernanceMGTagsAzure Policy
Deny/append/audit, initiatives, remediation tasks, continuous compliance.
PolicyDenyRemediateBlueprints / Baselines
Standardized platform baselines, security packs, repeatable landing zones.
BaselineStandardsRepeatData Governance
Purview basics, classification, lineage, access patterns and auditing.
PurviewLineageAuditIdentity Governance
PIM, access reviews, conditional access, privileged access workflows.
PrivilegedAccess ReviewCAMulti-Tenant / Enterprise
Tenants, cross-subscription patterns, shared services, delegated admin.
EnterpriseTenantDelegationCost Management
Budgets, tags, chargeback, reservations, savings plans, anomaly detection.
FinOpsBudgetsChargebackCost Optimization Playbook
Rightsizing, autoscaling, storage tiers, log ingestion control, egress.
OptimizeScaleLogsSLA, HA, DR Economics
Multi-zone, multi-region trade-offs, RPO/RTO cost model, tiered resilience.
HADRTrade-offsEgress & Network Costs
Outbound data, CDN, private endpoints, architecture decisions that save money.
EgressCDNDesignLogging Costs (Real)
Retention, sampling, ingestion filters, archive strategies, SRE vs budget.
LogsRetentionSamplingCheat-sheet Azure
Core commands, architecture templates, security defaults, must-have checklists.
cheatchecklistsquickstartScopes and governance
Azure is controlled by ARM. Everything is deployed as resources under a strict scope tree. Governance and security become consistent when you design the scope model first.
Tenant
-> Management Groups
-> Subscriptions
-> Resource Groups
-> ResourcesResource groups
- Lifecycle boundary (delete RG -> delete everything).
- RBAC and policy scope boundary.
- Tagging and cost rollups.
# az quick start
az account show
az group create -n rg-demo -l westeuropeRegions, zones, and global services
| Concept | What it is | Design impact |
|---|---|---|
| Region | geographic area with datacenters | latency + compliance |
| Availability Zone | separate datacenter zones | HA within region |
| Paired regions | Azure region pairing | DR strategy |
| Global services | Front Door, Entra ID, etc. | global entry points |
Core production rules
- Identity-first: use Entra ID, managed identities, least privilege.
- Network-by-default: private endpoints for data services when possible.
- Logs by design: central workspace, retention policy, alert routing.
- IaC: everything repeatable, reviewed, and approved.
- Cost: tags, budgets, and alerting from day one.
RBAC model
| Role type | Example | Use |
|---|---|---|
| Built-in | Reader, Contributor | fast baseline |
| Specialized | Key Vault Secrets User | least privilege |
| Custom | fine-grained actions | platform teams |
PIM (Privileged Identity Management)
- Just-in-time elevation for privileged roles.
- Approval workflows + time-bound assignments.
- Audit trail for privileged actions.
PIM workflow
request -> approve -> activate -> expire
log -> review -> enforceManaged identities (the default for apps)
- System-assigned: lifecycle tied to resource.
- User-assigned: reusable identity for multiple resources.
- Use MI + RBAC to access Key Vault, Storage, SQL, etc.
# az example
az webapp identity assign -g rg-demo -n app-demo
# then grant RBAC to the identity on the target resourceLanding zone outcome
Goal: repeatable enterprise-ready platform
Management Groups:
Platform (shared services)
LandingZones (apps)
Sandbox (experimentation)
Decommissioned (quarantine)| Layer | What you put there | Why |
|---|---|---|
| Platform | hub network, identity baseline, central logs | control + visibility |
| App subscriptions | workloads | blast radius control |
| Sandbox | R&D, PoC | safe experimentation |
Hub/spoke network skeleton
Hub VNet:
- Azure Firewall
- VPN/ExpressRoute Gateway
- Shared DNS/Resolvers
- Bastion
- Private DNS zones (optional)
Spoke VNets:
- Workloads (AKS/App Service/VM)
- Private endpoints to data servicesGuardrails (policy + security + standards)
- Policy initiatives: allowed locations, SKU restrictions, enforce tags.
- Deny public endpoints for data services (when possible).
- Enforce diagnostics settings to central Log Analytics.
- Key Vault required for secrets, block plaintext secrets in app config.
Operations foundations
- Central monitoring workspace(s) + retention strategy.
- Action groups (email, webhook, ITSM) + incident workflow.
- Backup and DR baseline per tier.
- Patch cadence and emergency changes.
Bicep essentials
# bicep idea
param location string = resourceGroup().location
resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: 'stg${uniqueString(resourceGroup().id)}'
location: location
sku: { name: 'Standard_LRS' }
kind: 'StorageV2'
}Terraform essentials
# terraform idea
provider "azurerm" { features {} }
resource "azurerm_resource_group" "rg" {
name = "rg-demo"
location = "westeurope"
}- State storage: secure, versioned, locked.
- Plan/apply with approvals.
- Drift detection: scheduled plans or policy audits.
Pipeline rules
| Stage | Gate | Why |
|---|---|---|
| Validate | lint + format + policy check | prevent bad patterns |
| Plan | diff review | human control |
| Apply | approval + change window | safe production |
| Post | smoke tests + alerts | verify outcome |
3-tier baseline (private-first)
Internet
-> Front Door / App Gateway (WAF)
-> App Service / AKS (private)
-> Data (SQL / Postgres / Storage) via Private Endpoints
Central:
Log Analytics + Key Vault + Firewall + DNSMicroservices on AKS
- Ingress controller + WAF at edge.
- Separate node pools for system and workloads.
- Managed identity + workload identity for pods.
- Service-to-service auth (mTLS / JWT) + network policy.
- Observability: distributed tracing mandatory.
Event-driven serverless
Event sources:
- Event Grid (react to changes)
- Event Hubs (stream)
- Service Bus (commands)
Compute:
- Functions / Logic Apps
State:
- Storage / Cosmos / SQL
Ops:
- Monitor + alerts + dead-letter queuesData platform (lakehouse-ish)
- ADLS Gen2 as the central lake storage layer.
- Ingestion via Data Factory / event pipelines.
- Compute via Databricks / Synapse (depending on requirements).
- Governance via Purview + RBAC + data classification.
API Management (APIM)
- Central auth, rate limiting, quotas, IP filtering.
- Versioning + revision workflow.
- Policies: transform headers, validate JWT, caching.
- Developer portal + subscription keys (if needed).
Service Bus (commands, workflows)
| Feature | Use | Notes |
|---|---|---|
| Queues | point-to-point | dead-letter essential |
| Topics | pub/sub | filters per subscription |
| Sessions | ordering | stateful consumers |
Event Grid vs Event Hubs
| Service | Best for | Gotchas |
|---|---|---|
| Event Grid | reactive events, fan-out | delivery/retry semantics |
| Event Hubs | high-throughput streaming | partitioning + consumer groups |
Logic Apps
- Fast orchestration for integrations.
- Connectors for SaaS systems.
- Use for glue and automation; keep core business logic in code when it matters.
Subnet design principles
- Separate subnets for ingress, compute, and private endpoints.
- Reserve IP space for future growth (avoid painful renumbering).
- Centralize egress through firewall when required.
Typical layout
- snet-ingress
- snet-app
- snet-data-pe
- snet-managementNSG + UDR (routes)
| Control | What it does | Common mistake |
|---|---|---|
| NSG | allow/deny L3/L4 | too broad inbound rules |
| UDR | force routing path | breaking private endpoint DNS/routing |
Private endpoints (Private Link)
- Private IP mapped to a PaaS service.
- DNS is mandatory: Private DNS zone + links.
- Prefer private endpoints for storage and databases in production.
VPN Gateway
- Fast to start, internet-based.
- Good for small/medium needs and non-critical workloads.
- Plan IP ranges and routing early.
ExpressRoute
- Private connectivity, enterprise-grade.
- Better reliability and predictable latency.
- Requires provider integration and more governance.
| Service | Layer | Best for | Key features |
|---|---|---|---|
| Azure Load Balancer | L4 | TCP/UDP distribution | zones, HA ports |
| Application Gateway | L7 | HTTP ingress | WAF, TLS offload |
| Front Door | Global | global entry | anycast, caching, WAF |
| Traffic Manager | DNS | geo failover | routing policies |
Azure Firewall
- Central egress control and logging.
- Threat intelligence modes (when enabled).
- Application rules + network rules + DNAT.
Bastion
- SSH/RDP without public IP exposure.
- Fits zero-trust ops model.
- Combine with JIT/PIM for admin access.
Private endpoint DNS pattern
Private DNS zones:
privatelink.blob.core.windows.net
privatelink.database.windows.net
privatelink.postgres.database.azure.com
Links:
zone -> hub vnet
zone -> spoke vnet(s)Global design decisions
- Use Front Door for global entry + WAF + caching when appropriate.
- Separate static and dynamic traffic; cache static aggressively.
- Multi-region active/active needs data strategy, not only routing.
Compute sizing approach
- Start from CPU, memory, and IO needs, not from "standard sizes".
- Use metrics: CPU %, memory pressure, disk queue, network throughput.
- Plan right-sizing reviews monthly (FinOps loop).
Disk strategy
| Disk | Best for | Notes |
|---|---|---|
| Premium SSD | prod workloads | IOPS/throughput tiers |
| Standard SSD | balanced cost | general workloads |
| Ultra Disk | very high IO | special cases |
Availability
- Prefer Availability Zones when supported.
- Backups: Recovery Services Vault, test restores regularly.
- DR: define RPO/RTO, validate runbooks, measure recovery time.
Cluster foundation
- Separate system and user node pools.
- Define ingress strategy (App Gateway ingress controller or NGINX with WAF at edge).
- Autoscaling: cluster autoscaler + HPA.
- Private cluster when required.
AKS security essentials
- Use managed identity / workload identity for pods.
- Enable network policies and restrict east-west traffic.
- Image scanning and signed images (supply chain controls).
- Secrets: prefer Key Vault integration patterns.
Operations
- Upgrade strategy: staged upgrades, maintenance windows.
- Observability: metrics + logs + tracing; alert on saturation.
- Backup: workload state (databases) + cluster configs (GitOps).
Why App Service
- Fast PaaS for web APIs and sites.
- Built-in scaling, TLS, deployment slots.
- Managed identity integration for secretless access.
Operational must-haves
- Use deployment slots for safe releases.
- Enable diagnostics to Log Analytics.
- Integrate with VNet if data is private.
- Set autoscale based on real metrics.
Serverless reliability checklist
- Define idempotency keys for event handlers.
- Use dead-letter queues and alert on them.
- Control retries to avoid amplification.
- Measure cold starts and select plan accordingly.
| Service | Use | Strength | Limit |
|---|---|---|---|
| ACR | image registry | RBAC, private, integration | needs governance |
| Container Apps | serverless containers | scale-to-zero, simple | complex networking cases |
| ACI | run containers quickly | fast | not a full platform |
Ops baseline
- No public SSH/RDP. Use Bastion + JIT/PIM.
- Patch: define cadence and emergency patch process.
- Monitor: agent strategy (VM insights) + alerts.
- Hardening: CIS-like baseline, disable unused services.
| Service | Use | Notes |
|---|---|---|
| Blob | objects | hot/cool/archive tiers + lifecycle |
| ADLS Gen2 | data lake | hierarchical namespace |
| Files | SMB shares | lift-and-shift + enterprise shares |
Replication strategy
- LRS/ZRS: within region durability.
- GRS/GZRS: cross-region resilience.
- Choose based on RPO/RTO and cost.
Storage security
- Prefer private endpoints for production.
- Use RBAC and managed identity, avoid account keys.
- Enable soft delete and immutability if needed.
| Offering | Best for | Notes |
|---|---|---|
| SQL Database | modern apps | serverless/hyperscale options |
| Managed Instance | near lift-and-shift | more compatibility |
Core design points
- Choose private access for production workloads.
- Use read replicas for scale-out reads.
- Define backup retention and test restores.
- Monitor query latency and connection saturation.
Tuning loop
1) capture slow queries
2) EXPLAIN ANALYZE
3) index / rewrite
4) verify and measure p95/p99Partition key is the architecture
- Choose a key that distributes writes and reads evenly.
- Avoid hot partitions.
- Model queries first, then pick the key.
RU/s capacity planning
- RU/s is cost and performance.
- Measure real request units per operation.
- Use autoscale only when you understand workload patterns.
Consistency models
| Level | Trade-off | Typical use |
|---|---|---|
| Strong | latency/cost higher | critical reads |
| Session | balanced | user-facing apps |
| Eventual | fast/cheap | analytics-ish reads |
Data platform design map
Ingest
-> ADLS Gen2 (raw/bronze)
Transform
-> Databricks/Synapse (silver/gold)
Serve
-> SQL / BI / APIs
Govern
-> Purview + RBAC + auditDesign checklist
- Cache only what you can invalidate or tolerate stale.
- Define TTLs and eviction policy.
- Measure hit ratio and latency impact.
- Prefer managed identity where applicable (service-to-service auth patterns vary).
Zero trust pillars
- Verify explicitly (identity, device, location, risk).
- Use least privilege (RBAC + PIM).
- Assume breach (segmentation + logging + response).
Baseline checklist
| Area | Baseline | Minimum |
|---|---|---|
| Identity | PIM, MFA, conditional access | no standing admin |
| Network | private endpoints + firewall egress | no public DB |
| Secrets | Key Vault + managed identity | no secrets in code |
| Logging | central workspace + alerts | audit trail |
Common failures
- Public endpoints left open, no WAF rules.
- Broad Contributor assignments in production.
- No DNS plan for private endpoints.
- Logging disabled to save cost, then blind incidents.
What it gives you
- Security posture management: recommendations and secure score.
- Attack path analysis and misconfiguration detection.
- Workload protections (plans) for key services.
Core practices
- Use managed identities for access.
- Prefer RBAC model consistently across the platform.
- Enable soft delete and purge protection in production.
Rotation strategy
- Define rotation ownership per secret.
- Automate rotation where possible.
- Alert on near-expiry and failed rotations.
Sentinel workflow
Connectors -> Log Analytics -> Analytics rules -> Incidents
-> Triage -> Investigation (KQL) -> Response playbooks -> PostmortemDesign checklist
- Private endpoints per service; avoid mixing unrelated workloads in one subnet.
- Private DNS zones linked to VNets (hub + spokes).
- Route design: avoid breaking service dependencies with forced tunneling.
| Control | Implementation | Verification |
|---|---|---|
| Encryption | at rest + in transit | policy + audits |
| Retention | logs + backups | test restores + reports |
| Immutability | storage immutability where needed | tamper checks |
Pipeline baseline
Build
- tests
- security scans
- artifact publish (immutable)
Deploy
- environment approvals
- IaC plan/apply (guardrails)
- smoke tests
Operate
- SLO dashboards + alertsOIDC federation concept
GitHub Actions -> OIDC token -> Azure Entra -> short-lived credentials
Benefits:
- no long-lived secrets
- better auditabilitySupply chain checklist
- Immutable tags (promote by digest).
- Vulnerability scans + policy gates.
- SBOM generation and storage.
- Signed images and verification at deploy time.
| Pattern | Best for | Key requirement |
|---|---|---|
| Blue/Green | safe cutover | traffic switch + fast rollback |
| Canary | risk reduction | metrics-based promotion |
| Rings | enterprise rollouts | progressive exposure |
Drift control model
Desired state: IaC repo
Actual state: Azure resources
Detect drift:
- scheduled plan
- policy compliance
- config audits
Fix drift:
- apply with approvals
- incident if unexpected changesGolden path idea
- Pre-approved architecture templates (App Service + SQL + Key Vault + Monitor).
- Reusable pipelines and IaC modules.
- Guardrails enforced by policy, not tribal knowledge.
Monitoring system components
Signals:
metrics (fast)
logs (deep)
traces (request path)
Actions:
alerts -> action groups -> incident workflow
Visualization:
dashboards + workbooksKQL mindset
KQL structure
table
| where ...
| summarize ...
| order by ...APM essentials
- Define a correlation id across services.
- Instrument dependencies and external calls.
- Use sampling to control ingestion while preserving signal.
Incident lifecycle
Detect -> Triage -> Mitigate -> Recover -> Postmortem
Artifacts:
timeline, root cause, action items, guardrails| Tier | RPO/RTO target | Typical design |
|---|---|---|
| Tier 0 | minutes | multi-zone + cross-region |
| Tier 1 | hours | zone + backups + tested restore |
| Tier 2 | day | backups + manual failover |
Self-healing approach
- Detect (alert rule) -> decide (runbook) -> act (automation) -> verify (metrics).
- Always add safety checks and rate limiting.
Naming + tagging standards
Example tags
env=prod
owner=platform
app=payments
costcenter=cc123
data_class=confidential| Effect | Meaning | Use |
|---|---|---|
| Deny | block deployment | critical guardrails |
| Audit | report noncompliance | visibility |
| Append/Modify | add settings | diagnostics, tags |
| DeployIfNotExists | remediate | enforce baseline |
Baseline components
- Policy initiatives (security + compliance + tagging).
- Network architecture templates (hub/spoke).
- Central logging and alert routing.
- Identity governance (PIM, CA, access reviews).
Governance loop
Discover -> Classify -> Control access -> Audit -> Improve
Must-have controls
- Conditional access baseline (MFA, risk-based controls).
- PIM for privileged roles, no standing admin.
- Access reviews for sensitive groups and apps.
Enterprise design rules
- Separate duties: platform subscription vs app subscriptions.
- Shared services: DNS, firewall, logging, identity baseline.
- Delegated admin patterns for ops teams.
FinOps basics
- Budgets per subscription/resource group.
- Enforced tags for chargeback/showback.
- Reservations/savings plans for steady workloads.
- Anomaly alerts for sudden spikes.
| Area | Action | Verification |
|---|---|---|
| Compute | rightsize + autoscale | CPU/mem utilization review |
| Storage | lifecycle to cool/archive | access patterns |
| Logs | retention + sampling | signal vs spend |
| Network | reduce egress | traffic analytics |
Tiered resilience model
Tier 0: multi-zone + multi-region + automation (expensive)
Tier 1: multi-zone + backups + fast restore (balanced)
Tier 2: single zone + backups (cheap)Cost-sensitive architecture rules
- Cache static content at the edge when possible.
- Keep chatty microservices in the same region/network domain.
- Be careful with cross-region data movement.
Practical logging strategy
- Keep high-value logs in hot retention, archive the rest.
- Use sampling for high-volume telemetry.
- Define retention by system criticality and compliance.
- Alerting: focus on SLO violations and security events.
Platform checklist
Landing zone
- management groups
- policy initiatives (deny risky configs)
- central logging workspace
- hub/spoke network + DNS
- key vault baseline
- budgets + tagsSecurity checklist
Security
- PIM + MFA + conditional access
- managed identities
- private endpoints for data
- WAF at edge
- defender posture backlog
- sentinel detection + response playbooksaz CLI basics
az login
az account show
az group create -n rg-demo -l westeurope
az resource list -g rg-demo -o table
az monitor metrics list --resource
az policy assignment list -o table Ops checklist
Ops
- SLO dashboards
- alert routing + on-call
- runbooks (self-heal)
- backup restore drills
- patch cadence
- postmortem discipline