Project Oxygen & Ideo-LabIDEO LAB Dashboard 2026

☁️ Microsoft Azure – Hyper-Dense Cloud Guide

Core building blocks, landing zones, security, networking, compute, data, DevOps, observability, governance, cost, and reference architectures.

Core
Network
Compute
Data
Security
DevOps
Observability
Governance
Cost
1.1

Azure Overview

Regions, global services, subscriptions, resource groups, ARM control plane.

CoreFundamentalsARM
1.2

Identity & Access

Microsoft Entra ID, RBAC, PIM, managed identities, app registrations.

IAMRBACPIM
1.3

Landing Zone

Management groups, policies, network hub/spoke, shared services, guardrails.

PlatformLZGuardrails
1.4

IaC: Bicep / ARM / Terraform

Repeatable deployments, modules, parameterization, CI/CD and drift control.

IaCBicepTerraform
1.5

Reference Architectures

3-tier, microservices on AKS, serverless, event-driven, data platform.

ArchitecturePatternsDesign
1.6

Integration & Messaging

Service Bus, Event Grid, Event Hubs, Logic Apps, API Management.

MessagingAPIMServerless
2.1

Virtual Network (VNet)

Subnets, NSG, UDR, peering, private endpoints, DNS design.

VNetNSGPrivate Link
2.2

Hybrid Connectivity

VPN Gateway, ExpressRoute, BGP, routing, on-prem integration.

HybridExpressRouteRouting
2.3

Load Balancing & Ingress

Azure LB, Application Gateway/WAF, Front Door, Traffic Manager.

L7WAFGlobal
2.4

Network Security

Azure Firewall, DDoS Protection, Bastion, segmentation, zero trust network.

FirewallDDoSBastion
2.5

DNS & Private Resolution

Azure DNS, Private DNS zones, split-horizon, resolver patterns.

DNSPrivate DNSResolver
2.6

Edge & Global Routing

Front Door, CDN, global anycast, caching, multi-region design.

EdgeCDNGlobal
3.1

Virtual Machines

VM sizing, availability sets/zones, disks, images, patching, backup.

IaaSZonesDisks
3.2

AKS (Kubernetes)

Clusters, node pools, ingress, autoscaling, network policy, upgrades.

AKSK8sAutoscale
3.3

App Service

Web Apps, deployment slots, scaling, VNet integration, managed identity.

PaaSSlotsScale
3.4

Functions (Serverless)

Triggers, durable functions, event-driven apps, cold start strategy.

ServerlessEventsDurable
3.5

Containers

ACR, Container Apps, ACI, image scanning, supply chain basics.

ContainersACRSupply Chain
3.6

Windows & Linux Ops

SSH/RDP via Bastion, patching, extensions, monitoring agents, hardening.

OpsBastionHardening
4.1

Storage

Blob, Files, Queues, Tables, ADLS Gen2, tiers, lifecycle, replication.

StorageADLSLifecycle
4.2

Azure SQL

SQL Database, Managed Instance, HA/DR, performance, security, backup.

SQLMIPaaS DB
4.3

Azure Database for PostgreSQL

Flexible Server, HA, read replicas, network private access, tuning.

PostgresHATuning
4.4

Cosmos DB

Partition keys, RU/s, consistency models, multi-region, TTL, change feed.

NoSQLRU/sPartition
4.5

Analytics & Data Platform

Synapse, Databricks, Data Factory, Lakehouse, governance patterns.

AnalyticsETLLakehouse
4.6

Cache & Redis

Azure Cache for Redis, clustering, persistence trade-offs, session store.

CacheRedisLatency
5.1

Security Baseline

Zero trust, identity-first, network segmentation, encryption, logging.

Zero TrustBaselineLogging
5.2

Defender for Cloud

CSPM, recommendations, attack paths, workload protection plans.

CSPMCWPPPosture
5.3

Key Vault

Secrets, keys, certificates, HSM, rotation, managed identity access.

SecretsHSMMI
5.4

Microsoft Sentinel

SIEM/SOAR, connectors, analytics rules, incident response workflow.

SIEMSOARKQL
5.5

Private Link Patterns

Private endpoints, service endpoints, DNS, hub/spoke resolution.

Private LinkDNSHub/Spoke
5.6

Compliance & Data Protection

Policies, encryption at rest/in transit, retention, backup, immutability.

ComplianceRetentionBackup
6.1

Azure DevOps

Repos, Pipelines, Boards, Artifacts, release strategies and gates.

CI/CDPipelinesGates
6.2

GitHub + Azure

Actions, OIDC federation, environments, secrets, deployments.

GitHubOIDCSupply Chain
6.3

Artifacts & Registries

ACR, image signing, SBOM, vulnerability scanning, promotion flows.

ACRSBOMScan
6.4

Release Patterns

Blue/green, canary, ring deployments, feature flags, rollback playbooks.

ReleaseCanaryRollbacks
6.5

IaC in CI/CD

Plan/apply, environments, policy checks, drift detection, approvals.

IaCPolicyDrift
6.6

Platform Engineering

Golden paths, templates, internal developer platform, guardrails.

IDPGolden PathTemplates
7.1

Azure Monitor

Metrics, logs, alerts, action groups, dashboards, workbooks.

MetricsAlertsWorkbooks
7.2

Log Analytics + KQL

Central logging, KQL queries, retention, ingestion cost control.

LogsKQLCost
7.3

Application Insights

APM, tracing, dependencies, sampling, live metrics, SLA metrics.

APMTracingSampling
7.4

SRE Playbooks

SLI/SLO, incident response, runbooks, dashboards, error budgets.

SRESLORunbooks
7.5

Backup & DR

Recovery Services Vault, VM backup, DB backup, multi-region strategy.

BackupDRRPO/RTO
7.6

Automation

Automation accounts, runbooks, update management, self-healing loops.

AutomationRunbooksSelf-heal
8.1

Governance Core

Management groups, subscriptions, RBAC model, naming/tagging standards.

GovernanceMGTags
8.2

Azure Policy

Deny/append/audit, initiatives, remediation tasks, continuous compliance.

PolicyDenyRemediate
8.3

Blueprints / Baselines

Standardized platform baselines, security packs, repeatable landing zones.

BaselineStandardsRepeat
8.4

Data Governance

Purview basics, classification, lineage, access patterns and auditing.

PurviewLineageAudit
8.5

Identity Governance

PIM, access reviews, conditional access, privileged access workflows.

PrivilegedAccess ReviewCA
8.6

Multi-Tenant / Enterprise

Tenants, cross-subscription patterns, shared services, delegated admin.

EnterpriseTenantDelegation
9.1

Cost Management

Budgets, tags, chargeback, reservations, savings plans, anomaly detection.

FinOpsBudgetsChargeback
9.2

Cost Optimization Playbook

Rightsizing, autoscaling, storage tiers, log ingestion control, egress.

OptimizeScaleLogs
9.3

SLA, HA, DR Economics

Multi-zone, multi-region trade-offs, RPO/RTO cost model, tiered resilience.

HADRTrade-offs
9.4

Egress & Network Costs

Outbound data, CDN, private endpoints, architecture decisions that save money.

EgressCDNDesign
9.5

Logging Costs (Real)

Retention, sampling, ingestion filters, archive strategies, SRE vs budget.

LogsRetentionSampling
9.6

Cheat-sheet Azure

Core commands, architecture templates, security defaults, must-have checklists.

cheatchecklistsquickstart
1.1 Azure Overview (Control Plane, Scope Model, Regions)
Scopes and governance

Azure is controlled by ARM. Everything is deployed as resources under a strict scope tree. Governance and security become consistent when you design the scope model first.

Tenant
                            -> Management Groups
                            -> Subscriptions
                            -> Resource Groups
                            -> Resources
Rule: build a platform layer (guardrails) before application subscriptions.
Resource groups
  • Lifecycle boundary (delete RG -> delete everything).
  • RBAC and policy scope boundary.
  • Tagging and cost rollups.
# az quick start
                            az account show
                            az group create -n rg-demo -l westeurope
Regions, zones, and global services
ConceptWhat it isDesign impact
Regiongeographic area with datacenterslatency + compliance
Availability Zoneseparate datacenter zonesHA within region
Paired regionsAzure region pairingDR strategy
Global servicesFront Door, Entra ID, etc.global entry points
Core production rules
  • Identity-first: use Entra ID, managed identities, least privilege.
  • Network-by-default: private endpoints for data services when possible.
  • Logs by design: central workspace, retention policy, alert routing.
  • IaC: everything repeatable, reviewed, and approved.
  • Cost: tags, budgets, and alerting from day one.
1.2 Identity & Access (Entra ID, RBAC, PIM, Managed Identity)
RBAC model
Role typeExampleUse
Built-inReader, Contributorfast baseline
SpecializedKey Vault Secrets Userleast privilege
Customfine-grained actionsplatform teams
Rule: assign roles at the highest safe scope, but avoid broad Contributor in production.
PIM (Privileged Identity Management)
  • Just-in-time elevation for privileged roles.
  • Approval workflows + time-bound assignments.
  • Audit trail for privileged actions.
PIM workflow
                    request -> approve -> activate -> expire
                    log -> review -> enforce
Managed identities (the default for apps)
  • System-assigned: lifecycle tied to resource.
  • User-assigned: reusable identity for multiple resources.
  • Use MI + RBAC to access Key Vault, Storage, SQL, etc.
# az example
                    az webapp identity assign -g rg-demo -n app-demo
                    # then grant RBAC to the identity on the target resource
1.3 Landing Zone (Management Groups, Hub/Spoke, Guardrails)
Landing zone outcome
Goal: repeatable enterprise-ready platform

                    Management Groups:
                    Platform (shared services)
                    LandingZones (apps)
                    Sandbox (experimentation)
                    Decommissioned (quarantine)
LayerWhat you put thereWhy
Platformhub network, identity baseline, central logscontrol + visibility
App subscriptionsworkloadsblast radius control
SandboxR&D, PoCsafe experimentation
Hub/spoke network skeleton
Hub VNet:
                    - Azure Firewall
                    - VPN/ExpressRoute Gateway
                    - Shared DNS/Resolvers
                    - Bastion
                    - Private DNS zones (optional)

                    Spoke VNets:
                    - Workloads (AKS/App Service/VM)
                    - Private endpoints to data services
Rule: centralize egress control and DNS resolution early, or you will fight it later.
Guardrails (policy + security + standards)
  • Policy initiatives: allowed locations, SKU restrictions, enforce tags.
  • Deny public endpoints for data services (when possible).
  • Enforce diagnostics settings to central Log Analytics.
  • Key Vault required for secrets, block plaintext secrets in app config.
Operations foundations
  • Central monitoring workspace(s) + retention strategy.
  • Action groups (email, webhook, ITSM) + incident workflow.
  • Backup and DR baseline per tier.
  • Patch cadence and emergency changes.
1.4 IaC (Bicep / ARM / Terraform) + CI/CD
Bicep essentials
# bicep idea
                    param location string = resourceGroup().location

                    resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = {
                    name: 'stg${uniqueString(resourceGroup().id)}'
                    location: location
                    sku: { name: 'Standard_LRS' }
                    kind: 'StorageV2'
                    }
Rule: build modules for network, identity, monitoring, and reuse them across subscriptions.
Terraform essentials
# terraform idea
                    provider "azurerm" { features {} }

                    resource "azurerm_resource_group" "rg" {
                    name     = "rg-demo"
                    location = "westeurope"
                    }
  • State storage: secure, versioned, locked.
  • Plan/apply with approvals.
  • Drift detection: scheduled plans or policy audits.
Pipeline rules
StageGateWhy
Validatelint + format + policy checkprevent bad patterns
Plandiff reviewhuman control
Applyapproval + change windowsafe production
Postsmoke tests + alertsverify outcome
1.5 Reference Architectures (Patterns that scale)
3-tier baseline (private-first)
Internet
                    -> Front Door / App Gateway (WAF)
                    -> App Service / AKS (private)
                    -> Data (SQL / Postgres / Storage) via Private Endpoints
                    Central:
                    Log Analytics + Key Vault + Firewall + DNS
Rule: prefer private endpoints for data services and limit public exposure to edge only.
Microservices on AKS
  • Ingress controller + WAF at edge.
  • Separate node pools for system and workloads.
  • Managed identity + workload identity for pods.
  • Service-to-service auth (mTLS / JWT) + network policy.
  • Observability: distributed tracing mandatory.
Event-driven serverless
Event sources:
                    - Event Grid (react to changes)
                    - Event Hubs (stream)
                    - Service Bus (commands)
                    Compute:
                    - Functions / Logic Apps
                    State:
                    - Storage / Cosmos / SQL
                    Ops:
                    - Monitor + alerts + dead-letter queues
Data platform (lakehouse-ish)
  • ADLS Gen2 as the central lake storage layer.
  • Ingestion via Data Factory / event pipelines.
  • Compute via Databricks / Synapse (depending on requirements).
  • Governance via Purview + RBAC + data classification.
1.6 Integration & Messaging (APIM, Service Bus, Event Grid, Event Hubs, Logic Apps)
API Management (APIM)
  • Central auth, rate limiting, quotas, IP filtering.
  • Versioning + revision workflow.
  • Policies: transform headers, validate JWT, caching.
  • Developer portal + subscription keys (if needed).
Service Bus (commands, workflows)
FeatureUseNotes
Queuespoint-to-pointdead-letter essential
Topicspub/subfilters per subscription
Sessionsorderingstateful consumers
Event Grid vs Event Hubs
ServiceBest forGotchas
Event Gridreactive events, fan-outdelivery/retry semantics
Event Hubshigh-throughput streamingpartitioning + consumer groups
Logic Apps
  • Fast orchestration for integrations.
  • Connectors for SaaS systems.
  • Use for glue and automation; keep core business logic in code when it matters.
2.1 VNet (Subnets, NSG, UDR, Peering, Private Endpoints)
Subnet design principles
  • Separate subnets for ingress, compute, and private endpoints.
  • Reserve IP space for future growth (avoid painful renumbering).
  • Centralize egress through firewall when required.
Typical layout
                    - snet-ingress
                    - snet-app
                    - snet-data-pe
                    - snet-management
NSG + UDR (routes)
ControlWhat it doesCommon mistake
NSGallow/deny L3/L4too broad inbound rules
UDRforce routing pathbreaking private endpoint DNS/routing
Private endpoints (Private Link)
  • Private IP mapped to a PaaS service.
  • DNS is mandatory: Private DNS zone + links.
  • Prefer private endpoints for storage and databases in production.
Rule: private endpoint without correct DNS resolution equals downtime.
2.2 Hybrid Connectivity (VPN Gateway, ExpressRoute, Routing)
VPN Gateway
  • Fast to start, internet-based.
  • Good for small/medium needs and non-critical workloads.
  • Plan IP ranges and routing early.
ExpressRoute
  • Private connectivity, enterprise-grade.
  • Better reliability and predictable latency.
  • Requires provider integration and more governance.
Rule: treat routing as a first-class system. Document BGP, advertised prefixes, and failover behavior.
2.3 Load Balancing & Ingress (LB, App Gateway/WAF, Front Door)
ServiceLayerBest forKey features
Azure Load BalancerL4TCP/UDP distributionzones, HA ports
Application GatewayL7HTTP ingressWAF, TLS offload
Front DoorGlobalglobal entryanycast, caching, WAF
Traffic ManagerDNSgeo failoverrouting policies
Rule: keep public entry points minimal (edge), then go private inside.
2.4 Network Security (Firewall, DDoS, Bastion, Segmentation)
Azure Firewall
  • Central egress control and logging.
  • Threat intelligence modes (when enabled).
  • Application rules + network rules + DNAT.
Bastion
  • SSH/RDP without public IP exposure.
  • Fits zero-trust ops model.
  • Combine with JIT/PIM for admin access.
Rule: DDoS + WAF + logging + incident workflow is a single system, not separate checkboxes.
2.5 DNS & Private Resolution (Azure DNS, Private DNS, Resolver)
Private endpoint DNS pattern
Private DNS zones:
                privatelink.blob.core.windows.net
                privatelink.database.windows.net
                privatelink.postgres.database.azure.com
                Links:
                zone -> hub vnet
                zone -> spoke vnet(s)
Rule: document name resolution end-to-end (clients -> resolvers -> zones -> targets).
2.6 Edge & Global Routing (Front Door, CDN, Caching)
Global design decisions
  • Use Front Door for global entry + WAF + caching when appropriate.
  • Separate static and dynamic traffic; cache static aggressively.
  • Multi-region active/active needs data strategy, not only routing.
Rule: global routing without data consistency strategy is a half architecture.
3.1 Virtual Machines (Sizing, Disks, Availability, Backup)
Compute sizing approach
  • Start from CPU, memory, and IO needs, not from "standard sizes".
  • Use metrics: CPU %, memory pressure, disk queue, network throughput.
  • Plan right-sizing reviews monthly (FinOps loop).
Disk strategy
DiskBest forNotes
Premium SSDprod workloadsIOPS/throughput tiers
Standard SSDbalanced costgeneral workloads
Ultra Diskvery high IOspecial cases
Availability
  • Prefer Availability Zones when supported.
  • Backups: Recovery Services Vault, test restores regularly.
  • DR: define RPO/RTO, validate runbooks, measure recovery time.
3.2 AKS (Kubernetes on Azure) – Cluster design, security, upgrades
Cluster foundation
  • Separate system and user node pools.
  • Define ingress strategy (App Gateway ingress controller or NGINX with WAF at edge).
  • Autoscaling: cluster autoscaler + HPA.
  • Private cluster when required.
AKS security essentials
  • Use managed identity / workload identity for pods.
  • Enable network policies and restrict east-west traffic.
  • Image scanning and signed images (supply chain controls).
  • Secrets: prefer Key Vault integration patterns.
Operations
  • Upgrade strategy: staged upgrades, maintenance windows.
  • Observability: metrics + logs + tracing; alert on saturation.
  • Backup: workload state (databases) + cluster configs (GitOps).
3.3 App Service (Web Apps) – Slots, scaling, identity, VNet
Why App Service
  • Fast PaaS for web APIs and sites.
  • Built-in scaling, TLS, deployment slots.
  • Managed identity integration for secretless access.
Operational must-haves
  • Use deployment slots for safe releases.
  • Enable diagnostics to Log Analytics.
  • Integrate with VNet if data is private.
  • Set autoscale based on real metrics.
3.4 Functions (Serverless) – Events, Durable, reliability patterns
Serverless reliability checklist
  • Define idempotency keys for event handlers.
  • Use dead-letter queues and alert on them.
  • Control retries to avoid amplification.
  • Measure cold starts and select plan accordingly.
Rule: in event-driven systems, your error handling is part of the architecture.
3.5 Containers (ACR, Container Apps, ACI) – Supply chain basics
ServiceUseStrengthLimit
ACRimage registryRBAC, private, integrationneeds governance
Container Appsserverless containersscale-to-zero, simplecomplex networking cases
ACIrun containers quicklyfastnot a full platform
Rule: secure the pipeline (scan, sign, SBOM) before scaling container adoption.
3.6 Windows & Linux Ops (Access, patching, extensions, hardening)
Ops baseline
  • No public SSH/RDP. Use Bastion + JIT/PIM.
  • Patch: define cadence and emergency patch process.
  • Monitor: agent strategy (VM insights) + alerts.
  • Hardening: CIS-like baseline, disable unused services.
4.1 Storage (Blob, ADLS Gen2, Files, lifecycle, replication)
ServiceUseNotes
Blobobjectshot/cool/archive tiers + lifecycle
ADLS Gen2data lakehierarchical namespace
FilesSMB shareslift-and-shift + enterprise shares
Replication strategy
  • LRS/ZRS: within region durability.
  • GRS/GZRS: cross-region resilience.
  • Choose based on RPO/RTO and cost.
Storage security
  • Prefer private endpoints for production.
  • Use RBAC and managed identity, avoid account keys.
  • Enable soft delete and immutability if needed.
4.2 Azure SQL (DB vs Managed Instance) – HA/DR, security, performance
OfferingBest forNotes
SQL Databasemodern appsserverless/hyperscale options
Managed Instancenear lift-and-shiftmore compatibility
Rule: database performance work is still: measure -> query -> plan -> index -> validate.
4.3 Azure Database for PostgreSQL (Flexible Server) – HA, replicas, private access
Core design points
  • Choose private access for production workloads.
  • Use read replicas for scale-out reads.
  • Define backup retention and test restores.
  • Monitor query latency and connection saturation.
Tuning loop
                1) capture slow queries
                2) EXPLAIN ANALYZE
                3) index / rewrite
                4) verify and measure p95/p99
4.4 Cosmos DB – Partition keys, RU/s, consistency, multi-region
Partition key is the architecture
  • Choose a key that distributes writes and reads evenly.
  • Avoid hot partitions.
  • Model queries first, then pick the key.
RU/s capacity planning
  • RU/s is cost and performance.
  • Measure real request units per operation.
  • Use autoscale only when you understand workload patterns.
Consistency models
LevelTrade-offTypical use
Stronglatency/cost highercritical reads
Sessionbalanceduser-facing apps
Eventualfast/cheapanalytics-ish reads
4.5 Analytics (Synapse, Databricks, Data Factory) – Lakehouse patterns
Data platform design map
Ingest
                -> ADLS Gen2 (raw/bronze)
                Transform
                -> Databricks/Synapse (silver/gold)
                Serve
                -> SQL / BI / APIs
                Govern
                -> Purview + RBAC + audit
Rule: separate storage (lake) from compute (clusters) for cost control and scaling.
4.6 Cache & Redis – Latency, clustering, persistence trade-offs
Design checklist
  • Cache only what you can invalidate or tolerate stale.
  • Define TTLs and eviction policy.
  • Measure hit ratio and latency impact.
  • Prefer managed identity where applicable (service-to-service auth patterns vary).
5.1 Security Baseline (Zero Trust, encryption, logging, segmentation)
Zero trust pillars
  • Verify explicitly (identity, device, location, risk).
  • Use least privilege (RBAC + PIM).
  • Assume breach (segmentation + logging + response).
Baseline checklist
AreaBaselineMinimum
IdentityPIM, MFA, conditional accessno standing admin
Networkprivate endpoints + firewall egressno public DB
SecretsKey Vault + managed identityno secrets in code
Loggingcentral workspace + alertsaudit trail
Common failures
  • Public endpoints left open, no WAF rules.
  • Broad Contributor assignments in production.
  • No DNS plan for private endpoints.
  • Logging disabled to save cost, then blind incidents.
5.2 Defender for Cloud (CSPM + workload protections)
What it gives you
  • Security posture management: recommendations and secure score.
  • Attack path analysis and misconfiguration detection.
  • Workload protections (plans) for key services.
Rule: treat recommendations as an engineering backlog (triage, owners, SLA).
5.3 Key Vault (Secrets, keys, certs) + Rotation + Managed Identity
Core practices
  • Use managed identities for access.
  • Prefer RBAC model consistently across the platform.
  • Enable soft delete and purge protection in production.
Rotation strategy
  • Define rotation ownership per secret.
  • Automate rotation where possible.
  • Alert on near-expiry and failed rotations.
5.4 Microsoft Sentinel (SIEM/SOAR) – Analytics, incidents, response
Sentinel workflow
Connectors -> Log Analytics -> Analytics rules -> Incidents
                -> Triage -> Investigation (KQL) -> Response playbooks -> Postmortem
Rule: detection without response playbooks is only half security.
5.6 Compliance & Data Protection (Encryption, retention, backup, immutability)
ControlImplementationVerification
Encryptionat rest + in transitpolicy + audits
Retentionlogs + backupstest restores + reports
Immutabilitystorage immutability where neededtamper checks
6.1 Azure DevOps (Repos, Pipelines, Boards, Artifacts)
Pipeline baseline
Build
                - tests
                - security scans
                - artifact publish (immutable)
                Deploy
                - environment approvals
                - IaC plan/apply (guardrails)
                - smoke tests
                Operate
                - SLO dashboards + alerts
Rule: every deployment produces evidence (who, what, when, diff, result).
6.2 GitHub + Azure (Actions, OIDC, environments, secrets)
OIDC federation concept
GitHub Actions -> OIDC token -> Azure Entra -> short-lived credentials
                Benefits:
                - no long-lived secrets
                - better auditability
Rule: prefer short-lived credentials and policy-based access for CI/CD.
6.3 Artifacts & Registries (ACR, signing, SBOM, scanning)
Supply chain checklist
  • Immutable tags (promote by digest).
  • Vulnerability scans + policy gates.
  • SBOM generation and storage.
  • Signed images and verification at deploy time.
6.4 Release Patterns (Blue/Green, Canary, Rings, Rollback)
PatternBest forKey requirement
Blue/Greensafe cutovertraffic switch + fast rollback
Canaryrisk reductionmetrics-based promotion
Ringsenterprise rolloutsprogressive exposure
Rule: a release strategy is useless without SLO monitoring and rollback automation.
6.5 IaC in CI/CD (Policy checks, drift, approvals)
Drift control model
Desired state: IaC repo
                Actual state: Azure resources

                Detect drift:
                - scheduled plan
                - policy compliance
                - config audits

                Fix drift:
                - apply with approvals
                - incident if unexpected changes
6.6 Platform Engineering (Golden paths, templates, IDP)
Golden path idea
  • Pre-approved architecture templates (App Service + SQL + Key Vault + Monitor).
  • Reusable pipelines and IaC modules.
  • Guardrails enforced by policy, not tribal knowledge.
Rule: reduce cognitive load for developers while increasing safety for operations.
7.1 Azure Monitor (Metrics, Alerts, Workbooks)
Monitoring system components
Signals:
                metrics (fast)
                logs (deep)
                traces (request path)
                Actions:
                alerts -> action groups -> incident workflow
                Visualization:
                dashboards + workbooks
Rule: alerts must be actionable, owned, and measured (noise is failure).
7.2 Log Analytics + KQL (Retention, ingestion, cost controls)
KQL mindset
KQL structure
                table
                | where ...
                | summarize ...
                | order by ...
Rule: logging without retention and ingestion strategy becomes a cost incident.
7.3 Application Insights (APM, tracing, sampling)
APM essentials
  • Define a correlation id across services.
  • Instrument dependencies and external calls.
  • Use sampling to control ingestion while preserving signal.
7.4 SRE Playbooks (SLI/SLO, incidents, runbooks)
Incident lifecycle
Detect -> Triage -> Mitigate -> Recover -> Postmortem
                Artifacts:
                timeline, root cause, action items, guardrails
Rule: define SLOs and error budgets before debating alert thresholds.
7.5 Backup & DR (RPO/RTO, runbooks, restore tests)
TierRPO/RTO targetTypical design
Tier 0minutesmulti-zone + cross-region
Tier 1hourszone + backups + tested restore
Tier 2daybackups + manual failover
Rule: DR is real only if you run restore drills and measure time.
7.6 Automation (Runbooks, self-healing, patch workflows)
Self-healing approach
  • Detect (alert rule) -> decide (runbook) -> act (automation) -> verify (metrics).
  • Always add safety checks and rate limiting.
Rule: automation without guardrails becomes an outage accelerator.
8.1 Governance Core (MG, subscriptions, naming, tagging)
Naming + tagging standards
Example tags
                env=prod
                owner=platform
                app=payments
                costcenter=cc123
                data_class=confidential
Rule: tags power cost management and incident ownership. Enforce them via policy.
8.2 Azure Policy (Deny, Audit, Append, Remediation)
EffectMeaningUse
Denyblock deploymentcritical guardrails
Auditreport noncompliancevisibility
Append/Modifyadd settingsdiagnostics, tags
DeployIfNotExistsremediateenforce baseline
8.3 Baselines (Blueprint-like approach, guardrails as code)
Baseline components
  • Policy initiatives (security + compliance + tagging).
  • Network architecture templates (hub/spoke).
  • Central logging and alert routing.
  • Identity governance (PIM, CA, access reviews).
8.4 Data Governance (Purview basics, lineage, classification)
Governance loop
Discover -> Classify -> Control access -> Audit -> Improve
Rule: without classification, access control is guesswork.
8.5 Identity Governance (PIM, access reviews, conditional access)
Must-have controls
  • Conditional access baseline (MFA, risk-based controls).
  • PIM for privileged roles, no standing admin.
  • Access reviews for sensitive groups and apps.
8.6 Enterprise (Tenants, shared services, delegation)
Enterprise design rules
  • Separate duties: platform subscription vs app subscriptions.
  • Shared services: DNS, firewall, logging, identity baseline.
  • Delegated admin patterns for ops teams.
9.1 Cost Management (Budgets, tags, reservations, anomaly detection)
FinOps basics
  • Budgets per subscription/resource group.
  • Enforced tags for chargeback/showback.
  • Reservations/savings plans for steady workloads.
  • Anomaly alerts for sudden spikes.
Rule: cost is an engineering metric (like latency). Make it visible.
9.2 Cost Optimization Playbook (Rightsize, autoscale, storage tiers, logs)
AreaActionVerification
Computerightsize + autoscaleCPU/mem utilization review
Storagelifecycle to cool/archiveaccess patterns
Logsretention + samplingsignal vs spend
Networkreduce egresstraffic analytics
9.3 SLA, HA, DR Economics (Tiered resilience)
Tiered resilience model
Tier 0: multi-zone + multi-region + automation (expensive)
                Tier 1: multi-zone + backups + fast restore (balanced)
                Tier 2: single zone + backups (cheap)
Rule: align resilience tier with business impact, not fear.
9.4 Egress & Network Costs (CDN, design decisions)
Cost-sensitive architecture rules
  • Cache static content at the edge when possible.
  • Keep chatty microservices in the same region/network domain.
  • Be careful with cross-region data movement.
9.5 Logging Costs (Retention, sampling, archiving)
Practical logging strategy
  • Keep high-value logs in hot retention, archive the rest.
  • Use sampling for high-volume telemetry.
  • Define retention by system criticality and compliance.
  • Alerting: focus on SLO violations and security events.
Rule: logging is a product. Define what questions it must answer.
Azure Cheat-sheet (Core checklists + quick commands)
Platform checklist
Landing zone
                        - management groups
                        - policy initiatives (deny risky configs)
                        - central logging workspace
                        - hub/spoke network + DNS
                        - key vault baseline
                        - budgets + tags
Security checklist
Security
                        - PIM + MFA + conditional access
                        - managed identities
                        - private endpoints for data
                        - WAF at edge
                        - defender posture backlog
                        - sentinel detection + response playbooks
az CLI basics
az login
                        az account show
                        az group create -n rg-demo -l westeurope
                        az resource list -g rg-demo -o table
                        az monitor metrics list --resource 
                        az policy assignment list -o table
Ops checklist
Ops
                        - SLO dashboards
                        - alert routing + on-call
                        - runbooks (self-heal)
                        - backup restore drills
                        - patch cadence
                        - postmortem discipline