Search the choice, symptom, architecture fork, or risky change.
Make the Azure decision before you build, expose, scale, or change.
Search a choice, service pair, architecture fork, or risky change. AzureGlossary turns similar-sounding Azure options into decision briefs with use-when guidance, common traps, related terms, command paths, and safe-first next steps.
Use Compare when the question is not “what is this?” but “which choice is safer for this job?”
See the concrete use case for each side, the trap people usually miss, and the evidence to collect.
Jump to related terms, runbooks, or command searches.
Collect discovery output before you modify Azure.
Pick the kind of Azure decision you are making.
97 specific decision briefs available. Start broad, then open the decision brief that matches your job. 97 practical decision briefs available, now including certification-friendly Azure tradeoffs. f430 keeps Compare anchored to real decisions, search previews, command paths, and related runbooks.
Advanced filtersDecision areas and search shortcuts stay available without crowding the first choice.
ACR admin user vs managed identity pull
Should image pulls use a broad registry credential or a scoped Azure identity?
ACR admin user enables broad registry username/password access; managed identity pull uses Azure identity and RBAC for secretless image access.
Use managed identity pull when an Azure workload can authenticate to ACR with scoped role assignment and no shared password.
Use the ACR admin user only for narrow break-glass or legacy scenarios where identity-based pull is not available.
- Managed identity can be scoped, rotated by platform lifecycle, and audited through role assignments.
- ACR admin user is a registry-wide credential and can become a shared secret.
- Kubernetes, Container Apps, App Service, and VM image pulls each have different identity wiring steps.
Leaving the ACR admin user enabled because it fixed a pull failure once, then spreading the same credential across environments.
- Check current registry auth settings and role assignments before enabling or disabling admin user.
- Verify the workload identity has AcrPull at the right scope.
- Test image pull with a non-production revision, pod, or deployment before production rollout.
ACR tag vs digest
Do you need a human-friendly image label or an immutable reference to exact image content?
A tag is a mutable human-friendly reference; a digest is a content-addressed identifier for an image manifest.
Use a tag when humans and pipelines need a readable release label such as latest, prod, or v1.2.3.
Use a digest when the deployment must prove the exact image bytes that are running.
- Tags can be moved to point at different image content.
- Digests identify immutable image content and are better for evidence, rollback, and supply-chain checks.
- Production investigations should record both the tag used and the resolved digest.
Deploying by a mutable tag and later being unable to prove what image actually ran during an incident.
- Resolve the image digest before rollout and capture it in release evidence.
- Check running pods, revisions, or web app containers for the resolved image reference.
- Avoid using latest for production unless the pipeline records immutable provenance elsewhere.
Action group vs Alert rule
Are you defining the condition that detects a problem, or the notification/action path after detection?
An alert rule decides when a signal should fire; an action group decides who or what gets notified or triggered.
Use an alert rule to define scope, signal, condition, threshold, evaluation frequency, and severity.
Use an action group to define recipients, webhooks, automation, ITSM, or other actions triggered by one or more alerts.
- Alert rules detect; action groups notify or trigger response.
- One action group can be reused by many alert rules.
- Bad action groups create noise even when alert logic is correct.
Troubleshooting alert noise only by changing thresholds, while the real problem is a shared action group or receiver configuration.
- List alert rules and their linked action groups.
- Confirm receivers, escalation path, and suppression rules before enabling.
- Test notification routing with non-production scope where possible.
AI Foundry project vs Azure AI services resource
Are you organizing an AI development workspace or controlling the deployed service endpoint that apps call?
AI Foundry projects organize development assets and collaboration; Azure AI services resources expose service endpoints, keys, identity, networking, and billing boundaries.
Use an AI Foundry project when the exam scenario is about building, evaluating, or coordinating model, prompt, agent, and app assets.
Use an Azure AI services resource when the scenario is about endpoint access, keys, network rules, identity, region, quota, or billing.
- A project is a workspace-level organizing concept; a service resource is the deployable Azure control-plane boundary.
- Resources carry keys, endpoints, managed identity, private endpoints, and diagnostic settings.
- Projects help learners reason about app lifecycle, but production security still lives at the resource, identity, and network layers.
Assuming a project setting secures all deployed endpoints. Exam scenarios often separate development workspace choices from resource-level access and network controls.
- Identify whether the question is asking about development organization or runtime endpoint control.
- Check service resource keys, identity, network rules, and deployments before release.
- Open the AI services runbook before changing keys, networking, or model deployments.
AI Search indexer vs manual ingestion
Should the search service crawl/pull from a data source or should your application push indexed documents?
Indexers pull content from supported data sources; manual ingestion pushes prepared documents into an index from app or pipeline code.
Use an indexer when the source is supported and scheduled pull/indexing behavior fits the data refresh pattern.
Use manual ingestion when the content requires custom transforms, event-driven updates, non-supported sources, or strict app-controlled indexing.
- Indexers depend on data-source credentials, skillsets, schedules, and indexer status.
- Manual ingestion shifts transformation, retries, and error handling into app/pipeline code.
- Exam questions often reveal the answer through data-source support, refresh cadence, or required transformations.
Debugging query relevance while ignoring indexer failures or stale source-to-index refresh status.
- Check indexer status and error messages before changing index schema.
- Confirm whether the source is supported by an indexer.
- Use manual ingestion when transformations or event timing require app ownership.
AKS system vs user node pool
Is the node pool meant to keep AKS system components healthy, or to run application workloads?
System node pools host critical cluster components; user node pools are meant for application workloads and specialized capacity.
Use the system node pool for critical cluster services and add-ons required for the AKS control experience.
Use user node pools for application workloads, scaling profiles, GPU/spot nodes, isolation, or workload-specific sizing.
- System pools protect cluster operations; user pools carry app workload capacity.
- Scaling or tainting the wrong pool can break DNS, ingress, monitoring, or critical add-ons.
- User pools let teams separate cost, autoscale behavior, OS image, SKU, and workload placement.
Running most application pods on the system pool, then making node pool changes that affect cluster reliability.
- List node pools, modes, taints, labels, and current pod placement before changes.
- Check system add-on health and pending pods before scaling or upgrading pools.
- Plan workload movement and surge capacity before changing production node pools.
AKS Workload Identity vs Managed Identity
Does a Kubernetes workload need its own Azure identity, or is the identity attached at the Azure resource level enough?
AKS Workload Identity federates Kubernetes service accounts to Entra identities; managed identity is the Azure identity used by resources or cluster components.
Use AKS Workload Identity when individual pods or service accounts need scoped Azure access without node-level secrets.
Use managed identity directly when the Azure resource itself can hold the identity and pod-level federation is not needed.
- Workload Identity maps Kubernetes service accounts to Azure identity through federation.
- Managed identity is the Azure identity object, but AKS workloads still need a safe way to assume it.
- The blast radius depends on service account binding, federated credential setup, and assigned Azure roles.
Granting broad identity permissions to the cluster or node path instead of the specific workload that needs access.
- List service accounts, federated credentials, and role assignments before enabling access.
- Use the smallest Azure scope and role that satisfies the workload.
- Test token acquisition and resource access from a non-production pod before rollout.
API key vs managed identity for AI service access
Should the app call the AI service with a stored key or Azure identity?
API keys are shared secrets for service access; managed identity uses Azure identity and RBAC where supported for secretless access.
Use API keys when the service path or SDK requires key-based access and key storage/rotation is controlled.
Use managed identity when the app and service support identity-based access and you can scope roles without storing secrets.
- Keys are simple but can leak and must be rotated.
- Managed identity reduces secret handling but requires role assignment and supported service integration.
- Exam scenarios often mention Key Vault, app settings, identity, or least privilege as clues.
Putting an AI service key directly in app settings or code because it made the first demo work.
- Check whether the target AI service supports identity-based access for the operation.
- If keys are used, store them in Key Vault and rotate them.
- List role assignments and app identity before changing access.
API Management vs Application Gateway
Do you need API product management and policies, or Layer 7 ingress and WAF routing?
API Management governs APIs with products, subscriptions, operations, and policies; Application Gateway routes web traffic and can apply WAF controls.
Use API Management when the job is publishing APIs, applying API policies, managing subscriptions, versioning, and developer access.
Use Application Gateway when the job is regional HTTP routing, TLS termination, backend pools, path rules, and WAF protection.
- API Management understands APIs, operations, products, subscriptions, policies, and developer portals.
- Application Gateway understands listeners, routing rules, probes, backend pools, and WAF at the application ingress layer.
- They can be used together: gateway for ingress/WAF, APIM for API governance.
Expecting Application Gateway to replace API lifecycle management, or using APIM as a generic WAF/load balancer.
- Draw the request path from client to backend before choosing.
- Check TLS, private networking, WAF, authentication, API policies, and subscription-key needs.
- Review existing routes, products, policies, and backend health before changing exposure.
App Service plan vs Functions hosting plan
Are you hosting a long-running web app/API, or event-driven functions with plan-specific scaling behavior?
App Service plans allocate web app compute; Functions hosting plans determine event-driven compute scale, cold-start, networking, and runtime behavior.
Use an App Service plan when hosting web apps, APIs, and containers that need app service features, scaling, and predictable web hosting behavior.
Use a Functions hosting plan when the workload is event-driven and plan choice affects scale, cold start, networking, and execution limits.
- App Service plan capacity is shared by App Service apps.
- Functions plans include Consumption, Premium, and Dedicated choices with different scaling and networking behavior.
- A Function App can run on an App Service plan, but that is not the same as the Functions Consumption or Premium model.
Treating all Functions deployments as serverless consumption, then missing Premium plan, VNet, and cold-start requirements.
- Review runtime, scaling, networking, and cost requirements.
- List app settings and deployment source before moving plans.
- Use deployment runbooks before production slot or scale changes.
App Service slot swap vs direct production deployment
Can the release be warmed, tested, and swapped, or is a direct production deployment acceptable?
Slot swap stages a release in a deployment slot and swaps traffic; direct production deployment changes the live app path immediately.
Use a deployment slot when you need warm-up, validation, sticky settings, quick rollback, or lower-risk production cutover.
Use direct production deployment only for low-risk changes where rollback, warm-up, and configuration drift are already controlled elsewhere.
- Slot swap changes routing between slots; direct deployment changes the active production app in place.
- Slots need attention to sticky app settings, connection strings, health checks, and warm-up behavior.
- Direct deployment is simpler but leaves less room to catch startup, dependency, or configuration failures before users see them.
Swapping without checking slot-specific settings, then sending production traffic to an app with staging secrets, wrong endpoints, or cold startup failures.
- Compare app settings, connection strings, runtime stack, and health checks between slots.
- Warm and test the target slot before swap; record the rollback path.
- Use Activity Log and deployment logs to capture who changed what and when.
App Service vs Azure Functions
Is this a continuously hosted web/API app, or event-driven code that should run on triggers?
App Service is application hosting; Azure Functions is event-driven serverless execution with triggers and bindings.
Use App Service for web apps, APIs, containers, or long-running app processes that need a stable hosting surface.
Use Azure Functions when work is trigger-driven, event-based, scheduled, or naturally split into small units of execution.
- App Service centers on sites, plans, slots, runtime stacks, and always-on application hosting.
- Functions centers on triggers, bindings, scale plans, host settings, and execution duration.
- App Service usually gives simpler web-app control; Functions can reduce idle cost but adds trigger, cold-start, and plan-choice concerns.
Choosing Functions for a normal web app because it sounds cheaper, then fighting cold starts, long execution, and trigger complexity.
- Identify request pattern, execution duration, state needs, and deployment workflow first.
- Check runtime stack, scaling plan, VNet/private access, and observability needs before choosing.
- If moving between platforms, map secrets, managed identity, custom domains, and CI/CD separately.
App Service vs Static Web Apps
Do you need a server-side app platform, or mostly static front-end hosting with managed build and routing?
App Service hosts dynamic web apps and APIs on App Service plans; Static Web Apps focuses on static front ends, preview environments, and integrated backend APIs.
Use App Service when the app needs server-side runtime control, custom containers, background processes, slots, or richer networking options.
Use Static Web Apps for static front ends, JAMstack-style sites, branch environments, and simple integrated API/back-end patterns.
- App Service hosts dynamic server workloads; Static Web Apps optimizes static assets, routes, auth, and preview environments.
- App Service exposes more plan, runtime, networking, and deployment-slot controls.
- Static Web Apps can simplify front-end delivery but may not fit complex server-side hosting or custom network requirements.
Putting a dynamic app into Static Web Apps and later discovering that runtime, private networking, or deployment needs belong in App Service.
- Confirm whether the workload serves static files, dynamic server code, APIs, or containers.
- Check custom domain, auth, API, and environment requirements before migrating.
- Review deployment tokens, GitHub workflow, and backend links before production changes.
Application Gateway vs Front Door
Is the entry point regional and VNet-oriented, or global edge-facing?
Application Gateway is regional HTTP load balancing; Front Door is global edge routing with endpoints, origins, routes, caching, and WAF policies.
Use Application Gateway for regional Layer 7 load balancing, private backend integration, and WAF close to the VNet/application region.
Use Front Door for global edge entry, anycast routing, acceleration, global failover, and edge WAF behavior.
- Application Gateway is regional; Front Door is a global edge service.
- Application Gateway often sits near private backends; Front Door fronts global public entry and origin routing.
- DNS, TLS, WAF policy, health probes, and origin exposure differ significantly.
Choosing based only on WAF without deciding where traffic should enter: global edge or regional network boundary.
- Map client locations, origin regions, private/public origin requirements, and failover needs.
- Inspect current listeners, origins/backends, probes, TLS, and WAF policies.
- Test health probe and failover behavior before changing production routing.
Application Security Group vs Network Security Group
Are you grouping application NICs for rule targeting, or defining the actual network allow/deny rules?
An Application Security Group groups VM NICs for rule targeting; an NSG is the actual traffic-filtering control applied to subnets or NICs.
Use an Application Security Group to group VM NICs by application role so NSG rules can target them cleanly.
Use a Network Security Group to define inbound and outbound security rules on subnets or NICs.
- ASGs are labels/targets; NSGs are the rule enforcement point.
- NSGs attach to subnets or NICs and evaluate priority-based rules.
- ASGs can make NSG rules more readable but do not filter traffic by themselves.
Creating ASGs and assuming traffic is filtered without confirming NSG rules reference them.
- List NIC ASG membership and NSG associations before changing rules.
- Check effective security rules on the VM/NIC to see what is actually applied.
- Review rule priority, direction, source, destination, and port before allowing exposure.
Availability zone vs Availability set
Do you need zone-level regional resiliency, or only VM fault/update-domain separation for a single workload tier?
Availability zones separate resources across datacenter zones in a region; availability sets spread VMs across fault and update domains inside a datacenter placement model.
Use availability zones when supported services need regional-zone separation and the architecture can handle zone-aware networking, storage, and failover behavior.
Use availability sets for classic VM availability grouping where zone placement is not required or not supported by the deployment pattern.
- Zones are a regional resiliency feature; availability sets are a VM placement feature.
- Zones usually affect network, storage, and cross-zone latency design; availability sets mostly affect VM fault/update-domain placement.
- Newer architectures often prefer zones where service and region support are available; availability sets still appear in VM-focused scenarios.
Assuming an availability set protects against a full zone failure. It does not provide the same regional-zone isolation as availability zones.
- Check whether the region and target service support zones.
- List VM, disk, public IP, load balancer, and app dependencies before choosing a resiliency model.
- Open related backup and VM runbooks before changing production placement.
Azure Advisor recommendation vs Policy compliance
Are you looking at optimization guidance, or checking compliance against assigned rules?
Advisor recommendations suggest improvements; Policy compliance reports whether resources meet assigned governance rules.
Use Advisor recommendations to find cost, reliability, performance, operational, and security improvement opportunities.
Use Policy compliance when resources must be evaluated against required organizational rules and effects.
- Advisor is recommendation-driven; Policy is assignment/evaluation-driven.
- Advisor findings are not the same as policy non-compliance.
- Policy can deny, audit, append, modify, or deploy settings depending on effect.
Treating Advisor cleanup as mandatory compliance remediation without checking business owner, exemption, and policy requirements.
- Separate recommendation triage from compliance remediation.
- Check resource owner, tag, and impact before acting.
- Use related runbooks to collect evidence before changes.
Azure AI Content Safety vs Prompt Shields
Are you filtering harmful content or defending the prompt/tool flow from malicious instructions?
Content Safety classifies harmful content; Prompt Shields help detect prompt-injection or jailbreak-style attacks against AI systems.
Use Content Safety when the scenario is detecting or filtering harmful user/model content categories.
Use Prompt Shields when the scenario is about prompt injection, jailbreaks, or malicious instructions in user/input documents.
- Content Safety focuses on content category risk; Prompt Shields focus on attack patterns against instructions and grounding.
- Production systems may need both, plus human review and logging.
- Prompt-injection risk becomes more important when agents, tools, or retrieval are involved.
Assuming content moderation alone protects agent tools or system prompts from malicious retrieved text.
- Classify the threat: harmful content, prompt injection, data leakage, or unsafe tool action.
- Log and review blocked/flagged cases before loosening controls.
- Test with adversarial prompts and retrieved content, not only normal happy-path prompts.
Azure AI Search vs Azure OpenAI
Are you trying to retrieve grounded content, generate model output, or combine both in a RAG flow?
AI Search retrieves and ranks grounded content; Azure OpenAI generates, summarizes, reasons over, or transforms content through deployed models.
Use Azure AI Search when the hard problem is indexing, filtering, ranking, or retrieving approved content from your own corpus.
Use Azure OpenAI when the hard problem is generation, summarization, extraction, reasoning, or transformation through a deployed model.
- AI Search owns indexes, analyzers, vector fields, data sources, indexers, and retrieval ranking.
- Azure OpenAI owns model deployments, prompt behavior, token usage, and generation quality.
- Many production designs use both: Search retrieves evidence, then OpenAI produces the answer grounded on that evidence.
Treating Azure OpenAI as a knowledge base. Without retrieval, permissions, and grounding, model output may not reflect your approved data.
- List the data sources, indexes, and model deployments involved before choosing the pattern.
- Test a real query with expected citations or source IDs, not only a demo prompt.
- Check indexing cost, token cost, access controls, and private-network requirements before production rollout.
Azure AI Vision OCR vs Document Intelligence
Are you reading text from images or extracting structured fields from documents?
AI Vision OCR reads text from images; Document Intelligence is built for structured document extraction and document-specific models.
Use AI Vision OCR when the scenario is image text extraction or visual analysis without document-specific field extraction.
Use Document Intelligence when the scenario is forms, invoices, layouts, tables, fields, or document-processing workflows.
- Vision OCR is image-centric; Document Intelligence is document-centric.
- Document Intelligence adds layout, prebuilt models, and custom extraction choices.
- The right answer often depends on whether the exam scenario asks for fields/tables or raw text.
Using generic OCR for document workflows that require field confidence, tables, or model-specific extraction.
- Identify whether the input is an image, document, form, or invoice.
- Check whether raw text is enough or structured fields are required.
- Review endpoint, key, region, and network access before deployment.
Azure budget vs Cost alert
Are you setting planned spend guardrails, or detecting unusual cost movement after it happens?
An Azure budget tracks planned spend and threshold notifications; cost alerts or anomaly signals help detect unexpected changes in actual spend.
Use budgets to define expected spend boundaries and threshold notifications for a scope.
Use cost alerts, anomaly review, Advisor, or cost analysis when you need to investigate unexpected cost changes and drivers.
- Budgets are proactive guardrails; cost anomaly review is investigative.
- Budgets need correct scope and recipients.
- Cost investigation usually needs tags, resource ownership, and usage evidence.
Creating a budget without validating scope, contact path, or the resources that actually drive spend.
- Check budget scope, thresholds, and action groups.
- Review month-to-date cost by service, resource group, and tag.
- Use cost runbooks before deleting or resizing resources.
Azure Firewall vs Network Security Group
Do you need centralized stateful network policy, or subnet/NIC-level traffic filtering?
Azure Firewall is a managed network security service for centralized inspection and policy; an NSG filters traffic at subnet or network-interface scope.
Use Azure Firewall for centralized inspection, rule collections, threat intelligence, routing control, and shared network security policy.
Use Network Security Groups for subnet or NIC allow/deny rules close to workloads.
- Azure Firewall is a managed firewall appliance/service in the network path.
- NSGs are distributed rule sets applied to subnets or NICs.
- Firewall policy often works with route tables; NSG behavior is inspected through effective security rules.
Adding firewall rules while traffic still bypasses the firewall because routes or subnet associations were not checked.
- Map routes, subnets, next hop, and NSG rules before editing firewall policy.
- Check firewall logs and rule collection priority before adding broad allows.
- Use effective routes and effective NSG output to prove the actual path.
Azure Monitor metrics vs logs
Can a numeric time-series signal answer the question, or do you need queryable event/detail evidence?
Metrics are numeric time-series signals optimized for fast alerting and charts; logs are queryable records that provide richer event and diagnostic context.
Use Azure Monitor metrics for fast numeric signals such as CPU, requests, latency, failures, or capacity counters.
Use logs when you need detailed records, correlation, joins, text search, or custom KQL investigation.
- Metrics are lightweight, structured time series and usually better for fast alerting.
- Logs are richer and queryable but depend on ingestion, retention, schema, and cost.
- Good observability uses metrics for symptoms and logs for explanation.
Sending everything to logs and creating expensive noisy alerts for questions that a simple metric could answer.
- Define the question before choosing signal type: symptom, cause, audit, or evidence.
- Check available metrics, diagnostic settings, tables, and retention before enabling new ingestion.
- Estimate alert noise and ingestion cost before production rollout.
Azure Policy vs Management Locks
Do you need to govern what configurations are allowed, or prevent writes/deletes at a scope?
Policy evaluates compliance and can deny or remediate; locks directly restrict delete or update operations at a scope.
Use Azure Policy to audit, deny, append, or remediate resources based on compliance rules.
Use management locks when the immediate requirement is to prevent accidental delete or write operations at a scope.
- Policy evaluates resource configuration and can enforce governance at scale.
- Locks block operations regardless of why the operation was requested.
- Locks can interrupt legitimate automation; policy can block deployments if definitions or exemptions are wrong.
Using a lock as a governance program or using policy to protect a critical resource from accidental deletion when a lock is the direct control.
- Check current policy assignments, exemptions, and locks before changing either control.
- Confirm the target scope and whether automation relies on write/delete operations.
- Record compliance or lock evidence before applying enforcement.
Azure RBAC vs Azure Policy
Is the problem who can act, or what configurations are allowed once someone acts?
RBAC controls who can act; Azure Policy controls what resource states are allowed or remediated.
Use Azure RBAC to control which identities can perform actions at management or data scopes.
Use Azure Policy to control, audit, or remediate resource configuration regardless of which allowed identity deployed it.
- RBAC answers who can perform an operation.
- Policy answers whether the resulting resource state is allowed or compliant.
- Strong governance usually uses both: least-privilege RBAC plus policy guardrails.
Trying to fix broad Owner access with policy alone, or trying to enforce tagging and regions with RBAC alone.
- List role assignments and policy assignments at the same scope before changing access.
- Identify whether the failure is authorization or policy denial.
- Use exemptions and custom roles carefully, with evidence and owner approval.
Azure Resource Graph vs Log Analytics KQL
Are you trying to find Azure resources and configuration or analyze collected logs and telemetry?
Resource Graph queries Azure resource inventory/control-plane state; Log Analytics queries collected telemetry, logs, and table data.
Use Resource Graph when the question is inventory, configuration, exposure, ownership, policy, or resource relationships across subscriptions.
Use Log Analytics KQL when the question is runtime events, logs, metrics-derived tables, errors, alerts, or historical telemetry in a workspace.
- Resource Graph is read-only resource inventory; Log Analytics depends on data collection and workspace retention.
- Resource Graph is excellent before changes; Log Analytics is excellent after activity has happened.
- Exam scenarios often reveal the answer by saying “resources configured as” versus “logs/events show.”
Expecting Resource Graph to contain app logs or expecting Log Analytics to know every resource that lacks diagnostic settings.
- Classify the question as inventory/configuration or telemetry/events.
- Use Resource Graph before risky changes to find affected resources.
- Use Log Analytics after diagnostics are enabled and data has been collected.
Azure SQL Database vs SQL Managed Instance
Do you need a managed database surface, or SQL Server instance-level compatibility?
Azure SQL Database is database-scoped PaaS; SQL Managed Instance preserves more instance-level SQL Server compatibility. Also covers: Azure SQL Database vs SQL Managed Instance.
Use Azure SQL Database when the unit of management is a database or elastic pool and platform-managed simplicity is the priority.
Use SQL Managed Instance when the workload needs broader SQL Server compatibility, instance-level features, or easier lift-and-shift behavior.
- Azure SQL Database optimizes database-level PaaS management and scaling.
- Managed Instance keeps more instance-level capabilities and network isolation patterns.
- Compatibility, cross-database behavior, SQL Agent needs, networking, and migration path should drive the choice.
Choosing Azure SQL Database for a lift-and-shift workload that quietly depends on instance-level SQL Server features.
- Inventory SQL Server features, cross-database dependencies, jobs, linked servers, and network requirements.
- Check restore, failover, backup, and maintenance expectations before choosing.
- Run a migration compatibility assessment before committing the target service.
Azure SQL vs Cosmos DB
Is the workload relational with SQL-style constraints, or globally distributed NoSQL with partition-first access patterns?
Azure SQL is managed relational data; Cosmos DB is globally distributed NoSQL and multi-model data with tunable consistency and throughput.
Use Azure SQL when relational modeling, joins, transactions, SQL compatibility, and structured reporting are central.
Use Cosmos DB when the workload needs globally distributed, low-latency NoSQL access with a well-designed partition key.
- Azure SQL favors relational schema, query consistency, and mature SQL ecosystem behavior.
- Cosmos DB favors partitioned NoSQL scale, multiple APIs, global distribution, and RU-based throughput planning.
- The Cosmos DB partition key and access pattern are architecture decisions, not afterthoughts.
Choosing Cosmos DB for scale without proving the partition key, query pattern, and RU cost model.
- Map entities, query patterns, transaction boundaries, and data growth before choosing.
- Estimate throughput, indexing, region, and consistency requirements.
- Prototype the highest-volume query path and measure latency and cost.
Backup vault vs Recovery Services vault
Which vault type matches the workload you need to protect and the recovery operation you must prove?
Backup vault is used by newer Azure Backup workloads; Recovery Services vault supports classic backup and site recovery scenarios.
Use Backup vault for workloads supported by Azure Backup's newer data-protection model and policy flow.
Use Recovery Services vault where the workload, migration state, or recovery feature set still depends on that vault model.
- Vault choice is workload-driven; not every protected item can move freely between vault types.
- Backup policies, recovery points, soft delete, and cross-region behavior must be checked per vault.
- Operational teams need to know which vault owns the protected item before testing restore.
Creating a vault because the name sounds right, then discovering the workload or restore flow is supported by the other vault type.
- List protected items, policies, and recent backup jobs before changing vault setup.
- Confirm restore requirements with the application or data owner, not only the infrastructure team.
- Run a non-destructive restore-readiness review before any production recovery action.
Bicep vs ARM Template
Do you want cleaner infrastructure authoring, or raw ARM JSON compatibility for generated or legacy templates?
Bicep is a cleaner language that compiles to ARM template JSON; ARM templates are the JSON deployment representation.
Use Bicep when humans will author and maintain the Azure infrastructure code.
Use ARM templates directly when you already receive generated JSON, need exact low-level representation, or maintain legacy deployments.
- Bicep is a higher-level authoring language that compiles to ARM JSON.
- ARM templates are the underlying deployment format, but they are more verbose and harder to review by hand.
- Both deploy through ARM, so what-if, scope, parameters, and deployment mode still matter.
Assuming Bicep makes a deployment safe. The same wrong scope, parameters, or complete-mode deployment can still create or delete resources.
- Run what-if at the correct scope before applying either format.
- Review parameters, deployment mode, and target scope with the resource owner.
- Check deployment history and operation errors before rerunning a failed deployment.
Blob Storage vs Azure Files
Does the workload need object storage, or a managed file share protocol?
Blob Storage is object storage; Azure Files provides managed SMB/NFS file shares.
Use Blob Storage for object storage, HTTP/REST access, data lakes, backups, media, logs, and application objects.
Use Azure Files when applications or users need SMB/NFS-style shared file access with directory semantics.
- Blob is object storage; Azure Files is managed file share storage.
- Azure Files has share quota, protocol, identity, mount, and snapshot considerations.
- Blob has containers, tiers, lifecycle, versioning, immutability, and SAS/public access considerations.
Using Azure Files as a general object store or Blob Storage as a drop-in replacement for a mounted file share.
- Confirm required access protocol, permissions, performance, backup, and restore behavior.
- Check public access, SAS/key usage, private endpoints, and lifecycle rules before exposing data.
- Test client access from the actual network and identity path.
Blob Storage vs Data Lake Storage Gen2
Do you need general object storage, or hierarchical namespace and analytics-oriented lake behavior?
Blob Storage is object storage; Data Lake Storage Gen2 adds hierarchical namespace and analytics-friendly ACL semantics.
Use Blob Storage for general object workloads where containers, blobs, tiers, and lifecycle policy are the main concerns.
Use Data Lake Storage Gen2 when analytics workloads need hierarchical namespace, directory semantics, and lake-oriented access patterns.
- Data Lake Storage Gen2 is built on storage accounts with hierarchical namespace enabled.
- Hierarchical namespace changes directory operations, ACL behavior, and analytics integration.
- The choice affects access control, tooling, migration, and analytics performance patterns.
Enabling or choosing lake behavior without checking how existing apps, permissions, and tools interact with hierarchical namespace.
- Identify analytics engines, access methods, ACL needs, and migration expectations.
- Check storage account hierarchical namespace, private access, and identity model.
- Test representative read/write/list operations before migrating production data.
Container Apps vs AKS
Do you need a managed container app platform, or full Kubernetes cluster control?
Container Apps hides most orchestration and infrastructure; AKS gives Kubernetes-level control for platform teams that need it.
Use Container Apps when you want revisions, ingress, scale rules, and managed container hosting without operating a Kubernetes cluster.
Use AKS when you need Kubernetes APIs, node pools, custom networking, cluster add-ons, or deep platform control.
- Container Apps abstracts much of the cluster and gives app-level revisions and scaling.
- AKS exposes the Kubernetes control surface and requires cluster, node, network, and upgrade ownership.
- The right choice depends on operational ownership as much as application architecture.
Choosing AKS for every container workload before confirming the team is ready to operate Kubernetes safely.
- Identify required ingress, private networking, secrets, registry access, scale rules, and observability.
- Check whether the team needs Kubernetes-native APIs or just managed container deployment.
- Compare production change workflow: Container Apps revision rollout vs AKS manifests, ingress, and node upgrades.
Container Apps vs Container Instances
Is this a managed app with ingress, revisions, and scale rules, or a simple container run?
Container Apps is an application platform with revisions, ingress, Dapr, jobs, and scale rules; Container Instances is simpler container execution.
Use Container Apps for long-running services, APIs, event-driven apps, revisions, traffic splitting, and managed scaling.
Use Container Instances for simple container groups, short-lived jobs, quick tests, or cases where app-platform features are unnecessary.
- Container Apps provides app platform behavior such as revisions, ingress, environment, and scale rules.
- Container Instances is lower-level and simpler for direct container execution.
- Secrets, registry auth, networking, and logs are managed differently across the two services.
Building a production service on Container Instances and then recreating app-platform features manually.
- Confirm whether the workload needs ingress, revision rollback, autoscaling, or service discovery.
- Check image pull, secrets, environment variables, and networking before choosing.
- For production services, document rollout and rollback behavior before deployment.
Control Plane vs Data Plane
Are you managing the Azure resource itself, or accessing the workload/data inside that resource?
The control plane manages resource configuration; the data plane accesses or operates service data.
Think control plane when the action creates, configures, scales, secures, or deletes Azure resources.
Think data plane when the action reads, writes, processes, queries, or serves the data/application inside the resource.
- Control-plane permissions usually flow through Azure Resource Manager and RBAC at scope.
- Data-plane permissions may use service-specific roles, keys, tokens, connection strings, or network rules.
- A user can have control-plane access without data-plane access, or the reverse, depending on service design.
Granting Contributor and assuming that automatically allows reading secrets, blobs, databases, or messages.
- Identify whether the requested action is resource management or data access.
- Check both Azure RBAC and service-specific access controls before changing permissions.
- Collect the exact error message because authorization failures often reveal the missing plane.
Conversational Language Understanding vs custom text classification
Is the app interpreting user intent in conversation or classifying text/documents into categories?
CLU predicts conversational intents/entities; custom text classification assigns documents or text into categories.
Use CLU when the app needs intents, entities, utterances, and conversational routing.
Use custom text classification when the app needs to label text or documents into categories without conversational intent routing.
- CLU is dialog/intent oriented; classification is category-label oriented.
- CLU design depends on utterances, intents, and entities.
- Classification design depends on labeled examples and category boundaries.
Using a conversational intent model for every NLP problem even when the requirement is document classification.
- Read the requirement for intents, entities, categories, and expected output.
- Choose sample data that mirrors real user utterances or documents.
- Check endpoint, deployment, and confidence thresholds before production.
Cosmos DB database vs container
Are you defining a namespace and throughput boundary, or the item collection with partitioning and indexing?
A Cosmos DB database is a logical namespace; a container is the unit where items, partitioning, indexing, and throughput are usually applied.
Use the Cosmos DB database as the logical grouping and, when configured, a shared throughput boundary for containers.
Use the container as the place where items live, partition keys are enforced, indexing is configured, and most query behavior is shaped.
- Databases organize containers and can hold shared throughput in some designs.
- Containers hold items and define partition key, indexing policy, TTL, and throughput if not shared.
- Most performance and cost problems show up at the container and partition-key level.
Treating the database as the main scale boundary and ignoring the container partition key that actually governs item distribution.
- List databases, containers, throughput mode, and partition keys before schema changes.
- Check hot partitions, indexing policy, and RU consumption at the container level.
- Do not change partitioning strategy without a migration plan.
Cosmos DB logical vs physical partition
Are you reasoning about your partition key value, or the backend partitions Azure uses to serve data?
Logical partitions group items by partition key value; physical partitions are internal storage and throughput partitions managed by Cosmos DB.
Use logical partition thinking when designing item distribution, partition key values, and query patterns.
Use physical partition thinking when investigating throughput distribution, hot partitions, storage growth, and backend scaling behavior.
- Logical partitions are based on your partition key values.
- Physical partitions are Azure-managed infrastructure that host logical partitions.
- Hot logical partitions can create throttling even when total account throughput looks sufficient.
Looking only at total RU/s and missing that one logical partition key is carrying most of the traffic.
- Inspect partition key design, high-volume keys, RU consumption, and throttling evidence.
- Review queries that do not include the partition key.
- Do not change partition key strategy without a data migration plan.
Cosmos DB manual vs autoscale throughput
Is demand predictable enough for fixed RU/s, or variable enough to pay for autoscale headroom?
Manual throughput fixes provisioned RU/s; autoscale adjusts RU/s within a configured range as workload demand changes. Also covers: Cosmos DB manual throughput vs autoscale; Manual vs Autoscale Throughput.
Use manual throughput when workload demand is steady, understood, and managed against a fixed RU/s target.
Use autoscale throughput when traffic varies and you need the service to scale within a configured maximum.
- Manual throughput gives predictable capacity and cost when usage is stable.
- Autoscale can absorb bursts but requires careful max RU/s selection to avoid surprise spend.
- Both modes still depend on partition key quality and query efficiency.
Switching to autoscale to hide inefficient queries or hot partitions instead of fixing the access pattern.
- Review RU consumption, throttling, hot partitions, and peak windows before switching modes.
- Set autoscale max based on measured demand, not guesswork.
- Capture before/after metrics after changing throughput mode.
Cosmos DB serverless vs Provisioned throughput
Is the workload intermittent and unpredictable, or does it need predictable reserved throughput and scale controls?
Serverless charges for request units consumed; provisioned throughput reserves RU/s capacity at database or container scope.
Use serverless when usage is intermittent, smaller, or early-stage and you want consumption-based billing without managing RU/s.
Use provisioned throughput when workloads need predictable RU/s, autoscale/manual controls, or production capacity planning.
- Serverless removes RU/s planning but has different limits and cost behavior.
- Provisioned throughput requires partition and RU planning.
- Autoscale provisioned throughput is different from serverless billing.
Choosing serverless for a workload that needs predictable sustained throughput, then being surprised by limits or cost shape.
- Estimate workload request patterns before choosing mode.
- Review partition key and hot partition risk.
- Use Cosmos runbooks before scaling or changing backup/failover settings.
Data Factory vs Synapse Pipelines
Do you need standalone data integration orchestration, or orchestration inside a Synapse analytics workspace?
Data Factory focuses on data integration; Synapse pipelines bring similar orchestration into an analytics workspace. Also covers: Data Factory Pipeline vs Synapse Pipeline.
Use Data Factory when the primary job is data movement, transformation orchestration, integration runtime, and pipeline scheduling across services.
Use Synapse Pipelines when orchestration is part of a Synapse workspace with SQL pools, Spark, and analytics development in the same environment.
- Data Factory is the standalone integration service and often fits cross-platform orchestration.
- Synapse Pipelines share workspace context with Synapse SQL, Spark, linked services, and workspace security.
- Integration runtime, private endpoints, triggers, and rerun behavior still need safe review in either service.
Choosing based on UI similarity and overlooking workspace governance, runtime networking, and who owns reruns after failures.
- List pipelines, triggers, linked services, and integration runtimes before migration or rerun.
- Confirm whether the pipeline depends on Synapse workspace assets.
- Capture failed activity output before changing triggers or rerunning production pipelines.
Dedicated SQL Pool vs Serverless SQL Pool
Do you need provisioned warehouse capacity, or on-demand SQL over lake data?
Dedicated SQL pools reserve data warehouse capacity; serverless SQL pools query lake data on demand.
Use a dedicated SQL pool when workloads need provisioned MPP warehouse capacity, predictable performance, and controlled loading/query patterns.
Use a serverless SQL pool for exploratory or on-demand querying over files in the lake without provisioning a warehouse.
- Dedicated pools have allocated capacity and require pause/resume, scale, distribution, and workload management decisions.
- Serverless pools charge per query/data processed and depend heavily on file layout, format, and query efficiency.
- Dedicated is a data warehouse choice; serverless is often an exploration, external table, or lightweight query choice.
Running broad serverless queries over poorly partitioned data and being surprised by performance or scan cost.
- Confirm workload pattern, concurrency, data size, and performance objective.
- Review pool state, SKU, pause/resume behavior, and query history before changes.
- Test representative queries against real file layout before committing.
Delegated vs Application Permission
Should the app act as the signed-in user, or act on its own without a user present?
Delegated permission acts with a signed-in user; application permission lets an app act as itself without a user.
Use delegated permissions when the app should operate in the context of a signed-in user and respect that user's access.
Use application permissions for daemon, service, or automation scenarios where the app acts as itself.
- Delegated permission combines app consent with the user's identity and effective access.
- Application permission can operate without a user and often has broader tenant or resource impact.
- Admin consent, token claims, and audit evidence differ between the two models.
Granting application permission to solve a user-flow issue, creating an app-only identity with more access than intended.
- Confirm whether a user is present and whether the app should inherit user restrictions.
- Review app registration API permissions and granted consent before changing scopes.
- Collect sign-in/audit logs after consent changes.
Diagnostic setting vs Activity Log
Are you collecting resource telemetry, or investigating control-plane changes in Azure?
Activity Log records subscription-level control-plane events; diagnostic settings route selected platform logs and metrics to destinations like Log Analytics, Storage, or Event Hubs.
Use diagnostic settings to route supported resource logs and metrics to a destination such as Log Analytics, Event Hubs, or Storage.
Use Activity Log when the question is who changed Azure resources, when, and whether the control-plane operation succeeded.
- Diagnostic settings are configured per resource or scope to export telemetry categories.
- Activity Log records Azure control-plane operations at subscription scope.
- Diagnostic settings can increase ingestion cost; Activity Log may not contain data-plane details.
Looking for application or data-plane logs in Activity Log, or enabling every diagnostic category without cost review.
- Identify whether the evidence needed is control-plane change, resource log, metric, or application telemetry.
- List current diagnostic settings and destinations before enabling more categories.
- Capture Activity Log around the incident window before changing resources.
Diagnostic settings vs data collection rule
Are you routing Azure platform logs from a resource or configuring an agent/data collection pipeline?
Diagnostic settings route platform logs and metrics from resources; data collection rules define how monitoring agents and collection pipelines gather telemetry.
Use diagnostic settings when the scenario is sending resource platform logs/metrics to Log Analytics, Event Hubs, or Storage.
Use a data collection rule when the scenario is Azure Monitor Agent, VM/guest telemetry, collection transforms, or DCR associations.
- Diagnostic settings are attached to Azure resources and categories.
- DCRs are agent/pipeline configuration objects with associations.
- Both can affect cost and data volume, so exam scenarios often include destination and collection scope clues.
Enabling all log categories everywhere without estimating ingestion cost or retention impact.
- List supported diagnostic categories first.
- Check Log Analytics retention and table cost before turning on verbose logs.
- Verify DCR associations when troubleshooting missing VM telemetry.
Document Intelligence layout model vs prebuilt model
Do you need generic document structure or fields from a known document type?
Layout extracts structure from documents; prebuilt models target known document types and common fields.
Use layout when the task is extracting text, tables, selection marks, and structure across varied documents.
Use a prebuilt model when the document type matches a supported form, invoice, receipt, ID, or other known pattern.
- Layout is broader and structure-oriented; prebuilt models are targeted and field-oriented.
- Prebuilt models reduce custom training when the document type matches.
- Layout is often a step before deciding whether custom extraction is needed.
Training a custom model before testing whether layout or a prebuilt model already solves the extraction need.
- Identify the document type and fields required.
- Try layout/prebuilt extraction with sample documents.
- Use custom extraction only when the generic or prebuilt path misses required fields.
DTU vs vCore
Do you want a simplified blended sizing model, or more transparent compute, memory, and licensing control?
DTU is a bundled performance model; vCore separates compute, memory, and storage choices more explicitly.
Use DTU when a simple bundled performance model is enough and the workload is not asking for detailed hardware or license control.
Use vCore when you need clearer CPU/memory/storage sizing, hardware family choice, reserved capacity, or Azure Hybrid Benefit planning.
- DTU bundles compute, storage, and IO into an abstract unit.
- vCore exposes compute and storage choices more directly and aligns better with SQL Server licensing conversations.
- Migration between models should be tested with workload metrics, not only pricing tables.
Changing purchasing model to reduce cost without checking CPU, IO, storage, and query wait evidence.
- Review current database metrics, wait stats, storage, and service objective.
- Estimate cost with production workload peaks and licensing assumptions.
- Test the target tier before changing a production database.
Event Grid vs Event Hubs
Are you routing discrete events to handlers, or ingesting a high-volume event stream?
Event Grid routes discrete events; Event Hubs ingests high-volume streams.
Use Event Grid when producers emit discrete events that subscribers should react to through filtering and delivery semantics.
Use Event Hubs when the workload ingests telemetry, logs, or ordered streams at high throughput for consumers to process.
- Event Grid is event routing with subscriptions, filters, retry, and dead-letter behavior.
- Event Hubs is streaming ingestion with partitions, consumer groups, throughput, and capture options.
- Event Grid pushes events to handlers; Event Hubs is read by consumers from a stream.
Using Event Grid as a telemetry stream or Event Hubs as a simple notification router without checking delivery and consumer semantics.
- Define event volume, ordering needs, subscriber count, retry behavior, and dead-letter requirements.
- Inspect existing subscriptions, filters, consumer groups, and throughput before changes.
- Test failure handling with a non-production event before changing production routing.
Event Hubs vs Service Bus
Is the workload a telemetry/event stream, or brokered business messaging?
Event Hubs is high-throughput streaming; Service Bus is enterprise brokered messaging for commands, workflows, and business transactions.
Use Event Hubs for high-throughput event ingestion where consumers read from partitions and handle stream processing.
Use Service Bus when messages need broker features such as queues/topics, locks, sessions, dead-lettering, and business process reliability.
- Event Hubs optimizes stream ingestion and multiple consumer groups.
- Service Bus optimizes commands, work items, pub/sub messages, and delivery guarantees.
- Ordering, replay, lock duration, sessions, and dead-letter handling are key decision points.
Choosing Event Hubs for work queues that need per-message settlement, retries, sessions, or dead-letter operations.
- Clarify whether consumers process streams or individual business messages.
- Check ordering, retry, dead-letter, replay, and throughput requirements.
- Inspect current lag, dead-letter count, and consumer behavior before migration.
Event Hubs vs Service Bus vs Event Grid
Do you need stream ingestion, brokered messaging, or event routing?
Event Hubs is for high-throughput event streams, Service Bus is for brokered enterprise messaging, and Event Grid is for reactive event routing.
Use Event Hubs when the core need is high-volume stream ingestion and consumer-group processing.
Use Service Bus for durable business messaging or Event Grid for event notification and routing to handlers.
- Event Hubs is the stream; Service Bus is the broker; Event Grid is the event router.
- Consumer behavior differs: stream readers, message receivers, or event handlers.
- Retry, ordering, dead-letter, filtering, and replay expectations should decide the service.
Picking the service with the familiar name instead of writing down delivery semantics and failure behavior.
- Classify the workload as telemetry stream, work queue, pub/sub message, or event notification.
- Document ordering, replay, dead-letter, retry, and fan-out needs.
- Run a small failure-mode test before changing integration routes.
ExpressRoute vs Site-to-site VPN
Do you need private provider-backed hybrid connectivity, or is encrypted internet VPN connectivity enough?
ExpressRoute provides private connectivity through a connectivity provider; site-to-site VPN uses encrypted tunnels over the public internet.
Use ExpressRoute when requirements call for private connectivity, predictable performance, provider integration, or enterprise hybrid network architecture.
Use site-to-site VPN when encrypted connectivity over the internet is acceptable, the environment is smaller, or VPN is needed as a backup path.
- ExpressRoute is private provider connectivity, not encrypted internet tunneling by default.
- VPN is usually faster to start but may have lower predictability than private circuit connectivity.
- Many enterprise designs use both: ExpressRoute primary, VPN backup.
Treating ExpressRoute as automatically encrypted end-to-end or ignoring VPN backup and route failover behavior.
- Document latency, bandwidth, encryption, and failover requirements.
- Check route propagation and gateway SKU choices before deployment.
- Use network discovery before changing hybrid routing.
Foundry agent vs chat completion
Does the solution need tool-using, stateful task orchestration or a direct model response to a prompt?
Agents coordinate tools, memory, orchestration, and multi-step tasks; chat completions produce model responses for direct prompt-and-response interactions.
Use an agent when the scenario requires tools, actions, retrieval, multi-turn task state, or workflow-like behavior.
Use chat completion when the scenario is a direct prompt, system message, grounding payload, and model response without managed agent orchestration.
- Agents add orchestration concepts that must be secured and monitored beyond the model call.
- Chat completion remains simpler and easier to reason about for stateless prompt/response flows.
- Agent scenarios usually require more attention to tool permissions, data boundaries, and human approval points.
Using an agent for every generative AI scenario. On exams, the simpler model call is often enough when no tool orchestration is required.
- List tools, data sources, and actions the agent can call.
- Separate model output risk from tool-execution risk.
- Check identity and approval boundaries before allowing an agent to change data or call APIs.
Front Door WAF vs Application Gateway WAF
Should web protection happen at the global edge or at a regional application gateway?
Front Door WAF protects global edge traffic; Application Gateway WAF protects regional application gateway traffic inside a regional design.
Use Front Door WAF when traffic enters through global edge routing and needs edge-level policy before reaching origins.
Use Application Gateway WAF when traffic is regional, VNet-oriented, or protected at the application gateway layer.
- Front Door WAF applies at the global edge in front of Front Door routes.
- Application Gateway WAF applies regionally on Application Gateway listeners.
- Policy mode, exclusions, managed rules, custom rules, and origin/back-end exposure must be checked in context.
Applying WAF in one layer while the public origin or alternate route bypasses that WAF entirely.
- Trace DNS and client path to confirm which WAF actually receives traffic.
- Review WAF mode, managed rule sets, exclusions, and custom rules before switching to prevention.
- Check origin exposure so users cannot bypass the protected entry point.
Functions Consumption vs Premium
Can the function tolerate cold start and pay-per-execution scaling, or does it need warm capacity and stronger network/runtime control?
Consumption is pure serverless usage billing; Premium adds prewarmed capacity and advanced networking capabilities.
Use Consumption when executions are bursty, short, event-driven, and cold start is acceptable.
Use Premium when the app needs pre-warmed instances, VNet integration, higher scale control, or workloads that do not fit basic Consumption behavior.
- Consumption emphasizes automatic scale and pay-per-execution economics.
- Premium adds always-ready capacity, richer networking, and stronger runtime control at a baseline cost.
- Cold start, execution duration, trigger pressure, and dependency latency should drive the choice.
Optimizing for apparent idle cost while ignoring cold-start, VNet, and throughput requirements that force a later plan migration.
- Check trigger volume, execution duration, memory use, and dependency latency.
- Inspect current plan, app settings, and scale behavior before changing hosting plan.
- Test cold-start and peak-load behavior with representative events.
Generalized vs specialized VM image
Are you creating a reusable clean VM image, or preserving one machine exactly as-is?
Generalized images remove machine identity for reusable provisioning; specialized images preserve state from the source VM.
Use a generalized image when you want reusable VM instances without carrying the source machine identity.
Use a specialized image when you intentionally need to preserve OS state, machine identity, users, and installed configuration.
- Generalized images are prepared to create new machines from a clean identity state.
- Specialized images keep the source VM state and are closer to cloning that machine.
- The wrong image type can cause identity duplication, domain issues, or unusable provisioning expectations.
Capturing a specialized image when the team expected a reusable base image for scaled deployments.
- Confirm whether the VM should be sysprepped/generalized before capture.
- Record extensions, disks, network assumptions, and identity dependencies.
- Test a new VM from the image before using it in production scale sets or deployments.
Group-based RBAC vs Direct user role assignment
Should access be managed through membership and lifecycle processes, or assigned directly to one person?
Group-based RBAC assigns roles to an Entra group; direct assignment grants a role to an individual user at scope.
Use group-based RBAC when access should follow a team, job role, approval workflow, or lifecycle-managed membership.
Use direct user assignment only for exceptional, temporary, or highly specific cases where a group model is not appropriate.
- Group-based access is easier to review and remove at scale.
- Direct assignments accumulate quickly and are harder to govern.
- Privileged groups still need owner, membership, and PIM review.
Cleaning up direct assignments without checking inherited scope or group membership, which can leave hidden access paths behind.
- List role assignments at the target and parent scopes.
- Separate direct assignments from group-based assignments.
- Review owners, eligible assignments, and privileged roles before cleanup.
Hot vs Cool vs Archive Blob Tiers
How often will the data be read, and how quickly must it be restored?
Access tiers trade read/write frequency, retrieval latency, and storage cost.
Use Hot for data that is actively read, updated, or needed quickly.
Use Cool or Archive when data is rarely accessed and lower storage cost matters more than retrieval speed or transaction cost.
- Hot usually favors frequent access with higher storage cost and lower access friction.
- Cool lowers storage cost for infrequent access but can increase access and early deletion considerations.
- Archive is for long-term retention and requires rehydration before normal reads.
Moving data to Archive to save money without checking restore-time expectations or downstream jobs that still read it.
- Review access patterns, lifecycle rules, retention requirements, and restore-time objectives.
- Check current blob tier distribution and last modified/access evidence before lifecycle changes.
- Test rehydration or restore expectations with the data owner before applying broad policies.
Incremental vs Complete Deployment Mode
Should the deployment update only declared resources, or enforce that the template is the full desired state at that scope?
Incremental mode updates declared resources without deleting omitted ones; complete mode can remove resources not declared at the target scope.
Use incremental mode for most deployments where resources not in the template should remain untouched.
Use complete mode only when the template deliberately represents the entire target scope and deletion impact is approved.
- Incremental mode creates or updates resources in the template without deleting unrelated resources.
- Complete mode can remove resources at the deployment scope that are not represented in the template.
- Scope is everything: resource group, subscription, and nested deployments change the blast radius.
Running complete mode against a shared resource group and deleting resources that were never meant to be managed by that template.
- Run what-if and inspect delete operations before any complete-mode deployment.
- Confirm the target scope and ownership of every resource that would be removed.
- Keep a rollback and recovery plan for deleted resources before applying complete mode.
Integrated vectorization vs custom embedding pipeline
Should Azure AI Search own vectorization or should the application/data pipeline produce embeddings explicitly?
Integrated vectorization lets Azure AI Search handle chunking/vectorization flow; custom pipelines give teams deeper control over preprocessing, embeddings, and storage.
Use integrated vectorization when the goal is simpler AI Search indexing with managed vectorization steps and less custom pipeline code.
Use a custom embedding pipeline when you need exact preprocessing control, external data movement, custom scheduling, or nonstandard chunking/evaluation.
- Integrated vectorization reduces custom code but still needs service identity, indexer, skillset, and data-source permissions.
- Custom pipelines can fit mature data engineering flows but create more moving parts to secure and monitor.
- Both patterns need evaluation for chunk quality, relevance, cost, and stale index behavior.
Choosing vectorization mechanics before defining retrieval quality, permissions, refresh cadence, and cost limits.
- Check data source, index, vector fields, and skillset permissions.
- Measure retrieval quality with expected queries and citations.
- Review indexer failures and service scale before production.
Key Vault key vs Secret vs Certificate
Are you storing a cryptographic key, an application secret, or certificate material that has lifecycle and renewal needs?
Key Vault keys are cryptographic keys, secrets are sensitive values, and certificates manage certificate material and lifecycle.
Use keys when the workload needs cryptographic operations such as encryption, signing, wrapping, or key rotation.
Use secrets for sensitive values and certificates for certificate objects that need issuance, renewal, or binding workflows.
- Keys, secrets, and certificates have different permissions and operations.
- Certificate objects often create backing keys/secrets that must be understood.
- Rotation and access review differ by object type.
Granting broad secret access when the workload only needs a certificate binding or key operation.
- List vault access model and object permissions before changing access.
- Separate app runtime access from human operator access.
- Collect evidence before rotation or purge-protection changes.
Key Vault RBAC vs Access Policy
Should Key Vault data access be governed through Azure RBAC or vault-local access policies?
Key Vault RBAC uses Azure role assignments; access policies use the older vault-level policy model for data-plane permissions.
Use Key Vault RBAC when you want centralized role assignment, scope inheritance, and Azure RBAC governance for vault data operations.
Use access policies when the vault or migration state still depends on the older vault-local permission model.
- RBAC uses Azure role assignments and scope inheritance.
- Access policies are configured directly on the vault and can be easier to inspect locally but less aligned with central RBAC governance.
- Switching models can break secret, key, or certificate access if assignments are not prepared.
Changing the vault access model before proving every application identity has equivalent data-plane access.
- List current vault access model, access policies, and role assignments before changes.
- Map each application identity to required secret/key/certificate operations.
- Test read-only secret access with the target identity before switching models.
Load Balancer vs Application Gateway
Do you need Layer 4 load distribution, or HTTP-aware Layer 7 routing?
Load Balancer operates at transport layers; Application Gateway handles HTTP routing, listeners, backend settings, and WAF integration.
Use Load Balancer for TCP/UDP load distribution, inbound NAT, and network-level balancing without HTTP routing logic.
Use Application Gateway for HTTP/S routing, TLS termination, path/host rules, cookies, probes, and WAF options.
- Load Balancer works at Layer 4 and does not understand HTTP paths or headers.
- Application Gateway works at Layer 7 and can route based on HTTP properties.
- Health probes, backend pools, SNAT/outbound behavior, and TLS ownership differ.
Using Load Balancer for an HTTP app that needs host/path routing or WAF, then trying to recreate Layer 7 behavior elsewhere.
- Identify protocol, TLS termination point, routing rules, and health probe requirements.
- Inspect current backend health and probe configuration before changing traffic flow.
- Check whether outbound/SNAT behavior is part of the design.
Log Analytics workspace vs Application Insights
Are you centralizing operational logs, or instrumenting application behavior and user-facing performance?
Log Analytics workspace is the central query and retention store for logs; Application Insights focuses on application performance, failures, dependencies, availability, and user behavior telemetry.
Use Log Analytics workspace as the query and retention home for operational logs across resources and services.
Use Application Insights when the focus is application telemetry such as requests, dependencies, exceptions, availability, and traces.
- Log Analytics is the broader workspace and KQL query surface for operational data.
- Application Insights adds application performance monitoring concepts and SDK/agent telemetry.
- Many App Insights resources are workspace-based, so workspace retention and cost still matter.
Treating Application Insights as just another log bucket and missing sampling, dependency tracking, availability tests, and app-level signals.
- Map which apps and resources send data to which workspace.
- Check ingestion, retention, sampling, and table cost before expanding telemetry.
- Use App Insights for application symptoms and workspace queries for cross-resource evidence.
Logic Apps Standard vs Consumption
Do workflows need single-tenant runtime control, or simple multi-tenant pay-per-action execution?
Standard runs workflows in a single-tenant model; Consumption runs workflows in a multi-tenant serverless model.
Use Logic Apps Standard when workflows need stronger runtime isolation, local project structure, VNet options, or multiple workflows in one app.
Use Logic Apps Consumption for simpler serverless workflows where per-action billing and multi-tenant managed runtime fit.
- Standard runs on a single-tenant app model with hosting and networking choices.
- Consumption is simpler to start and bills per action/trigger execution.
- Connector availability, networking, deployment model, and cost behavior should be checked before migration.
Migrating to Standard for control without planning hosting cost, deployment workflow, or connector differences.
- List triggers, actions, connectors, integration accounts, and network dependencies.
- Compare execution volume and hosting cost before switching plan type.
- Test workflow runs and connector authentication in a non-production environment.
LRS vs ZRS vs GRS
Do you need local, zone-level, or regional redundancy for the data and recovery objective?
LRS keeps replicas within one physical location, ZRS spreads data across zones, and GRS copies data to a paired secondary region.
Use LRS when local durability is enough and the workload does not require zone or regional redundancy.
Use ZRS or GRS when availability-zone resilience or cross-region disaster recovery is part of the requirement.
- LRS keeps replicas in one region location model; ZRS spreads across availability zones where supported.
- GRS adds cross-region replication for regional disaster recovery scenarios.
- Redundancy choice affects cost, failover behavior, read access options, and compliance expectations.
Choosing the cheapest redundancy tier without confirming recovery-time and recovery-point expectations with the data owner.
- Confirm RTO/RPO, region requirements, read-access needs, and compliance constraints.
- Check current account redundancy and failover readiness before changing production storage.
- Review backup, soft delete, versioning, and restore evidence separately from redundancy.
Managed Identity vs Service Principal
Can Azure manage the identity lifecycle for the workload, or do you need a standalone app identity with credentials?
Managed identity is lifecycle-managed by Azure for a resource; a service principal is an application identity that often requires explicit credential and permission management.
Use managed identity for Azure-hosted workloads that can authenticate without storing client secrets or certificates.
Use a service principal when the identity must exist outside a single Azure resource lifecycle, cross environments, or integrate with systems that need app credentials.
- Managed identity removes credential storage and is tied to Azure resource or user-assigned identity lifecycle.
- Service principals require explicit credential/certificate rotation and ownership tracking.
- Both still need least-privilege role assignments at the correct scope.
Creating service principals with long-lived secrets for Azure workloads that could use managed identity instead.
- List existing identities, credentials, owners, and role assignments before changing authentication.
- Prefer managed identity when the workload platform supports it and scope is clear.
- If service principal remains required, document rotation owner and expiry monitoring.
Management group vs Subscription
Is the decision about organizing governance across subscriptions, or operating resources inside one subscription boundary?
Management groups organize and govern subscriptions; subscriptions are billing, quota, and resource-management containers.
Use management groups to apply governance, policy, and RBAC structure across multiple subscriptions.
Use subscriptions as the billing, quota, access, and resource-management boundary for workloads or environments.
- Management groups are hierarchy and governance containers, not resource deployment locations.
- Subscriptions contain resource groups and resources and carry billing/quota/provider registration concerns.
- Inheritance from management groups can surprise teams working only at subscription level.
Troubleshooting a policy or role assignment only inside the subscription while the effective assignment is inherited from a management group.
- Map the management group hierarchy and subscription placement before changing governance.
- Check inherited policy and role assignments before creating local exceptions.
- Record the exact scope string for every assignment or exemption.
Metric alert vs Log alert
Can a metric threshold detect the condition, or does the alert require a KQL query?
Metric alerts evaluate Azure platform metrics; log alerts run queries over Log Analytics data and can express richer conditions at higher query and ingestion complexity.
Use a metric alert for fast threshold alerts on platform metrics with predictable dimensions.
Use a log alert when the condition requires KQL, joins, text matching, custom tables, or correlated evidence.
- Metric alerts are usually simpler, faster, and less dependent on log ingestion delays.
- Log alerts are more flexible but depend on table schema, ingestion latency, query quality, and cost.
- Alert noise is a design problem: severity, frequency, window, dimensions, and action groups all matter.
Writing a log alert for a simple metric condition, then debugging ingestion delay and noisy KQL instead of using the metric directly.
- Check whether the needed signal exists as a metric with useful dimensions.
- If using logs, test the KQL over the incident window before creating the rule.
- Review action groups, frequency, severity, and suppression strategy before enabling.
Model catalog vs model deployment
Is the scenario about choosing a model or running an application against a configured model endpoint?
A model catalog helps users discover/select models; a deployment is the configured runtime endpoint/capacity that applications call.
Use the model catalog when the task is discovering, comparing, or selecting foundation models and capabilities.
Use a model deployment when the task is runtime access, quota, capacity, monitoring, endpoint names, or application integration.
- Catalog selection is design-time; deployments are runtime resources with cost, quotas, and access patterns.
- Applications usually need deployment names/endpoints rather than catalog entries.
- Operational questions focus on metrics, throttling, capacity, and deployment health.
Assuming selecting a model in a catalog means the application is ready to call it in production.
- Identify model, deployment name, region, quota, and endpoint.
- Check tokens, throttling, metrics, and content safety before release.
- Keep official model availability and pricing checks outside the static exam notes.
NAT Gateway vs Load Balancer outbound rules
Should outbound SNAT be handled at the subnet level, or through load-balancer-tied outbound rules?
NAT Gateway provides managed outbound SNAT for subnets; Load Balancer outbound rules can provide outbound connectivity tied to a load balancer frontend.
Use NAT Gateway when a subnet needs predictable outbound internet SNAT with dedicated public IPs or prefixes.
Use Load Balancer outbound rules when outbound behavior is intentionally tied to a Standard Load Balancer backend pool design.
- NAT Gateway attaches to subnets and becomes the preferred outbound path for resources in those subnets.
- Load Balancer outbound rules depend on backend pool membership and load balancer configuration.
- SNAT exhaustion, public IP ownership, routes, and subnet association drive troubleshooting.
Adding NAT Gateway without checking existing load balancer outbound rules, public IP dependencies, and route tables.
- List subnet NAT Gateway associations, load balancer outbound rules, public IPs, and route tables.
- Check SNAT metrics or connection failure evidence before changing egress.
- Test outbound path from a representative VM or workload after the change.
Online endpoint vs Batch endpoint
Does inference need to respond now, or can it run asynchronously over a batch of inputs?
Online endpoints serve real-time inference; batch endpoints process asynchronous inference jobs over larger inputs. Also covers: ML online endpoint vs batch endpoint; Online Endpoint vs Batch Endpoint.
Use an online endpoint for real-time scoring where callers expect a request/response API and latency matters.
Use a batch endpoint when jobs can run asynchronously over files, datasets, or large input sets and callers can wait for output.
- Online endpoints need traffic routing, authentication, scale, and live availability planning.
- Batch endpoints need job scheduling, input/output storage, retry behavior, and throughput planning.
- Online cost is tied to serving capacity; batch cost is tied to job execution and data movement.
Using an online endpoint for large offline jobs, then fighting timeout, cost, and scaling issues that batch inference is designed to absorb.
- Confirm latency, throughput, authentication, and failure-retry expectations with the consuming app owner.
- Inspect existing endpoint deployments and traffic before changing production inference routes.
- Run a representative test payload and record latency, error rate, and cost before committing.
Owner vs Contributor vs Reader role
Does the identity need to manage access, change resources, or only view configuration?
Owner can manage access and resources; Contributor can manage resources but not access; Reader can view resources without making changes.
Use Owner only when the identity must manage role assignments or fully administer the scope.
Use Contributor or Reader when the identity should modify resources without granting access, or only inspect state.
- Owner can manage access; Contributor can change resources but not grant access; Reader can view configuration.
- Broad Owner at subscription or management-group scope is high-risk.
- Least privilege often means custom roles or narrower scope, not just choosing among the three common roles.
Granting Owner because a deployment failed, when the real missing permission is narrower and temporary.
- List existing assignments, inherited scope, and recent role changes before granting anything.
- Choose the narrowest scope and role that satisfies the task.
- Set review or removal evidence for temporary elevated access.
Policy definition vs Policy initiative
Are you enforcing one governance rule, or grouping several rules into a compliance baseline?
A policy definition describes one rule; a policy initiative groups multiple policy definitions into a larger compliance objective.
Use a policy definition for one specific rule, effect, and parameterized evaluation.
Use a policy initiative when multiple related definitions should be assigned, tracked, and reported together.
- Definitions are individual rules; initiatives are grouped rule sets.
- Initiatives simplify assignment and compliance reporting for baselines.
- Parameters can exist at both definition and initiative assignment layers.
Assigning many separate policy definitions when an initiative would make compliance state and exceptions easier to manage.
- Review existing initiatives before creating a new standalone rule.
- Check assignment scope, parameters, exemptions, and enforcement mode.
- Use policy compliance runbooks before remediating resources.
Practice assessment vs scenario lab
Are you trying to measure readiness or build the experience needed to reason through scenario questions?
A practice assessment checks exam-style recall and reasoning; a scenario lab teaches how Azure decisions behave in a real portal/CLI context.
Use practice assessments to find weak objective areas and get used to exam wording.
Use scenario labs when you need hands-on context for why a service, command, or configuration choice is correct.
- Practice questions expose gaps; labs create durable mental models.
- Labs should be read-only first and low-cost where possible.
- The strongest study loop is Learn objective → Compare decision → command/resource discovery → runbook/lab → practice question.
Only memorizing answers without building the operational context behind scenario questions.
- Use Microsoft Learn as the official objective source.
- Map every missed question to a concept, Compare card, and lab action.
- Use Resource Graph and read-only CLI commands before running any cost-impacting lab.
Prebuilt vs Custom Document Intelligence model
Can a prebuilt extraction model handle the document type, or do your forms require training on your own examples?
Prebuilt models cover common document types; custom models are trained on examples for organization-specific document extraction.
Use a prebuilt Document Intelligence model for common document types where the built-in fields match what you need.
Use a custom model when layouts, field names, language, or business rules are specific to your organization.
- Prebuilt models reduce training effort but limit field control.
- Custom models require labeled examples and validation but can fit your document shape.
- Custom accuracy depends on representative samples, versioning, and ongoing review after document templates change.
Training a custom model before proving that a prebuilt model fails on real samples, or trusting a prebuilt model without testing edge cases.
- Run a sample set through the prebuilt model and record fields, confidence scores, and misses.
- If custom is needed, separate training, validation, and holdout samples before production use.
- Check data privacy, region, and cost before sending production documents through any model.
Private DNS zone vs Public DNS zone
Should this name resolve only inside Azure/private networks, or should it be publicly resolvable?
Private DNS zones resolve names inside linked virtual networks; public DNS zones publish names for public resolution on the internet.
Use Private DNS zones for private endpoints, internal service names, and VNet-linked name resolution that should not be public.
Use public DNS zones when internet clients must resolve the name through public DNS.
- Private DNS depends on VNet links and private records.
- Public DNS is externally resolvable and must be managed with public exposure in mind.
- Private endpoint failures are often DNS failures, not endpoint failures.
Creating a private endpoint but leaving clients resolving the old public name or missing the required private DNS zone link.
- Check which resolver path clients use.
- List private DNS zone links and records before cutover.
- Validate CNAME/A records from the network where the app actually runs.
Private endpoint vs managed identity for AI security
Is the scenario about network reachability or identity-based authorization?
Private endpoints restrict network path; managed identity controls who or what can authenticate and authorize requests.
Use private endpoints when the requirement is to remove public network exposure and route service access over private network paths.
Use managed identity when the requirement is to avoid secrets and grant scoped identity-based access.
- Private endpoint answers “where can traffic come from?”
- Managed identity answers “who is allowed to call or manage this?”
- Strong designs often need both, plus DNS, diagnostics, and least-privilege roles.
Adding a private endpoint and assuming identity/role assignments are no longer needed.
- Separate network controls from identity controls in the scenario.
- Validate private DNS resolution before disabling public access.
- Review role assignments and keys before release.
Private Endpoint vs Service Endpoint
Do you need a private IP connection to a specific resource, or VNet-restricted access to a service endpoint?
Private Endpoint gives a private IP for a specific resource; Service Endpoint extends VNet identity to a public service endpoint without creating a private IP.
Use Private Endpoint when the service should be reached through a private IP in your VNet and public exposure should be minimized.
Use Service Endpoint when the service can remain on its public endpoint but access should be restricted to selected virtual networks/subnets.
- Private Endpoint creates a network interface with a private IP for a specific resource.
- Service Endpoint extends subnet identity to supported PaaS services while the service endpoint remains public-facing.
- Private DNS, approval state, network policies, and public network access settings are common failure points.
Creating a private endpoint and forgetting private DNS, causing clients to keep resolving the public endpoint.
- Check public network access, private endpoint connection state, DNS zone links, and client resolution before changes.
- Map which subnet and workload must reach the service.
- Test name resolution and connection from the consuming network before disabling public access.
Prompt flow vs code-first orchestration
Do you need a managed AI workflow/evaluation surface or application code that owns the orchestration path?
Prompt flow helps design, evaluate, and iterate AI workflows; code-first orchestration gives app teams full control over runtime behavior, deployment, and integration logic.
Use prompt flow when the focus is prompt-chain design, evaluation, traceability, and fast iteration with AI workflow assets.
Use code-first orchestration when the app must own branching, retries, authorization, deployment, telemetry, and integration behavior directly.
- Prompt flow helps with experimentation and evaluation; code owns production-grade control and app-specific lifecycle.
- Code-first paths may integrate better with existing CI/CD and testing standards.
- Prompt flow can clarify model and prompt behavior before teams harden the production implementation.
Treating prompt flow as a complete production architecture without checking identity, observability, deployment, and rollback requirements.
- Map the flow steps to the app runtime path.
- Check how secrets, identities, and telemetry are handled.
- Use a deployment runbook before promoting prompt or code changes.
RAG with vector search vs fine-tuning
Is the problem missing knowledge/evidence or does the model need different learned behavior?
RAG grounds model responses with retrieved content; fine-tuning changes model behavior for patterns, style, or task adaptation.
Use RAG when answers must cite or use fresh, permissioned, domain content without retraining a model.
Use fine-tuning when the model needs to adapt behavior, format, or task patterns that prompts and retrieval cannot solve.
- RAG keeps knowledge in data sources and indexes; fine-tuning changes the model artifact or deployment behavior.
- RAG needs retrieval quality, chunking, indexing, and access control.
- Fine-tuning needs training data quality, evaluation, cost, deployment governance, and rollback planning.
Fine-tuning a model just because the model does not know private company documents. That is usually a retrieval/grounding problem first.
- Identify whether the exam scenario asks for current facts, citations, or private data.
- Test retrieval precision and permission filtering before changing model behavior.
- Compare token, index, and training costs before production.
Recovery point vs Snapshot
Are you using managed backup restore history, or a direct point-in-time copy of a storage/disk object?
A recovery point is managed by backup/recovery tooling; a snapshot is a point-in-time copy of a disk, blob, or file share object.
Use recovery points when restore needs to follow backup policy, retention, vault protection, and workload-aware recovery workflows.
Use snapshots for storage or disk point-in-time copy scenarios where lifecycle, consistency, and retention are managed separately.
- Recovery points are tied to backup policy and vault behavior.
- Snapshots may not provide the same application consistency or retention governance.
- Both can create cost and data-exposure concerns if left unmanaged.
Assuming a snapshot is a full backup strategy without checking retention, consistency, restore process, or ownership.
- Check protected items, backup policy, and recent job status.
- List snapshots and confirm age, owner, and delete safety.
- Test restore path before relying on it for exam scenarios or production.
Resource group vs Resource
Are you managing a lifecycle boundary for related assets, or one specific Azure object?
A resource group is a management container; a resource is the individual service instance being managed.
Use a resource group as the management, deployment, tagging, RBAC, and lifecycle container for related resources.
Use resource-level operations when the change should affect only one Azure resource and not the whole group.
- Resource-group actions can affect every contained resource.
- Resource-level actions are narrower but may still affect dependencies.
- RBAC, locks, policy, tags, and deployment history can exist at both levels.
Deleting or locking at resource-group level when the request was really about one resource.
- Inventory all resources in the group before any group-level change.
- Check locks, policy, role assignments, and dependencies at both group and resource scope.
- Use resource-level action when blast radius should be narrow.
Resource lock vs Azure Policy deny
Should the control protect existing resources from accidental changes or prevent noncompliant configurations from being created?
Resource locks block delete or write actions at a scope; Policy deny prevents noncompliant resource create/update requests based on rules.
Use a resource lock when the goal is to protect a critical resource or scope from accidental delete/write actions.
Use Policy deny when the goal is to enforce configuration rules before resources are created or updated.
- Locks are blunt protection on a resource/scope; policy deny is rule-based governance.
- Locks can surprise operators during legitimate maintenance.
- Policy deny needs assignment scope, parameters, exemptions, and compliance review.
Using locks as a substitute for governance policy or using deny policy without testing exemptions and deployment impact.
- Identify whether the risk is accidental change or noncompliant configuration.
- Review lock inheritance and policy assignments before remediation.
- Test policy in audit mode when possible before deny.
Route table vs Network Security Group
Are you troubleshooting path selection, or traffic allow/deny rules?
Route tables influence where packets go; Network Security Groups decide whether allowed traffic can pass.
Use route table analysis when traffic is taking the wrong next hop, missing a firewall, or bypassing expected routing.
Use NSG analysis when traffic reaches the right path but is allowed or blocked by security rules.
- Routing and filtering are separate layers.
- Effective routes and effective NSG rules should both be checked for VM traffic.
- A firewall path can fail because of either routing or rules.
Changing NSG rules when the real problem is a route table or next-hop issue.
- Check effective routes and effective NSG rules from the NIC/subnet perspective.
- Verify route association and NSG association at subnet/NIC scope.
- Use Network Watcher before changing production networking.
Search indexer vs Skillset
Are you moving data into a search index, enriching it during indexing, or both?
An indexer reads and loads data; a skillset enriches content during indexing before it lands in the search index.
Use an indexer when the job is connecting to a supported data source, detecting changes, and loading documents into an index.
Use a skillset when content needs enrichment such as extraction, chunking, OCR, or AI transformation before it lands in the index.
- An indexer controls data source reads, schedules, field mappings, and indexing runs.
- A skillset controls enrichment steps that transform or add fields during indexing.
- A failed indexer run may be caused by data access, mapping, or skill execution, so inspect both when troubleshooting.
Adding a skillset to fix a data-loading problem. If the indexer cannot reach or map the source, enrichment will not solve it.
- List the data source, index, indexer, and skillset before editing any one of them.
- Run or inspect indexer status in a non-production window if reindexing can change visible search results.
- Capture failing field names, skill errors, and document counts before changing mappings.
Service Bus Queue vs Topic
Should each message go to one consumer path, or be published to multiple subscriptions?
A queue has point-to-point consumers; a topic publishes messages to subscriptions with filters.
Use a Service Bus queue for point-to-point work where one receiver should process each message.
Use a Service Bus topic when publishers should send once and multiple subscriptions receive filtered copies.
- Queues model competing consumers for one work stream.
- Topics add subscriptions and filters for pub/sub fan-out.
- Dead-letter handling, sessions, duplicate detection, and lock duration still matter in both.
Building fan-out by adding multiple consumers to one queue, then discovering only one consumer gets each message.
- Confirm whether the business event has one owner or multiple independent subscribers.
- Inspect subscription filters and dead-letter counts before routing changes.
- Test with duplicate, failed, and delayed messages before production rollout.
Service principal secret vs federated credential
Should automation store a secret or use workload federation to avoid long-lived credentials?
A client secret is a stored credential; a federated credential lets an external identity provider exchange tokens without a long-lived secret.
Use a service principal secret only when federation is unavailable and rotation, storage, and access controls are strong.
Use a federated credential when CI/CD or workload identity can trust an external identity provider without storing a client secret.
- Secrets must be stored, rotated, and protected.
- Federation reduces secret sprawl but requires correct issuer, subject, and audience configuration.
- Exam scenarios often reward eliminating stored secrets when a supported federation path exists.
Treating a secret in a pipeline variable as equivalent to identity-based federation.
- Inventory app credentials before rotation.
- Confirm federated issuer/subject/audience values.
- Use least-privilege role assignments at the narrowest useful scope.
Service SAS vs User Delegation SAS
Should access be signed with storage account authority, or delegated from an Entra-authenticated user?
A service SAS is signed with storage account credentials; a user delegation SAS is secured with Microsoft Entra credentials.
Use a service SAS when you intentionally need a SAS signed with account key authority for supported storage resources.
Use a user delegation SAS for Blob access when Entra-based delegation is available and stronger identity governance is preferred.
- Service SAS relies on storage account key material and can be broad if not scoped carefully.
- User delegation SAS is tied to Entra authentication and is preferred for many Blob scenarios.
- Both still require tight expiry, permissions, IP/protocol restrictions, and evidence of who issued the token.
Issuing a long-lived broad SAS because it is quick, then losing track of who can access the data.
- Confirm resource type, required permissions, expiry, IP range, and protocol before issuing SAS.
- Prefer user delegation where supported and operationally practical.
- Record the issuing identity, scope, expiry, and revocation/rotation path.
Speech to text vs Speech translation
Does the app need transcription only, or transcription plus translation?
Speech to text transcribes spoken audio; speech translation transcribes and translates speech across languages.
Use speech to text when the output should preserve the source language as text.
Use speech translation when the scenario requires translated text or multilingual speech workflows.
- Speech to text is the base transcription workload.
- Speech translation adds language translation requirements and related quality considerations.
- Both require attention to language, audio quality, region, latency, and privacy requirements.
Choosing translation when the scenario only asks for transcription or captions in the original language.
- Identify source language, target language, and latency requirement.
- Test representative audio, accents, and noise conditions.
- Check endpoint and key handling before release.
Storage account key vs SAS
Does the consumer need full account-level key access, or a scoped, time-bound token?
Storage account keys grant broad shared-key access; SAS tokens delegate scoped access for a time, permission set, and resource scope. Also covers: SAS vs storage account access key.
Use storage account keys only when account-level shared-key access is explicitly required and tightly controlled.
Use SAS when the need can be scoped to specific resources, permissions, time windows, IP ranges, or protocols.
- Account keys can grant broad access across the storage account.
- SAS can narrow access but must be generated and governed carefully.
- Disabling shared key, rotating keys, or revoking SAS can break dependent apps if not inventoried first.
Sharing an account key for a narrow upload/download task that should have used a scoped SAS or managed identity.
- Inventory apps using account keys, connection strings, and SAS before rotation or disabling shared key.
- Prefer the narrowest access method that satisfies the consumer.
- Capture expiry, permissions, and owner for every SAS issued.
Storage Queue vs Service Bus Queue
Is this a simple storage-backed queue, or enterprise messaging with broker features?
Storage Queue is simple queue storage; Service Bus Queue is an enterprise broker with sessions, dead-lettering, and advanced messaging. Also covers: Queue Storage vs Service Bus.
Use Storage Queue for simple asynchronous work items where basic queue semantics are enough.
Use Service Bus Queue when the workload needs richer broker features such as sessions, dead-lettering, duplicate detection, transactions, or advanced retry behavior.
- Storage Queue is simpler and tied to storage account patterns.
- Service Bus Queue is built for messaging semantics and operational controls.
- Poison-message handling, ordering, lock duration, and delivery guarantees are the deciding factors.
Starting with Storage Queue for a business process that later needs dead-letter inspection, sessions, or stronger delivery semantics.
- Define ordering, retry, duplicate, poison-message, and dead-letter requirements.
- Inspect queue length, age, failures, and consumer behavior before migration.
- Test failure and replay behavior before production cutover.
System-assigned vs User-assigned managed identity
Should the identity live and die with one resource, or be reusable across resources?
A system-assigned identity is tied to one Azure resource lifecycle; a user-assigned identity is a standalone reusable identity that can be attached to multiple resources.
Use a system-assigned managed identity when one Azure resource needs its own identity tied to that resource lifecycle.
Use a user-assigned managed identity when multiple resources need the same identity or when identity lifecycle must be managed separately.
- System-assigned identity is created and deleted with the resource.
- User-assigned identity is a standalone Azure resource that can be attached to multiple workloads.
- Role assignments and cleanup differ; deleting a workload may not remove user-assigned identity access.
Using a user-assigned identity for convenience and forgetting that its role assignments survive after workloads are deleted.
- List attached resources and role assignments for the identity before changing it.
- Choose system-assigned for simple one-resource access; user-assigned for reuse or stable identity needs.
- Audit stale user-assigned identities and remove unused role assignments.
Tenant vs subscription
Is the exam scenario asking about identity/directory scope or resource/billing/deployment scope?
A tenant is the Microsoft Entra identity boundary; a subscription is an Azure billing and resource-management boundary.
Use tenant when the scenario is users, groups, app registrations, identities, consent, and directory-wide controls.
Use subscription when the scenario is resources, quotas, policies, role assignments, deployments, and billing/cost scope.
- Tenant and subscription are related but not interchangeable.
- RBAC role assignments can be scoped under subscriptions, but identities come from a tenant.
- Many mistakes come from changing the wrong account, tenant, or subscription in CLI context.
Answering a subscription question with a tenant control, or running commands in the wrong subscription after switching directories.
- Run az account show before lab commands.
- Confirm tenant ID and subscription ID separately.
- Use runbooks for RBAC and subscription-scope changes.
User delegation SAS vs Account SAS
Can the SAS be based on Entra delegation, or must it be signed with account key authority?
User delegation SAS uses Microsoft Entra credentials for Blob access; account SAS relies on storage account keys and can authorize broader account operations.
Use user delegation SAS for Blob scenarios where Entra-authenticated delegation is supported and preferred.
Use account SAS only when the required resource type or operation requires account-key signing and the risk is accepted.
- User delegation SAS avoids direct account-key use and ties issuance to Entra authorization.
- Account SAS can cover broader service/resource types but inherits account-key risk.
- Both need narrow permissions, short expiry, and owner-recorded evidence.
Using account SAS by habit even when user delegation SAS would provide a better identity-governed path.
- Confirm the storage service, operation, and client support before choosing SAS type.
- Use short expiry and minimal permissions; record who issued the token and why.
- Review shared-key settings, key rotation impact, and active SAS dependencies.
Vector Search vs Semantic Ranker
Do users need similarity over embeddings, language-aware reranking of text results, or a hybrid of both?
Vector search finds embedding similarity; semantic ranker reranks text results with deeper language understanding. Also covers: Semantic ranking vs Vector search; Vector search vs semantic ranker.
Use vector search when meaning similarity matters, such as finding related passages even when users do not use exact keywords.
Use semantic ranker when text results need language-aware reranking, captions, or answer-style improvements after retrieval.
- Vector search depends on embeddings, vector fields, chunking strategy, and distance similarity.
- Semantic ranker works over text results and semantic configuration; it does not replace an embedding strategy.
- Hybrid retrieval can combine keyword, vector, and semantic ranking, but each layer adds cost and tuning complexity.
Turning on every relevance feature at once and then not knowing whether vector quality, keyword matching, or semantic ranking caused the result.
- Create a small relevance test set with expected documents before changing ranking behavior.
- Check index fields, vector dimensions, semantic configuration, and query mode before production changes.
- Measure quality and latency separately for keyword, vector, semantic, and hybrid queries.
Virtual Machine Scale Set vs Availability set
Do you need a managed scalable fleet, or fixed VM placement for availability of individually managed VMs?
A VM Scale Set manages a group of VM instances as a scalable fleet; an availability set spreads individual VMs across fault and update domains.
Use VM Scale Sets when you need autoscale, uniform/flexible orchestration, rolling upgrades, or a fleet of similar VM instances.
Use availability sets when you have individually managed VMs that need fault/update-domain placement without fleet scaling.
- Scale sets are about fleet management and scaling.
- Availability sets are about VM placement across fault/update domains.
- Scale sets can integrate with load balancers and autoscale rules; availability sets do not provide that fleet lifecycle by themselves.
Using an availability set when the real requirement is autoscale and rolling instance management.
- Clarify whether the workload needs instance individuality or fleet behavior.
- Review load balancing, image, upgrade, and autoscale requirements.
- Check VM/disks/network dependencies before migration.
VNet peering vs VPN Gateway
Are you connecting Azure VNets directly, or do you need encrypted tunnel connectivity across on-premises, branches, or separate network boundaries?
VNet peering connects Azure virtual networks through Microsoft backbone routing; VPN Gateway connects networks through encrypted tunnels, often across on-premises or separate environments.
Use VNet peering for low-latency private connectivity between Azure virtual networks when address spaces do not overlap and transitive routing is not assumed.
Use VPN Gateway when you need IPsec/IKE tunnel connectivity, hybrid connectivity, point-to-site users, or network boundaries that require gateway mediation.
- Peering is not a VPN tunnel and does not automatically provide transitive routing.
- VPN Gateway introduces gateway SKUs, tunnel configuration, routing, and availability design.
- Peering is usually simpler for same-cloud VNet-to-VNet connectivity; gateways are required for many hybrid scenarios.
Building a hub-and-spoke design with peering but forgetting that traffic does not transit unless routing and gateway transit are explicitly designed.
- Inventory address spaces and route tables before connecting networks.
- Check DNS resolution and private endpoint dependencies before cutover.
- Use Resource Graph and network runbooks before changing routes.
These cards help learners explain why one Azure option fits a scenario better than another, then jump into related terms, safe command paths, and Guided Azure Runbooks.
Decision support, not another definition page.
Glossary pages explain terms. Runbooks show safe operational steps. Commands map CLI risk. Compare helps you choose between Azure options before the wrong design, security, cost, or deployment choice becomes expensive.
Methodology explains sourcing, limitations, update rules, and why AzureGlossary links back to Microsoft Learn instead of pretending to replace it.
Read methodologyServices stays separate so Compare remains useful first, not sales-led. Use it when you want a founder-led review applied to your Azure environment.
View servicesAzureGlossary gives the plain-English map: what the choice means, what it touches, and which safe checks to run before implementation details.
Open Command Center