Containers AKS networking premium field-manual-complete

AKS network policy

AKS network policy is the way teams control which pods are allowed to talk to other pods, namespaces, or network destinations inside an AKS cluster. By default, Kubernetes networking is usually open enough that workloads can reach more than they should. Network policy lets you move toward least-privilege traffic rules, such as allowing an API to receive traffic only from a front-end namespace or allowing egress only to DNS and a specific backend. It is microsegmentation for Kubernetes workloads, enforced through the cluster’s networking policy engine.

Aliases
Azure Kubernetes Service network policy, aks network policy
Difficulty
Intermediate
CLI mappings
5
Last verified
2026-05-30T03:55:00Z

Microsoft Learn

An AKS network policy is a Kubernetes traffic-control policy used to restrict pod communication in Azure Kubernetes Service. Microsoft Learn explains that AKS can enforce policies through supported engines such as Cilium, Azure Network Policy Manager, or Calico, depending on the cluster networking configuration.

Microsoft Learn: Secure traffic between pods with network policies in Azure Kubernetes Service (AKS)2026-05-30T03:55:00Z

Technical context

Technically, AKS network policy uses Kubernetes NetworkPolicy objects and the selected AKS networking policy implementation to enforce label, namespace, port, ingress, and egress rules. It sits below application authentication and above the underlying virtual network controls. Network security groups protect node and subnet traffic, but network policies are better suited for pod-to-pod segmentation. In current AKS designs, the available policy engine depends on cluster networking choices such as Azure CNI powered by Cilium, Azure Network Policy Manager, or Calico. Operators must test DNS, probes, service mesh, ingress, and dependency traffic before enforcement.

Why it matters

AKS network policy matters because a compromised or buggy pod should not have free movement across every namespace. Modern clusters often run many services, jobs, agents, and platform components together. Without segmentation, one vulnerable workload can scan internal services, call sensitive APIs, or reach databases it should never touch. Network policy creates an enforceable traffic contract that supports zero trust, compliance, and safer multi-team clusters. It also makes dependency assumptions visible. When teams write policies carefully, they learn which services actually need to communicate. When they skip policy, security reviews depend on diagrams and trust rather than enforceable cluster behavior.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In AKS cluster configuration, az aks show reveals the network plugin, network policy option, and dataplane choices that determine which policy engine enforces pod traffic.

Signal 02

In Kubernetes manifests and kubectl output, NetworkPolicy resources show pod selectors, namespace selectors, ports, ingress rules, egress rules, and the namespaces where segmentation applies. after deployment.

Signal 03

In incident traces, blocked calls appear as timeouts, failed DNS resolution, refused backend connections, missing health probes, or service mesh errors after a policy rollout.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Apply default-deny ingress in a payments namespace while allowing only approved front-end and job namespaces to call specific ports.
  • Restrict egress from application pods so compromised containers cannot scan internal services or reach arbitrary internet destinations.
  • Segment shared AKS clusters by team or environment when namespaces host workloads with different sensitivity levels.
  • Prove compliance segmentation by exporting NetworkPolicy manifests, namespace labels, and test results for regulated workloads.
  • Safely introduce Cilium, Azure NPM, or Calico policy enforcement by testing DNS, ingress, probes, and backend dependencies first.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Education platform contains tenant-to-tenant traffic

An online learning provider used network policy to protect shared AKS namespaces.

Scenario

An online education platform hosted course APIs, grading workers, and analytics services on shared AKS clusters. A penetration test showed that a compromised worker pod could reach APIs belonging to other tenant groups.

Business/Technical Objectives
  • Block cross-tenant namespace traffic by default.
  • Allow only documented API, DNS, ingress, and monitoring paths.
  • Prove segmentation to university security reviewers.
  • Avoid breaking grading jobs during exam weeks.
Solution Using AKS network policy

Platform engineers inventoried service dependencies with application owners, then added namespace labels for tenant group, environment, and shared platform services. They introduced default-deny ingress and egress policies in a staging cluster first. Allow rules permitted front-end namespaces to call course APIs on approved ports, grading workers to reach queues and databases through controlled endpoints, DNS to kube-system, and telemetry agents to monitoring destinations. Azure CLI confirmed the AKS policy engine, while kubectl exports captured NetworkPolicy resources and labels for review. Policies were deployed through GitOps with automated connectivity tests that checked allowed calls and expected denials before promotion.

Results & Business Impact
  • Penetration testers could no longer reach unrelated tenant APIs from compromised worker pods.
  • Exam-week grading job failures stayed at zero during the staged rollout.
  • Security evidence collection dropped from manual screenshots to a 20-minute manifest and test export.
  • The platform team identified and removed 14 undocumented service-to-service paths.
Key Takeaway for Glossary Readers

AKS network policy turns namespace boundaries into enforceable traffic rules instead of relying on application teams to behave perfectly.

Case study 02

Factory telemetry cluster stops noisy lateral scans

A manufacturer used egress controls to reduce risk from third-party collector pods.

Scenario

An industrial equipment manufacturer accepted telemetry from plant-floor gateways through AKS collector services. Security monitoring found third-party collector pods attempting discovery traffic toward internal APIs they did not need.

Business/Technical Objectives
  • Limit collector pod egress to ingestion, DNS, and approved update endpoints.
  • Keep high-volume telemetry ingestion latency within existing limits.
  • Give plant security teams proof that collectors could not scan internal services.
  • Avoid changing NSG rules shared by other cluster workloads.
Solution Using AKS network policy

The AKS team mapped collector traffic by namespace, label, port, and destination service. Instead of changing subnet NSGs, they deployed Kubernetes NetworkPolicy resources that selected only collector pods. Egress rules allowed DNS, the ingestion API, and a controlled package update endpoint, while all other egress was denied. A test harness generated telemetry at production-like volume and compared latency before and after enforcement. Engineers captured az aks show output for the policy engine, kubectl policy descriptions, and packet-level connectivity test results for plant security. The rollout was phased by plant group so operators could rollback one namespace without affecting the rest of the cluster.

Results & Business Impact
  • Unauthorized collector egress attempts dropped by 96 percent in the first week.
  • P95 telemetry ingestion latency changed by less than 3 milliseconds during load tests.
  • Plant security reviews used exported policy and test evidence instead of broad network diagrams.
  • No shared NSG changes were required, avoiding risk to unrelated services on the same subnet.
Key Takeaway for Glossary Readers

AKS network policy provides pod-level control when subnet-level rules are too broad for mixed workload clusters.

Case study 03

Payroll SaaS prevents policy rollout from breaking DNS

A payroll provider used staged network policy tests to avoid a self-inflicted outage.

Scenario

A payroll SaaS provider wanted default-deny network policy after a security audit, but an early prototype blocked DNS and readiness probes, causing pods to restart during a payroll preview window.

Business/Technical Objectives
  • Implement default-deny policies without disrupting payroll preview workloads.
  • Document required DNS, ingress, webhook, and monitoring allowances.
  • Reduce policy troubleshooting time for on-call engineers.
  • Make policy changes reviewable through the same release process as application manifests.
Solution Using AKS network policy

Engineers rebuilt the policy plan around dependency tests. They added explicit allow rules for CoreDNS, ingress controller health probes, admission webhooks, metrics scraping, and approved service calls before enforcing default deny. Each namespace had a policy readme explaining selectors and allowed paths. The GitOps pipeline ran kubectl diff, applied policies to a canary namespace, launched a short-lived test pod, and executed DNS, API, and denied-traffic checks. Azure Monitor alerts watched for new connection timeouts after rollout. On-call runbooks included commands to list matching labels, describe policies, and temporarily roll back the namespace policy manifest through Git.

Results & Business Impact
  • Network-policy-caused pod restarts dropped from 37 during prototype testing to zero in production rollout.
  • On-call diagnosis of blocked traffic fell from over an hour to about 12 minutes using the runbook.
  • All 18 production namespaces adopted a documented default-deny baseline within six weeks.
  • The audit team accepted automated connectivity evidence for segmentation controls.
Key Takeaway for Glossary Readers

AKS network policy succeeds when teams test platform dependencies first, not after a default-deny rule breaks production.

Why use Azure CLI for this?

I use CLI and kubectl for AKS network policy because policy troubleshooting is about live state, not intentions. Azure CLI tells me which network plugin and policy engine the cluster was created with; kubectl shows the actual NetworkPolicy objects, namespace labels, pod labels, endpoints, and events. That combination is essential before blaming the firewall, DNS, ingress, or application code. CLI also lets teams automate preflight checks, export policy evidence, compare namespaces, and test changes in lower environments. The portal can show workload health, but it cannot replace command-line inspection when a selector typo silently blocks production traffic. under production pressure.

CLI use cases

  • Use az aks show to confirm the cluster network plugin, network policy setting, and dataplane before designing policies.
  • Use kubectl get and describe networkpolicy to inspect live selectors, ports, and rule direction in each namespace.
  • Compare pod and namespace labels with policy selectors to find why traffic is unexpectedly allowed or denied.
  • Run temporary connectivity tests from controlled pods to validate allowed and denied paths after a rollout.
  • Export policy manifests and test evidence for security review without relying on portal screenshots.

Before you run CLI

  • Confirm the tenant, subscription, resource group, cluster, namespace, and kubeconfig context before inspecting or applying policies.
  • Check whether the cluster uses Cilium, Azure NPM, or Calico because troubleshooting commands and behavior can differ.
  • Start with read-only kubectl commands, then apply policies only through reviewed manifests or approved pipeline steps.
  • Map DNS, health probes, ingress controllers, service mesh components, monitoring agents, and backend dependencies before enforcing default deny.

What output tells you

  • Cluster output identifies the network policy capability and whether your planned policy style matches the deployed networking configuration.
  • NetworkPolicy output shows which pods are selected, which directions are controlled, and which ports or namespaces are allowed.
  • Pod and namespace label output explains why a policy did or did not match the workload you expected.
  • Connectivity test output proves whether the live cluster behavior matches the intended traffic contract after policy enforcement.

Mapped Azure CLI commands

AKS network policy inspection

direct
az aks show --name <cluster-name> --resource-group <resource-group> --query networkProfile --output json
az aksdiscoverContainers
kubectl get networkpolicy --all-namespaces
kubectl describe networkpolicy <policy-name> --namespace <namespace>
kubectl get pods --namespace <namespace> --show-labels
kubectl apply --filename network-policy.yaml

Architecture context

Architecturally, AKS network policy belongs in the cluster security and traffic design, alongside namespace strategy, ingress, service mesh, private endpoints, NSGs, route tables, DNS, and workload identity. I usually recommend starting with a dependency map, then applying default-deny rules to sensitive namespaces before expanding coverage. Policies should be owned with the application manifests or platform baseline, not edited manually in emergencies. The engine choice matters because Cilium, Azure NPM, and Calico have different capabilities, performance characteristics, and lifecycle considerations. Good architecture separates node-level network controls from pod-level segmentation and includes tests proving that allowed paths still work after policy rollout.

Security

Security impact is direct because AKS network policy controls east-west traffic between workloads. It reduces lateral movement, limits accidental access to internal APIs, and supports compliance requirements around segmentation. However, policy is only as strong as labels, namespace governance, and coverage. A default-deny policy with careless allow rules can still expose sensitive services. Missing egress controls may leave workloads able to reach unexpected destinations. Pair network policy with Kubernetes RBAC, image security, secrets management, workload identity, ingress controls, and Azure network boundaries. Monitor policy changes because a single broad selector can reopen paths that security teams believed were closed. repeatedly.

Cost

AKS network policy does not usually appear as a separate billable line item, but it influences cost through engineering time, node sizing, troubleshooting, and security risk. Some policy engines can add dataplane overhead or require networking choices that affect cluster design. Poorly tested policies create incident cost when teams spend hours tracing blocked DNS or backend calls. Missing policies create security cost if lateral movement causes an incident. Cost-conscious teams invest in reusable policy templates, automated tests, and clear namespace ownership. That effort is cheaper than responding to avoidable breaches or production outages caused by misunderstood service dependencies. during reviews.

Reliability

Reliability risk is real because a correct-looking policy can block DNS, health probes, metrics agents, webhook calls, service mesh sidecars, or backend APIs. Roll out policies gradually, starting in nonproduction and using namespace-level default-deny with explicit allow rules. Keep rollback commands ready, and verify service readiness after each policy change. Reliable teams maintain dependency tests that exercise allowed traffic and confirm denied traffic. They also watch for selector drift when labels change during deployments. The goal is not maximum restriction on day one; it is enforceable segmentation that does not surprise applications, platform components, or incident responders. after deployment finishes.

Performance

Performance impact depends on the policy engine, rule volume, label design, and traffic pattern. Network policy enforcement adds dataplane work, but the bigger performance issue is often operational: blocked or delayed calls caused by missing egress, DNS, or probe allowances. Complex selectors and large rule sets can make troubleshooting slower and sometimes affect packet processing differently across engines. Measure latency and throughput for critical services before and after policy changes, especially in high-connection microservice environments. Keep policies specific but understandable. A policy that nobody can reason about will slow incident response even if raw packet performance remains acceptable. under load.

Operations

Operators inspect AKS network policy with az aks show, kubectl get networkpolicy, kubectl describe, pod labels, namespace labels, endpoint lists, DNS tests, and application traces. Routine work includes confirming the cluster policy engine, reviewing default-deny coverage, testing allowed paths, documenting exceptions, and troubleshooting blocked traffic. During incidents, operators should identify the source pod, destination pod or service, namespace labels, ports, protocols, and policy selectors. Good teams keep network policy files in Git, review them with application releases, and export evidence for security assessments. They avoid manual one-off changes that drift away from source-controlled manifests. Keep evidence current after each release window.

Common mistakes

  • Writing a default-deny policy before allowing DNS, ingress health probes, webhook calls, or monitoring traffic.
  • Using labels that deployment pipelines later change, causing policies to stop matching the intended pods.
  • Assuming NSGs provide pod-level segmentation when they mainly operate at subnet or node boundaries.
  • Deploying policies to production without testing the specific AKS network policy engine in lower environments.
  • Creating broad namespace selectors that reopen traffic paths the policy was meant to close.