Analytics Streaming analytics field-manual-complete field-manual-complete field-manual-complete

Streaming unit

A streaming unit is the capacity dial for an Azure Stream Analytics job. When the job needs more compute and memory to keep up with input events, you adjust streaming units rather than managing servers. More SUs can help a job process higher volume, but they are not magic. The query must be able to use the capacity, the inputs need useful partitioning, and the outputs must keep up. Operators treat SUs as a cost and performance lever that must be measured, not guessed.

Aliases
Azure Stream Analytics streaming unit, Stream Analytics SU, SUs, streaming units
Difficulty
fundamentals
CLI mappings
5
Last verified
2026-05-26T19:53:46Z

Microsoft Learn

Microsoft Learn defines Streaming Units, or SUs, as the compute resources allocated to run a Stream Analytics job. More SUs provide more CPU and memory, and Azure uses them to abstract the hardware required to process streaming queries in a timely way.

Microsoft Learn: Azure Stream Analytics streaming units explained2026-05-26T19:53:46Z

Technical context

In Azure architecture, streaming units belong to the Stream Analytics job runtime and scale model. They influence compute and memory available to the query, interact with streaming nodes, input partitions, query parallelization, compatibility level, and output bottlenecks. They are configured on jobs or allocated through dedicated Stream Analytics clusters depending on the hosting model. Operators observe their effect through SU utilization, CPU and memory metrics, watermark delay, backlogged input events, and output throughput. They sit between control-plane configuration and data-plane processing behavior.

Why it matters

Streaming units matter because they are where streaming performance, reliability, and cost meet. Too few SUs can create backlog, watermark delay, and missed operational timelines. Too many SUs can waste money or hide a query design that does not parallelize. A job with poor partition alignment, expensive joins, JavaScript functions, or slow outputs may not improve just because capacity increases. Engineers need to understand SUs before responding to incidents, planning load tests, or approving FinOps recommendations. The right SU setting is based on measured workload, query shape, partition strategy, and business latency target, not a default left untouched since deployment.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Stream Analytics Scale blade, operators see the current streaming-unit allocation and choose manual or autoscale capacity settings for a job. during capacity reviews

Signal 02

In Azure Monitor, SU utilization appears with watermark delay, backlogged input events, CPU, memory, and output metrics during performance investigations. and incident response capacity triage

Signal 03

In Azure CLI job show output, the SKU or streaming unit field confirms the capacity currently assigned before or after a scaling change. during audits

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Increase capacity for a known seasonal event stream when measured SU utilization and watermark delay approach agreed thresholds.
  • Reduce unnecessary cost after a temporary incident scale-up once traffic, backlog, and watermark delay return to baseline.
  • Validate whether a slow job is capacity-bound or blocked by query shape, hot partitions, reference data, or output throttling.
  • Plan dedicated Stream Analytics cluster capacity by matching multiple jobs to expected SU demand and isolation requirements.
  • Set autoscale or runbook thresholds so operators scale based on evidence rather than portal guesswork during incidents.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Smart-meter program scales during evening demand peaks

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A municipal utility processed smart-meter readings every few minutes for outage and demand dashboards. During heat waves, the Stream Analytics job built backlog after 6 p.m., and operators could not tell whether the fix was more SUs or query redesign.

Business/Technical Objectives
  • Keep watermark delay under two minutes during evening demand peaks.
  • Avoid permanent over-provisioning after heat-wave traffic subsided.
  • Document when operators should scale versus tune the query.
  • Preserve downstream SQL Database stability during higher throughput.
Solution Using Streaming unit

Cloud engineers used Azure Monitor to compare SU utilization, watermark delay, Backlogged Input Events, and SQL output throttling. CLI captured the job configuration before a controlled SU increase from 6 to 12 during a test peak. The team also aligned Event Hubs partitions with the query partition key and reduced unnecessary columns written to SQL. A runbook defined scale-up thresholds, post-peak scale-down checks, and validation commands. Because the output database became the next bottleneck during testing, DBAs adjusted write capacity and indexing before the higher SU setting was allowed in production.

Results & Business Impact
  • Peak watermark delay dropped from 7.9 minutes to 83 seconds.
  • Post-peak scale-down saved an estimated 26% versus leaving the higher SU count all month.
  • SQL output throttling fell by 71% after schema trimming and index review.
  • Operators resolved the next heat-wave backlog without engineering escalation.
Key Takeaway for Glossary Readers

Streaming units are effective when scaling is paired with partition, query, and output validation rather than used as a blind knob.

Case study 02

Fintech fraud team stops over-scaling a serial query

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A fintech fraud platform streamed card events through Stream Analytics. The team repeatedly doubled streaming units when alerts lagged, but SU utilization stayed uneven and Watermark Delay kept rising during merchant promotion bursts.

Business/Technical Objectives
  • Determine why extra SUs did not reduce fraud-alert delay.
  • Cut unnecessary capacity spend without weakening detection coverage.
  • Make the query use parallel capacity where possible.
  • Give incident commanders a metric-based scaling decision tree.
Solution Using Streaming unit

Engineers exported the job and transformation with Azure CLI and reviewed the Stream Analytics query alongside partition metrics. They found a nonpartitioned aggregation step and a reference-data join that forced much of the workload through a narrow path. Instead of increasing SUs again, the team split merchant-level enrichment from cardholder velocity scoring, aligned Event Hubs partition keys with the scalable step, and kept a smaller SU increase only where utilization proved capacity-bound. The runbook now requires comparing SU utilization, partition backlog, output events, and query parallelization before any emergency scaling request is approved.

Results & Business Impact
  • Fraud-alert 95th percentile delay fell from 96 seconds to 28 seconds.
  • Average allocated SUs dropped by 33% compared with the previous emergency setting.
  • Promotion-day escalations for late alerts fell from seven per month to one.
  • The decision tree prevented three proposed scale-ups where the root cause was output or query design.
Key Takeaway for Glossary Readers

More streaming units do not fix a query that cannot use them; capacity work must start with topology and metrics.

Case study 03

Transit agency plans dedicated capacity for arrival predictions

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A transit agency ran several Stream Analytics jobs for bus GPS, train telemetry, and platform occupancy. One noisy bus route job consumed attention during rush hour, making planners consider a dedicated Stream Analytics cluster without understanding SU demand per workload.

Business/Technical Objectives
  • Map each streaming job to normal and peak SU demand.
  • Isolate critical arrival-prediction workloads from experimental analytics.
  • Reduce rush-hour incident noise for operations staff.
  • Create a cost model for dedicated versus multitenant capacity.
Solution Using Streaming unit

The architecture team inventoried jobs with Azure CLI, exported each job’s SU allocation, state, input and output count, and tags, then correlated those settings with Azure Monitor metrics. They identified two production jobs that justified higher baseline SUs and three experimental jobs that could be stopped or reduced outside office hours. For the dedicated-cluster proposal, they modeled assigned SUs, peak overlap, failover assumptions, and owner tags. The team also added alerts at 80% sustained utilization and required post-change validation of watermark delay and output throughput after every SU adjustment.

Results & Business Impact
  • Rush-hour arrival prediction delay improved from 4.2 minutes to 58 seconds.
  • Three experimental jobs were rescheduled or downscaled, offsetting 21% of the production capacity increase.
  • The dedicated-cluster business case used measured SU demand instead of rough estimates.
  • Operations alerts fell by 46% after noisy nonproduction jobs stopped competing for attention.
Key Takeaway for Glossary Readers

Streaming-unit planning turns real-time analytics from reactive scaling into an owned capacity model with cost and reliability trade-offs.

Why use Azure CLI for this?

CLI is useful for streaming units because capacity decisions need repeatable measurement, not a hunch from one chart. An engineer can list job SU settings, pull utilization and watermark metrics, update capacity through an approved pipeline, and document the before-and-after evidence. It also helps compare dozens of jobs to find over-provisioned workloads, under-sized jobs with backlog, or clusters that hide shared capacity pressure. During incidents, CLI makes scaling actions clear, time-stamped, and reversible. That is much safer than ad hoc portal clicks when a job is billing by allocated compute, and it leaves a clean audit trail for cost review.

CLI use cases

  • Show a job and record its current streaming units, state, compatibility level, and resource ID before changing capacity.
  • Update streaming units during an approved release or incident response while capturing before-and-after JSON evidence.
  • List jobs in a resource group to find stopped, idle, or over-provisioned streaming workloads for FinOps review.
  • Compare SU settings across dev, test, and production to catch drift before a load test or migration.
  • List jobs on a dedicated cluster and verify whether cluster capacity planning matches assigned streaming units.

Before you run CLI

  • Confirm tenant, subscription, resource group, job name, region, and whether the job runs in multitenant capacity or a dedicated cluster.
  • Check whether the command is read-only or cost-impacting; updating SUs can change billing and production throughput immediately.
  • Collect baseline metrics for SU utilization, watermark delay, input backlog, output events, and runtime errors before scaling.
  • Verify that the query can parallelize and downstream outputs have capacity, otherwise extra SUs may not improve results.

What output tells you

  • Job output shows current capacity, state, compatibility level, identity, and timestamps needed to verify a scaling change.
  • List output helps identify jobs that are stopped, idle, missing tags, or configured with unexpected streaming units.
  • Cluster job listings show which workloads consume isolated Stream Analytics cluster capacity and may compete for planning attention.
  • After update, command output confirms the accepted setting, while metrics prove whether capacity actually reduced delay or backlog.

Mapped Azure CLI commands

Streaming unit capacity commands

scales
az stream-analytics job show --name <job-name> --resource-group <resource-group>
az stream-analytics jobdiscoverAnalytics
az stream-analytics job update --name <job-name> --resource-group <resource-group> --streaming-units <count>
az stream-analytics jobconfigureAnalytics
az stream-analytics job list --resource-group <resource-group> --output table
az stream-analytics jobdiscoverAnalytics
az monitor metrics list --resource <stream-analytics-job-resource-id> --metric "SU % Utilization" "Watermark Delay"
az monitor metricsdiscoverAnalytics
az stream-analytics cluster list-streaming-job --cluster-name <cluster-name> --resource-group <resource-group>
az stream-analytics clusterdiscoverAnalytics

Architecture context

Architecturally, streaming units are the runtime capacity boundary for a Stream Analytics job. I review them with the input partition count, query stages, output destinations, and failure budget. More capacity helps only when the query can parallelize and the downstream path can receive more data. For dedicated clusters, SUs also relate to isolated capacity planning across multiple jobs. The design should define normal SUs, burst plan, autoscale or manual change process, alert thresholds, and rollback. Without that, SUs become a panic knob during incidents instead of a managed part of the streaming architecture. I define that decision path before launch.

Security

Security impact is mostly indirect. Streaming units do not grant access or expose a network endpoint, but scaling can increase the speed at which sensitive data is read, transformed, and written downstream. Faster processing can amplify a bad query or misrouted output. Permissions to change SUs should be limited to operators who understand cost, data flow, and incident procedures. Scaling actions should be logged and tied to change records. Security teams should also watch that emergency scale changes do not bypass normal review of identity, private endpoints, diagnostic logging, or sensitive output destinations. Those emergency changes deserve review and audit.

Cost

Cost impact is direct because Stream Analytics jobs are billed based on allocated streaming units over time, and dedicated clusters have their own capacity model. Increasing SUs can be the right business decision during peak periods, but leaving them high after an incident creates ongoing waste. Idle or forgotten jobs with allocated SUs are classic FinOps findings. Cost reviews should combine SU allocation, utilization, run schedule, output volume, diagnostic retention, and business value. Sometimes the cheapest fix is query tuning, partition alignment, or scheduled stop-start behavior for nonproduction jobs rather than permanent scale-up. Set clear review dates for every increase.

Reliability

Reliability impact is direct because insufficient streaming units can prevent a job from keeping up with input volume. The symptoms include backlogged input events, rising watermark delay, high SU utilization, and delayed output. However, reliability is not solved by SUs alone. If an output database throttles, a query is serial, or one input partition is hot, extra units may have limited effect. Reliable operations define alert thresholds, test peak load, understand maximum useful SUs for the query, and document when to scale, tune, split the job, or fix the downstream bottleneck. Peak load tests should prove which path applies safely.

Performance

Performance impact is direct, but bounded by query and topology. More SUs provide more CPU and memory, which can improve throughput and reduce watermark delay when the job can parallelize. They do not fix every bottleneck. Nonpartitioned query steps, high-cardinality state, JavaScript functions, large joins, output throttling, and hot input partitions can limit gains. Operators should watch SU utilization, CPU, memory, backlogged input, watermark delay, and output events before and after scaling. The best performance work combines capacity adjustment with query simplification, partition-aware design, and downstream throughput validation. A controlled load test should confirm whether extra capacity changes user-visible delay.

Operations

Operators inspect streaming units during onboarding, load testing, incident response, cost review, and seasonal traffic planning. They compare current SU allocation with utilization, watermark delay, input backlog, output errors, query changes, and downstream capacity. Scaling should be done through a runbook or deployment pipeline when possible, with a clear baseline and rollback. Operators also check whether autoscale rules apply, whether a dedicated cluster has available capacity, and whether the query can use more units. Good operations treat SU changes as production changes, not casual slider adjustments. They should record why a scale change was made and when it will be reviewed.

Common mistakes

  • Scaling up before checking whether the query can use more parallel capacity.
  • Leaving emergency streaming-unit increases in place after the incident ends.
  • Treating watermark delay as proof of insufficient SUs without checking sparse inputs, late-event policy, or output throttling.
  • Ignoring cost ownership tags, so nobody reviews continuously running jobs with high capacity.
  • Assuming development and production need the same SU count despite different partitions, event volume, and output targets.