Analytics Streaming analytics field-manual-complete field-manual-complete field-manual-complete

Stream Analytics input

A Stream Analytics input tells the job where to read data from. It can be a live stream, such as Event Hubs, IoT Hub, Kafka, Blob Storage, or Data Lake Storage, or reference data used for lookups. The input has a name that the query uses, plus settings for source connection, serialization, authentication, timestamp behavior, and partitioning. If the input is wrong, the job may run but process no useful events or produce misleading results.

Aliases
Stream Analytics input, ASA input, stream input, reference input for Stream Analytics
Difficulty
fundamentals
CLI mappings
5
Last verified
2026-05-26T16:27:57Z

Microsoft Learn

A Stream Analytics input is a named connection between a Stream Analytics job and a data source. Inputs can represent live event streams or reference data, are used by name in the query, and define source type, serialization, authentication, and related connection settings.

Microsoft Learn: Understand inputs for Azure Stream Analytics2026-05-26T16:27:57Z

Technical context

In Azure architecture, a Stream Analytics input is the data-source boundary for a streaming job. It connects the job to event producers or reference stores and defines how records are read and interpreted. Inputs interact with Event Hubs consumer groups, IoT Hub routing, Blob or Data Lake paths, Kafka endpoints, serialization formats, managed identity or keys, timestamps, partition keys, and diagnostics. The input is a control-plane configuration object, but its health and correctness show up in data-plane metrics and job output.

Why it matters

Stream Analytics input matters because every result depends on what the job actually reads. A query can be perfect and still fail if the input points to the wrong event hub, uses the default consumer group, has the wrong serialization, cannot reach a private endpoint, or assigns event time incorrectly. Inputs also control replay and freshness expectations because retention, start mode, late events, and reference data updates all shape the records available to the job. Clear input design prevents silent data gaps, duplicate reads, bad joins, and frustrating incidents where teams blame the query while the source configuration is the real problem.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Stream Analytics Inputs blade, each input alias shows source type, serialization, authentication details, and settings the query references by name during deployment validation.

Signal 02

In CLI input show output, operators review datasource properties, serialization, consumer group or path settings, and identity-sensitive fields for drift before releases and incident reviews.

Signal 03

In metrics and diagnostic logs, input events, deserialization errors, late arrivals, and connection failures reveal whether the job is reading useful data during incidents quickly.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Connect an Event Hubs stream with a dedicated consumer group so a production job does not compete with other readers.
  • Add reference data from Blob Storage, Data Lake Storage, or SQL so live events can be enriched during processing.
  • Validate serialization and timestamp behavior before a schema migration changes what the Stream Analytics query sees.
  • Move an input to private connectivity while proving the job can still reach the source from the approved network path.
  • Separate production, replay, and test inputs so historical events can be reprocessed without disrupting the live stream.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Cold-chain logistics fixes silent sensor gaps

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A pharmaceutical logistics provider monitored vaccine shipment temperatures through IoT devices. Some shipments appeared healthy in dashboards even though source devices had stopped sending usable events for several hours.

Business/Technical Objectives
  • Identify missing or malformed temperature events before compliance reports closed.
  • Use a dedicated input path for production Stream Analytics processing.
  • Preserve replay ability for disputed shipment intervals.
  • Reduce false confidence in stale dashboard data.
Solution Using Stream Analytics input

The operations team redesigned the Stream Analytics input from IoT Hub with a dedicated consumer group, explicit JSON serialization, and event-time handling based on the device timestamp. They added input metrics and diagnostic alerts for late events, deserialization failures, and unexpectedly low event counts by shipment lane. CLI input show output became part of the validation checklist so operators could confirm the hub, consumer group, and serialization before each release. The query routed stale or malformed device records to a review output instead of treating missing values as normal temperatures. Retention settings were documented so disputed intervals could be replayed through a test job.

Results & Business Impact
  • Undetected sensor gaps over one hour fell from 14 per month to 2 per month.
  • Compliance report corrections dropped by 61 percent in the first quarter.
  • Input validation time before releases decreased from 90 minutes to 18 minutes.
  • Replay evidence resolved three shipment disputes without manual spreadsheet reconstruction.
Key Takeaway for Glossary Readers

A Stream Analytics input is a data-quality boundary, not just a source picker in the portal.

Case study 02

Agritech company enriches irrigation streams with reference data

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

An agritech company streamed soil-moisture readings from farms but needed crop type, irrigation zone, and water-rights data to make useful recommendations. The live stream alone did not contain enough context.

Business/Technical Objectives
  • Join sensor events with slowly changing farm reference data.
  • Reduce water recommendations sent to the wrong irrigation zone.
  • Keep reference-data updates visible without stopping the live stream.
  • Separate source problems from query and output problems.
Solution Using Stream Analytics input

Engineers configured two Stream Analytics inputs: an Event Hubs stream for live sensor readings and a Blob Storage reference input for farm-zone metadata. The query joined moisture events to crop, soil, and water-rights records before writing recommendations to Azure SQL Database. Reference blobs followed a documented naming pattern so updates were picked up predictably. CLI input list and input show checks verified that both aliases, serialization formats, and storage paths matched the deployment template. Diagnostic logs alerted when reference data failed to load or when live events arrived without a matching zone record.

Results & Business Impact
  • Incorrect irrigation-zone recommendations dropped by 43 percent after reference joins were added.
  • Reference-data update verification fell from two hours of portal checks to 15 minutes.
  • Unmatched sensor events were reduced from 8 percent to 1.5 percent.
  • Water advisory freshness stayed under five minutes during peak morning irrigation windows.
Key Takeaway for Glossary Readers

Stream Analytics inputs let a job combine live streams and reference context when real-time decisions need more than raw events.

Case study 03

Mining operator validates private Kafka input

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A mining operator streamed equipment vibration data from an on-premises Kafka cluster into Azure. After a network redesign, the Stream Analytics job ran but stopped receiving several high-priority equipment feeds.

Business/Technical Objectives
  • Restore private Kafka input connectivity without exposing the source publicly.
  • Confirm the job read the correct topics and serialization format.
  • Detect feed loss before maintenance alerts went stale.
  • Create a repeatable test for future network changes.
Solution Using Stream Analytics input

The platform team reviewed the Stream Analytics input definition, Kafka topic settings, credentials, and network route from the approved private path. CLI input test was added to the maintenance checklist, while input show output captured topic and serialization settings for the change record. Engineers discovered that two topics had been moved to a new broker address that was not reachable from the configured network path. DNS and firewall rules were corrected, and the job was restarted with the expected start mode. Metrics were updated to alert on low input events per equipment class rather than only total job health.

Results & Business Impact
  • Lost vibration feeds were restored in 38 minutes instead of the previous four-hour escalation path.
  • Private connectivity was preserved with no temporary public endpoint exception.
  • Stale maintenance alerts fell by 52 percent over the next month.
  • Network-change validation became a scripted 10-minute check before mine-site cutovers.
Key Takeaway for Glossary Readers

Testing and inspecting the Stream Analytics input is the fastest way to distinguish source reachability problems from query problems.

Why use Azure CLI for this?

I use Azure CLI for Stream Analytics inputs because input mistakes are common and often invisible until output looks wrong. The portal can show one input, but CLI lets me list all inputs, show the exact JSON, test connectivity, and compare source settings between environments. That evidence matters when two jobs use similar aliases but different consumer groups or serialization. CLI also helps during incidents: I can check whether the input exists, whether the job can reach the source, and whether a deployment changed the input definition. For production, repeatable input inspection beats screenshots and tribal memory every time for operators.

CLI use cases

  • List all inputs under a Stream Analytics job and confirm aliases match the transformation query.
  • Show an input definition to inspect source type, serialization, authentication, and consumer group or path settings.
  • Test whether an input datasource is reachable and usable before starting a job or approving a source migration.
  • Update an input through deployment automation after schema, network, or authentication changes are reviewed.
  • Export input definitions across environments to detect drift in aliases, serialization, or source configuration.

Before you run CLI

  • Confirm tenant, subscription, resource group, job name, input name, and source resource before testing or changing an input.
  • Know whether the input is a live stream or reference data because replay, freshness, and failure behavior differ significantly.
  • Use read-only list and show commands before update or delete operations, which can break live event processing immediately.
  • Prepare source permissions, network access, serialization format, consumer group, path pattern, and output format before running input test commands.

What output tells you

  • Input list output shows the aliases available to the query and helps catch missing or unexpectedly named sources.
  • Datasource properties identify the Event Hubs, IoT Hub, Storage, Data Lake, Kafka, or reference source the job reads.
  • Serialization fields show whether JSON, CSV, Avro, or another expected format is configured for incoming records.
  • Test output indicates whether the Stream Analytics service can reach and use the source with the provided settings.

Mapped Azure CLI commands

Stream Analytics input inventory and connectivity checks

tests
az stream-analytics input list --job-name <job-name> --resource-group <resource-group> --output table
az stream-analytics inputdiscoverAnalytics
az stream-analytics input show --job-name <job-name> --resource-group <resource-group> --input-name <input-name>
az stream-analytics inputdiscoverAnalytics
az stream-analytics input test --job-name <job-name> --resource-group <resource-group> --input-name <input-name>
az stream-analytics inputdiscoverAnalytics
az stream-analytics input update --job-name <job-name> --resource-group <resource-group> --input-name <input-name> --properties @input.json
az stream-analytics inputconfigureAnalytics
az stream-analytics job show --job-name <job-name> --resource-group <resource-group> --expand "inputs,transformation"
az stream-analytics jobdiscoverAnalytics

Architecture context

A Stream Analytics input is where the event contract enters the streaming application. Architects should document the source system, input alias, event schema, serialization format, timestamp choice, partition key, authentication model, network path, and retention assumptions. Stream inputs and reference inputs have different reliability and freshness patterns, so they should not be treated as interchangeable sources. The input design must align with the transformation query, output destinations, replay plan, and monitoring. For high-volume jobs, partitioning and consumer-group choices are especially important because they decide whether the job can read efficiently without colliding with other consumers. Good diagrams show each input and its operational owner.

Security

Security impact is direct because an input grants the job a path to read data. The source may contain customer telemetry, operational events, payment signals, or regulated reference records. Authentication should prefer managed identity where supported and least-privilege access to the specific source. Shared keys, connection strings, and Kafka credentials need secure storage and rotation. Network exposure also matters: private endpoints, firewall rules, trusted services settings, and DNS must match the data classification. Operators should verify who can modify input definitions because changing an input can redirect a job to a different data source or bypass intended controls without changing the query text.

Cost

The input itself is not usually the main bill, but it drives cost through event volume, source throughput, replay, and downstream processing. A noisy input can force higher streaming units, increase output writes, and raise diagnostic log volume. Reading from Event Hubs, Storage, Data Lake Storage, Kafka, SQL reference data, or IoT Hub may also affect those services' costs. Shared consumer groups and poorly filtered inputs can make several jobs process the same high-volume stream unnecessarily. FinOps reviews should connect event rate, input partition count, retention, replay frequency, and output volume so teams understand what the input causes the whole pipeline to spend.

Reliability

Reliability starts at the input. A Stream Analytics job can stop producing useful output when the source throttles, a consumer group is shared, a path pattern misses new blobs, reference data is stale, or event timestamps fall outside the processing window. Reliable designs use dedicated consumer groups, tested serialization, documented timestamp behavior, and source monitoring. Operators should know how much retention exists for replay and what happens if the job restarts from now, a custom time, or the last output time. Input test commands, diagnostic logs, input event metrics, and backlog signals should be checked before changing the query or scaling the job.

Performance

Performance is directly tied to input design. Partitioned sources such as Event Hubs, IoT Hub, Blob Storage, and Data Lake Storage can support parallel processing when configured correctly. The wrong consumer group, low partition count, bad timestamp selection, heavy reference joins, or inefficient serialization can create backlog even when streaming units look adequate. Operators should monitor input events, late input events, deserialization errors, watermark delay, and source-side throttling. For high-throughput workloads, align partition keys with query strategy, test input configuration with representative traffic, and make sure output capacity is not confused with input read performance during scale reviews and tuning.

Operations

Operators inspect inputs during deployment, source migration, schema change, incident response, and replay planning. They list job inputs, show exact definitions, test source reachability, check serialization, validate aliases used in the query, and compare consumer groups or storage paths across environments. Operational runbooks should include the source resource ID, input alias, authentication method, network path, timestamp rule, partition strategy, and expected event rate. When output is stale, operators should first ask whether input events are arriving, whether the job can read them, and whether the query references the correct alias. Those checks save hours of unnecessary query debugging during incidents.

Common mistakes

  • Using the default Event Hubs consumer group for production and then colliding with test jobs or diagnostic readers.
  • Changing a source schema without updating serialization assumptions or query field references.
  • Pointing the input alias to the right service type but the wrong namespace, hub, container, or path pattern.
  • Assuming a job failure is caused by SQL query logic when input connectivity or deserialization is failing first.
  • Forgetting that reference data freshness and stream input replay use different operational models.