Databases Data platform field-manual-complete

PostgreSQL Query Store

PostgreSQL Query Store is a built-in performance history feature for Azure Database for PostgreSQL flexible server. Instead of guessing why the database became slow, teams can look at recorded query behavior over time: which queries ran often, which took longest, which waited on locks or I/O, and when the pattern changed. It is not a tuning magic button. It is evidence. Developers, DBAs, and operators use it to separate bad SQL, missing indexes, workload spikes, and application changes from general database health complaints.

Aliases
PostgreSQL Query Store, Azure PostgreSQL Query Store, pg_qs, Query Store for PostgreSQL flexible server
Difficulty
intermediate
CLI mappings
5
Last verified
2026-05-20T00:00:00Z

Microsoft Learn

Query Store in Azure Database for PostgreSQL flexible server tracks query performance over time. It captures query text, runtime statistics, and wait information in the azure_sys database so operators can find long-running, resource-intensive, or changing queries and analyze workload patterns by time window.

Microsoft Learn: Query Store in Azure Database for PostgreSQL flexible server2026-05-20T00:00:00Z

Technical context

In Azure architecture, Query Store sits in the observability and database operations layer. It is enabled through server parameters such as pg_qs.query_capture_mode and stores captured data in the azure_sys database. Query Performance Insight, troubleshooting guides, and diagnostic workflows depend on that captured runtime and wait data. It is a server-level capability, not something enabled per database. Because capture has overhead, it should be planned with compute tier, retention, query capture mode, Log Analytics integration, privacy expectations, and incident investigation workflows.

Why it matters

Query Store matters because performance incidents need facts, not opinions. Without historical query data, a team may scale compute, blame the network, or chase recent deployments without knowing which statements actually consumed time, I/O, or lock waits. Query Store lets engineers compare windows, detect regressions, find top resource consumers, and confirm whether tuning changes improved behavior. It also improves communication between developers and operators because everyone can discuss the same query fingerprints and time ranges. Used carefully, it prevents blind scaling, shortens incidents, and turns PostgreSQL performance work into a repeatable evidence-based process after every major release during root-cause reviews.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Server parameters, pg_qs.query_capture_mode and pgms_wait_sampling.query_capture_mode show whether query runtime and wait capture are disabled, top-level, or all-query focused.

Signal 02

In Query Performance Insight, charts surface top queries, time ranges, durations, executions, and waits after Query Store has captured enough representative workload activity during investigations.

Signal 03

Inside the azure_sys database, query_store.qs_view and query_store.pgms_wait_sampling_view expose recorded runtime and wait statistics for deeper PostgreSQL analysis.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Find the exact query that became slower after a release instead of scaling compute blindly.
  • Compare query duration across time windows to prove whether an index or vacuum change helped.
  • Identify lock, I/O, or memory waits that explain application latency during peak traffic.
  • Supply developers with query fingerprints and evidence from production without relying on anecdotes.
  • Build a performance review process that ties database tuning decisions to measured workload behavior.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Gaming event latency regression

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A multiplayer gaming studio saw lobby creation slow during a seasonal event. Application traces showed API latency, but teams disagreed whether PostgreSQL, cache misses, or matchmaking code was responsible.

Business/Technical Objectives
  • Identify which PostgreSQL statements changed behavior during the event.
  • Avoid scaling the database before proving the bottleneck.
  • Give developers query evidence tied to event time windows.
  • Verify that remediation improved the same workload pattern.
Solution Using PostgreSQL Query Store

The operations team enabled PostgreSQL Query Store in top-level capture mode before the next event rehearsal and turned on wait sampling for deeper evidence. During the rehearsal, they compared Query Store windows before and after the lobby release. The top offender was a matchmaking eligibility query that ran far more often and waited on buffer I/O. Developers discovered that a new filter prevented an existing partial index from being used. The team added a targeted index in staging, replayed the rehearsal traffic, and reviewed the same Query Store query fingerprint afterward. Azure CLI recorded the Query Store parameter values and server SKU so the change record showed capture scope and overhead considerations.

Results & Business Impact
  • Median lobby creation time dropped from 1.8 seconds to 420 milliseconds.
  • The team avoided an unplanned four-vCore scale-up during the event window.
  • Query evidence shortened the developer handoff from half a day to one hour.
  • The same Query Store fingerprint confirmed the index change reduced I/O waits.
Key Takeaway for Glossary Readers

Query Store is valuable because it turns a vague performance argument into a specific query, time window, and remediation target.

Case study 02

Freight dispatch reporting contention

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A freight logistics platform used PostgreSQL flexible server for dispatch, driver assignments, and customer reporting. Dispatchers complained that afternoon planning screens froze whenever finance reporting jobs ran.

Business/Technical Objectives
  • Separate reporting workload impact from normal transaction growth.
  • Find the queries causing lock or I/O waits during dispatch hours.
  • Create evidence for rescheduling reports or adding a replica.
  • Reduce dispatcher screen latency without overprovisioning compute.
Solution Using PostgreSQL Query Store

The DBA enabled PostgreSQL Query Store and wait sampling, then captured data across two business days. Query Performance Insight showed long-running aggregation queries overlapping with dispatch updates. Query Store wait views showed buffer I/O waits and lock waits on tables used by route assignment screens. Instead of scaling immediately, the team moved finance extracts to an off-peak schedule, added one covering index, and adjusted the report to read smaller date ranges. Operators documented the before-and-after Query Store windows and paired them with Azure Monitor CPU and IOPS charts. A later architecture review used the same evidence to justify a read replica only for month-end analytics.

Results & Business Impact
  • Afternoon dispatch p95 screen latency fell from 6.4 seconds to 1.7 seconds.
  • Daily report runtime dropped by 38 percent after index and query changes.
  • The team deferred a compute scale-up for two quarters.
  • Month-end replica planning used measured Query Store workload data instead of estimates.
Key Takeaway for Glossary Readers

Query Store helps operations teams decide whether the right fix is SQL tuning, workload scheduling, replicas, or more compute.

Case study 03

Insurance quote engine release review

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

An insurance software vendor updated its quote engine to support new commercial policy rules. The release touched PostgreSQL queries that joined rating tables, address history, and underwriting exceptions.

Business/Technical Objectives
  • Detect query regressions before rolling the release to all tenants.
  • Compare staging and canary production query behavior with the same metrics.
  • Give developers concrete evidence for query rewrites.
  • Keep performance review consistent across monthly product releases.
Solution Using PostgreSQL Query Store

The platform team made PostgreSQL Query Store part of the canary checklist. Before release, operators used CLI to verify pg_qs.query_capture_mode and wait sampling settings on staging and the canary server. During canary, they collected Query Store windows for quote creation, renewal edits, and policy document generation. One query showed a large increase in average duration because a new optional clause changed join selectivity. Developers rewrote the query and added statistics refresh steps to the migration pipeline. After redeploying the canary, the team compared the same Query Store query IDs and attached the improvement evidence to the rollout approval.

Results & Business Impact
  • The team caught a 4.5x quote-query slowdown before full tenant rollout.
  • Canary approval time dropped from two days of debate to one evidence review meeting.
  • Support tickets after the release were 35 percent lower than the prior rules update.
  • Query Store checks became a standard release-control requirement for database-impacting changes.
Key Takeaway for Glossary Readers

Query Store makes performance regression review repeatable because release teams can compare the same workload evidence before and after deployment.

Why use Azure CLI for this?

As an Azure engineer with ten years of database incident work, I use CLI around Query Store because server parameters and diagnostic evidence need repeatability. The portal is useful for visual analysis, but CLI lets me show whether capture is enabled, set pg_qs.query_capture_mode consistently, export server metadata, and compare staging against production. I also use CLI output to prove whether a server is on burstable compute before recommending capture. Query Store is evidence-driven by nature, so its configuration should be managed with evidence too, not with untracked portal clicks during pressure-filled incidents and noisy release bridges.

CLI use cases

  • Show pg_qs.query_capture_mode to confirm whether Query Store is disabled, top, or all.
  • Set Query Store capture mode after a change ticket approves the expected overhead and scope.
  • Show pgms_wait_sampling.query_capture_mode before relying on wait charts in Query Performance Insight.
  • List server parameters and export Query Store-related values for environment drift comparison.
  • Show server SKU and tier before enabling Query Store on a small or burstable workload.

Before you run CLI

  • Confirm tenant, subscription, resource group, server name, database owner, compute tier, and performance investigation goal.
  • Check whether the server is burstable or under heavy load before enabling capture that may add overhead.
  • Get permission to change server parameters, because Query Store settings apply globally to all databases on the server.
  • Choose JSON output when collecting parameter values for evidence, and table output for quick operator review.
  • Coordinate with developers and privacy owners before collecting query text from sensitive workloads.

What output tells you

  • Parameter output shows the current Query Store capture mode, allowed values, source, and whether a restart is required.
  • Wait sampling parameters tell you whether wait-based troubleshooting views and portal charts can contain useful data.
  • Server SKU output helps judge whether Query Store overhead is appropriate for the workload tier.
  • State values indicate whether a parameter change is still applying or the server is ready for observation.
  • Exported parameter lists reveal drift between production, staging, and recovery environments.

Mapped Azure CLI commands

PostgreSQL Query Store CLI Commands

direct
az postgres flexible-server parameter show --resource-group <resource-group> --server-name <server-name> --name pg_qs.query_capture_mode --output json
az postgres flexible-server parameterdiscoverDatabases
az postgres flexible-server parameter set --resource-group <resource-group> --server-name <server-name> --name pg_qs.query_capture_mode --value top
az postgres flexible-server parameterconfigureDatabases
az postgres flexible-server parameter show --resource-group <resource-group> --server-name <server-name> --name pgms_wait_sampling.query_capture_mode --output json
az postgres flexible-server parameterdiscoverDatabases
az postgres flexible-server parameter set --resource-group <resource-group> --server-name <server-name> --name pgms_wait_sampling.query_capture_mode --value all
az postgres flexible-server parameterconfigureDatabases
az postgres flexible-server show --name <server-name> --resource-group <resource-group> --query "{name:name,sku:sku,state:state,version:version}" --output json
az postgres flexible-serverdiscoverDatabases

Architecture context

As an Azure architect, I place Query Store inside the performance observability design for every serious PostgreSQL workload, but I do not enable it casually on every server. The workload tier, capture mode, storage behavior, privacy constraints, and diagnostic pipeline must match the support model. For production, I want Query Store aligned with Log Analytics, Query Performance Insight, alerting, and change-management records. For burstable or very small servers, I review overhead and choose a safer capture posture. The architecture question is not simply whether Query Store is on; it is how the team will use the captured evidence during release regressions, lock contention, capacity planning, and tuning reviews.

Security

Security impact is indirect but important because Query Store can capture query text and workload patterns. Those records may reveal table names, business processes, tenant identifiers, or application behavior even when they do not expose raw secrets. Access to azure_sys views, Query Performance Insight, diagnostic exports, and Log Analytics workspaces should be limited to people who need performance evidence. Operators should avoid pasting full query text into public tickets and should follow data-handling rules for regulated workloads. Security reviewers should also verify that enabling Query Store does not bypass normal database role separation or observability access controls already approved.

Cost

Cost impact is indirect. Query Store has no separate feature charge, but it can increase database workload overhead, monitoring data, and operator analysis time. The bigger cost effect is decision quality. Good Query Store evidence can prevent unnecessary vCore scaling, reduce incident duration, and focus index work on statements that matter. Poorly managed capture can add noise, overwhelm small servers, or send extra diagnostic data to Log Analytics. FinOps teams should use Query Store to compare query cost before and after tuning, identify reports that justify replicas, and decide whether performance problems require schema work, workload scheduling, or more compute.

Reliability

Reliability impact is indirect. Query Store does not make a server more available, but it helps teams resolve performance degradations before they become outages. Historical runtime and wait data can show whether a release introduced lock waits, whether storage I/O became the bottleneck, or whether a reporting query began starving transactional work. The caution is that capture itself has overhead and should not be enabled thoughtlessly on weak compute tiers or during extreme load. Reliable use means choosing capture mode deliberately, watching server health after enabling it, and ensuring support engineers know how to query and interpret the data during incidents.

Performance

Performance impact is both diagnostic and operational. Query Store helps identify slow, frequent, and wait-heavy queries, but capture has overhead because runtime and wait statistics must be recorded. Microsoft recommends avoiding Query Store on burstable tier servers when the impact is not acceptable. The practical value is that operators can distinguish slow SQL from infrastructure problems. They can compare average duration, execution count, wait events, and time windows before changing indexes or compute. For best results, capture only what is useful, allow enough time for data to persist, and use Query Store evidence alongside application traces during focused tuning reviews.

Operations

Operators use Query Store by enabling the relevant server parameters, waiting for data to accumulate, reviewing Query Performance Insight, and querying azure_sys views when deeper evidence is needed. They compare time windows, identify top queries, correlate waits with deployments, and hand actionable examples to developers. Operational runbooks should document capture mode, wait sampling, retention settings, workspace exports, privacy handling, and escalation paths. During incidents, Query Store evidence should be paired with metrics such as CPU, IOPS, memory, active connections, locks, and application latency. After remediation, operators should verify the same query patterns improved instead of declaring victory after a scale-up.

Common mistakes

  • Expecting Query Store charts to fill immediately before enough workload data has been captured.
  • Enabling all-query capture on a small or burstable server without reviewing overhead.
  • Using Query Store findings without correlating them to deployment times, metrics, locks, and application traces.
  • Ignoring privacy controls around query text, role IDs, and workload patterns visible in performance data.
  • Assuming Query Store tunes queries automatically instead of treating it as evidence for human tuning decisions.