Analytics Synapse Analytics field-manual-complete field-manual-complete field-manual-complete

Synapse notebook

A Synapse notebook is an interactive workbench for data engineers, analysts, and data scientists. It lets you mix code, explanation, charts, and parameters in one workspace artifact, then run that work on a Synapse Spark pool. The useful part is not just experimentation; notebooks can become repeatable processing steps. A team might profile raw files, clean bad records, train a small model, or build a curated table, then call the same notebook from a Synapse pipeline with production parameters.

Aliases
Azure Synapse notebook, Synapse Spark notebook, Synapse Analytics notebook, notebook artifact in Synapse
Difficulty
fundamentals
CLI mappings
8
Last verified
2026-05-27T07:24:06Z

Microsoft Learn

A Synapse notebook is a web authoring surface in Azure Synapse Analytics for live code, visualizations, and explanatory text. It is commonly used with Apache Spark for data preparation, exploration, machine learning, and repeatable transformations that can run interactively or from pipelines.

Microsoft Learn: Create, develop, and maintain Synapse notebooks2026-05-27T07:24:06Z

Technical context

Inside Synapse, a notebook is a workspace artifact that usually executes against an Apache Spark pool. It sits above the storage and compute boundary: code reads from ADLS Gen2, linked services, SQL pools, or other sources, while Spark sessions provide execution capacity. Notebooks can be authored in Synapse Studio, stored with Git integration, parameterized, and invoked by pipeline notebook activities. Identity matters because interactive runs often use the user context, while pipeline runs may rely on workspace managed identity or linked service credentials.

Why it matters

Synapse notebooks matter because they are where exploratory analytics often becomes production data engineering. A notebook that starts as a quick investigation can later feed a finance dashboard, customer model, or data quality gate. If the notebook is poorly parameterized, tied to one user's permissions, or dependent on an oversized Spark pool, the business inherits fragile logic and surprise cost. When managed well, notebooks shorten the path from data discovery to governed automation. They also preserve the reasoning behind transformations, which helps reviewers understand why code exists instead of only seeing a black-box pipeline step. It also makes migration planning easier because exported notebooks reveal hidden dependencies before a workspace is rebuilt. That context prevents accidental rewrites when ownership changes between research and operations.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Synapse Studio, the Develop hub shows notebooks as artifacts with folders, language cells, parameters, Spark pool selection, published state, Git markers, and owners before release approval.

Signal 02

In a Synapse pipeline, a Notebook activity shows the notebook name, Spark pool, base parameters, timeout, retry policy, and run output for troubleshooting during incident triage.

Signal 03

In CLI or deployment logs, az synapse notebook commands show imported files, workspace names, folder paths, Spark pool options, and artifact changes during promotion for release comparison.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Turn exploratory Spark data cleansing into a parameterized production step without rewriting it as a separate application.
  • Run repeatable feature engineering or model-preparation logic inside a Synapse pipeline with date, path, and environment parameters.
  • Debug a failed data transformation by rerunning the exact notebook version against a smaller input slice.
  • Document data-quality assumptions beside the code that enforces them, so reviewers understand both logic and intent.
  • Move notebooks between dev, test, and production workspaces through CLI export/import instead of manual copy-and-paste.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Port congestion model becomes a controlled production notebook

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A marine logistics planner used ad hoc Spark notebooks to estimate berth congestion from vessel pings, customs holds, and weather feeds. Executives began using screenshots from those experiments for daily routing decisions.

Business/Technical Objectives
  • Convert the analysis into a scheduled notebook run by 05:30 UTC each morning.
  • Keep input parameters visible for port, date window, and confidence threshold.
  • Reduce manual analyst preparation time from two hours to under twenty minutes.
  • Preserve a reviewed notebook version for audit and incident reconstruction.
Solution Using Synapse notebook

The data engineering team refactored the Synapse notebook into clear parameter cells for port code, forecast date, and output folder. They moved hard-coded storage paths into pipeline parameters, attached the notebook to a medium Spark pool, and stored the artifact in Git. A Synapse pipeline called the notebook after raw AIS and weather files arrived in ADLS Gen2. Azure CLI exported the approved notebook before release, imported it into production, and listed the Spark pool to confirm runtime capacity. Operators added checks for empty source partitions and wrote forecast outputs to a dated curated path.

Results & Business Impact
  • Morning congestion reports were ready by 05:18 UTC on 92% of business days, up from 47% before automation.
  • Analyst preparation time dropped from about 126 minutes to 14 minutes per day.
  • Manual path errors fell to zero during the first seven-week shipping peak.
  • Audit review time for the forecast logic fell from three days to four hours because the exact notebook export was attached.
Key Takeaway for Glossary Readers

A Synapse notebook becomes valuable operational software when parameters, versioning, identity, and run evidence are treated as part of the design.

Case study 02

Climate research lab stops losing reproducibility in exploratory Spark work

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A nonprofit climate lab processed satellite tiles in Synapse notebooks, but grant reviewers could not reproduce several published regional estimates because each scientist kept slightly different cell edits.

Business/Technical Objectives
  • Standardize notebook versions for repeatable watershed calculations.
  • Record exact input folders, package versions, and Spark pool settings for every published run.
  • Cut rerun failures caused by missing permissions or stale code by at least half.
  • Let researchers keep exploratory notebooks without mixing them with approved production artifacts.
Solution Using Synapse notebook

The lab split notebooks into experimental and published folders inside Synapse Studio. Approved notebooks used parameter cells for region, year, and cloud-cover threshold, then wrote results to immutable dated folders in ADLS Gen2. Git integration captured peer review, while Azure CLI exported release candidates and compared production workspace artifacts against approved files. Operators checked Synapse roles and storage ACLs so pipeline runs used the workspace identity instead of a scientist's personal access. Spark pool configuration was documented with each release note, and failures wrote a small diagnostic summary before exiting.

Results & Business Impact
  • Reproducibility checks passed for 18 of 19 published regions, compared with 11 of 19 before the cleanup.
  • Notebook permission failures fell by 68% after moving production runs to the workspace identity.
  • Grant-review evidence packs took one afternoon instead of almost two weeks to assemble.
  • Compute spend dropped 22% because exploratory notebooks no longer ran on the production-sized Spark pool by default.
Key Takeaway for Glossary Readers

Synapse notebooks support scientific agility, but reproducibility appears only when artifact promotion and execution context are deliberately controlled.

Case study 03

Game economy tuning moves from fragile notebooks to pipeline-ready scoring

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A mobile game studio balanced in-game pricing with notebooks that joined telemetry, purchase events, and campaign calendars. Weekend promotions sometimes launched with stale scoring because the analyst notebook had not been rerun.

Business/Technical Objectives
  • Run economy scoring automatically before each promotion window.
  • Expose promotion ID, player cohort, and experiment arm as notebook parameters.
  • Detect missing telemetry partitions before publishing pricing recommendations.
  • Keep manual experiments separate from the scoring notebook used by live operations.
Solution Using Synapse notebook

Engineers converted the scoring workbook into a Synapse notebook with strict parameter validation and a small preflight cell that counted expected telemetry partitions. The notebook was invoked by a Synapse pipeline after ingestion completed and used Spark to aggregate cohort behavior into a curated Delta table. Azure CLI exported the notebook before each release, imported the reviewed version into production, and listed notebooks to ensure the live pipeline referenced the controlled artifact. Failed preflight checks stopped the pipeline before pricing recommendations reached the promotion dashboard.

Results & Business Impact
  • Promotion scoring ran automatically for 31 consecutive campaigns, eliminating six recurring weekend manual handoffs.
  • Stale recommendation incidents dropped from four per quarter to none in the next quarter.
  • Telemetry missing-partition detection caught three ingestion gaps before business users saw the dashboard.
  • The live-ops team reduced emergency analyst calls by 74% during holiday events.
Key Takeaway for Glossary Readers

A well-parameterized Synapse notebook can preserve data-science flexibility while giving operations the guardrails needed for live business decisions.

Why use Azure CLI for this?

After ten years running Azure environments, I use Azure CLI for Synapse notebooks because portal clicks do not give enough evidence for repeatable releases. A notebook is code, configuration, identity, and workspace state combined. CLI commands let me export notebook definitions, compare names across environments, confirm the Spark pool exists, and script imports during deployment. They also reduce the common mistake of editing production notebooks by hand. Even when debugging starts in Synapse Studio, CLI output gives a clean audit trail for change tickets, incident reviews, environment drift checks, and release approvals. That repeatability is what keeps emergency fixes from becoming untracked production edits.

CLI use cases

  • Export notebooks before a release and compare the files against the Git branch approved for deployment.
  • Import or update a notebook from a pipeline artifact so dev, test, and production workspaces stay consistent.
  • List notebooks by workspace to find orphaned experiments that are still being called by production pipelines.
  • Check the Spark pool attached to notebook workloads before approving capacity, package, or timeout changes.
  • Delete retired notebook artifacts only after confirming no pipeline activity or runbook references them.

Before you run CLI

  • Confirm tenant and subscription with az account show, because notebook names often repeat across dev, test, and production workspaces.
  • Use the exact workspace name and folder path, and treat import, set, and delete commands as production code changes.
  • Check that the caller has Synapse artifact permissions and that the target Spark pool exists in the same workspace.
  • Use @file syntax for notebook content so JSON and notebook metadata are not corrupted by shell quoting.
  • Export the current notebook before overwriting it, and store the export in the change record for rollback.

What output tells you

  • Notebook list output shows which artifacts exist, their names, and folder placement, which is the first drift signal between workspaces.
  • Show or export output exposes the notebook definition, parameters, language metadata, and references that should match source control.
  • Spark pool output tells you whether the runtime capacity required by the notebook is available and in the expected resource group.
  • Mutation output confirms the service accepted the artifact, but it does not prove the notebook logic ran successfully.
  • Delete output should be paired with pipeline search evidence, because the CLI cannot infer every downstream business dependency.

Mapped Azure CLI commands

Synapse notebook artifact operations

direct
az synapse notebook list --workspace-name <workspace-name>
az synapse notebookdiscoverAnalytics
az synapse notebook show --workspace-name <workspace-name> --name <notebook-name>
az synapse notebookdiscoverAnalytics
az synapse notebook export --workspace-name <workspace-name> --name <notebook-name> --output-folder <folder>
az synapse notebookoperateAnalytics
az synapse notebook import --workspace-name <workspace-name> --name <notebook-name> --file @<notebook.ipynb> --folder-path <folder-path>
az synapse notebookprovisionAnalytics
az synapse notebook set --workspace-name <workspace-name> --name <notebook-name> --file @<notebook.ipynb>
az synapse notebookconfigureAnalytics
az synapse notebook delete --workspace-name <workspace-name> --name <notebook-name>
az synapse notebookremoveAnalytics

Spark pool checks for notebook runtime

supporting
az synapse spark pool list --workspace-name <workspace-name> --resource-group <resource-group>
az synapse spark pooldiscoverAnalytics
az synapse spark pool show --workspace-name <workspace-name> --resource-group <resource-group> --name <spark-pool-name>
az synapse spark pooldiscoverAnalytics

Architecture context

Architecturally, a Synapse notebook belongs in the development and transformation layer of a Synapse workspace. It is not the storage system, the Spark pool, or the pipeline by itself; it is the executable artifact that binds those pieces together. A mature design treats notebooks as versioned assets with parameters, controlled dependencies, clear input and output paths, and predictable Spark pool selection. For production, I expect notebooks to be called by pipelines, protected by Git and workspace permissions, and monitored through Spark application logs, pipeline run history, and storage-level results. The notebook should be portable between dev, test, and prod without hard-coded account names.

Security

Notebook security is about who can read code, who can run code, and what that code can reach. Interactive notebooks may expose storage paths, secret lookup logic, package sources, or sample data. Avoid embedding keys or passwords; use managed identity, Key Vault-backed secret patterns, and least-privilege data access. Remember that a notebook run can move data, write files, or call external endpoints, so workspace roles and storage permissions both matter. For regulated workloads, review outputs, shared notebooks, Git repository access, and Spark package dependencies because leakage often happens through results or logs, not only through source data. Exported notebooks should be handled like source code. Review access after every new linked service is added.

Cost

The notebook itself is not the major cost driver, but every run consumes Spark capacity, storage I/O, logging, and sometimes downstream SQL or data movement cost. Small exploratory notebooks can become expensive when scheduled too often or attached to oversized pools. Long-running sessions, wide shuffles, repeated reads of large raw files, and unnecessary package installation add avoidable spend. FinOps review should connect each production notebook to a business outcome, Spark pool configuration, run frequency, and data volume. Cost improves when teams parameterize date ranges, reuse curated data, stop idle sessions, and separate development pools from production pools. Owner tags make chargeback reviews clearer.

Reliability

A notebook can be reliable only if its runtime assumptions are explicit. Spark pool size, package versions, input paths, schema expectations, retry behavior, and output commit logic should be documented or parameterized. Production notebooks should fail clearly when source files are missing, permissions change, or schemas drift. Avoid relying on an open interactive session as proof that a scheduled run will work, because pipeline execution can use a different identity and pool configuration. Good reliability practice includes idempotent writes, staging paths where appropriate, notebook activity timeouts, and a rollback path for downstream tables or files after library or runtime upgrades.

Performance

Notebook performance depends on Spark pool startup, executor sizing, partitioning, file layout, caching choices, package initialization, and the quality of the code. A notebook that samples one day of data may fail badly when a pipeline passes a full month. Watch for small-file problems, skewed joins, unbounded collects to the driver, repeated full scans, and visualizations that pull too much data into the browser. Performance tuning is not only faster code; it includes choosing the right pool, writing partitioned outputs, measuring stages, and validating that parameterized production runs behave like tested development runs before increasing cluster size. Profile shuffle and driver memory before raising pool size. Baseline runs should use production-sized inputs.

Operations

Operators inspect Synapse notebooks by checking the artifact definition, the selected Spark pool, recent run history, dependency packages, parameters, and linked pipeline activities. In real environments, notebooks should be exported or stored through Git so changes are reviewable. During incidents, compare the notebook version that ran, the pipeline parameters, Spark application logs, and output files. Operators also watch for notebooks that silently grew from experiments into business-critical jobs. Those need ownership, scheduling rules, monitoring, data quality checks, and cleanup of old outputs. A notebook with no owner is operational debt during incidents and audits. Record expected row counts and owners for each production run. Reviewers should verify owner, schedule, and rollback before production use.

Common mistakes

  • Treating a successful interactive notebook run as proof that the same code will work from a pipeline identity.
  • Hard-coding storage account names, date ranges, pool names, or database names instead of passing parameters per environment.
  • Overwriting a production notebook with a local export that contains stale metadata or a different default Spark configuration.
  • Keeping secrets, tokens, or sample customer data inside cells and then publishing the notebook to a wider workspace audience.
  • Deleting notebooks because they look unused without checking pipeline activities, triggers, run history, and Git references.