NetApp ONTAP
Ansible
Playbook Reference
Field Guide

NetApp ONTAP Ansible Playbook Reference: Enterprise Automation Examples

45 min read

This is a working reference for automating NetApp ONTAP with the netapp.ontap Ansible collection — fourteen worked examples covering the jobs a storage engineer actually does, from creating an SVM to replicating a volume across clusters. It is the companion to our three Ansible field guides: installing the control node, the seven core playbooks, and securing credentials with Ansible Vault. Where those teach concepts, this one is the lookup table: find the job, copy the playbook.

How to read each example

Every entry shows the same playbook two ways. The amber block is the lab original — exactly as it runs in a NetApp training workshop, quirks and all. The blue block is the WUC-cleaned version — the same result rewritten for production: credentials via module_defaults pulling from a vault-encrypted global.vars, REST-native parameters, fully-qualified module names, and least-privilege defaults. The blue block has a copy button; the amber one does not. A short terminal run follows, then a note on exactly what changed between the two versions and why. Every playbook here validates as parseable YAML.

Modules current as of the netapp.ontap collection 23.x against ONTAP 9.12+ over REST. Variable names (PRI_CLU, PRI_SVM, …) come from the shared global.vars file documented in the Vault walkthrough.

How Ansible automates ONTAP: the REST architecture

Modern ONTAP automation should use REST-based workflows. This guide focuses exclusively on supported ONTAP REST API automation through the NetApp Ansible Collection (netapp.ontap) — every example here drives the ONTAP REST API, and every cleaned playbook sets use_rest: always so there is no ambiguity about which interface runs.

The mechanics are worth understanding before the examples, because they explain why these playbooks are safe to run against production. Ansible is agentless: nothing is installed on the ONTAP cluster. The netapp.ontap modules execute on an Ansible control node and make authenticated HTTPS calls to the cluster management LIF’s REST endpoint (/api/...). Each module is a thin, idempotent wrapper around a set of REST calls: it first reads current state with a GET, compares it to the state your playbook declares, and issues a POST or PATCH only if reality differs. That read-compare-act cycle is the whole engine.

Figure 01 · Ansible control node to ONTAP cluster over REST

Ansible control nodenetapp.ontap collectionplaybooks (YAML, in Git)global.vars (vault-encrypted)ansible-playbook … –ask-vault-passHTTPS RESTGET / POST / PATCH /apiONTAP cluster — nothing installedcluster management LIF :443node 1SVMsvolumesLUNs / bucketsnode 2SVMsvolumesLUNs / buckets
One control node, authenticated REST over HTTPS, nothing on the cluster. The control node is where you install Ansible, store playbooks in Git, and keep the vault-encrypted credentials.

Idempotency and desired-state: why these playbooks are safe to re-run

Two properties make Ansible suitable for production storage, and both follow directly from the read-compare-act cycle above. Idempotency means running a playbook twice produces the same result as running it once: the first run creates what is missing and reports changed; the second run finds reality already matching the declaration and reports changed=0, touching nothing. Desired-state management is the consequence — your playbook is not a script of imperative commands (“create this, then that”) but a description of how the cluster should look, and Ansible’s job is to make reality match that description regardless of the starting point.

The operational payoff is real. A changed=0 run is a free compliance audit: schedule any of these playbooks nightly, and a run that suddenly reports changed=1 is drift detected and already corrected — someone resized a volume by hand, and the next scheduled run quietly put it back and logged that it did. This is why every cleaned playbook in this reference uses state: present and task names that begin with “Ensure”: you are declaring desired state, not issuing commands, and the cluster gains a standing enforcement mechanism no hand-run CLI procedure can match.

Module quick-reference table

The netapp.ontap collection ships well over a hundred modules. These are the ones that carry most production automation — the working set behind every example in this guide and the day-2 operations around them.

Module Task Typical production use case
na_ontap_svm Create / manage a storage VM Onboard a tenant, department, or customer; declare which protocols it may serve
na_ontap_aggregate Create / manage an aggregate Provision the physical capacity pool that volumes are carved from
na_ontap_volume Create / resize / manage a volume The unit of capacity for NAS shares, SAN LUNs, and S3 buckets; bulk volume provisioning
na_ontap_interface Create / manage a LIF Management, NAS data, and block data interfaces; one module, role set by service policy
na_ontap_broadcast_domain Manage a broadcast domain Group ports into a failover-capable Layer-2 domain before placing LIFs
na_ontap_vlan Create / manage a VLAN interface Tagged network segmentation for multi-tenant or multi-subnet data traffic
na_ontap_cifs Publish / manage an SMB share Expose a path as a Windows file share after the CIFS server is joined to AD
na_ontap_export_policy Manage an NFS export policy The named rule set that decides which client networks may mount an NFS volume
na_ontap_snapshot_policy Manage a snapshot schedule policy Standardize local data protection — hourly/daily/weekly retention — across volumes
na_ontap_snapshot Take / manage an individual snapshot Application-consistent point-in-time copies, e.g. before a change window or upgrade
na_ontap_snapmirror Manage a SnapMirror relationship Cross-cluster replication for disaster recovery and migration; orchestrate failover
na_ontap_rest_info Read cluster state over REST Read-only inventory, performance metrics, drift detection, and dynamic inventory sourcing

Every one of these is idempotent and REST-driven. The examples that follow build the most common combinations of them, in dependency order; the enterprise scenarios further down show how teams compose them at scale.

10

SVM foundation

Create an SVM and declare its protocols

Job: create the storage tenant — the first object in every workflow, since volumes, LIFs, and shares all live inside an SVM. Module: na_ontap_svm (one task). File: create_pri_svm.yml

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create primary SVM
    na_ontap_svm:
      state: present
      name:  "{{ PRI_SVM }}"
      services:
        nfs:
          allowed: true
          enabled: true
#        s3:
#          allowed: true
#          enabled: true
      comment: Created with Ansible
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Create the primary SVM with the protocols it will serve
    netapp.ontap.na_ontap_svm:
      state: present
      name: "{{ PRI_SVM }}"
      comment: "Created with Ansible"
      services:
        nfs:
          allowed: true
          enabled: true
        # add only the protocols this SVM should ever serve, e.g.:
        # s3:
        #   allowed: true
        #   enabled: true

The run

[root@centos1 ansible-workshop]# ansible-playbook create_pri_svm.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create primary SVM] ******************************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. The services block is the SVM’s protocol contract, and the two keys mean different things: allowed permits the protocol to exist on this SVM at all, enabled turns its service on. Declaring only what you need — NFS here, S3 commented out until wanted — is least privilege at the tenant level: a protocol that is not allowed cannot be misconfigured into serving data. This is the object every other entry in this reference depends on; in dependency order it always runs first.

Original → cleaned: what changed and why

This file is mostly clean already — it uses the modern services block and use_rest: Always. The cleaned version changes only the credential pattern (module_defaults instead of the &input anchor) and tidies the commented-out blocks: the original carries two parallel commented sections (a legacy allowed_protocols form and a certificate line) that are dead weight; the cleaned version keeps a single commented S3 stanza as the documented extension point. Note the design choice worth keeping from the original: protocols are declared at creation, not bolted on later — the SVM’s capability surface is defined in one reviewable place.

One module, every workflow. The workshop reuses this same create_pri_svm.yml across its NAS, S3, and SAN tracks — the only thing that changes is the services block (NFS here, or S3, or iSCSI as in 60-01). That is the whole point of declaring protocols at creation: one playbook, one module, and the tenant’s purpose set by which protocols you allow. The variants also carry a commented certificate: "server" line — a placeholder for the server certificate an S3 server (50-01) later requires, a reminder that the SVM and its certificate are born together.

Add a management LIF to an SVM

Job: give an existing SVM its own management interface, so storage admins can manage the tenant directly instead of through the cluster LIF. Module: na_ontap_interface (one task). File: 22_create_mgmtlif_pri_svm.yml

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create management interface on primary SVM
    na_ontap_interface:
      state:           present
      interface_name:  "{{ PRI_SVM }}"
      vserver:         "{{ PRI_SVM }}"
      address:         "{{ PRI_SVM_IP }}"
      netmask:         "{{ PRI_SVM_NETMASK }}"
      home_node:       "{{ PRI_CLU_NODE1 }}"
      home_port:       "{{ PRI_MGMT_PORT }}"
      <<: *input
      service_policy:  "default-management"
      firewall_policy: mgmt
      role:            data

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Create the SVM management LIF
    netapp.ontap.na_ontap_interface:
      state: present
      vserver: "{{ PRI_SVM }}"
      interface_name: "{{ PRI_SVM }}_mgmt"
      address: "{{ PRI_SVM_IP }}"
      netmask: "{{ PRI_SVM_NETMASK }}"
      home_node: "{{ PRI_CLU_NODE1 }}"
      home_port: "{{ PRI_MGMT_PORT }}"
      service_policy: default-management

The run

[root@centos1 ansible-workshop]# ansible-playbook 22_create_mgmtlif_pri_svm.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create management interface on primary SVM] ******************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. An SVM management LIF is what lets a storage tenant be administered independently of the cluster — useful for delegated administration and for tools that connect per-SVM. The four placement parameters work together: home_node + home_port say where the interface lives, address + netmask say what it answers on. The service_policy is what makes it a management interface rather than a data one — default-management is the built-in policy that permits management traffic and nothing else, which is the least-privilege default you want on an admin LIF.

The same na_ontap_interface module creates data LIFs too — the difference is the service_policy: a data LIF for a protocol either takes the SVM default (no policy line) or names a protocol data policy like default-data-iscsi (see 60-01) or default-data-blocks. One module, three interface roles — management, NAS/object data, and block data — distinguished entirely by the service policy you attach.

Original → cleaned: what changed and why

This file is worth citing for one contradiction it carries: it sets service_policy: default-management (the REST-native field that defines an interface’s role) and the older firewall_policy: mgmt + role: data fields on the same task. Under REST the modern service_policy is authoritative and the older pair is redundant — harmless, but confusing to the next reader, and role: data on a management interface flatly contradicts the management service policy. The cleaned version keeps only service_policy, which is the single source of truth: an interface’s role is whatever its service policy permits. Two smaller fixes: the interface is named {{ PRI_SVM }}_mgmt rather than reusing the bare SVM name (an interface named identically to its SVM invites confusion in network interface show), and credentials move to module_defaults.

Configure DNS on an SVM

Job: point the SVM at a DNS server and domain — the quiet prerequisite that makes the Active Directory join (and any hostname-based access) work. Module: na_ontap_dns (one task). Depends on: the SVM (10-01). Required by: the CIFS server join (30-01).

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create DNS on primary SVM
    na_ontap_dns:
      state:       present
      vserver:     "{{ PRI_SVM }}"
      domains:     "{{ PRI_DOMAIN }}"
      nameservers: "{{ PRI_DNS1 }}"
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Configure DNS on the SVM
    netapp.ontap.na_ontap_dns:
      state: present
      vserver: "{{ PRI_SVM }}"
      domains:
        - "{{ PRI_DOMAIN }}"
      nameservers:
        - "{{ PRI_DNS1 }}"
        - "{{ PRI_DNS2 }}"          # list both - resilience if one resolver is down

The run

[root@centos1 ansible-workshop]# ansible-playbook create_dns_pri_svm.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create DNS on primary SVM] **********************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. DNS is configured per SVM, not just at the cluster — because a data SVM joining Active Directory resolves the domain from its own network, using its own resolver. domains is the search domain, nameservers is the resolver to query. This task looks trivial and is — until you skip it, at which point the CIFS server join (30-01) fails with a domain-not-found error that sends people debugging the join when the real fault is one layer below. Run DNS first; the join just works.

Original → cleaned: what changed and why

The fix here is resilience, expressed through YAML types. Both domains and nameservers are lists — the original passes a single scalar to each (one domain, one resolver), which works but leaves the SVM with a single point of DNS failure: if PRI_DNS1 is down, name resolution stops and SMB access with it. The cleaned version writes them as proper YAML lists and adds PRI_DNS2 (already in global.vars) as a second resolver. One resolver is a lab; two is production. Credentials move to module_defaults as elsewhere.

Add a network route to an SVM

Job: give the SVM a route so its LIFs can reach clients on other subnets — the networking piece that turns a reachable interface into a reachable service. Module: na_ontap_net_routes (one task). File: 36_create_route_pri_svm.yml. Depends on: the SVM (10-01) and at least one data LIF.

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create route on primary SVM
    na_ontap_net_routes:
      state:       present
      vserver:     "{{ PRI_SVM }}"
      destination: "192.168.0.0/24"
      gateway:     "{{ PRI_CLU_DEFAULT_GW }}"
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Add the default route for the SVM
    netapp.ontap.na_ontap_net_routes:
      state: present
      vserver: "{{ PRI_SVM }}"
      destination: "0.0.0.0/0"          # default route - reach any subnet
      gateway: "{{ PRI_CLU_DEFAULT_GW }}"

The run

[root@centos1 ansible-workshop]# ansible-playbook 36_create_route_pri_svm.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create route on primary SVM] ********************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. Like DNS, routing on a data SVM is per-SVM — an SVM has its own routing table, separate from the cluster’s. A LIF (10-02) gives the SVM an address on its local subnet; a route is what lets it answer clients that live on other subnets. destination is the network to reach and gateway is the next hop toward it. Without a route to a client’s subnet, that client can reach the LIF only if it happens to share the subnet — the subtle cause of “works for some clients, times out for others.”

Original → cleaned: what changed and why

One meaningful choice. The original scopes the route to a single subnet (destination: 192.168.0.0/24), which reaches exactly that network and nothing else. The cleaned version uses the default route 0.0.0.0/0 — the catch-all that sends any otherwise-unmatched traffic to the gateway, which is what most data SVMs actually want so clients on any routed subnet can connect. Use a specific destination only when you deliberately want to limit which networks the SVM can reach (a valid hardening choice); use the default route when the SVM should serve broadly. The original’s slightly mismatched indentation on gateway is also tidied — harmless to YAML, but consistent indentation is the difference between a file that reviews cleanly and one that invites a second look. Credentials move to module_defaults.

20

Volumes

Create a NAS volume and mount it into the namespace

Job: carve capacity from an aggregate and mount it at a junction path, so NAS clients can reach it — the layer between the SVM and any share or export. Module: na_ontap_volume (one task). File: create_nfsvol_pri_svm.yml. Depends on: the SVM (10-01).

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create volume on primary SVM
    na_ontap_volume:
      state:                 present
      name:                  "{{ PRI_SVM }}_nfs_01"
      vserver:               "{{ PRI_SVM }}"
      size:                  "{{ VOL_SIZE }}"
      size_unit:             mb
      aggregate_name:        "{{ PRI_AGGR }}"
      comment:               Created with Ansible
#      space_guarantee:       volume
      policy:                default
      junction_path:         "/{{ PRI_SVM }}_nfs_01"
      volume_security_style: unix
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Create the NFS volume and mount it in the namespace
    netapp.ontap.na_ontap_volume:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: "{{ PRI_SVM }}_nfs_01"
      aggregate_name: "{{ PRI_AGGR }}"
      size: "{{ VOL_SIZE }}"
      size_unit: gb                     # GiB - see note on units below
      junction_path: "/{{ PRI_SVM }}_nfs_01"
      volume_security_style: unix
      space_guarantee: none
      export_policy: default
      comment: "NFS volume - managed by Ansible"

The run

[root@centos1 ansible-workshop]# ansible-playbook create_nfsvol_pri_svm.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create volume on primary SVM] *******************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. Four decisions define a NAS volume. aggregate_name is which physical pool backs it. size + size_unit is how big (two fields, always set both). junction_path is what makes it reachable — an unmounted volume exists but no client can see it, the silent cause of “the export is empty” tickets. And volume_security_style: unix matches the NFS consumers coming next — UNIX mode bits govern permissions, where an SMB volume would use ntfs. With the volume mounted, an export policy is the next layer.

Original → cleaned: what changed and why

Two substantive fixes and a recurring one. Size unit: the original uses size_unit: mb — with VOL_SIZE at 20 that is a 20 MB volume, almost certainly not intended; the cleaned version uses gb, which is what a NAS volume actually wants. This is the kind of unit slip that passes every syntax check and only surfaces when the share fills in a day. Space guarantee: the original comments out space_guarantee entirely (defaulting to the cluster setting); the cleaned version states none explicitly — thin-provisioned and intentional, not implicit. Export policy: renamed from the bare policy: default to the explicit export_policy: default, the unambiguous modern parameter. Credentials move to module_defaults as everywhere else.

The SMB variant: one parameter different. The workshop ships a near-identical 26_create_cifsvol_pri_svm.yml for SMB — same module, same structure, with volume_security_style: ntfs instead of unix so Windows ACLs govern the files. That single field is the whole difference between a NAS volume destined for an NFS export and one destined for an SMB share. The SMB file also sets space_guarantee: volume rather than none — the thick option, which reserves the full size from the aggregate up front. Thick guarantees the space can never be over-committed; thin (none) lets you over-provision and monitor. Choose deliberately: thick for workloads that must never hit a space error, thin for density with the monitoring to back it. Set security_style to match the protocol, space_guarantee to match the risk tolerance — everything else about the two volumes is identical.

30

NAS (SMB & NFS)

Join an SVM to Active Directory (create a CIFS server)

Job: give the SVM an SMB identity by joining it to Active Directory — the prerequisite every SMB share depends on, and the step most quick-starts skip. Module: na_ontap_cifs_server (one task). File: create_cifs_pri_svm.yml. Depends on: the SVM (10-01) with CIFS allowed.

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create CIFS server on primary SVM
    na_ontap_cifs_server:
      state:           present
      name:            "{{ PRI_SVM }}"
      vserver:         "{{ PRI_SVM }}"
      domain:          "{{ PRI_AD_DOMAIN }}"
      admin_user_name: "{{ PRI_AD_USER }}"
      admin_password:  "{{ PRI_AD_PASS }}"
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Join the SVM to Active Directory as a CIFS server
    netapp.ontap.na_ontap_cifs_server:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: "{{ PRI_SVM }}_smb"          # the AD computer object + UNC name
      domain: "{{ PRI_AD_DOMAIN }}"
      admin_user_name: "{{ PRI_AD_USER }}"
      admin_password: "{{ PRI_AD_PASS }}"
      service_state: started
      # ou: "OU=Storage,DC=demo,DC=netapp,DC=com"   # place the object precisely

The run

[root@centos1 ansible-workshop]# ansible-playbook create_cifs_pri_svm.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create CIFS server on primary SVM] **************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. This single task performs a real Active Directory domain join: name becomes the computer object created in AD and the server half of the UNC path (\\name\share), domain is the AD domain to join, and admin_user_name / admin_password are the join credentials — an account with rights to create computer objects in the target OU. Those credentials are why global.vars is vault-encrypted: they are domain credentials, not just storage ones. The join happens from the SVM’s network, so the SVM’s DNS must resolve the domain — the most common cause of a failed join, which is why DNS (10-03) runs first.

Original → cleaned: what changed and why

Two naming and operational refinements. Server name: the original sets the CIFS server name equal to the SVM name — legal, but it makes the AD computer object indistinguishable from the SVM in logs and in vserver cifs show; the cleaned version uses {{ PRI_SVM }}_smb so the SMB identity is its own recognizable thing. Explicit start + OU: the cleaned version adds service_state: started (don’t assume the default) and shows a commented ou: parameter — in a real domain you place the computer object in a specific OU rather than the default Computers container, which is often a security or GPO requirement. The join credentials and domain stay as the original had them; they are already coming from vaulted variables, which is correct.

Publish an SMB share

Job: expose a path on the SVM as a Windows file share — the last NAS step, where storage becomes a drive someone can map. Module: na_ontap_cifs (one task). File: create_cifsshare_pri_svm.yml. Depends on: the volume (20-01, ntfs variant) and the CIFS server (30-01).

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create CIFS share on primary SVM
    na_ontap_cifs:
      state:      present
      share_name: "share_01"
      vserver:    "{{ PRI_SVM }}"
      path:       "/{{ PRI_SVM }}_cifs_01/cifs_01/"
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Publish the SMB share
    netapp.ontap.na_ontap_cifs:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: share_01
      path: "/{{ PRI_SVM }}_cifs_01/cifs_01"
      comment: "Project share - managed by Ansible"

The run

[root@centos1 ansible-workshop]# ansible-playbook create_cifsshare_pri_svm.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create CIFS share on primary SVM] ***************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

# from a domain-joined Windows client, the share now maps:
C:\> net use Z: \\nas_svm_smb\share_01
The command completed successfully.

What to read in this one. A share is just a name mapped to a path inside the SVM namespace — clients connect to \\server\share_01 and land at the path you specify. The path arithmetic is the part that must line up: the volume mounted at /{{ PRI_SVM }}_cifs_01 (the ntfs volume from 20-01), then the qtree cifs_01 inside it, so the share path is junction + qtree. Point it one segment wrong and the share either fails or publishes the wrong directory. Who can do what on the share is governed separately by NTFS ACLs on the files plus share-level ACLs (na_ontap_cifs_acl); this task only publishes the name.

Original → cleaned: what changed and why

Two small correctness fixes. Trailing slash: the original’s path: "/..._cifs_01/cifs_01/" carries a trailing slash; ONTAP tolerates it, but share paths are canonically written without one, and a stray slash occasionally trips path-equality checks on re-runs. The cleaned version drops it. Parameter name: the original uses share_name; the module’s current canonical parameter is name (with share_name kept as an alias) — the cleaned version uses name to match the rest of the collection, and adds a comment so the share is self-documenting in vserver cifs share show. Credentials move to module_defaults.

Add an NFS export policy rule

Job: grant NFS clients access by adding a rule to an export policy — the NFS equivalent of publishing a share, and the step that decides who may mount. Module: na_ontap_export_policy_rule (one task). File: create_export_policy_rule.yml. Depends on: the volume (20-01) attached to this policy, plus the NFS service enabled on the SVM.

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create export policy rule for svm.
    na_ontap_export_policy_rule:
      state:               present
      name:                default
      vserver:             "{{ PRI_SVM }}"
      client_match:        192.168.0.0/24
      ro_rule:             any
      rw_rule:             any
      protocol:            nfs3
      super_user_security: any
      anonymous_user_id:   65534
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Add the project network to the export policy, read-write
    netapp.ontap.na_ontap_export_policy_rule:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: default
      client_match: "{{ nfs_client_network }}"   # one CIDR from vars, not the world
      protocol: nfs
      ro_rule: sys                 # require AUTH_SYS, not "any"
      rw_rule: sys
      super_user_security: none    # squash root from clients
      allow_suid: false

The run

[root@centos1 ansible-workshop]# ansible-playbook create_export_policy_rule.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create export policy rule for svm.] *************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

# from a client in the allowed network, the export now mounts:
$ sudo mount -t nfs nas_svm:/nas_svm_nfs_01 /mnt/nfs
$ df -h /mnt/nfs
Filesystem            Size  Used Avail Use% Mounted on
nas_svm:/nas_svm_nfs_01  19G  256K   19G   1% /mnt/nfs

What to read in this one. NFS access control is unlike SMB’s: there is no per-user authentication at mount time by default. Instead an export policy is a named, ordered list of rules, and each rule matches client machines by address (client_match) and grants them read-only (ro_rule) and read-write (rw_rule) access. A volume points at one policy; this task adds a rule to that policy. The empty-policy trap is the classic NFS incident: a volume attached to a policy with no matching rule mounts nowhere, because ONTAP’s default for “no rule matched” is deny.

Original → cleaned: what changed and why

This is a security entry: the lab rule is wide open and the cleaned one is least privilege. The original sets ro_rule: any / rw_rule: any with a hard-coded 192.168.0.0/24any means “no authentication required,” which reads in an audit as “anyone on this subnet has unauthenticated read-write.” The cleaned version requires AUTH_SYS (ro_rule: sys, rw_rule: sys), squashes root from clients (super_user_security: none — a root user on a workstation becomes the anonymous user on the export, so owning a laptop is not owning the data), disables suid, and pulls the network from a nfs_client_network variable instead of a literal. protocol: nfs3 also becomes the broader nfs unless you specifically need to pin v3. The original’s anonymous_user_id: 65534 (the “nobody” uid) is fine and kept implicitly by the squash. Same module, same policy — the difference is entirely in how much it trusts the client.

50

S3 / object storage

Stand up an S3 server on an SVM

Job: turn an existing SVM into an S3 endpoint, so object clients (backup tools, cloud-native apps) can talk to the cluster like AWS. Module: na_ontap_s3_services (one task). File: create_s3_server.yml

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
#
# Procedure to get the certificate name:
# 1. System Manager > Cluster > Settings > Certificates > Client/Server
# 2. Find the cert for the SVM you created (e.g. nas_svm)
# 3. Note it: e.g. nas_svm_97BBD25E15519CA
#
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create an s3 service on primary SVM
    na_ontap_s3_services:
      state:            present
      name:             "s3_server"
      vserver:          "{{ PRI_SVM }}"
      comment:          enabled
      enabled:          true
      certificate_name: nas_svm_97BBD25E15519CA
      <<: *input
#    register: result
#  - name: print variables
#    ansible.builtin.debug:
#      msg: Access Key {{ result.access_key }} and Secret Key {{ result.secret_key }}

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Create the S3 server on the SVM
    netapp.ontap.na_ontap_s3_services:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: s3_server
      enabled: true
      comment: "S3 endpoint - managed by Ansible"
      certificate_name: "{{ s3_certificate_name }}"   # from vars, not hard-coded
    register: s3_service

  - name: Show the root-user keys ONCE - store them in your secrets manager now
    ansible.builtin.debug:
      msg:
        - "access_key: {{ s3_service.access_key | default('(unchanged - keys only issued at creation)') }}"
        - "secret_key: {{ s3_service.secret_key | default('(unchanged - keys only issued at creation)') }}"

The run

[root@centos1 ansible-workshop]# ansible-playbook create_s3_server.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create an s3 service on primary SVM] *************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. An S3 server is per-SVM, and it needs a server certificate for HTTPS — which is the one manual prerequisite this task cannot invent: the certificate is created with the SVM, and you supply its name here. The lab header documents exactly where to find it (System Manager → Cluster → Settings → Certificates), because the name is cluster-generated and unguessable. Once the server is enabled, the SVM answers S3 over its data LIF — which is the one piece of plumbing this task assumes already exists.

Companion: the S3 data LIF. S3 clients reach the server over a data interface, created with the same na_ontap_interface module as the management LIF in 10-02 — but with no service_policy, because a plain data LIF takes the SVM’s default. Run this alongside the server:

  tasks:
  - name: Create the S3 data LIF clients connect to
    netapp.ontap.na_ontap_interface:
      state: present
      vserver: "{{ PRI_SVM }}"
      interface_name: "{{ PRI_SVM }}_s3_01"
      address: "{{ PRI_SVM_S3_IP }}"
      netmask: "{{ PRI_SVM_S3_NETMASK }}"
      home_node: "{{ PRI_CLU_NODE1 }}"
      home_port: "{{ PRI_DATA_PORT }}"
      # module_defaults supplies the connection block; no service_policy
      # needed - a data LIF inherits the SVM default. For two-path
      # redundancy, add a second LIF on PRI_CLU_NODE2, as the SAN
      # example (60-01) does for iSCSI.

Then creating users and buckets follows — covered in 50-02 through 50-04 below.

Original → cleaned: what changed and why

Two things the original commented out are worth turning back on, which the cleaned version does. First, certificate_name is hard-coded to a specific cluster’s generated string — fine in a lab, but it makes the file non-portable; the cleaned version reads it from a s3_certificate_name variable so the playbook moves between clusters unchanged. Second, the original’s commented register + debug block is the most important part of the whole file: ONTAP issues the S3 root user’s access and secret keys once, at service creation, and never again. The cleaned version restores that capture with a default() guard so re-runs print a calm placeholder instead of failing — lose those keys and you regenerate, never recover. comment: enabled in the original is also just a stray value (the comment field set to the word “enabled”); the cleaned version gives it a real description.

The enabled flag is a staging control. The workshop ships this file in two forms — one with enabled: true (shown above) and one with enabled: false — because the same task can provision the S3 server without turning it on. That is a deliberate production pattern: create the endpoint during a change window, validate certificate and LIF reachability, then flip enabled: true in a separate run to take it live. Because the module is idempotent, that second run changes only the one field. Same task, two roles — deploy and activate.

Create a reusable S3 access policy

Job: define a named, standalone permission set that groups can attach by name — the reusable alternative to the built-in FullAccess and to inline bucket policies. Module: na_ontap_s3_policies (one task). File: create_s3_policy.yml. Used by: the group in 50-03.

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create an s3 policies
    na_ontap_s3_policies:
      state:   present
      name:    "{{ S3_Policy }}"
      vserver: "{{ PRI_SVM }}"
      comment: Created with Ansible
      statements:
        - sid: 1
          resources:
          - "*"
          actions:
            - "*"
          effect: allow
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Create a scoped, reusable S3 access policy
    netapp.ontap.na_ontap_s3_policies:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: "{{ S3_Policy }}"
      comment: "Read/write to the project bucket - managed by Ansible"
      statements:
        - sid: AllowProjectBucketReadWrite
          effect: allow
          resources:
            - "{{ S3_Bucket }}"
            - "{{ S3_Bucket }}/*"
          actions:
            - GetObject
            - PutObject
            - DeleteObject
            - ListBucket

The run

[root@centos1 ansible-workshop]# ansible-playbook create_s3_policy.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create an s3 policies] ***************************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. A standalone policy is the reusable middle layer of the S3 model: define the permission set once, then attach it by name to any group (as in 50-03) instead of repeating inline policy on every bucket. The statement grammar is identical to a bucket policy — resources, actions, effect, optional principals — the difference is only where it lives: a bucket policy is attached to one bucket, a named policy is reusable across groups. This is how you replace the built-in FullAccess that 50-03 flagged.

Original → cleaned: what changed and why

This is the entry where the original is a deliberate anti-example, and the cleaned version is the lesson. The lab file grants actions: ["*"] on resources: ["*"] — every action on every resource, which is FullAccess rewritten by hand and the exact thing a custom policy exists to avoid. It runs, it is valid, and it is wrong for production. The cleaned version scopes both lists: specific actions on the project bucket and its objects only, with a descriptive sid instead of 1. The original’s commented-out lines (the per-bucket resources and a sm_s3_user principal) show the author knew the scoped form — the cleaned version simply uncomments that intent. If you take one habit from this reference, take this one: a wildcard policy is a finding, a scoped policy is a control.

Create an S3 group that binds a user to a policy

Job: grant an S3 user access by placing them in a group that carries an access policy — the group is where identity meets permission. Module: na_ontap_s3_groups (one task). File: create_s3_group.yml. Depends on: the S3 server (50-01) and an S3 user already existing.

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create a S3 Group
    na_ontap_s3_groups:
      state:    present
      name:     "{{ S3_Group }}"
      vserver:  "{{ PRI_SVM }}"
      users:
        - name: "{{ S3_User }}"
      policies:
        - name: FullAccess
      comment:  Created with Ansible
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Create the S3 group and bind the user to a policy
    netapp.ontap.na_ontap_s3_groups:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: "{{ S3_Group }}"
      comment: "S3 access group - managed by Ansible"
      users:
        - name: "{{ S3_User }}"
      policies:
        - name: FullAccess          # least privilege: prefer a scoped custom
                                     # policy over FullAccess in production

The run

[root@centos1 ansible-workshop]# ansible-playbook create_s3_group.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create a S3 Group] *******************************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. ONTAP’s S3 authorization model mirrors AWS IAM: a user is an identity with keys, a policy is a set of permissions, and a group is the join between them — you attach policies to a group and add users to it, rather than granting permissions to users directly. Both users and policies are lists, so one group can hold many members and carry several policies; the module reconciles the group to exactly the membership you declare, which is what makes re-runs safe. FullAccess is a built-in policy — convenient for a lab, and the one line to revisit before production.

Original → cleaned: what changed and why

This file is already well-formed — the cleaned version only swaps the &input anchor for module_defaults and gives the comment a real description. The substantive note is a security one, flagged inline: policies: [FullAccess] grants the group unrestricted access to every bucket on the SVM. That is fine in the workshop, but in production the group should carry the scoped custom policy from 50-02 instead — read/write to named buckets only. The grammar is identical; you simply replace FullAccess with the name of the policy you defined in 50-02.

Create an S3 bucket with a least-privilege access policy

Job: create the object container itself and attach a policy that says exactly who can do what to it — the last step that makes the S3 endpoint usable. Module: na_ontap_s3_buckets (one task). File: create_s3_bucket.yml. Depends on: the S3 server (50-01) and the user named in the policy.

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create a S3 bucket
    na_ontap_s3_buckets:
      state:      present
      name:       "s3-bucket"
      vserver:    "{{ PRI_SVM }}"
      aggregates: "{{ PRI_AGGR }}"
      size:       102005473280
      policy:
        statements:
          - sid: 1
            resources:
              - s3-bucket
              - s3-bucket/*
            actions:
              - GetObject
              - PutObject
              - DeleteObject
              - ListBucket
            effect: allow
            principals:
              - s3_user
      comment:    "Container for S3 objects"
      <<: *input

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Create the S3 bucket with a scoped access policy
    netapp.ontap.na_ontap_s3_buckets:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: "{{ S3_Bucket }}"
      aggregates:
        - "{{ PRI_AGGR }}"
      size: 95   # GiB - see note on size units below
      size_unit: gb
      comment: "Container for S3 objects - managed by Ansible"
      policy:
        statements:
          - sid: AllowAppReadWrite
            effect: allow
            principals:
              - "{{ S3_User }}"
            resources:
              - "{{ S3_Bucket }}"
              - "{{ S3_Bucket }}/*"
            actions:
              - GetObject
              - PutObject
              - DeleteObject
              - ListBucket

The run

[root@centos1 ansible-workshop]# ansible-playbook create_s3_bucket.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create a S3 bucket] ******************************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=1    changed=1    unreachable=0    failed=0    skipped=0

What to read in this one. The bucket policy is the same statement grammar as an AWS S3 policy: a principals list (who), an actions list (what they may do), a resources list (on what), and an effect of allow or deny. The two resource lines are both required and mean different things — s3-bucket is the bucket itself (needed for ListBucket), s3-bucket/* is the objects inside it (needed for Get/Put/Delete). Name only the principals that should have access and only the actions they need; this policy is already close to least privilege, which is exactly right for a bucket.

Original → cleaned: what changed and why

Three improvements, one of which matters at 2 a.m. Size units: the original’s size: 102005473280 is a raw byte count — correct (~95 GiB) but unreviewable; a typo of one digit is a 10× mistake nobody catches in review. The cleaned version uses size: 95 + size_unit: gb, which a human can verify at a glance. Descriptive sid: the original’s sid: 1 (a numbered statement, with the descriptive FullAccessTos3-user commented out just above it) becomes a named AllowAppReadWrite — statement IDs should describe intent so a policy audit reads like English. Hard-coded names: s3-bucket and s3_user are literals in the original; the cleaned version uses the {{ S3_Bucket }} and {{ S3_User }} variables already defined in global.vars, so the file is portable and the bucket name stays consistent between its definition and its policy. aggregates is also written as a proper YAML list, which the module expects.

60

SAN / block

Provision a complete iSCSI SAN service: SVM, LIFs, igroup, volume, LUN, map

Job: from nothing to a Windows host seeing a disk — one playbook builds the SAN tenant and every layer inside it, in dependency order. Modules: na_ontap_svm, na_ontap_iscsi, na_ontap_interface ×2, na_ontap_igroup, na_ontap_volume, na_ontap_lun, na_ontap_lun_map. File: create_san_all.yml

The lab original — exactly as it runs in the workshop

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false
  vars:
    input: &input
      hostname:       "{{ PRI_CLU }}"
      username:       "{{ PRI_CLU_USER }}"
      password:       "{{ PRI_CLU_PASS }}"
      https:          true
      validate_certs: false
#      use_rest:       Always
  vars_files:
    - global.vars
  collections:
    - netapp.ontap
  tasks:
  - name: Create primary SVM
    na_ontap_svm:
      state:   present
      name:    "{{ PRI_SVM }}"
      comment: Created with Ansible
      <<: *input
  - name: Enable iSCSI on primary SVM
    na_ontap_iscsi:
      state:         present
      vserver:       "{{ PRI_SVM }}"
      service_state: started
      <<: *input
  - name: Create iSCSI interface for node01
    na_ontap_interface:
      state:           present
      interface_name:  "{{ PRI_SVM }}_iscsi_01"
      vserver:         "{{ PRI_SVM }}"
      address:         "{{ PRI_ISCSI_IP }}"
      netmask:         "{{ PRI_SVM_NETMASK }}"
      home_node:       "{{ PRI_CLU_NODE1 }}"
      home_port:       "{{ PRI_DATA_PORT }}"
      protocols:       iscsi
      <<: *input
      firewall_policy: data
      role:            data
  - name: Create iSCSI interface for node02
    na_ontap_interface:
      state:           present
      interface_name:  "{{ PRI_SVM }}_iscsi_02"
      vserver:         "{{ PRI_SVM }}"
      address:         "{{ SEC_ISCSI_IP }}"
      netmask:         "{{ PRI_SVM_NETMASK }}"
      home_node:       "{{ PRI_CLU_NODE2 }}"
      home_port:       "{{ PRI_DATA_PORT }}"
      protocols:       iscsi
      <<: *input
      firewall_policy: data
      role:            data
  - name: Create iGroup
    na_ontap_igroup:
       state:                present
       name:                 "{{ IGROUP_NAME }}"
       vserver:              "{{ PRI_SVM }}"
       initiator_group_type: iscsi
       ostype:               windows
       initiator:            "{{ WIN_IQN }}"
       hostname:             "{{ PRI_CLU }}"
       username:             "{{ PRI_CLU_USER }}"
       password:             "{{ PRI_CLU_PASS }}"
       https:                true
       validate_certs:       false
    with_items: "{{ igroups }}"
    when: igroups != None
  - name: Create volume
    na_ontap_volume:
       state:           present
       name:            "{{ VOL_NAME }}"
       aggregate_name:  "{{ PRI_AGGR }}"
       size:            "{{ VOL_SIZE }}"
       size_unit:       gb
       space_guarantee: none
       vserver:         "{{ PRI_SVM }}"
       hostname:        "{{ PRI_CLU }}"
       username:        "{{ PRI_CLU_USER }}"
       password:        "{{ PRI_CLU_PASS }}"
       https:           true
       validate_certs:  false
    with_items: "{{ luns }}"
  - name: Lun Create
    na_ontap_lun:
       state:          present
       name:           "{{ LUN_NAME }}"
       flexvol_name:   "{{ VOL_NAME }}"
       vserver:        "{{ PRI_SVM }}"
       size:           "{{ LUN_SIZE }}"
       size_unit:      gb
       ostype:         windows
       space_reserve:  false
       hostname:       "{{ PRI_CLU }}"
       username:       "{{ PRI_CLU_USER }}"
       password:       "{{ PRI_CLU_PASS }}"
       https:          true
       validate_certs: false
    with_items: "{{ luns }}"
    when: luns != None
  - name: Create LUN mapping
    na_ontap_lun_map:
       state:                present
       initiator_group_name: "{{ IGROUP_NAME }}"
       path:                 "/vol/{{ VOL_NAME }}/{{ LUN_NAME }}"
       vserver:              "{{ PRI_SVM }}"
       hostname:             "{{ PRI_CLU }}"
       username:             "{{ PRI_CLU_USER }}"
       password:             "{{ PRI_CLU_PASS }}"
       https:                true
       validate_certs:       false
    with_items: "{{ luns }}"
    when: luns != None

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      hostname: "{{ PRI_CLU }}"
      username: "{{ PRI_CLU_USER }}"
      password: "{{ PRI_CLU_PASS }}"
      https: true
      validate_certs: false        # lab certificate - set true in production
      use_rest: always

  tasks:
  - name: Create primary SAN SVM
    netapp.ontap.na_ontap_svm:
      state: present
      name: "{{ PRI_SVM }}"
      comment: "Created with Ansible"
      services:
        iscsi:
          allowed: true

  - name: Enable iSCSI service on the SVM
    netapp.ontap.na_ontap_iscsi:
      state: present
      vserver: "{{ PRI_SVM }}"
      service_state: started

  - name: Create iSCSI data LIF on node01
    netapp.ontap.na_ontap_interface:
      state: present
      vserver: "{{ PRI_SVM }}"
      interface_name: "{{ PRI_SVM }}_iscsi_01"
      address: "{{ PRI_ISCSI_IP }}"
      netmask: "{{ PRI_SVM_NETMASK }}"
      home_node: "{{ PRI_CLU_NODE1 }}"
      home_port: "{{ PRI_DATA_PORT }}"
      service_policy: default-data-iscsi

  - name: Create iSCSI data LIF on node02
    netapp.ontap.na_ontap_interface:
      state: present
      vserver: "{{ PRI_SVM }}"
      interface_name: "{{ PRI_SVM }}_iscsi_02"
      address: "{{ SEC_ISCSI_IP }}"
      netmask: "{{ PRI_SVM_NETMASK }}"
      home_node: "{{ PRI_CLU_NODE2 }}"
      home_port: "{{ PRI_DATA_PORT }}"
      service_policy: default-data-iscsi

  - name: Create initiator group for the Windows host
    netapp.ontap.na_ontap_igroup:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: "{{ IGROUP_NAME }}"
      group_type: iscsi
      os_type: windows
      initiator_names:
        - "{{ WIN_IQN }}"

  - name: Create the volume that will hold the LUN
    netapp.ontap.na_ontap_volume:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: "{{ VOL_NAME }}"
      aggregate_name: "{{ PRI_AGGR }}"
      size: "{{ VOL_SIZE }}"
      size_unit: gb
      space_guarantee: none

  - name: Create the LUN inside the volume
    netapp.ontap.na_ontap_lun:
      state: present
      vserver: "{{ PRI_SVM }}"
      flexvol_name: "{{ VOL_NAME }}"
      name: "{{ LUN_NAME }}"
      size: "{{ LUN_SIZE }}"
      size_unit: gb
      os_type: windows
      space_reserve: false

  - name: Map the LUN to the initiator group
    netapp.ontap.na_ontap_lun_map:
      state: present
      vserver: "{{ PRI_SVM }}"
      path: "/vol/{{ VOL_NAME }}/{{ LUN_NAME }}"
      initiator_group_name: "{{ IGROUP_NAME }}"

The run

[root@centos1 ansible-workshop]# ansible-playbook create_san_all.yml --ask-vault-pass
Vault password:

PLAY [localhost] ***************************************************************

TASK [Create primary SVM] ******************************************************
changed: [localhost]

TASK [Enable iSCSI on primary SVM] *********************************************
changed: [localhost]

TASK [Create iSCSI interface for node01] ***************************************
changed: [localhost]

TASK [Create iSCSI interface for node02] ***************************************
changed: [localhost]

TASK [Create iGroup] ***********************************************************
changed: [localhost] => (item=igroup1)

TASK [Create volume] ***********************************************************
changed: [localhost] => (item=lun1)

TASK [Lun Create] **************************************************************
changed: [localhost] => (item=lun1)

TASK [Create LUN mapping] ******************************************************
changed: [localhost] => (item=lun1)

PLAY RECAP *********************************************************************
localhost    : ok=8    changed=8    unreachable=0    failed=0    skipped=0

What to read in this one. The task order is the dependency stack — tenant → protocol service → network reachability → access control → capacity → LUN → map; shuffle it and ONTAP refuses the forward references. Two LIFs, one per node, is the SAN minimum for path redundancy: Windows MPIO sees both and survives a node takeover. os_type: windows appears on both the igroup and the LUN deliberately (it controls SCSI geometry and alignment), and space_guarantee: none on the volume paired with space_reserve: false on the LUN is the thin-provisioned default — monitor real usage, as in the performance playbook. After the run, the Windows host needs only an iSCSI target portal pointed at either LIF IP, then a rescan.

Original → cleaned: what changed and why

Three differences, each a habit worth keeping. Credentials: the original’s first four tasks use the &input anchor but the last four repeat hostname/username/password longhand — the cleaned version uses module_defaults for the whole collection, so no task can drift or forget them. LIF parameters: the original’s role: data + firewall_policy: data are older interface fields, which is why its use_rest line is commented out; the REST-native replacement is a single service_policy: default-data-iscsi, and with it the playbook runs cleanly with use_rest: always. Accidental loops: with_items: "{{ igroups }}" with when: != None iterates a single string — harmless here, but for one resource write one task and reserve loop: for real lists. The cleaned version produces the identical cluster state; it just ages without breaking.

70

Data protection

Set up cross-cluster SnapMirror replication

Job: replicate a volume from one cluster to another for disaster recovery — peer the clusters, peer the SVMs, create the destination, and establish the SnapMirror relationship. Modules: na_ontap_cluster_peer, na_ontap_vserver_peer, na_ontap_volume (type DP), na_ontap_snapmirror. File: snapmirror-create.yml.

The lab original — exactly as it runs in the workshop

---
- hosts: localhost
  name: Snapmirror Create
  gather_facts: false
  vars:
    src_ontap:   192.168.0.101
    src_lif:     192.168.0.120,192.168.0.121
    src_vserver: svm1
    src_volume:  vol1
    dst_ontap:   192.168.0.102
    dst_name:    cluster2
    dst_lif:     192.168.0.122
    dst_aggr:    aggr1_cluster2_01_data
    dst_vserver: svm2
    dst_volume:  vol1_sm
    username:    admin            # !! plaintext credentials in the playbook
    password:    Netapp1!         # !! - see the security note below
    passphrase:  Netapp123        # !!
  tasks:
  - name: Create cluster peer
    na_ontap_cluster_peer:
      state: present
      source_intercluster_lifs: "{{ src_lif }}"
      dest_intercluster_lifs:   "{{ dst_lif }}"
      passphrase:    "{{ passphrase }}"
      hostname:      "{{ src_ontap }}"
      dest_hostname: "{{ dst_ontap }}"
      username:      "{{ username }}"
      password:      "{{ password }}"
      https: true
      validate_certs: false
  - name: sleep for 20 seconds and continue with play
    wait_for:
      timeout: 20
    delegate_to: localhost
  - name: Source vserver peer create
    na_ontap_vserver_peer:
      state: present
      peer_vserver: "{{ dst_vserver }}"
      peer_cluster: "{{ dst_name }}"
      vserver:      "{{ src_vserver }}"
      applications: snapmirror
      hostname:      "{{ src_ontap }}"
      dest_hostname: "{{ dst_ontap }}"
      username: "{{ username }}"
      password: "{{ password }}"
      https: true
      validate_certs: false
  - name: sleep for 20 seconds and continue with play
    wait_for:
      timeout: 20
    delegate_to: localhost
  - name: Validate destination FlexVol
    na_ontap_volume:
      state: present
      name:           "{{ dst_volume }}"
      aggregate_name: "{{ dst_aggr }}"
      size: 1
      size_unit: gb
      type: DP
      vserver:  "{{ dst_vserver }}"
      hostname: "{{ dst_ontap }}"
      username: "{{ username }}"
      password: "{{ password }}"
      https: true
      validate_certs: false
  - name: Create SnapMirror
    na_ontap_snapmirror:
      state: present
      source_volume:       "{{ src_volume }}"
      destination_volume:  "{{ dst_volume }}"
      source_vserver:      "{{ src_vserver }}"
      destination_vserver: "{{ dst_vserver }}"
      hostname: "{{ dst_ontap }}"
      username: "{{ username }}"
      password: "{{ password }}"
      https: true
      validate_certs: false

The WUC-cleaned version — what we would run in production

#!/usr/bin/env ansible-playbook
- hosts: localhost
  gather_facts: false

  vars_files:
    - global.vars            # ALL credentials live here, vault-encrypted

  module_defaults:
    group/netapp.ontap.netapp_ontap:
      username: "{{ PRI_CLU_USER }}"   # same admin on both clusters here;
      password: "{{ PRI_CLU_PASS }}"   # use SEC_CLU_* if they differ
      https: true
      validate_certs: false            # lab certificate - true in production
      use_rest: always

  tasks:
  - name: Peer the two clusters
    netapp.ontap.na_ontap_cluster_peer:
      state: present
      hostname: "{{ PRI_CLU }}"
      dest_hostname: "{{ SEC_CLU }}"
      source_intercluster_lifs: "{{ PRI_CLU_IC1_IP }},{{ PRI_CLU_IC2_IP }}"
      dest_intercluster_lifs: "{{ SEC_CLU_IC1_IP }}"
      passphrase: "{{ peer_passphrase }}"   # vaulted, not literal

  - name: Peer the source and destination SVMs for snapmirror
    netapp.ontap.na_ontap_vserver_peer:
      state: present
      hostname: "{{ PRI_CLU }}"
      dest_hostname: "{{ SEC_CLU }}"
      vserver: "{{ PRI_SVM }}"
      peer_vserver: "{{ SEC_SVM }}"
      peer_cluster: "{{ SEC_CLU_NAME }}"
      applications: snapmirror

  - name: Create the destination DP volume
    netapp.ontap.na_ontap_volume:
      state: present
      hostname: "{{ SEC_CLU }}"
      vserver: "{{ SEC_SVM }}"
      name: "{{ dst_volume }}"
      aggregate_name: "{{ SEC_AGGR }}"
      size: 1
      size_unit: gb
      type: DP                 # data-protection destination - no junction path

  - name: Establish and initialize the SnapMirror relationship
    netapp.ontap.na_ontap_snapmirror:
      state: present
      hostname: "{{ SEC_CLU }}"        # snapmirror is driven from the destination
      source_endpoint:
        path: "{{ PRI_SVM }}:{{ src_volume }}"
      destination_endpoint:
        path: "{{ SEC_SVM }}:{{ dst_volume }}"
      policy: MirrorAllSnapshots
      initialize: true

The run

[root@centos1 ansible-workshop]# ansible-playbook snapmirror-create.yml --ask-vault-pass
Vault password:

PLAY [Snapmirror Create] *******************************************************

TASK [Create cluster peer] *****************************************************
changed: [localhost]

TASK [sleep for 20 seconds and continue with play] ****************************
ok: [localhost]

TASK [Source vserver peer create] *********************************************
changed: [localhost]

TASK [sleep for 20 seconds and continue with play] ****************************
ok: [localhost]

TASK [Validate destination FlexVol] *******************************************
changed: [localhost]

TASK [Create SnapMirror] *******************************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost    : ok=6    changed=4    unreachable=0    failed=0    skipped=0

What to read in this one. SnapMirror has a four-step dependency chain and the order is mandatory: cluster peer (the two clusters authenticate to each other with a shared passphrase) → SVM peer (the two SVMs authorize SnapMirror traffic between them) → destination volume (created as type: DP — a data-protection volume that is read-only and takes no junction path, because it exists only to receive replicated data) → the relationship itself, which is driven from the destination cluster (note every snapmirror task targets dst_ontap/SEC_CLU as its hostname). The relationship pulls from source to destination; the destination is in charge. initialize: true performs the first full baseline transfer.

Original → cleaned: what changed and why — read this one

The security fix is the headline. The lab original declares username: admin, password: Netapp1!, and a cluster-peer passphrase: Netapp123 as plaintext literals in the playbook’s own vars block — committed to the repository, visible in every clone, exactly the failure Ansible Vault exists to prevent. The cleaned version moves all three into the vault-encrypted global.vars (the passphrase as peer_passphrase) and supplies the shared credentials once via module_defaults. Nothing secret remains in the playbook. Two correctness notes. The wait_for: timeout “sleep” tasks are a fragile timing hack — peering usually propagates in seconds, but a fixed 20-second sleep both wastes time when it is ready sooner and fails when it is not; the production pattern is a retries/until loop that polls the peer state, though a short sleep is tolerable for a lab. And the modern na_ontap_snapmirror uses source_endpoint/destination_endpoint path syntax rather than the older flat source_volume/destination_volume fields — both work, the endpoint form is current. The dependency order and the destination-driven model are correct in the original and kept exactly.

Enterprise automation scenarios

The fourteen examples above are building blocks. What follows is how storage teams actually compose them in production — the patterns that turn a folder of playbooks into an operating model. Each scenario names the modules involved and the principle that makes it scale.

Automated storage provisioning at scale

The problem. A project needs forty volumes with consistent naming, size, and export policy. Created by hand in System Manager, that is forty opportunities for a typo and no record of intent. The pattern: declare the volumes as data and let one task loop over them — the list is the documentation, and the diff when you add the forty-first volume is your change record.

  tasks:
  - name: Provision the project volumes from a declared list
    netapp.ontap.na_ontap_volume:
      state: present
      vserver: "{{ PRI_SVM }}"
      name: "{{ item.name }}"
      aggregate_name: "{{ item.aggr | default(PRI_AGGR) }}"
      size: "{{ item.size }}"
      size_unit: gb
      junction_path: "/{{ item.name }}"
      export_policy: "{{ item.policy | default('default') }}"
      space_guarantee: none
      comment: "Project volume - managed by Ansible"
    loop:
      - { name: proj_data_01, size: 500 }
      - { name: proj_data_02, size: 500 }
      - { name: proj_logs_01, size: 100, policy: restricted }
      # ... extend the list; one line per volume, reviewed in Git

Scaling further, the list moves out of the playbook entirely — into a group_vars file, a CSV the team edits, or a CMDB query — and the same task provisions any number of volumes. The module’s idempotency means re-running after adding entries touches only the new volumes. Modules: na_ontap_volume, often preceded by na_ontap_aggregate when capacity pools are provisioned in the same run.

VMware datastore deployment

The problem. Standing up vSphere datastores means provisioning storage that ESXi can consume — an NFS export the hosts mount, or an iSCSI LUN they claim — and doing it identically across a cluster of hosts. The pattern: the NAS and SAN examples in this guide are the storage half; the playbook provisions the volume and the access layer (export policy scoped to the ESXi management network for NFS, or a LUN mapped to an igroup of the hosts’ IQNs for iSCSI/VMFS), and the vSphere side is automated separately with the community.vmware collection. The two run in one pipeline: ONTAP provisions, vCenter mounts. Modules: na_ontap_volume, na_ontap_export_policy (NFS datastores) or na_ontap_lun + na_ontap_igroup + na_ontap_lun_map (VMFS), with the ESXi management subnet as the client_match or the hosts’ IQNs as the igroup initiators.

Multi-tenant SVM automation

The problem. Onboarding a new customer or department means a repeatable bundle: an SVM, its network identity, DNS, a route, the protocols they bought, and a starter volume — identical every time, auditable, and deletable as a unit. The pattern: the section-10 examples are exactly this bundle. Wrap them in a role (or a single playbook driven by per-tenant variables) and onboarding becomes one command with one vars file; the tenant’s entire definition lives in version control.

Figure 02 · Multi-tenant SVM provisioning from one parameterized playbook

onboard_svm.ymlone parameterized playbookSVM + LIF + DNS + route + voltenant_acme.varsnames, IPs, protocolstenant_globex.varsnames, IPs, protocolstenant_initech.varsnames, IPs, protocolsONTAP clusterSVM acme — LIF, DNS, volisolated tenantSVM globex — LIF, DNS, volisolated tenantSVM initech — LIF, DNS, volisolated tenant
The SVM is ONTAP’s unit of multi-tenancy. One playbook plus one vars file per tenant yields fully isolated, identically-configured tenants — and the same playbook with state: absent offboards one cleanly.

Disaster recovery automation

The problem. DR is only real if it is tested, and manual SnapMirror failover is too error-prone to test often. The pattern: the SnapMirror example (70-01) establishes replication as code; the same collection orchestrates the failover and the periodic test. A DR runbook becomes a playbook: quiesce and break the relationship to activate the destination, or in a test, clone the destination and verify the data without disturbing replication. Because the relationship is declared, drift in the protection topology is detectable on a schedule.

Figure 03 · SnapMirror disaster-recovery workflow

Source clustervol1 (read-write)productionDest clustervol1_sm (type DP, RO)DR sitereplicate (scheduled)resync on recoveryOn disaster — playbook1. quiesce relationship2. break → dest read-write3. redirect clients to DRtestable any time bycloning the destination
Replication, failover, and resync are all declarative. Codifying the failover is what makes DR testable often enough to trust.

Storage-as-Code

The principle. Everything above shares one idea: the cluster’s configuration lives in Git, not in an administrator’s memory or a wiki page that drifts. Playbooks, inventory, and the (vault-encrypted) variables are version-controlled; changes go through pull requests and review; a bad change rolls back with a revert instead of an archaeology session. The repository becomes the most accurate description of the estate that exists — and because the playbooks are idempotent, that description is enforceable, not just documentary. This is the difference between automation and Storage-as-Code: automation runs commands faster, Storage-as-Code makes the desired state auditable and self-correcting.

Day-2 operations

The principle. Provisioning is day one; the value compounds on day two. Routine administration — growing a volume, rotating snapshots, standardizing a snapshot policy across a fleet, adding an export rule, reading performance — all become small, reviewable, repeatable playbooks. A snapshot policy applied through na_ontap_snapshot_policy is identical on every volume because the same code created it; a capacity increase is a one-line diff with an audit trail. The read-only na_ontap_rest_info turns into a nightly reporting and drift-detection job. Day-2 is where the discipline pays for itself, because the work that used to be a hundred manual clicks per week becomes a scheduled pipeline nobody has to remember.

ONTAP automation in CI/CD pipelines

The natural home for these playbooks is a pipeline, not an engineer’s laptop. Committing a change to the storage repository triggers the same sequence every enterprise CI/CD system supports: lint and syntax-check the YAML, run the playbook in --check mode against the cluster to preview changes without making them, require an approval, then apply. The vault password comes from the pipeline’s secret store (never the repo), and every run is logged with who triggered it and what changed.

Figure 04 · CI/CD pipeline driving ONTAP automation

git pushstorage repoCI/CD runnerGitHub Actions / GitLab / Jenkins1. yamllint + syntax2. ansible-playbook –check3. approval gate4. applyvault password ← pipeline secret storeAnsible control nodenetapp.ontapONTAP clusterdesired state applied + logged
The same flow works in GitHub Actions, GitLab CI/CD, Jenkins, or a dedicated automation platform. The --check stage and approval gate are what make applying storage changes from a pipeline safe.

Concretely, the apply stage runs ansible-playbook site.yml --vault-password-file "$VAULT_PASS_FILE", where the runner writes the vault password to a short-lived file from its secret store and deletes it afterward. The --check stage runs the identical command with --check --diff and posts the would-change output to the merge request, so reviewers approve a concrete plan, not a hope. This is the production form of the discipline every cleaned playbook in this guide is written for.

Troubleshooting ONTAP Ansible automation

Nearly every failure in the first month of running these playbooks falls into one of five buckets. Each announces itself with a recognizable message; match the symptom and apply the fix.

Authentication and RBAC failures

Symptom Root cause Resolution
401 Unauthorized / Invalid credentials Wrong username or password, or the wrong cluster hostname Verify the vaulted PRI_CLU_USER / PRI_CLU_PASS against a manual login; confirm hostname is the cluster management LIF, not a node or data LIF.
403 Forbidden on a specific resource The account authenticates but its role lacks rights to that API The automation account needs a role with access to the relevant REST endpoints. Use a dedicated admin-scoped account, or a custom role granting only the API paths your playbooks touch — least privilege, but sufficient.
User is not authorized for http application The account exists but the http application is not enabled for it The REST API requires the user to have the http (and typically ontapi) application enabled. Confirm with security login show; the automation account must be permitted to authenticate over HTTP.

The durable fix is a purpose-built automation account: a dedicated cluster user, http-enabled, with a role scoped to what the playbooks actually do, and its password in the vault. Never automate as the same interactive admin a human logs in with — you want the audit log to distinguish “the pipeline did this” from “a person did this.”

TLS and certificate issues

The single most common line to get wrong is validate_certs. Every lab original in this guide sets validate_certs: false because lab clusters ship with self-signed certificates and verification would fail. That setting is a lab convenience and a production liability: with verification off, anything that can intercept the HTTPS session can impersonate the cluster and harvest the admin credential the playbook sends. The production fix is not to keep disabling it — it is to install a CA-signed (or trusted internal-CA) certificate on the cluster management LIF and set validate_certs: true. If you must run against a self-signed certificate temporarily, treat validate_certs: false the way you treat any other security exception: scoped to one environment, documented, and never copied into the production vars file. Symptoms to expect: SSL: CERTIFICATE_VERIFY_FAILED means verification is on and the certificate is not trusted — the right answer is to fix the trust, not to disable the check.

Connectivity problems

Before suspecting Ansible, prove the layer beneath it. A module that hangs or times out is almost always a network problem, not a code problem. Check, in order: DNS — can the control node resolve the cluster management name? (nslookup cluster1.demo.netapp.com); reachability — does curl -k https://cluster1.demo.netapp.com/api/cluster return JSON, or hang? A hang is a firewall or routing problem between the control node and the management LIF on port 443; the management LIF itself — is it up and is the cluster healthy? The control node needs HTTPS (443) to the cluster management LIF specifically; reaching a node management or data LIF is not the same thing. This is the same layer-isolation discipline that applies to the per-SVM DNS and routing examples earlier — fix the network below before debugging the automation above.

Common REST API errors

Message pattern Root cause Resolution
job reported error ... duplicate entry / already exists A non-idempotent parameter, or creating something that exists with different immutable attributes Most modules are idempotent; this usually means an immutable field (an aggregate, a security style) differs from the existing object. Read current state with na_ontap_rest_info and reconcile, rather than forcing.
svm not found / aggregate not found A dependency does not exist, or a name differs between vars and cluster Check the dependency order — SVM before its volumes, CIFS server before its shares — and verify the name in global.vars matches the cluster exactly. Names are the most common drift between lab and production.
missing required arguments A required parameter is unset, often an empty variable A variable referenced in the playbook is empty or undefined in global.vars. Run with --check first; it surfaces undefined-variable errors before any change is attempted.
ModuleNotFoundError before any API call The collection or a Python library is missing from Ansible’s environment Install the collection (ansible-galaxy collection install netapp.ontap) and netapp-lib into the same environment Ansible runs from. See the install guide’s storage extras.

Playbook debugging technique

Three tools resolve almost everything the tables above do not. Verbosity: re-run with -vvv to see the exact REST calls, request bodies, and responses — the response JSON usually names the real problem precisely. Add no_log: true to credential-bearing tasks first, so verbose output does not echo secrets into logs. Check and diff: --check --diff previews what would change without changing it — the safest first run against any cluster and the fastest way to catch undefined variables and bad parameter names. Read the truth: when a module’s behavior is surprising, query actual cluster state with na_ontap_rest_info and a targeted gather_subset — comparing what Ansible thinks exists against what the cluster reports resolves most “it should have worked” cases. The pattern is always the same: isolate the layer (network, then auth, then the module’s view of state) before changing the playbook.

Production lessons from the field

Vendor documentation tells you what the modules do. This is what running them across enterprise estates teaches — the things that are obvious only in hindsight.

Lessons learned. The playbook is never the hard part; the operating model around it is. Teams that succeed treat inventory and playbooks as production code — reviewed, versioned, pipeline-applied — from day one, not as scripts that graduate to that later. The ones that struggle started with ansible as “a faster way to type CLI commands” and never made the leap to desired-state thinking. The single highest-leverage habit is wiring a nightly --check run early: it converts your playbooks from provisioning tools into a continuous drift-detection system, and it surfaces the gap between documented and actual state before an auditor does.

Common mistakes. Five recur across nearly every estate. Disabling certificate validation in production and forgetting it was ever temporary. Hard-coding credentials in playbooks or vars files instead of the vault — the single most common security finding. Specifying sizes in raw bytes where a one-digit typo is a 10× error no reviewer catches (use size + size_unit, always). Granting wildcard permissions — FullAccess S3 policies, any NFS rules — because the lab example did. And mixing imperative habits into declarative code: writing a task per object instead of looping a declared list, or assuming order where idempotency makes it irrelevant.

Scaling considerations. What works for one cluster needs structure for fifty. Move per-cluster facts into group_vars and host_vars rather than a single global.vars; the playbooks stay identical and only the variable scope changes. Use dynamic inventory sourced from na_ontap_rest_info or a CMDB so the fleet is discovered, not hand-listed. Pin the collection version in requirements.yml so every control node and pipeline runner behaves identically — an unpinned collection is how “works on my machine” enters storage automation. And separate vault passwords per environment, so a compromised lab credential cannot decrypt production.

Security best practices. A dedicated, http-enabled automation account with a least-privilege role, never the interactive admin. All secrets in Ansible Vault, the vault password in a secrets manager or pipeline store, never in Git. validate_certs: true against a real certificate in production. no_log: true on every task that handles a credential. Scoped permissions everywhere the lab used wildcards — named S3 policies, CIDR-scoped NFS rules, igroups treated like firewall rules. The recurring theme across this entire reference is that lab-convenient defaults are audit findings; production automation closes each one deliberately.

Operational runbooks. The endgame is that routine storage work stops being interactive. A volume-growth request is a one-line pull request, not a console session. A new-tenant onboarding is a vars file and a pipeline run. A DR test is a scheduled playbook that clones the destination and validates it. A quarterly access review reads live state with na_ontap_rest_info and diffs it against the declared policies. Each of these is a runbook that used to live in a human’s head or a stale wiki; as code, it is executable, reviewable, and the same every time it runs. That is the operating model WUC builds into the estates we run — and the reason these fourteen examples are written for production from the first line, not retrofitted to it later.

The recurring lesson: lab-convenient is audit-findable

Read these fourteen examples together and one pattern repeats across every protocol. The lab originals are written for a closed training environment, so they reach for the convenient default — and almost every convenient default is a finding in a production audit. Four show it plainly. The S3 policy in 50-02 grants actions: ["*"] on resources: ["*"] — full access by another name. The NFS rule in 30-03 sets ro_rule: any / rw_rule: any — unauthenticated read-write to anyone on the subnet. The SnapMirror playbook in 70-01 hard-codes the cluster password and peering passphrase as plaintext in the playbook itself. And nearly every original disables certificate validation and repeats credentials in the clear.

The cleaned versions apply the same three corrections every time: scope the permission (named actions and resources, specific client networks, least-privilege policies), encrypt the secret (everything sensitive in a vault-encrypted global.vars, supplied once through module_defaults), and state intent explicitly (descriptive policy IDs, real comments, units a reviewer can verify at a glance). None of this changes what the playbook builds; all of it changes whether the playbook survives review. That is the whole difference between a script that works in a lab and one that runs in production — and it is the discipline WUC builds into every estate we automate.

Using this reference

Find the job in the build-order list, copy the blue block, change the variables in your global.vars, and run it with ansible-playbook <file>.yml --ask-vault-pass. The examples are deliberately ordered by dependency — an SVM before its volumes, a volume before its shares, a CIFS server before an SMB share — so reading top to bottom is also a working build sequence for a complete storage service. Every cleaned playbook is idempotent: run it twice and the second run reports changed=0, which is how you turn any of these into a scheduled compliance check. For the concepts underneath — how the modules reach the cluster, what makes them safe to re-run, and how Vault protects the credentials — the three field guides linked throughout are the place to start.

Automating a NetApp estate beyond the lab?

These fourteen playbooks are the building blocks; an automated estate is the operating model around them — vault discipline, least-privilege service accounts, change-controlled pipelines, and drift enforcement that runs on a schedule. WUC engineers build and run both, across NetApp ONTAP, Cisco fabrics, and multi-OEM infrastructure, as an automation consultant, maintenance provider, and managed services partner.

Prefer to read first? See managed services and post-OEM storage maintenance.

References

  1. Ansible project. netapp.ontap collection documentation. The authoritative reference for every module used across these fourteen examples.
  2. Ansible Galaxy. netapp.ontap role: na_ontap_nas_create. NetApp’s prebuilt role packaging the volume-to-share NAS flow — the consume-rather-than-compose option once these building blocks are familiar.
  3. NetApp. ONTAP Automation Documentation. The REST API foundation every module here drives.
  4. WUC Technologies. How to Install Ansible, NetApp ONTAP Ansible Playbooks, and Encrypting Ansible Variables with Ansible Vault. The three field guides this reference accompanies.
About WUC Engineering
Infrastructure engineers at WUC Technologies running Ansible against multi-OEM estates — NetApp ONTAP storage, Cisco Catalyst and MDS fabrics, and the server platforms between them — under SLA-backed maintenance and managed services engagements. Authorized Dell & Cisco partner.

Find our field guides faster in Google. Add WUC Technologies as a preferred source and our engineering guides carry a “preferred” badge in your Search results, AI Overviews, and AI Mode.

Add as preferred source →