NetApp ONTAP Ansible Playbook Reference: Enterprise Automation Examples
This is a working reference for automating NetApp ONTAP with the netapp.ontap Ansible collection — fourteen worked examples covering the jobs a storage engineer actually does, from creating an SVM to replicating a volume across clusters. It is the companion to our three Ansible field guides: installing the control node, the seven core playbooks, and securing credentials with Ansible Vault. Where those teach concepts, this one is the lookup table: find the job, copy the playbook.
Every entry shows the same playbook two ways. The amber block is the lab original — exactly as it runs in a NetApp training workshop, quirks and all. The blue block is the WUC-cleaned version — the same result rewritten for production: credentials via module_defaults pulling from a vault-encrypted global.vars, REST-native parameters, fully-qualified module names, and least-privilege defaults. The blue block has a copy button; the amber one does not. A short terminal run follows, then a note on exactly what changed between the two versions and why. Every playbook here validates as parseable YAML.
Modules current as of the netapp.ontap collection 23.x against ONTAP 9.12+ over REST. Variable names (PRI_CLU, PRI_SVM, …) come from the shared global.vars file documented in the Vault walkthrough.
How Ansible automates ONTAP: the REST architecture
Modern ONTAP automation should use REST-based workflows. This guide focuses exclusively on supported ONTAP REST API automation through the NetApp Ansible Collection (netapp.ontap) — every example here drives the ONTAP REST API, and every cleaned playbook sets use_rest: always so there is no ambiguity about which interface runs.
The mechanics are worth understanding before the examples, because they explain why these playbooks are safe to run against production. Ansible is agentless: nothing is installed on the ONTAP cluster. The netapp.ontap modules execute on an Ansible control node and make authenticated HTTPS calls to the cluster management LIF’s REST endpoint (/api/...). Each module is a thin, idempotent wrapper around a set of REST calls: it first reads current state with a GET, compares it to the state your playbook declares, and issues a POST or PATCH only if reality differs. That read-compare-act cycle is the whole engine.
Figure 01 · Ansible control node to ONTAP cluster over REST
Idempotency and desired-state: why these playbooks are safe to re-run
Two properties make Ansible suitable for production storage, and both follow directly from the read-compare-act cycle above. Idempotency means running a playbook twice produces the same result as running it once: the first run creates what is missing and reports changed; the second run finds reality already matching the declaration and reports changed=0, touching nothing. Desired-state management is the consequence — your playbook is not a script of imperative commands (“create this, then that”) but a description of how the cluster should look, and Ansible’s job is to make reality match that description regardless of the starting point.
The operational payoff is real. A changed=0 run is a free compliance audit: schedule any of these playbooks nightly, and a run that suddenly reports changed=1 is drift detected and already corrected — someone resized a volume by hand, and the next scheduled run quietly put it back and logged that it did. This is why every cleaned playbook in this reference uses state: present and task names that begin with “Ensure”: you are declaring desired state, not issuing commands, and the cluster gains a standing enforcement mechanism no hand-run CLI procedure can match.
Module quick-reference table
The netapp.ontap collection ships well over a hundred modules. These are the ones that carry most production automation — the working set behind every example in this guide and the day-2 operations around them.
| Module | Task | Typical production use case |
|---|---|---|
na_ontap_svm |
Create / manage a storage VM | Onboard a tenant, department, or customer; declare which protocols it may serve |
na_ontap_aggregate |
Create / manage an aggregate | Provision the physical capacity pool that volumes are carved from |
na_ontap_volume |
Create / resize / manage a volume | The unit of capacity for NAS shares, SAN LUNs, and S3 buckets; bulk volume provisioning |
na_ontap_interface |
Create / manage a LIF | Management, NAS data, and block data interfaces; one module, role set by service policy |
na_ontap_broadcast_domain |
Manage a broadcast domain | Group ports into a failover-capable Layer-2 domain before placing LIFs |
na_ontap_vlan |
Create / manage a VLAN interface | Tagged network segmentation for multi-tenant or multi-subnet data traffic |
na_ontap_cifs |
Publish / manage an SMB share | Expose a path as a Windows file share after the CIFS server is joined to AD |
na_ontap_export_policy |
Manage an NFS export policy | The named rule set that decides which client networks may mount an NFS volume |
na_ontap_snapshot_policy |
Manage a snapshot schedule policy | Standardize local data protection — hourly/daily/weekly retention — across volumes |
na_ontap_snapshot |
Take / manage an individual snapshot | Application-consistent point-in-time copies, e.g. before a change window or upgrade |
na_ontap_snapmirror |
Manage a SnapMirror relationship | Cross-cluster replication for disaster recovery and migration; orchestrate failover |
na_ontap_rest_info |
Read cluster state over REST | Read-only inventory, performance metrics, drift detection, and dynamic inventory sourcing |
Every one of these is idempotent and REST-driven. The examples that follow build the most common combinations of them, in dependency order; the enterprise scenarios further down show how teams compose them at scale.
SVM foundation
Create an SVM and declare its protocols
Job: create the storage tenant — the first object in every workflow, since volumes, LIFs, and shares all live inside an SVM. Module: na_ontap_svm (one task). File: create_pri_svm.yml
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create primary SVM
na_ontap_svm:
state: present
name: "{{ PRI_SVM }}"
services:
nfs:
allowed: true
enabled: true
# s3:
# allowed: true
# enabled: true
comment: Created with Ansible
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Create the primary SVM with the protocols it will serve
netapp.ontap.na_ontap_svm:
state: present
name: "{{ PRI_SVM }}"
comment: "Created with Ansible"
services:
nfs:
allowed: true
enabled: true
# add only the protocols this SVM should ever serve, e.g.:
# s3:
# allowed: true
# enabled: true
The run
[root@centos1 ansible-workshop]# ansible-playbook create_pri_svm.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create primary SVM] ****************************************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. The services block is the SVM’s protocol contract, and the two keys mean different things: allowed permits the protocol to exist on this SVM at all, enabled turns its service on. Declaring only what you need — NFS here, S3 commented out until wanted — is least privilege at the tenant level: a protocol that is not allowed cannot be misconfigured into serving data. This is the object every other entry in this reference depends on; in dependency order it always runs first.
This file is mostly clean already — it uses the modern services block and use_rest: Always. The cleaned version changes only the credential pattern (module_defaults instead of the &input anchor) and tidies the commented-out blocks: the original carries two parallel commented sections (a legacy allowed_protocols form and a certificate line) that are dead weight; the cleaned version keeps a single commented S3 stanza as the documented extension point. Note the design choice worth keeping from the original: protocols are declared at creation, not bolted on later — the SVM’s capability surface is defined in one reviewable place.
One module, every workflow. The workshop reuses this same create_pri_svm.yml across its NAS, S3, and SAN tracks — the only thing that changes is the services block (NFS here, or S3, or iSCSI as in 60-01). That is the whole point of declaring protocols at creation: one playbook, one module, and the tenant’s purpose set by which protocols you allow. The variants also carry a commented certificate: "server" line — a placeholder for the server certificate an S3 server (50-01) later requires, a reminder that the SVM and its certificate are born together.
Add a management LIF to an SVM
Job: give an existing SVM its own management interface, so storage admins can manage the tenant directly instead of through the cluster LIF. Module: na_ontap_interface (one task). File: 22_create_mgmtlif_pri_svm.yml
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create management interface on primary SVM
na_ontap_interface:
state: present
interface_name: "{{ PRI_SVM }}"
vserver: "{{ PRI_SVM }}"
address: "{{ PRI_SVM_IP }}"
netmask: "{{ PRI_SVM_NETMASK }}"
home_node: "{{ PRI_CLU_NODE1 }}"
home_port: "{{ PRI_MGMT_PORT }}"
<<: *input
service_policy: "default-management"
firewall_policy: mgmt
role: data
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Create the SVM management LIF
netapp.ontap.na_ontap_interface:
state: present
vserver: "{{ PRI_SVM }}"
interface_name: "{{ PRI_SVM }}_mgmt"
address: "{{ PRI_SVM_IP }}"
netmask: "{{ PRI_SVM_NETMASK }}"
home_node: "{{ PRI_CLU_NODE1 }}"
home_port: "{{ PRI_MGMT_PORT }}"
service_policy: default-management
The run
[root@centos1 ansible-workshop]# ansible-playbook 22_create_mgmtlif_pri_svm.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create management interface on primary SVM] ****************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. An SVM management LIF is what lets a storage tenant be administered independently of the cluster — useful for delegated administration and for tools that connect per-SVM. The four placement parameters work together: home_node + home_port say where the interface lives, address + netmask say what it answers on. The service_policy is what makes it a management interface rather than a data one — default-management is the built-in policy that permits management traffic and nothing else, which is the least-privilege default you want on an admin LIF.
The same na_ontap_interface module creates data LIFs too — the difference is the service_policy: a data LIF for a protocol either takes the SVM default (no policy line) or names a protocol data policy like default-data-iscsi (see 60-01) or default-data-blocks. One module, three interface roles — management, NAS/object data, and block data — distinguished entirely by the service policy you attach.
This file is worth citing for one contradiction it carries: it sets service_policy: default-management (the REST-native field that defines an interface’s role) and the older firewall_policy: mgmt + role: data fields on the same task. Under REST the modern service_policy is authoritative and the older pair is redundant — harmless, but confusing to the next reader, and role: data on a management interface flatly contradicts the management service policy. The cleaned version keeps only service_policy, which is the single source of truth: an interface’s role is whatever its service policy permits. Two smaller fixes: the interface is named {{ PRI_SVM }}_mgmt rather than reusing the bare SVM name (an interface named identically to its SVM invites confusion in network interface show), and credentials move to module_defaults.
Configure DNS on an SVM
Job: point the SVM at a DNS server and domain — the quiet prerequisite that makes the Active Directory join (and any hostname-based access) work. Module: na_ontap_dns (one task). Depends on: the SVM (10-01). Required by: the CIFS server join (30-01).
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create DNS on primary SVM
na_ontap_dns:
state: present
vserver: "{{ PRI_SVM }}"
domains: "{{ PRI_DOMAIN }}"
nameservers: "{{ PRI_DNS1 }}"
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Configure DNS on the SVM
netapp.ontap.na_ontap_dns:
state: present
vserver: "{{ PRI_SVM }}"
domains:
- "{{ PRI_DOMAIN }}"
nameservers:
- "{{ PRI_DNS1 }}"
- "{{ PRI_DNS2 }}" # list both - resilience if one resolver is down
The run
[root@centos1 ansible-workshop]# ansible-playbook create_dns_pri_svm.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create DNS on primary SVM] ********************************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. DNS is configured per SVM, not just at the cluster — because a data SVM joining Active Directory resolves the domain from its own network, using its own resolver. domains is the search domain, nameservers is the resolver to query. This task looks trivial and is — until you skip it, at which point the CIFS server join (30-01) fails with a domain-not-found error that sends people debugging the join when the real fault is one layer below. Run DNS first; the join just works.
The fix here is resilience, expressed through YAML types. Both domains and nameservers are lists — the original passes a single scalar to each (one domain, one resolver), which works but leaves the SVM with a single point of DNS failure: if PRI_DNS1 is down, name resolution stops and SMB access with it. The cleaned version writes them as proper YAML lists and adds PRI_DNS2 (already in global.vars) as a second resolver. One resolver is a lab; two is production. Credentials move to module_defaults as elsewhere.
Add a network route to an SVM
Job: give the SVM a route so its LIFs can reach clients on other subnets — the networking piece that turns a reachable interface into a reachable service. Module: na_ontap_net_routes (one task). File: 36_create_route_pri_svm.yml. Depends on: the SVM (10-01) and at least one data LIF.
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create route on primary SVM
na_ontap_net_routes:
state: present
vserver: "{{ PRI_SVM }}"
destination: "192.168.0.0/24"
gateway: "{{ PRI_CLU_DEFAULT_GW }}"
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Add the default route for the SVM
netapp.ontap.na_ontap_net_routes:
state: present
vserver: "{{ PRI_SVM }}"
destination: "0.0.0.0/0" # default route - reach any subnet
gateway: "{{ PRI_CLU_DEFAULT_GW }}"
The run
[root@centos1 ansible-workshop]# ansible-playbook 36_create_route_pri_svm.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create route on primary SVM] ******************************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. Like DNS, routing on a data SVM is per-SVM — an SVM has its own routing table, separate from the cluster’s. A LIF (10-02) gives the SVM an address on its local subnet; a route is what lets it answer clients that live on other subnets. destination is the network to reach and gateway is the next hop toward it. Without a route to a client’s subnet, that client can reach the LIF only if it happens to share the subnet — the subtle cause of “works for some clients, times out for others.”
One meaningful choice. The original scopes the route to a single subnet (destination: 192.168.0.0/24), which reaches exactly that network and nothing else. The cleaned version uses the default route 0.0.0.0/0 — the catch-all that sends any otherwise-unmatched traffic to the gateway, which is what most data SVMs actually want so clients on any routed subnet can connect. Use a specific destination only when you deliberately want to limit which networks the SVM can reach (a valid hardening choice); use the default route when the SVM should serve broadly. The original’s slightly mismatched indentation on gateway is also tidied — harmless to YAML, but consistent indentation is the difference between a file that reviews cleanly and one that invites a second look. Credentials move to module_defaults.
Volumes
Create a NAS volume and mount it into the namespace
Job: carve capacity from an aggregate and mount it at a junction path, so NAS clients can reach it — the layer between the SVM and any share or export. Module: na_ontap_volume (one task). File: create_nfsvol_pri_svm.yml. Depends on: the SVM (10-01).
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create volume on primary SVM
na_ontap_volume:
state: present
name: "{{ PRI_SVM }}_nfs_01"
vserver: "{{ PRI_SVM }}"
size: "{{ VOL_SIZE }}"
size_unit: mb
aggregate_name: "{{ PRI_AGGR }}"
comment: Created with Ansible
# space_guarantee: volume
policy: default
junction_path: "/{{ PRI_SVM }}_nfs_01"
volume_security_style: unix
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Create the NFS volume and mount it in the namespace
netapp.ontap.na_ontap_volume:
state: present
vserver: "{{ PRI_SVM }}"
name: "{{ PRI_SVM }}_nfs_01"
aggregate_name: "{{ PRI_AGGR }}"
size: "{{ VOL_SIZE }}"
size_unit: gb # GiB - see note on units below
junction_path: "/{{ PRI_SVM }}_nfs_01"
volume_security_style: unix
space_guarantee: none
export_policy: default
comment: "NFS volume - managed by Ansible"
The run
[root@centos1 ansible-workshop]# ansible-playbook create_nfsvol_pri_svm.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create volume on primary SVM] ******************************************* changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. Four decisions define a NAS volume. aggregate_name is which physical pool backs it. size + size_unit is how big (two fields, always set both). junction_path is what makes it reachable — an unmounted volume exists but no client can see it, the silent cause of “the export is empty” tickets. And volume_security_style: unix matches the NFS consumers coming next — UNIX mode bits govern permissions, where an SMB volume would use ntfs. With the volume mounted, an export policy is the next layer.
Two substantive fixes and a recurring one. Size unit: the original uses size_unit: mb — with VOL_SIZE at 20 that is a 20 MB volume, almost certainly not intended; the cleaned version uses gb, which is what a NAS volume actually wants. This is the kind of unit slip that passes every syntax check and only surfaces when the share fills in a day. Space guarantee: the original comments out space_guarantee entirely (defaulting to the cluster setting); the cleaned version states none explicitly — thin-provisioned and intentional, not implicit. Export policy: renamed from the bare policy: default to the explicit export_policy: default, the unambiguous modern parameter. Credentials move to module_defaults as everywhere else.
The SMB variant: one parameter different. The workshop ships a near-identical 26_create_cifsvol_pri_svm.yml for SMB — same module, same structure, with volume_security_style: ntfs instead of unix so Windows ACLs govern the files. That single field is the whole difference between a NAS volume destined for an NFS export and one destined for an SMB share. The SMB file also sets space_guarantee: volume rather than none — the thick option, which reserves the full size from the aggregate up front. Thick guarantees the space can never be over-committed; thin (none) lets you over-provision and monitor. Choose deliberately: thick for workloads that must never hit a space error, thin for density with the monitoring to back it. Set security_style to match the protocol, space_guarantee to match the risk tolerance — everything else about the two volumes is identical.
NAS (SMB & NFS)
Join an SVM to Active Directory (create a CIFS server)
Job: give the SVM an SMB identity by joining it to Active Directory — the prerequisite every SMB share depends on, and the step most quick-starts skip. Module: na_ontap_cifs_server (one task). File: create_cifs_pri_svm.yml. Depends on: the SVM (10-01) with CIFS allowed.
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create CIFS server on primary SVM
na_ontap_cifs_server:
state: present
name: "{{ PRI_SVM }}"
vserver: "{{ PRI_SVM }}"
domain: "{{ PRI_AD_DOMAIN }}"
admin_user_name: "{{ PRI_AD_USER }}"
admin_password: "{{ PRI_AD_PASS }}"
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Join the SVM to Active Directory as a CIFS server
netapp.ontap.na_ontap_cifs_server:
state: present
vserver: "{{ PRI_SVM }}"
name: "{{ PRI_SVM }}_smb" # the AD computer object + UNC name
domain: "{{ PRI_AD_DOMAIN }}"
admin_user_name: "{{ PRI_AD_USER }}"
admin_password: "{{ PRI_AD_PASS }}"
service_state: started
# ou: "OU=Storage,DC=demo,DC=netapp,DC=com" # place the object precisely
The run
[root@centos1 ansible-workshop]# ansible-playbook create_cifs_pri_svm.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create CIFS server on primary SVM] ************************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. This single task performs a real Active Directory domain join: name becomes the computer object created in AD and the server half of the UNC path (\\name\share), domain is the AD domain to join, and admin_user_name / admin_password are the join credentials — an account with rights to create computer objects in the target OU. Those credentials are why global.vars is vault-encrypted: they are domain credentials, not just storage ones. The join happens from the SVM’s network, so the SVM’s DNS must resolve the domain — the most common cause of a failed join, which is why DNS (10-03) runs first.
Two naming and operational refinements. Server name: the original sets the CIFS server name equal to the SVM name — legal, but it makes the AD computer object indistinguishable from the SVM in logs and in vserver cifs show; the cleaned version uses {{ PRI_SVM }}_smb so the SMB identity is its own recognizable thing. Explicit start + OU: the cleaned version adds service_state: started (don’t assume the default) and shows a commented ou: parameter — in a real domain you place the computer object in a specific OU rather than the default Computers container, which is often a security or GPO requirement. The join credentials and domain stay as the original had them; they are already coming from vaulted variables, which is correct.
Publish an SMB share
Job: expose a path on the SVM as a Windows file share — the last NAS step, where storage becomes a drive someone can map. Module: na_ontap_cifs (one task). File: create_cifsshare_pri_svm.yml. Depends on: the volume (20-01, ntfs variant) and the CIFS server (30-01).
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create CIFS share on primary SVM
na_ontap_cifs:
state: present
share_name: "share_01"
vserver: "{{ PRI_SVM }}"
path: "/{{ PRI_SVM }}_cifs_01/cifs_01/"
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Publish the SMB share
netapp.ontap.na_ontap_cifs:
state: present
vserver: "{{ PRI_SVM }}"
name: share_01
path: "/{{ PRI_SVM }}_cifs_01/cifs_01"
comment: "Project share - managed by Ansible"
The run
[root@centos1 ansible-workshop]# ansible-playbook create_cifsshare_pri_svm.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create CIFS share on primary SVM] *************************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0 # from a domain-joined Windows client, the share now maps: C:\> net use Z: \\nas_svm_smb\share_01 The command completed successfully.
What to read in this one. A share is just a name mapped to a path inside the SVM namespace — clients connect to \\server\share_01 and land at the path you specify. The path arithmetic is the part that must line up: the volume mounted at /{{ PRI_SVM }}_cifs_01 (the ntfs volume from 20-01), then the qtree cifs_01 inside it, so the share path is junction + qtree. Point it one segment wrong and the share either fails or publishes the wrong directory. Who can do what on the share is governed separately by NTFS ACLs on the files plus share-level ACLs (na_ontap_cifs_acl); this task only publishes the name.
Two small correctness fixes. Trailing slash: the original’s path: "/..._cifs_01/cifs_01/" carries a trailing slash; ONTAP tolerates it, but share paths are canonically written without one, and a stray slash occasionally trips path-equality checks on re-runs. The cleaned version drops it. Parameter name: the original uses share_name; the module’s current canonical parameter is name (with share_name kept as an alias) — the cleaned version uses name to match the rest of the collection, and adds a comment so the share is self-documenting in vserver cifs share show. Credentials move to module_defaults.
Add an NFS export policy rule
Job: grant NFS clients access by adding a rule to an export policy — the NFS equivalent of publishing a share, and the step that decides who may mount. Module: na_ontap_export_policy_rule (one task). File: create_export_policy_rule.yml. Depends on: the volume (20-01) attached to this policy, plus the NFS service enabled on the SVM.
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create export policy rule for svm.
na_ontap_export_policy_rule:
state: present
name: default
vserver: "{{ PRI_SVM }}"
client_match: 192.168.0.0/24
ro_rule: any
rw_rule: any
protocol: nfs3
super_user_security: any
anonymous_user_id: 65534
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Add the project network to the export policy, read-write
netapp.ontap.na_ontap_export_policy_rule:
state: present
vserver: "{{ PRI_SVM }}"
name: default
client_match: "{{ nfs_client_network }}" # one CIDR from vars, not the world
protocol: nfs
ro_rule: sys # require AUTH_SYS, not "any"
rw_rule: sys
super_user_security: none # squash root from clients
allow_suid: false
The run
[root@centos1 ansible-workshop]# ansible-playbook create_export_policy_rule.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create export policy rule for svm.] ************************************* changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0 # from a client in the allowed network, the export now mounts: $ sudo mount -t nfs nas_svm:/nas_svm_nfs_01 /mnt/nfs $ df -h /mnt/nfs Filesystem Size Used Avail Use% Mounted on nas_svm:/nas_svm_nfs_01 19G 256K 19G 1% /mnt/nfs
What to read in this one. NFS access control is unlike SMB’s: there is no per-user authentication at mount time by default. Instead an export policy is a named, ordered list of rules, and each rule matches client machines by address (client_match) and grants them read-only (ro_rule) and read-write (rw_rule) access. A volume points at one policy; this task adds a rule to that policy. The empty-policy trap is the classic NFS incident: a volume attached to a policy with no matching rule mounts nowhere, because ONTAP’s default for “no rule matched” is deny.
This is a security entry: the lab rule is wide open and the cleaned one is least privilege. The original sets ro_rule: any / rw_rule: any with a hard-coded 192.168.0.0/24 — any means “no authentication required,” which reads in an audit as “anyone on this subnet has unauthenticated read-write.” The cleaned version requires AUTH_SYS (ro_rule: sys, rw_rule: sys), squashes root from clients (super_user_security: none — a root user on a workstation becomes the anonymous user on the export, so owning a laptop is not owning the data), disables suid, and pulls the network from a nfs_client_network variable instead of a literal. protocol: nfs3 also becomes the broader nfs unless you specifically need to pin v3. The original’s anonymous_user_id: 65534 (the “nobody” uid) is fine and kept implicitly by the squash. Same module, same policy — the difference is entirely in how much it trusts the client.
S3 / object storage
Stand up an S3 server on an SVM
Job: turn an existing SVM into an S3 endpoint, so object clients (backup tools, cloud-native apps) can talk to the cluster like AWS. Module: na_ontap_s3_services (one task). File: create_s3_server.yml
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
#
# Procedure to get the certificate name:
# 1. System Manager > Cluster > Settings > Certificates > Client/Server
# 2. Find the cert for the SVM you created (e.g. nas_svm)
# 3. Note it: e.g. nas_svm_97BBD25E15519CA
#
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create an s3 service on primary SVM
na_ontap_s3_services:
state: present
name: "s3_server"
vserver: "{{ PRI_SVM }}"
comment: enabled
enabled: true
certificate_name: nas_svm_97BBD25E15519CA
<<: *input
# register: result
# - name: print variables
# ansible.builtin.debug:
# msg: Access Key {{ result.access_key }} and Secret Key {{ result.secret_key }}
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Create the S3 server on the SVM
netapp.ontap.na_ontap_s3_services:
state: present
vserver: "{{ PRI_SVM }}"
name: s3_server
enabled: true
comment: "S3 endpoint - managed by Ansible"
certificate_name: "{{ s3_certificate_name }}" # from vars, not hard-coded
register: s3_service
- name: Show the root-user keys ONCE - store them in your secrets manager now
ansible.builtin.debug:
msg:
- "access_key: {{ s3_service.access_key | default('(unchanged - keys only issued at creation)') }}"
- "secret_key: {{ s3_service.secret_key | default('(unchanged - keys only issued at creation)') }}"
The run
[root@centos1 ansible-workshop]# ansible-playbook create_s3_server.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create an s3 service on primary SVM] ************************************* changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. An S3 server is per-SVM, and it needs a server certificate for HTTPS — which is the one manual prerequisite this task cannot invent: the certificate is created with the SVM, and you supply its name here. The lab header documents exactly where to find it (System Manager → Cluster → Settings → Certificates), because the name is cluster-generated and unguessable. Once the server is enabled, the SVM answers S3 over its data LIF — which is the one piece of plumbing this task assumes already exists.
Companion: the S3 data LIF. S3 clients reach the server over a data interface, created with the same na_ontap_interface module as the management LIF in 10-02 — but with no service_policy, because a plain data LIF takes the SVM’s default. Run this alongside the server:
tasks:
- name: Create the S3 data LIF clients connect to
netapp.ontap.na_ontap_interface:
state: present
vserver: "{{ PRI_SVM }}"
interface_name: "{{ PRI_SVM }}_s3_01"
address: "{{ PRI_SVM_S3_IP }}"
netmask: "{{ PRI_SVM_S3_NETMASK }}"
home_node: "{{ PRI_CLU_NODE1 }}"
home_port: "{{ PRI_DATA_PORT }}"
# module_defaults supplies the connection block; no service_policy
# needed - a data LIF inherits the SVM default. For two-path
# redundancy, add a second LIF on PRI_CLU_NODE2, as the SAN
# example (60-01) does for iSCSI.
Then creating users and buckets follows — covered in 50-02 through 50-04 below.
Two things the original commented out are worth turning back on, which the cleaned version does. First, certificate_name is hard-coded to a specific cluster’s generated string — fine in a lab, but it makes the file non-portable; the cleaned version reads it from a s3_certificate_name variable so the playbook moves between clusters unchanged. Second, the original’s commented register + debug block is the most important part of the whole file: ONTAP issues the S3 root user’s access and secret keys once, at service creation, and never again. The cleaned version restores that capture with a default() guard so re-runs print a calm placeholder instead of failing — lose those keys and you regenerate, never recover. comment: enabled in the original is also just a stray value (the comment field set to the word “enabled”); the cleaned version gives it a real description.
The enabled flag is a staging control. The workshop ships this file in two forms — one with enabled: true (shown above) and one with enabled: false — because the same task can provision the S3 server without turning it on. That is a deliberate production pattern: create the endpoint during a change window, validate certificate and LIF reachability, then flip enabled: true in a separate run to take it live. Because the module is idempotent, that second run changes only the one field. Same task, two roles — deploy and activate.
Create a reusable S3 access policy
Job: define a named, standalone permission set that groups can attach by name — the reusable alternative to the built-in FullAccess and to inline bucket policies. Module: na_ontap_s3_policies (one task). File: create_s3_policy.yml. Used by: the group in 50-03.
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create an s3 policies
na_ontap_s3_policies:
state: present
name: "{{ S3_Policy }}"
vserver: "{{ PRI_SVM }}"
comment: Created with Ansible
statements:
- sid: 1
resources:
- "*"
actions:
- "*"
effect: allow
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Create a scoped, reusable S3 access policy
netapp.ontap.na_ontap_s3_policies:
state: present
vserver: "{{ PRI_SVM }}"
name: "{{ S3_Policy }}"
comment: "Read/write to the project bucket - managed by Ansible"
statements:
- sid: AllowProjectBucketReadWrite
effect: allow
resources:
- "{{ S3_Bucket }}"
- "{{ S3_Bucket }}/*"
actions:
- GetObject
- PutObject
- DeleteObject
- ListBucket
The run
[root@centos1 ansible-workshop]# ansible-playbook create_s3_policy.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create an s3 policies] *************************************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. A standalone policy is the reusable middle layer of the S3 model: define the permission set once, then attach it by name to any group (as in 50-03) instead of repeating inline policy on every bucket. The statement grammar is identical to a bucket policy — resources, actions, effect, optional principals — the difference is only where it lives: a bucket policy is attached to one bucket, a named policy is reusable across groups. This is how you replace the built-in FullAccess that 50-03 flagged.
This is the entry where the original is a deliberate anti-example, and the cleaned version is the lesson. The lab file grants actions: ["*"] on resources: ["*"] — every action on every resource, which is FullAccess rewritten by hand and the exact thing a custom policy exists to avoid. It runs, it is valid, and it is wrong for production. The cleaned version scopes both lists: specific actions on the project bucket and its objects only, with a descriptive sid instead of 1. The original’s commented-out lines (the per-bucket resources and a sm_s3_user principal) show the author knew the scoped form — the cleaned version simply uncomments that intent. If you take one habit from this reference, take this one: a wildcard policy is a finding, a scoped policy is a control.
Create an S3 group that binds a user to a policy
Job: grant an S3 user access by placing them in a group that carries an access policy — the group is where identity meets permission. Module: na_ontap_s3_groups (one task). File: create_s3_group.yml. Depends on: the S3 server (50-01) and an S3 user already existing.
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create a S3 Group
na_ontap_s3_groups:
state: present
name: "{{ S3_Group }}"
vserver: "{{ PRI_SVM }}"
users:
- name: "{{ S3_User }}"
policies:
- name: FullAccess
comment: Created with Ansible
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Create the S3 group and bind the user to a policy
netapp.ontap.na_ontap_s3_groups:
state: present
vserver: "{{ PRI_SVM }}"
name: "{{ S3_Group }}"
comment: "S3 access group - managed by Ansible"
users:
- name: "{{ S3_User }}"
policies:
- name: FullAccess # least privilege: prefer a scoped custom
# policy over FullAccess in production
The run
[root@centos1 ansible-workshop]# ansible-playbook create_s3_group.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create a S3 Group] ******************************************************* changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. ONTAP’s S3 authorization model mirrors AWS IAM: a user is an identity with keys, a policy is a set of permissions, and a group is the join between them — you attach policies to a group and add users to it, rather than granting permissions to users directly. Both users and policies are lists, so one group can hold many members and carry several policies; the module reconciles the group to exactly the membership you declare, which is what makes re-runs safe. FullAccess is a built-in policy — convenient for a lab, and the one line to revisit before production.
This file is already well-formed — the cleaned version only swaps the &input anchor for module_defaults and gives the comment a real description. The substantive note is a security one, flagged inline: policies: [FullAccess] grants the group unrestricted access to every bucket on the SVM. That is fine in the workshop, but in production the group should carry the scoped custom policy from 50-02 instead — read/write to named buckets only. The grammar is identical; you simply replace FullAccess with the name of the policy you defined in 50-02.
Create an S3 bucket with a least-privilege access policy
Job: create the object container itself and attach a policy that says exactly who can do what to it — the last step that makes the S3 endpoint usable. Module: na_ontap_s3_buckets (one task). File: create_s3_bucket.yml. Depends on: the S3 server (50-01) and the user named in the policy.
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create a S3 bucket
na_ontap_s3_buckets:
state: present
name: "s3-bucket"
vserver: "{{ PRI_SVM }}"
aggregates: "{{ PRI_AGGR }}"
size: 102005473280
policy:
statements:
- sid: 1
resources:
- s3-bucket
- s3-bucket/*
actions:
- GetObject
- PutObject
- DeleteObject
- ListBucket
effect: allow
principals:
- s3_user
comment: "Container for S3 objects"
<<: *input
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Create the S3 bucket with a scoped access policy
netapp.ontap.na_ontap_s3_buckets:
state: present
vserver: "{{ PRI_SVM }}"
name: "{{ S3_Bucket }}"
aggregates:
- "{{ PRI_AGGR }}"
size: 95 # GiB - see note on size units below
size_unit: gb
comment: "Container for S3 objects - managed by Ansible"
policy:
statements:
- sid: AllowAppReadWrite
effect: allow
principals:
- "{{ S3_User }}"
resources:
- "{{ S3_Bucket }}"
- "{{ S3_Bucket }}/*"
actions:
- GetObject
- PutObject
- DeleteObject
- ListBucket
The run
[root@centos1 ansible-workshop]# ansible-playbook create_s3_bucket.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create a S3 bucket] ****************************************************** changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=0
What to read in this one. The bucket policy is the same statement grammar as an AWS S3 policy: a principals list (who), an actions list (what they may do), a resources list (on what), and an effect of allow or deny. The two resource lines are both required and mean different things — s3-bucket is the bucket itself (needed for ListBucket), s3-bucket/* is the objects inside it (needed for Get/Put/Delete). Name only the principals that should have access and only the actions they need; this policy is already close to least privilege, which is exactly right for a bucket.
Three improvements, one of which matters at 2 a.m. Size units: the original’s size: 102005473280 is a raw byte count — correct (~95 GiB) but unreviewable; a typo of one digit is a 10× mistake nobody catches in review. The cleaned version uses size: 95 + size_unit: gb, which a human can verify at a glance. Descriptive sid: the original’s sid: 1 (a numbered statement, with the descriptive FullAccessTos3-user commented out just above it) becomes a named AllowAppReadWrite — statement IDs should describe intent so a policy audit reads like English. Hard-coded names: s3-bucket and s3_user are literals in the original; the cleaned version uses the {{ S3_Bucket }} and {{ S3_User }} variables already defined in global.vars, so the file is portable and the bucket name stays consistent between its definition and its policy. aggregates is also written as a proper YAML list, which the module expects.
SAN / block
Provision a complete iSCSI SAN service: SVM, LIFs, igroup, volume, LUN, map
Job: from nothing to a Windows host seeing a disk — one playbook builds the SAN tenant and every layer inside it, in dependency order. Modules: na_ontap_svm, na_ontap_iscsi, na_ontap_interface ×2, na_ontap_igroup, na_ontap_volume, na_ontap_lun, na_ontap_lun_map. File: create_san_all.yml
The lab original — exactly as it runs in the workshop
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars:
input: &input
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
# use_rest: Always
vars_files:
- global.vars
collections:
- netapp.ontap
tasks:
- name: Create primary SVM
na_ontap_svm:
state: present
name: "{{ PRI_SVM }}"
comment: Created with Ansible
<<: *input
- name: Enable iSCSI on primary SVM
na_ontap_iscsi:
state: present
vserver: "{{ PRI_SVM }}"
service_state: started
<<: *input
- name: Create iSCSI interface for node01
na_ontap_interface:
state: present
interface_name: "{{ PRI_SVM }}_iscsi_01"
vserver: "{{ PRI_SVM }}"
address: "{{ PRI_ISCSI_IP }}"
netmask: "{{ PRI_SVM_NETMASK }}"
home_node: "{{ PRI_CLU_NODE1 }}"
home_port: "{{ PRI_DATA_PORT }}"
protocols: iscsi
<<: *input
firewall_policy: data
role: data
- name: Create iSCSI interface for node02
na_ontap_interface:
state: present
interface_name: "{{ PRI_SVM }}_iscsi_02"
vserver: "{{ PRI_SVM }}"
address: "{{ SEC_ISCSI_IP }}"
netmask: "{{ PRI_SVM_NETMASK }}"
home_node: "{{ PRI_CLU_NODE2 }}"
home_port: "{{ PRI_DATA_PORT }}"
protocols: iscsi
<<: *input
firewall_policy: data
role: data
- name: Create iGroup
na_ontap_igroup:
state: present
name: "{{ IGROUP_NAME }}"
vserver: "{{ PRI_SVM }}"
initiator_group_type: iscsi
ostype: windows
initiator: "{{ WIN_IQN }}"
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
with_items: "{{ igroups }}"
when: igroups != None
- name: Create volume
na_ontap_volume:
state: present
name: "{{ VOL_NAME }}"
aggregate_name: "{{ PRI_AGGR }}"
size: "{{ VOL_SIZE }}"
size_unit: gb
space_guarantee: none
vserver: "{{ PRI_SVM }}"
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
with_items: "{{ luns }}"
- name: Lun Create
na_ontap_lun:
state: present
name: "{{ LUN_NAME }}"
flexvol_name: "{{ VOL_NAME }}"
vserver: "{{ PRI_SVM }}"
size: "{{ LUN_SIZE }}"
size_unit: gb
ostype: windows
space_reserve: false
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
with_items: "{{ luns }}"
when: luns != None
- name: Create LUN mapping
na_ontap_lun_map:
state: present
initiator_group_name: "{{ IGROUP_NAME }}"
path: "/vol/{{ VOL_NAME }}/{{ LUN_NAME }}"
vserver: "{{ PRI_SVM }}"
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false
with_items: "{{ luns }}"
when: luns != None
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars
module_defaults:
group/netapp.ontap.netapp_ontap:
hostname: "{{ PRI_CLU }}"
username: "{{ PRI_CLU_USER }}"
password: "{{ PRI_CLU_PASS }}"
https: true
validate_certs: false # lab certificate - set true in production
use_rest: always
tasks:
- name: Create primary SAN SVM
netapp.ontap.na_ontap_svm:
state: present
name: "{{ PRI_SVM }}"
comment: "Created with Ansible"
services:
iscsi:
allowed: true
- name: Enable iSCSI service on the SVM
netapp.ontap.na_ontap_iscsi:
state: present
vserver: "{{ PRI_SVM }}"
service_state: started
- name: Create iSCSI data LIF on node01
netapp.ontap.na_ontap_interface:
state: present
vserver: "{{ PRI_SVM }}"
interface_name: "{{ PRI_SVM }}_iscsi_01"
address: "{{ PRI_ISCSI_IP }}"
netmask: "{{ PRI_SVM_NETMASK }}"
home_node: "{{ PRI_CLU_NODE1 }}"
home_port: "{{ PRI_DATA_PORT }}"
service_policy: default-data-iscsi
- name: Create iSCSI data LIF on node02
netapp.ontap.na_ontap_interface:
state: present
vserver: "{{ PRI_SVM }}"
interface_name: "{{ PRI_SVM }}_iscsi_02"
address: "{{ SEC_ISCSI_IP }}"
netmask: "{{ PRI_SVM_NETMASK }}"
home_node: "{{ PRI_CLU_NODE2 }}"
home_port: "{{ PRI_DATA_PORT }}"
service_policy: default-data-iscsi
- name: Create initiator group for the Windows host
netapp.ontap.na_ontap_igroup:
state: present
vserver: "{{ PRI_SVM }}"
name: "{{ IGROUP_NAME }}"
group_type: iscsi
os_type: windows
initiator_names:
- "{{ WIN_IQN }}"
- name: Create the volume that will hold the LUN
netapp.ontap.na_ontap_volume:
state: present
vserver: "{{ PRI_SVM }}"
name: "{{ VOL_NAME }}"
aggregate_name: "{{ PRI_AGGR }}"
size: "{{ VOL_SIZE }}"
size_unit: gb
space_guarantee: none
- name: Create the LUN inside the volume
netapp.ontap.na_ontap_lun:
state: present
vserver: "{{ PRI_SVM }}"
flexvol_name: "{{ VOL_NAME }}"
name: "{{ LUN_NAME }}"
size: "{{ LUN_SIZE }}"
size_unit: gb
os_type: windows
space_reserve: false
- name: Map the LUN to the initiator group
netapp.ontap.na_ontap_lun_map:
state: present
vserver: "{{ PRI_SVM }}"
path: "/vol/{{ VOL_NAME }}/{{ LUN_NAME }}"
initiator_group_name: "{{ IGROUP_NAME }}"
The run
[root@centos1 ansible-workshop]# ansible-playbook create_san_all.yml --ask-vault-pass Vault password: PLAY [localhost] *************************************************************** TASK [Create primary SVM] ****************************************************** changed: [localhost] TASK [Enable iSCSI on primary SVM] ********************************************* changed: [localhost] TASK [Create iSCSI interface for node01] *************************************** changed: [localhost] TASK [Create iSCSI interface for node02] *************************************** changed: [localhost] TASK [Create iGroup] *********************************************************** changed: [localhost] => (item=igroup1) TASK [Create volume] *********************************************************** changed: [localhost] => (item=lun1) TASK [Lun Create] ************************************************************** changed: [localhost] => (item=lun1) TASK [Create LUN mapping] ****************************************************** changed: [localhost] => (item=lun1) PLAY RECAP ********************************************************************* localhost : ok=8 changed=8 unreachable=0 failed=0 skipped=0
What to read in this one. The task order is the dependency stack — tenant → protocol service → network reachability → access control → capacity → LUN → map; shuffle it and ONTAP refuses the forward references. Two LIFs, one per node, is the SAN minimum for path redundancy: Windows MPIO sees both and survives a node takeover. os_type: windows appears on both the igroup and the LUN deliberately (it controls SCSI geometry and alignment), and space_guarantee: none on the volume paired with space_reserve: false on the LUN is the thin-provisioned default — monitor real usage, as in the performance playbook. After the run, the Windows host needs only an iSCSI target portal pointed at either LIF IP, then a rescan.
Three differences, each a habit worth keeping. Credentials: the original’s first four tasks use the &input anchor but the last four repeat hostname/username/password longhand — the cleaned version uses module_defaults for the whole collection, so no task can drift or forget them. LIF parameters: the original’s role: data + firewall_policy: data are older interface fields, which is why its use_rest line is commented out; the REST-native replacement is a single service_policy: default-data-iscsi, and with it the playbook runs cleanly with use_rest: always. Accidental loops: with_items: "{{ igroups }}" with when: != None iterates a single string — harmless here, but for one resource write one task and reserve loop: for real lists. The cleaned version produces the identical cluster state; it just ages without breaking.
Data protection
Set up cross-cluster SnapMirror replication
Job: replicate a volume from one cluster to another for disaster recovery — peer the clusters, peer the SVMs, create the destination, and establish the SnapMirror relationship. Modules: na_ontap_cluster_peer, na_ontap_vserver_peer, na_ontap_volume (type DP), na_ontap_snapmirror. File: snapmirror-create.yml.
The lab original — exactly as it runs in the workshop
---
- hosts: localhost
name: Snapmirror Create
gather_facts: false
vars:
src_ontap: 192.168.0.101
src_lif: 192.168.0.120,192.168.0.121
src_vserver: svm1
src_volume: vol1
dst_ontap: 192.168.0.102
dst_name: cluster2
dst_lif: 192.168.0.122
dst_aggr: aggr1_cluster2_01_data
dst_vserver: svm2
dst_volume: vol1_sm
username: admin # !! plaintext credentials in the playbook
password: Netapp1! # !! - see the security note below
passphrase: Netapp123 # !!
tasks:
- name: Create cluster peer
na_ontap_cluster_peer:
state: present
source_intercluster_lifs: "{{ src_lif }}"
dest_intercluster_lifs: "{{ dst_lif }}"
passphrase: "{{ passphrase }}"
hostname: "{{ src_ontap }}"
dest_hostname: "{{ dst_ontap }}"
username: "{{ username }}"
password: "{{ password }}"
https: true
validate_certs: false
- name: sleep for 20 seconds and continue with play
wait_for:
timeout: 20
delegate_to: localhost
- name: Source vserver peer create
na_ontap_vserver_peer:
state: present
peer_vserver: "{{ dst_vserver }}"
peer_cluster: "{{ dst_name }}"
vserver: "{{ src_vserver }}"
applications: snapmirror
hostname: "{{ src_ontap }}"
dest_hostname: "{{ dst_ontap }}"
username: "{{ username }}"
password: "{{ password }}"
https: true
validate_certs: false
- name: sleep for 20 seconds and continue with play
wait_for:
timeout: 20
delegate_to: localhost
- name: Validate destination FlexVol
na_ontap_volume:
state: present
name: "{{ dst_volume }}"
aggregate_name: "{{ dst_aggr }}"
size: 1
size_unit: gb
type: DP
vserver: "{{ dst_vserver }}"
hostname: "{{ dst_ontap }}"
username: "{{ username }}"
password: "{{ password }}"
https: true
validate_certs: false
- name: Create SnapMirror
na_ontap_snapmirror:
state: present
source_volume: "{{ src_volume }}"
destination_volume: "{{ dst_volume }}"
source_vserver: "{{ src_vserver }}"
destination_vserver: "{{ dst_vserver }}"
hostname: "{{ dst_ontap }}"
username: "{{ username }}"
password: "{{ password }}"
https: true
validate_certs: false
The WUC-cleaned version — what we would run in production
#!/usr/bin/env ansible-playbook
- hosts: localhost
gather_facts: false
vars_files:
- global.vars # ALL credentials live here, vault-encrypted
module_defaults:
group/netapp.ontap.netapp_ontap:
username: "{{ PRI_CLU_USER }}" # same admin on both clusters here;
password: "{{ PRI_CLU_PASS }}" # use SEC_CLU_* if they differ
https: true
validate_certs: false # lab certificate - true in production
use_rest: always
tasks:
- name: Peer the two clusters
netapp.ontap.na_ontap_cluster_peer:
state: present
hostname: "{{ PRI_CLU }}"
dest_hostname: "{{ SEC_CLU }}"
source_intercluster_lifs: "{{ PRI_CLU_IC1_IP }},{{ PRI_CLU_IC2_IP }}"
dest_intercluster_lifs: "{{ SEC_CLU_IC1_IP }}"
passphrase: "{{ peer_passphrase }}" # vaulted, not literal
- name: Peer the source and destination SVMs for snapmirror
netapp.ontap.na_ontap_vserver_peer:
state: present
hostname: "{{ PRI_CLU }}"
dest_hostname: "{{ SEC_CLU }}"
vserver: "{{ PRI_SVM }}"
peer_vserver: "{{ SEC_SVM }}"
peer_cluster: "{{ SEC_CLU_NAME }}"
applications: snapmirror
- name: Create the destination DP volume
netapp.ontap.na_ontap_volume:
state: present
hostname: "{{ SEC_CLU }}"
vserver: "{{ SEC_SVM }}"
name: "{{ dst_volume }}"
aggregate_name: "{{ SEC_AGGR }}"
size: 1
size_unit: gb
type: DP # data-protection destination - no junction path
- name: Establish and initialize the SnapMirror relationship
netapp.ontap.na_ontap_snapmirror:
state: present
hostname: "{{ SEC_CLU }}" # snapmirror is driven from the destination
source_endpoint:
path: "{{ PRI_SVM }}:{{ src_volume }}"
destination_endpoint:
path: "{{ SEC_SVM }}:{{ dst_volume }}"
policy: MirrorAllSnapshots
initialize: true
The run
[root@centos1 ansible-workshop]# ansible-playbook snapmirror-create.yml --ask-vault-pass Vault password: PLAY [Snapmirror Create] ******************************************************* TASK [Create cluster peer] ***************************************************** changed: [localhost] TASK [sleep for 20 seconds and continue with play] **************************** ok: [localhost] TASK [Source vserver peer create] ********************************************* changed: [localhost] TASK [sleep for 20 seconds and continue with play] **************************** ok: [localhost] TASK [Validate destination FlexVol] ******************************************* changed: [localhost] TASK [Create SnapMirror] ******************************************************* changed: [localhost] PLAY RECAP ********************************************************************* localhost : ok=6 changed=4 unreachable=0 failed=0 skipped=0
What to read in this one. SnapMirror has a four-step dependency chain and the order is mandatory: cluster peer (the two clusters authenticate to each other with a shared passphrase) → SVM peer (the two SVMs authorize SnapMirror traffic between them) → destination volume (created as type: DP — a data-protection volume that is read-only and takes no junction path, because it exists only to receive replicated data) → the relationship itself, which is driven from the destination cluster (note every snapmirror task targets dst_ontap/SEC_CLU as its hostname). The relationship pulls from source to destination; the destination is in charge. initialize: true performs the first full baseline transfer.
The security fix is the headline. The lab original declares username: admin, password: Netapp1!, and a cluster-peer passphrase: Netapp123 as plaintext literals in the playbook’s own vars block — committed to the repository, visible in every clone, exactly the failure Ansible Vault exists to prevent. The cleaned version moves all three into the vault-encrypted global.vars (the passphrase as peer_passphrase) and supplies the shared credentials once via module_defaults. Nothing secret remains in the playbook. Two correctness notes. The wait_for: timeout “sleep” tasks are a fragile timing hack — peering usually propagates in seconds, but a fixed 20-second sleep both wastes time when it is ready sooner and fails when it is not; the production pattern is a retries/until loop that polls the peer state, though a short sleep is tolerable for a lab. And the modern na_ontap_snapmirror uses source_endpoint/destination_endpoint path syntax rather than the older flat source_volume/destination_volume fields — both work, the endpoint form is current. The dependency order and the destination-driven model are correct in the original and kept exactly.
Enterprise automation scenarios
The fourteen examples above are building blocks. What follows is how storage teams actually compose them in production — the patterns that turn a folder of playbooks into an operating model. Each scenario names the modules involved and the principle that makes it scale.
Automated storage provisioning at scale
The problem. A project needs forty volumes with consistent naming, size, and export policy. Created by hand in System Manager, that is forty opportunities for a typo and no record of intent. The pattern: declare the volumes as data and let one task loop over them — the list is the documentation, and the diff when you add the forty-first volume is your change record.
tasks:
- name: Provision the project volumes from a declared list
netapp.ontap.na_ontap_volume:
state: present
vserver: "{{ PRI_SVM }}"
name: "{{ item.name }}"
aggregate_name: "{{ item.aggr | default(PRI_AGGR) }}"
size: "{{ item.size }}"
size_unit: gb
junction_path: "/{{ item.name }}"
export_policy: "{{ item.policy | default('default') }}"
space_guarantee: none
comment: "Project volume - managed by Ansible"
loop:
- { name: proj_data_01, size: 500 }
- { name: proj_data_02, size: 500 }
- { name: proj_logs_01, size: 100, policy: restricted }
# ... extend the list; one line per volume, reviewed in Git
Scaling further, the list moves out of the playbook entirely — into a group_vars file, a CSV the team edits, or a CMDB query — and the same task provisions any number of volumes. The module’s idempotency means re-running after adding entries touches only the new volumes. Modules: na_ontap_volume, often preceded by na_ontap_aggregate when capacity pools are provisioned in the same run.
VMware datastore deployment
The problem. Standing up vSphere datastores means provisioning storage that ESXi can consume — an NFS export the hosts mount, or an iSCSI LUN they claim — and doing it identically across a cluster of hosts. The pattern: the NAS and SAN examples in this guide are the storage half; the playbook provisions the volume and the access layer (export policy scoped to the ESXi management network for NFS, or a LUN mapped to an igroup of the hosts’ IQNs for iSCSI/VMFS), and the vSphere side is automated separately with the community.vmware collection. The two run in one pipeline: ONTAP provisions, vCenter mounts. Modules: na_ontap_volume, na_ontap_export_policy (NFS datastores) or na_ontap_lun + na_ontap_igroup + na_ontap_lun_map (VMFS), with the ESXi management subnet as the client_match or the hosts’ IQNs as the igroup initiators.
Multi-tenant SVM automation
The problem. Onboarding a new customer or department means a repeatable bundle: an SVM, its network identity, DNS, a route, the protocols they bought, and a starter volume — identical every time, auditable, and deletable as a unit. The pattern: the section-10 examples are exactly this bundle. Wrap them in a role (or a single playbook driven by per-tenant variables) and onboarding becomes one command with one vars file; the tenant’s entire definition lives in version control.
Figure 02 · Multi-tenant SVM provisioning from one parameterized playbook
state: absent offboards one cleanly.Disaster recovery automation
The problem. DR is only real if it is tested, and manual SnapMirror failover is too error-prone to test often. The pattern: the SnapMirror example (70-01) establishes replication as code; the same collection orchestrates the failover and the periodic test. A DR runbook becomes a playbook: quiesce and break the relationship to activate the destination, or in a test, clone the destination and verify the data without disturbing replication. Because the relationship is declared, drift in the protection topology is detectable on a schedule.
Figure 03 · SnapMirror disaster-recovery workflow
Storage-as-Code
The principle. Everything above shares one idea: the cluster’s configuration lives in Git, not in an administrator’s memory or a wiki page that drifts. Playbooks, inventory, and the (vault-encrypted) variables are version-controlled; changes go through pull requests and review; a bad change rolls back with a revert instead of an archaeology session. The repository becomes the most accurate description of the estate that exists — and because the playbooks are idempotent, that description is enforceable, not just documentary. This is the difference between automation and Storage-as-Code: automation runs commands faster, Storage-as-Code makes the desired state auditable and self-correcting.
Day-2 operations
The principle. Provisioning is day one; the value compounds on day two. Routine administration — growing a volume, rotating snapshots, standardizing a snapshot policy across a fleet, adding an export rule, reading performance — all become small, reviewable, repeatable playbooks. A snapshot policy applied through na_ontap_snapshot_policy is identical on every volume because the same code created it; a capacity increase is a one-line diff with an audit trail. The read-only na_ontap_rest_info turns into a nightly reporting and drift-detection job. Day-2 is where the discipline pays for itself, because the work that used to be a hundred manual clicks per week becomes a scheduled pipeline nobody has to remember.
ONTAP automation in CI/CD pipelines
The natural home for these playbooks is a pipeline, not an engineer’s laptop. Committing a change to the storage repository triggers the same sequence every enterprise CI/CD system supports: lint and syntax-check the YAML, run the playbook in --check mode against the cluster to preview changes without making them, require an approval, then apply. The vault password comes from the pipeline’s secret store (never the repo), and every run is logged with who triggered it and what changed.
Figure 04 · CI/CD pipeline driving ONTAP automation
--check stage and approval gate are what make applying storage changes from a pipeline safe.Concretely, the apply stage runs ansible-playbook site.yml --vault-password-file "$VAULT_PASS_FILE", where the runner writes the vault password to a short-lived file from its secret store and deletes it afterward. The --check stage runs the identical command with --check --diff and posts the would-change output to the merge request, so reviewers approve a concrete plan, not a hope. This is the production form of the discipline every cleaned playbook in this guide is written for.
Troubleshooting ONTAP Ansible automation
Nearly every failure in the first month of running these playbooks falls into one of five buckets. Each announces itself with a recognizable message; match the symptom and apply the fix.
Authentication and RBAC failures
| Symptom | Root cause | Resolution |
|---|---|---|
401 Unauthorized / Invalid credentials |
Wrong username or password, or the wrong cluster hostname | Verify the vaulted PRI_CLU_USER / PRI_CLU_PASS against a manual login; confirm hostname is the cluster management LIF, not a node or data LIF. |
403 Forbidden on a specific resource |
The account authenticates but its role lacks rights to that API | The automation account needs a role with access to the relevant REST endpoints. Use a dedicated admin-scoped account, or a custom role granting only the API paths your playbooks touch — least privilege, but sufficient. |
User is not authorized for http application |
The account exists but the http application is not enabled for it |
The REST API requires the user to have the http (and typically ontapi) application enabled. Confirm with security login show; the automation account must be permitted to authenticate over HTTP. |
The durable fix is a purpose-built automation account: a dedicated cluster user, http-enabled, with a role scoped to what the playbooks actually do, and its password in the vault. Never automate as the same interactive admin a human logs in with — you want the audit log to distinguish “the pipeline did this” from “a person did this.”
TLS and certificate issues
The single most common line to get wrong is validate_certs. Every lab original in this guide sets validate_certs: false because lab clusters ship with self-signed certificates and verification would fail. That setting is a lab convenience and a production liability: with verification off, anything that can intercept the HTTPS session can impersonate the cluster and harvest the admin credential the playbook sends. The production fix is not to keep disabling it — it is to install a CA-signed (or trusted internal-CA) certificate on the cluster management LIF and set validate_certs: true. If you must run against a self-signed certificate temporarily, treat validate_certs: false the way you treat any other security exception: scoped to one environment, documented, and never copied into the production vars file. Symptoms to expect: SSL: CERTIFICATE_VERIFY_FAILED means verification is on and the certificate is not trusted — the right answer is to fix the trust, not to disable the check.
Connectivity problems
Before suspecting Ansible, prove the layer beneath it. A module that hangs or times out is almost always a network problem, not a code problem. Check, in order: DNS — can the control node resolve the cluster management name? (nslookup cluster1.demo.netapp.com); reachability — does curl -k https://cluster1.demo.netapp.com/api/cluster return JSON, or hang? A hang is a firewall or routing problem between the control node and the management LIF on port 443; the management LIF itself — is it up and is the cluster healthy? The control node needs HTTPS (443) to the cluster management LIF specifically; reaching a node management or data LIF is not the same thing. This is the same layer-isolation discipline that applies to the per-SVM DNS and routing examples earlier — fix the network below before debugging the automation above.
Common REST API errors
| Message pattern | Root cause | Resolution |
|---|---|---|
job reported error ... duplicate entry / already exists |
A non-idempotent parameter, or creating something that exists with different immutable attributes | Most modules are idempotent; this usually means an immutable field (an aggregate, a security style) differs from the existing object. Read current state with na_ontap_rest_info and reconcile, rather than forcing. |
svm not found / aggregate not found |
A dependency does not exist, or a name differs between vars and cluster | Check the dependency order — SVM before its volumes, CIFS server before its shares — and verify the name in global.vars matches the cluster exactly. Names are the most common drift between lab and production. |
missing required arguments |
A required parameter is unset, often an empty variable | A variable referenced in the playbook is empty or undefined in global.vars. Run with --check first; it surfaces undefined-variable errors before any change is attempted. |
ModuleNotFoundError before any API call |
The collection or a Python library is missing from Ansible’s environment | Install the collection (ansible-galaxy collection install netapp.ontap) and netapp-lib into the same environment Ansible runs from. See the install guide’s storage extras. |
Playbook debugging technique
Three tools resolve almost everything the tables above do not. Verbosity: re-run with -vvv to see the exact REST calls, request bodies, and responses — the response JSON usually names the real problem precisely. Add no_log: true to credential-bearing tasks first, so verbose output does not echo secrets into logs. Check and diff: --check --diff previews what would change without changing it — the safest first run against any cluster and the fastest way to catch undefined variables and bad parameter names. Read the truth: when a module’s behavior is surprising, query actual cluster state with na_ontap_rest_info and a targeted gather_subset — comparing what Ansible thinks exists against what the cluster reports resolves most “it should have worked” cases. The pattern is always the same: isolate the layer (network, then auth, then the module’s view of state) before changing the playbook.
Production lessons from the field
Vendor documentation tells you what the modules do. This is what running them across enterprise estates teaches — the things that are obvious only in hindsight.
Lessons learned. The playbook is never the hard part; the operating model around it is. Teams that succeed treat inventory and playbooks as production code — reviewed, versioned, pipeline-applied — from day one, not as scripts that graduate to that later. The ones that struggle started with ansible as “a faster way to type CLI commands” and never made the leap to desired-state thinking. The single highest-leverage habit is wiring a nightly --check run early: it converts your playbooks from provisioning tools into a continuous drift-detection system, and it surfaces the gap between documented and actual state before an auditor does.
Common mistakes. Five recur across nearly every estate. Disabling certificate validation in production and forgetting it was ever temporary. Hard-coding credentials in playbooks or vars files instead of the vault — the single most common security finding. Specifying sizes in raw bytes where a one-digit typo is a 10× error no reviewer catches (use size + size_unit, always). Granting wildcard permissions — FullAccess S3 policies, any NFS rules — because the lab example did. And mixing imperative habits into declarative code: writing a task per object instead of looping a declared list, or assuming order where idempotency makes it irrelevant.
Scaling considerations. What works for one cluster needs structure for fifty. Move per-cluster facts into group_vars and host_vars rather than a single global.vars; the playbooks stay identical and only the variable scope changes. Use dynamic inventory sourced from na_ontap_rest_info or a CMDB so the fleet is discovered, not hand-listed. Pin the collection version in requirements.yml so every control node and pipeline runner behaves identically — an unpinned collection is how “works on my machine” enters storage automation. And separate vault passwords per environment, so a compromised lab credential cannot decrypt production.
Security best practices. A dedicated, http-enabled automation account with a least-privilege role, never the interactive admin. All secrets in Ansible Vault, the vault password in a secrets manager or pipeline store, never in Git. validate_certs: true against a real certificate in production. no_log: true on every task that handles a credential. Scoped permissions everywhere the lab used wildcards — named S3 policies, CIDR-scoped NFS rules, igroups treated like firewall rules. The recurring theme across this entire reference is that lab-convenient defaults are audit findings; production automation closes each one deliberately.
Operational runbooks. The endgame is that routine storage work stops being interactive. A volume-growth request is a one-line pull request, not a console session. A new-tenant onboarding is a vars file and a pipeline run. A DR test is a scheduled playbook that clones the destination and validates it. A quarterly access review reads live state with na_ontap_rest_info and diffs it against the declared policies. Each of these is a runbook that used to live in a human’s head or a stale wiki; as code, it is executable, reviewable, and the same every time it runs. That is the operating model WUC builds into the estates we run — and the reason these fourteen examples are written for production from the first line, not retrofitted to it later.
The recurring lesson: lab-convenient is audit-findable
Read these fourteen examples together and one pattern repeats across every protocol. The lab originals are written for a closed training environment, so they reach for the convenient default — and almost every convenient default is a finding in a production audit. Four show it plainly. The S3 policy in 50-02 grants actions: ["*"] on resources: ["*"] — full access by another name. The NFS rule in 30-03 sets ro_rule: any / rw_rule: any — unauthenticated read-write to anyone on the subnet. The SnapMirror playbook in 70-01 hard-codes the cluster password and peering passphrase as plaintext in the playbook itself. And nearly every original disables certificate validation and repeats credentials in the clear.
The cleaned versions apply the same three corrections every time: scope the permission (named actions and resources, specific client networks, least-privilege policies), encrypt the secret (everything sensitive in a vault-encrypted global.vars, supplied once through module_defaults), and state intent explicitly (descriptive policy IDs, real comments, units a reviewer can verify at a glance). None of this changes what the playbook builds; all of it changes whether the playbook survives review. That is the whole difference between a script that works in a lab and one that runs in production — and it is the discipline WUC builds into every estate we automate.
Using this reference
Find the job in the build-order list, copy the blue block, change the variables in your global.vars, and run it with ansible-playbook <file>.yml --ask-vault-pass. The examples are deliberately ordered by dependency — an SVM before its volumes, a volume before its shares, a CIFS server before an SMB share — so reading top to bottom is also a working build sequence for a complete storage service. Every cleaned playbook is idempotent: run it twice and the second run reports changed=0, which is how you turn any of these into a scheduled compliance check. For the concepts underneath — how the modules reach the cluster, what makes them safe to re-run, and how Vault protects the credentials — the three field guides linked throughout are the place to start.
Automating a NetApp estate beyond the lab?
These fourteen playbooks are the building blocks; an automated estate is the operating model around them — vault discipline, least-privilege service accounts, change-controlled pipelines, and drift enforcement that runs on a schedule. WUC engineers build and run both, across NetApp ONTAP, Cisco fabrics, and multi-OEM infrastructure, as an automation consultant, maintenance provider, and managed services partner.
Prefer to read first? See managed services and post-OEM storage maintenance.
References
- Ansible project. netapp.ontap collection documentation. The authoritative reference for every module used across these fourteen examples.
- Ansible Galaxy. netapp.ontap role: na_ontap_nas_create. NetApp’s prebuilt role packaging the volume-to-share NAS flow — the consume-rather-than-compose option once these building blocks are familiar.
- NetApp. ONTAP Automation Documentation. The REST API foundation every module here drives.
- WUC Technologies. How to Install Ansible, NetApp ONTAP Ansible Playbooks, and Encrypting Ansible Variables with Ansible Vault. The three field guides this reference accompanies.
Find our field guides faster in Google. Add WUC Technologies as a preferred source and our engineering guides carry a “preferred” badge in your Search results, AI Overviews, and AI Mode.