For those of you who have followed earlier blogs, you will have seen the major enhancements we have been making in open IOS XE. Jeff McLaughlin mentioned Day Zero deployment in his recent blog.
Day Zero is a critical step in automation. In the past, in order to install a new network device, a highly skilled network engineer would go out on site, connect and configure the device. This process was quite manual (cut/paste) and hence error prone. A great opportunity for automation.
There has been lots of interest in Network Plug and Play (PnP) over the past few years and the PnP protocol is widely supported on Cisco switches, routers and Access Points. I have been working with customers who have saved hundreds of hours with PnP deployments.
Today I want to introduce you to the most recent automation capability in open IOS XE 16.6 – Zero Touch Provisioning (ZTP).
You are probably asking an obvious question, “Why another protocol for day zero deployment?” Good question. There are some subtle, yet important differences. Network PnP provides a powerful and simple user experience. It has an intuitive UI on APIC-EM controller, a smart phone application and a sophisticated agent on the device that manages security and serviceability. The workflow (certificate – for security, image upgrade, configuration push) is fixed and it is a turnkey solution.
ZTP by contrast is open (I just provide a URL for a python script via DHCP) and extremely flexible. It is not a turn-key solution like PnP with a UI and cloud component. With ZTP I can implement any workflow I like all through a python script. Anything I can do through python, I can do to the device.
Let’s take a look at some use-cases for ZTP deployment.
Simple Configuration – bootstrap
I will start with a very simple use case. I want the device to boot up, and apply a default set of credentials (or configure an authentication server) so I can connect to it with another automation tool. This is essentially a bootstrap configuration.
Figure 1 Basic ZTP Use Case
The device boots up, uses DHCP to obtain an IP address (Step 1). The DHCP response contains a URL in option 67 which is the location of a python script.
Step 2 downloads this script, starts an on-box guestshell, and runs the python script locally.
Here is a simple example of this base script. The code below simply configures base credentials for the device using the built-in cli python library.
When the device boots, I see the following messages on the console. I have cut out other messages for brevity.
Figure 2 – Log messages during device boot process
You can see the switch booting up, then it looks at option 67 for a python script to download. It then starts the guest-shell, and runs the python script.
Remember to save the configuration, in your script, if you want it to persist.
You can learn more about on-box Python and Guestshell on the Cisco DevNet Python Network Automation site. Read on for some more examples!
Dynamic Configuration – Serial Number
The next example takes this a little further and uses the serial number of the device to make a REST API call to collect parameters such as IP address/mask used to configure the device. This could contain other attributes such as a configuration file URL. My web-service (implemented in node-red) has a DB of serial numbers and attributes.
Figure 3 Dynamic Use Case
The device uses ZTP as in the previous use-case. In step 2, the python script will get the serial number of the device and make a REST API call to the server (step 3). The server will return a list of parameters for the script to use in configuration (step 4). In this example the IP address of the management interface, network mask, and default gateway are returned.
There are lots of options to extend this. For example, the API could return a URL for the configuration file to be used for the device. The script could download that file and use that to configure the device.
Advanced Configuration – Switch Stacking Order
One of the benefits of ZTP is you can interact with the device before you apply a configuration. There are a couple of situations where this is important, and switch stacking is a big one.
The key point with stacking is the order of the stack members determines the names of device interfaces. For example, all of the interfaces on switch #1 will start with “1” (e.g. GigabitEthernet1/0/1). This can cause challenges as often uplink connections will only be on a subset of the stack members. You need to know in advance the order the switches are connected, and you need to reboot to change the order of the stack members.
I have extended use-case #2 to handle a list of serial numbers. The API will return the serial number of the device which should be “top of stack”. The python script will renumber the switches (if required) and reboot. If no renumbering is required, the device will go straight to step 7.
Figure 4 Stacking
The first 3 steps are identical to use-case #2. The big difference is step number 4, where the python script detects that the serial number returned is not currently the top of stack, so it needs to renumber and reboot.
When the stack comes back up, it redoes the ZTP process (step 5) and makes the API call again (step 6). This time the top of stack matches, so the script proceeds to configure the device (step 7). This process is deliberately stateless to make it more robust.
As in the earlier example, you could also download a complete configuration file from a http(s) server at the completion of the stack re-order.
There are a number of other examples where a device may need to have something done to it before it can be provisioned. Software upgrade is an obvious one, but there are others. For example, ether-switch modules in ISR routers require a reboot before the internal interface can be provisioned.
Conclusion
ZTP is a powerful addition to your automation toolset. These examples give you a taste of the many possibilities for using ZTP. I have published sample scripts in a Github repository https://github.com/aradford123/ZTP-samples to help you get started. I also included a node-red flow in the repository.
And, as I mentioned earlier, if you would like to get some tutorial style information about how to do this, the Cisco DevNet Python Network Automation site is just waiting for you to sign in and get started.
As always an excellent detailed explanation of the various scenarios and methodologies.
Interesting in that this solution is very scalable. very similar to POAP.
Looking at CCO, only the C9Ks are at 16.6. 4510s, 3850s, 6800 are still at 15.x. 4300 routers are at 16.3. This makes this solution pretty much unusable in the near future except for a narrow band of products. Maybe we can use TCL scripting?
Hi Patrick,
Thanks for the comment.
16.6 runs on more platforms than the cat 9k. The 3650, 3850 switch also run 16.6.
In addition all of the ASR1k, ISR4k, CSR routers run 16.6
There will be broader support overtime.
Adam
Very nice blog Adam, as always short and very informative.
With the flexibility of ZTP to write any functionality via scripts, our customers would hugely benefit from custom Day0 provisioning that suits their env.
Thanks Yogesh.
Thanks Andrew. Very informative.
You have mentioned PnP and it looks like ZTP can do this in a more flexible way. So am wondering when do customers use PnP vs ZTP.
With ZTP you need to do all the work yourself. You need to run a web server, create a python script, match a serial number etc. There is no UI, you would need to build this
PnP has a complete workflow with a UI already done. You can upload config files, create rules all through the UI (or the REST API).
PnP also has built in certificate management for the device and a SUDI check option to make sure the device has a valid manufacturing certificate.
Thanks Andrew.