In this blog series we’ll take you beyond the hype and dive more deeply into how and why to automate your network operations.
I’ve spent the last couple of years at Red Hat helping customers automate their networks with Ansible. If there is one thing that I’ve learned during that time, it is that network automation is not as easy as many would have you believe. That is not to say that tools like Ansible are not good tools for automation or that anyone is trying to sell you snake oil, but I believe that there is a fundamental impedance mismatch in translating the success Ansible has had with automating systems to automate networks.
Part of this disconnect stems from a fundamental mis-understanding of the capabilities that Ansible provides. According to Red Hat, Ansible is a “common language to describe your infrastructure.” In practice, however, Ansible is more of a framework that brings an inventory of things together with a set of modules, plugins, and Jinja2 capabilities that perform operations on those things. The language, rendered in YAML or JSON, just passes key/value pairs between the modules, plugins, and Jinja2 capabilities. (Yes, that’s a simple description of a complex tool, but one that is accurate to illustrate the point of this and subsequent blogs.)
That is not to say that Ansible is not a powerful framework, but it has no native linguistic ability to describe a network. When I say an “inventory of things,” it is because Ansible really does not care what that thing is. Because of its agentless approach, it can talk to many things: systems, network devices, clouds, lightbulbs, etc. This is a great capability and part of why Ansible is so popular, but Ansible truly does not know one thing from another. It has no innate prowess for automating networks. It is simply a tool for automating what an operator does task by task. You cannot “describe” what you want OSPF to look like on your network. You simply provide a bunch of key/value pairs that get passed to the devices on your network through modules in hopes of yielding the OSPF configuration that you want.
Configuring settings on an IOS device
To illustrate this, let’s look at configuring two simple settings on an IOS device: hostname and NTP servers. Using Ansible parlance, we’ll describe the desired end state of the hostname of a particular device. Hostname is a great use case because it is a scalar (i.e. a single value). To change the hostname, the Ansible ios_config module does a simple textual compare of the configuration. If ‘hostname newname’ is not present, it sends that line to the device. Since hostname is a scaler, the old hostname gets replaced by the desired hostname.
A list of NTP servers, however, is more difficult. Say you’ve set the NTP server to 1.1.1.1 with:
- ios_config: lines: - ntp server 1.1.1.1
Now you want to change your NTP server to 2.2.2.2, so you do:
- ios_config: lines: - ntp server 2.2.2.2
Simple, right? But the problem is that you would end up with 2 NTP servers in the configuration:
ntp server 1.1.1.1 ntp server 2.2.2.2
This is because the Ansible ios_config module does not see `ntp server 2.2.2.2` present in the configuration, so it sends the line. Since ntp server is a list, however, it adds a new NTP server instead of replacing the existing one, giving you 2 NTP servers (one that you do not want). To end up with just 2.2.2.2 as your NTP server, you would have to know that 1.1.1.1 was already defined as an NTP server and explicitly remove it… exactly what an operator would do. This is also the case with ACLs, IP prefix-lists, and any other list in IOS. Ansible does not have a native way to describe the desired end state of something simple like NTP servers on a network device, much less something more complex like OSPF, QoS, or Multicast.
Does that mean that Ansible is not a great tool for network automation? No, but like any tool, it needs to be used for the right task and can only complete a complex task when used in concert with other tools. As a framework, it is not a complete solution.
The intent of this blog series is to go beyond the hype and simple demonstrations prevalent in network automation conversations today and to dive more deeply into how and why to automate your network operations. In the next installment, I’ll talk about data models and why they are a critical piece of any automation framework.
Until then, please visit the DevNet Networking Dev Center to see the wide range of resources and learning opportunities that are available. And please drop me a comment on this blog if you have questions, or topics you’d like this series to cover.
I don't think you are using the correct approach here or perhaps I am not yet fully grasping what you are getting at since it's only a first post. I suppose you mean well, probably trying to spare others the disappointment that a tool won't do everything for them and lower their expectations. But you seem to be missing a few important points:
1) What you are describing is true for every tool ever made for networking devices. That's why a compliance engine is so difficult to make in configuration management tools and why it's so resource heavy when it finally gets done. Just take a look at the relevant history with Cisco Prime Infrastructure and Compliance.
2) A tool will never replace the engineer's mind. AI is supposed to be able to do that but even that is still far from happening. And these tools are not about AI. They are about Automation, and automation is about saving time and effort, standardizing your processes and eliminating human errors in applying those processes (not in designing them). If the engineer has not thought about the consequences of his/her actions, it's not the tool's job to correct him. A robotic arm may greatly enhance your performance as a surgeon, but if you are going to make a cut in the wrong place, it will not stop you. It's not the scope it was made for. If you have carefully planned your processes, then automating them instead of typing them will help you in a big way. Any tool you can use for that is fine, and there are a lot already (I was using tcl-expect a lot of years back for cisco gear). You could for example check for additional lines for ntp servers before adding the one you want. It's an IOS thing, not an Ansible thing.
3) The hype is important. It was created for a reason. It's like advertising. People need to get excited to get some "fuel" that will carry them through the door of becoming a "programmer" and constructing "code", even if that code is only a description of a desired state in an ansible playbook or something closer to it like using the netmiko library in Python or the Nornir framework. Killing the hype will not help the engineers overcome the limits of what they thought would be a well thought out and planned career and skill space. It will just convince them that it's not worth their time after all, their career is fine as it is and they should just wait it out while the hype passes. That would be a big mistake, or at least it's what I think about it.
4) The hype is only there to get you to the door. After that, killing the hype or amplifying is not important. What people need to get through the door, is a structured approach. Courses, Labs, Studying, Applying new knowledge, anywhere. The use cases will come. Everyone can think of what they can do with a new tool after they get the feel of it. And if they don't, it's fine. Learning something new will almost always help broadening your horizons.
I am looking forward to your next posts to get a better picture of where you are trying to get.
Even putting your thoughts out there, questioning everything, requires courage and clear thinking, so you got my thumbs up for that!
Great feedback! Indeed, my intent is not to dissuade engineers from automating… apologies if it sounded otherwise. This first post is merely the summary of many conversations that I’ve had with engineers after their first experience with Ansible. Most gave up and the rest never got the full use of the tool. The next several posts are meant to better position engineers to be successful automating their network operations with Ansible and, eventually, a full NetDevOps workflow. While hype might be valuable, I’ve always found it more effective to get the marketing out of the way so that we can get to the real work of automation. Unfortunately, being blunt is one of my many personality flaws. I thank you for allowing me to clarify my intentions and I will do my best to incorporate your observations as the series unfolds.
I agree with the author. Ansible is great for doing something on a number of devices in an easy way: in networking world this is usually day 0 provisioning from templates and doing "show" commands across your infrastructure. However, as a network configuration tool Ansible is a bad choice, because it does not guarantee the state. If you want to guarantee the state with Ansible, you will end up writing custom Ansible modules, which is not fun. In that case, use an actual programming language or another specialized tool that focuses on the network configuration state management.
Steven,
Thanks for the examples and the dose of reality to help temper expectations regarding out of the box automation.
The industry is long on providing the "next great tool", but often comes up short on actually delivering something useful. I'm glad to see you set some boundaries up front to help us understand that automation is not a one stop shop. We'll need to work at this and carefully choose the right tool for the right job.
Thanks again for making sure we don't fall into the trap where everything becomes a "nail", just because we happen to have a "hammer". 🙂