What if you had a trip-recorder for all your traffic at line rate performance?

The case for “In-band OAM for IPv6”: Operating and validating your network just got easier

How many times have you wanted to gain a full insight into the precise paths packets take within your network whilst troubleshooting a problem or planning a change? Did you ever need to categorically prove that all packets that were meant to traverse a specific service chain or path really made it through the specified service chain or path? “In-band OAM for IPv6 (iOAM6)” is now here to help, adding forwarding path or service path information as well as other information/statistics to all your traffic. It is “always on” OAM – and a new source of data for your SDN analytics and control tools.

Nowadays we typically rely on indirect methods to troubleshoot or verify network behavior: we use tools like ping or traceroute which send probe packets to deduce the behavior of live data traffic; we separate concerns into different organizations, where service functions are managed by a different organization than network connectivity – so that the two organizations can supervise and control each other. The use of indirect methods puts you in the role of an observer. You might not always find a suitable way to “observe” and even if you do you might suffer from “observer side effect” – the very act of observation changing the behaviour being observed.

Many of us have had to debug a network which uses equal cost multipath forwarding (ECMP). Quite often customer traffic suffered connectivity issues while ping and traceroute just worked. The ECMP hash for ping and traceroute packets differed from that of the customer traffic in question. Thus ping and traceroute packets took different paths through the network or were even forwarded differently by the network elements. If you know the ECMP hash algorithm used by all nodes in your network and can hand-craft the probe traffic so that it exactly represents the live network traffic having the issue and if probe traffic from ping is really forwarded the same way by network elements as regular traffic, then you can debug such a situation. Unfortunately there are quite a few “ifs” in this sentence. In another example, think of battery operated sensor-networks where it becomes important to understand the battery charge level of sensors. In this case the act of sending probe traffic is quite expensive as it in turn drains the sensor battery. The observer dilemma kicks in. Sending probe traffic through a service chain only proves that probe traffic makes it through the service chain correctly. It does not say anything about whether live customer traffic – and more specifically all customer traffic at all times is forwarded and handled according to the required business policy of the customer. One could recommend a system audit: “Look at my system, it is cabled and configured correctly. Packets can only be forwarded down the path determined by cabling and configuration”. While the above could represent a possible response for service chains which chain physical devices with cables and VLANs, how do you deliver a similar proof for networks which employ virtual network functions (VNF) where everything is delivered via software on a single server? In some cases, the lack of any formal ability to verify the integrity of a service chain has forced some customers to turn their back on Network Function Virtualization (NFV) for the time being – until a solution for service chain verification has become available.

iOAM6 offers a real direct method in the following way: we insert the information required for troubleshooting, planning, and path or service chain verification directly into the live traffic. Indeed this is done into all traffic, and not just probe traffic. This isn’t entirely new. Remember the route record option in IPv4, or the bit error monitoring and path trace bits in the overhead sections of an SDH frame? Implementing IPv4 route recording was actually costly on performance. However the SDH example showed that carrying operations, administration, and maintenance (OAM) information with every datagram is technically feasible.

Unlike IPv4, IPv6 comes with a great tool – Extension Headers. These days we can implement IPv6 Extension Headers with performance. We can indeed use them to embed OAM information into all of our data traffic and create an “in-band OAM for IPv6 (iOAM6)” capability for our networks. Simply stated we add path and node or service specific data at selected nodes within a specific domain into the live data traffic – and strip the information at a domain egress device. The information inserted could be a combination of

ingress or egress interface identifier
time-stamp
node or service identifier
share of a secret describing a service or network element
sequence number
generic application metadata, etc.

It is in many respects the flip side of what segment routing does: iOAM6 records how traffic is forwarded, segment routing steers traffic – both use IPv6 extension headers.

How can “in-band OAM for IPv6” be used? Here are a few examples:

Path or service chain verification: Prove that a certain set of traffic, e.g. a particular application identified by a 5-tuple, traverses a given service chain or path. A verification node should be able to detect and report traffic which purposely (e.g. from some attacker) or by accident (e.g. due to misconfiguration) did not traverse the specified service chain.
Flow tracing in ECMP networks: Trace flows in a network which leverages ECMP and detect paths with issues. Provide statistics (dropped, reordered, duplicated packets) on particular flows in the network, independent of whether a layer-4 protocol offers sequence numbers or not.
Traffic matrix: Efficiently derive the IPv6 “traffic matrix” for a specific network domain. By “traffic matrix” we mean the overall amount of traffic (optionally per QoS class) between any two domain edge routers. While the IPv6 traffic matrix can be derived by correlating Netflow information from individual nodes, path-recording within the packet (as done by in-band OAM) automatically correlates ingress and egress nodes, which makes deriving the IPv6 matrix much easier.
Trend analysis on network parameters like e.g. time: Time stamp packet flows to analyze average delay across the network to measure trends for all traffic or for specific application flows.
Application specific packet tracing information: Include application specific information at every node into the packet. Potential uses could be to track the physical path (using GPS location information) of a packet in networks where nodes are mobile.

Does this sound of interest? Please stay tuned, this is the first of a series of blogs which will look into “in-band OAM for IPv6”. iOAM6 is a project which is currently in “proof-of-concept” state at Cisco; Steve Simlo is the product manager. You can already take a glimpse at iOAM6 and test-drive it yourself on dcloud.cisco.com (you find iOAM6 in the Service Provider category). If you are coming to CiscoLive in San Diego this June, you could also check out the breakout session “BRKRST-2606: Always on visibility: In-band OAM for IPv6” or visit the DevNet Zone for a live demo.