Given the tremendous interest in VXLAN with MP-BGP based EVPN Control-Plane (short EVPN) at Cisco Live in Milan, I decided to write a “short” technology brief blog post on this topic.
VXLAN (IETF RFC7348) has been designed to solve specific problems faced with Classical Ethernet for a few decades now. By introducing an abstraction through encapsulation, VXLAN has become the de-facto standard overlay of choice in the industry. Chief among the advantages provided by VXLAN; extension of the todays limited VLAN space and the increase in the scalability provided for Layer-2 Domains.
Extended Namespace – The available VLAN space from the IEEE 802.1Q encapsulation perspective is limited to a 12-bit field, which provides 4096 VLANs or segments. By encapsulating the original Ethernet frame with a VXLAN header, the newly introduced addressing field offers 24-bits, thereby providing a much larger namespace with up to 16 Million Virtual Network Identifiers (VNIs) or segments.
While the VXLAN VNI allows unique identification of a large number of tenant segments which is especially useful in high-scale multi-tenant deployments, the problems and requirements of large Layer-2 Domains are not sufficiently addressed. However, significant improvements in the following areas have been achieved:
- No dependency on Spanning-Tree protocol by leveraging Layer-3 routing protocols
- Layer-3 routing with Equal Cost Multi-Path (ECMP) allows all available links to be used
- Scalability, convergence, and resiliency of a Layer-3 network
- Isolation of Broadcast and Failure Domains
IETF RFC7348 – VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks
Scalable Layer-2 Domains
The abstraction by using a VXLAN-like overlay does not inherently change the Flood & Learn behavior introduced by Ethernet. In typical deployments of VXLAN, BUM (Broadcast, Unicast, Multicast) traffic is forwarded via layer-3 multicast in the underlay that in turn aids in the learning process so that subsequent traffic need not be subjected to this “flood” semantic. A control-plane is required to minimize the flood behavior and proactively distribute End-Host information to participating entities (typically called Virtual Tunnel End Points aka VTEPs) in the same segment – learning.
Control-plane protocols are mostly employed in the layer-3 routing space where predominantly IP prefix information is exchanged. Over the past years, some of the well-known routing protocols have been extended to also learn and exchange Layer-2 MAC addresses. An early technology adoption with MAC addresses in a routing-protocol was Cisco’s OTV (Overlay Transport Virtualization), which employed IS-IS to significantly reduce flooding across Data Center Interconnects (DCI).
Multi-Protocol BGP (MP-BGP) introduced a new Network Layer Reachability Information (NLRI) to carry both, Layer-2 MAC and Layer-3 IP information at the same time. By having the combined set of MAC and IP information available for forwarding decisions, optimized routing and switching within a network becomes feasible and the need for flood to do learning get minimized or even eliminated. This extension that allows BGP to transport Layer-2 MAC and Layer-3 IP information is called EVPN – Ethernet Virtual Private Network.
EVPN is documented in the following IETF drafts
- draft-sd-l2vpn-evpn-overlay – A Network Virtualization Overlay Solution using EVPN
- draft-sajassi-l2vpn-evpn-inter-subnet-forwarding-03 – IP Inter-Subnet Forwarding in EVPN
Integrated Route and Bridge (IRB) – VXLAN-EVPN offers significant advantages in Overlay networking by optimizing forwarding decision within the network based on Layer-2 MAC as well as Layer-3 IP information. The decision on forwarding via routing or switching can be done as close as possible to the End-Host, on any given Leaf/ToR (Top-of-Rack) Switch. The Leaf Switch provides the Distributed Anycast Gateway for routing, which acts completely stateless and does not require the exchange of protocol signalization for election or failover decision. All the reachability information available within the BGP control-plane is sufficient to provide the gateway service. The Distributed Anycast Gateway also provides integrated routing and bridging (IRB) decision at the Leaf Switch, which can be extended across a significant number of nodes. All the Leaf Switches host active default gateways for their respective configured subnets; the well known semantic of First Hop Routing Protocols (FHRP) with active/standby does not apply anymore.
Summary – The advantages provided by a VXLAN-EVPN solution are briefly summarized as follows:
- Standards based Overlay (VXLAN) with Standards based Control-Plane (BGP)
- Layer-2 MAC and Layer-3 IP information distribution by Control-Plane (BGP)
- Forwarding decision based on Control-Plane (minimizes flooding)
- Integrated Routing/Bridging (IRB) for Optimized Forwarding in the Overlay
- Leverages Layer-3 ECMP – all links forwarding – in the Underlay
- Significantly larger Name-Space in the Overlay (16M segments)
- Integration of Physical and Virtual Networks with Hybrid Overlays
- It facilitates Software-Defined-Networking (SDN)
Simply formulated, VXLAN-EVPN provides a standards-based Overlay that supports Segmentation, Host Mobility, and High Scale.
VXLAN-EVPN is available on Nexus 9300 (NX-OS 7.0) with Nexus 7000/7700 (F3 linecards) to follow in the upcoming major release. Additional Data Center Switching platforms, like the Nexus 5600, will follow shortly after.
A detailed whitepaper on this topic is available on Cisco.com. In addition, VXLAN-EVPN was featured during the following Cisco Live! Sessions.
- BRKDCT-2404 -VXLAN deployment models – A practical perspective (by Victor Moreno)
- BRKDCT-3378 – Building Simplified, Automated and Scalable Data Center Networks with Overlays (VXLAN/FabricPath) (by Lukas Krattiger)
Do you have appetite for more? Post a comment, tweet about it and have the conversation going … Thanks for reading and Happy Networking!
This is great. Will this eventually replace OTV?
Even as OTV and VXLAN/EVPN share similarities (control-plane for forwarding, Layer-2 across any transport), there is still a way to go for VXLAN/EVPN becoming a true Data Center Interconnect (DCI) technology.
When looking at VXLAN/EVPN for DCI, it is as good as VPC or FabricPath but not as good or “best” as OTV is today.
This said VXLAN/EVPN has the necessary base to become a full featured DCI technology in future.
Great information. Will this be available on Nexus 9500 ?
Will it be available with MSDC architecture (multi-stage clos DC, with 3 layer : leaf, spine and super-spine ) ?
Thank you!
VXLAN/EVPN is flexible from a topology perspective and can support Leaf, Spine, Super-Spine or even multi-stage Spines as well as more collabsed topologies (e.g. Leaf, Border Spine).
The Nexus 9500 itself can be Route-Reflector for EVPN since availability of NX-OS 7.0(3) and will be able to do VXLAN/EVPN later this year.
Great post Lukas.
Am I safe to say that VXLAN-EVPN is part of N9K standalone mode offering as it’s using nx-os code? Or it’s a third option for N9K, ACI, standalone and VXLAN-EVPN?
Thanks!
Thank you Weibin,
VXLAN/EVPN is part of the Standalone NX-OS mode and today available in Nexus 9300, followed by Nexus 7000/F3 and further platforms.
MPBGP-EVPN as control plane will eliminate BUM flooding and become like L3 ip learning, is the valid statment ? (No flooding anymore)
Thanks
Dear Marwan,
Let me add a bit of flavor to your statement.
VXLAN with EVPN improves the Flood&Learn semantics by mitigating unecessary flooding for Layer-2 traffic.
– We avoid sending unecessary ARP requests across VXLAN; if EVPN knows about the destination Host, we impersonate the answer ARP response. In case of a miss in EVPN (e.g. silent Host), a ARP broadcast (flood) would still apply and we can detect this silent Host.
– As we know about the Hosts, there is no Unknow Unicast flooding anymore. Silent Host situation above does also apply here
– Multicast is obviously still valid traffic so this would not be eliminated.
Short, we are doing closer to what Layer-3 forwarding (routing) does but still honor the requrements of Layer-2 traffic in regards to BUM, if necessary.
Cisco also offers a forwarding characteristic where Host to Host communication (intra- and inter-subnet) is based on routing. This forwarding method is called “Enhanced Forwarding” and, as it follows stict routing, does not flood.
Dear Lukas, thanks for this interesting blog post.
Do you have some news about the approximate date of availability of BGP EVPN on Nexus 9300 in ACI mode?
Also, I’ve been intrigued by one particular limitation of the Nexus 9500 series; the latest NX-OS 7.0 release notes (http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/6-x/release/notes/70311_nxos_rn.html#pgfId-678298) states that VXLAN is not supported on N9500. That should mean that these spine switches cannot act as VTEPs nor as VXLAN-VLAN gateways. Are there any other implications?
Salut Jean-Christophe,
merci pour votre compliment.
VXLAN/EVPN is available today on Cisco Nexus 9300 in the NX-OS (Standalone) mode; ACI provides its own and integrated control-plane.
Predominantly, the Nexus 9500 is deployed in the Spine and hence does not require VTEP capabilities itself for participating in VXLAN networks (Spine is transparent Layer-3 forwarder). For the role of a BGP Route-Reflector, the Nexus 9500 does already support all requirements for MP-BGP EVPN.
Nexus 9500 is about to follow with VXLAN and VXLAN/EVPN capabilities going forward.
Thanks for your answer. If I understand it correctly, it means that a proprietary VXLAN control plane is already implemented in ACI mode on both Nexus 9300/9500, allowing among other features:
– the distribution of VMs/bare-metal hosts IP/MAC addresses in the fabric (and stretched fabric),
– the transparent motion of VMs
Dear Jean-Christophe,
ACI provides a integrated Fabric solution with control-plane and application awareness. Physical as well as Virtual-Machines are supported and also its mobility within and between Fabrics.
Please have a look at respective Blog for some hot news: http://blogs.cisco.com/datacenter/new-cisco-apic-software-allows-stretched-aci-fabric-across-long-distances
Kind Regards
-Lukas
For those interested in seeing this in action, and getting our hands dirty, but who do not have access to Cisco’s lab – do any of Cisco’s virtualization options support VXLAN-EVPN?
VIRL, Nexus1000v, CSR1000, etc. … ?
Thanks, great post (and presentation / video @ CL!),
/TJ
Hi TJ,
Today we do only have VXLAN/EVPN in the physical product of Nexus 9000 and will extend to Nexus 9500, Nexus 5600 and Nexus 7000/7700 with F3 Linecard soon.
From a virtual perspective (N1kv, CSR1kv), we are progressing with our VXLAN/EVPN capabilities into them going forward.
Lukas in case of Compute clusters in different racks are in different subnet. For example rack 1 is 10.1 1.0, Rack 2 is 10.1.2.0 and rack 3 is 10.1.3.0.
In all these cases all racks will have gateways in different subnet. when VM moves from 1 rack to another rack how will it find its gateway.
In case of VMware NSX VM gateways are Distributed Logical Routers who have same interface in each compute cluster and hence VMs have same Default gateway when they move across the racks.
What happens in BGP-EVPN scenario ?
VXLAN/EVPN combines a integrated route and brdige (IRB) approach with Distributed Anycast Gateway.
Every Top-of-Rack (TOR) Switch where a given VLAN is configured can act as the active default gateway. In this case, all TORs will share the same Gateway MAC and IP Address and allow first-hop-routing decision as close as possible to the Host. As the MAC and IP is the same, VM mobility will be seamless as no changes are visible from the VM perspective.
The advantage of our integrated routing and bridging approach is that we do not need to create a VLAN on all TOR to route to a given destination, we can scope the configuration to where a VLAN is really necessary and preserve resources this way.
Cisco uses Symmetric IRB as described above, which allowes scoped configuration while VMware DLR uses more the forwarding semantics of Asymmetric IRB and thus requires either all VLAN/VNI everywhere or a transit routing segment for reaching them.