There has been a lot of chatter in the blogger-sphere about the advent of Virtual eXtensible Local Area Network (VXLAN) and all the vendors that contributed to the standard as well as those that are planning on supporting the proposed IETF draft standard. In the next couple of articles I will attempt to describe how VXLAN is supposed to work as well as give you an idea of when you should consider implementing it, and how to implement it in your VMware Infrastructure (VI).
The basic use case for VXLAN is to connect two or more layer three (L3) networks and make them look like they share the same layer two (L2) domain. This would allow for virtual machines to live in two disparate networks yet still operate as if they were attached to the same L2. See section 3 of the VXLAN IETF draft as it addresses the networking problems that VXLAN is attempting to solve a lot better than I ever could.
To operate a VXLAN needs a couple of components in place:
- Multicast support, IGMP and PIM
- VXLAN Network Identifier (VNI)
- VXLAN Gateway
- VXLAN Tunnel End Point (VTEP)
- VXLAN Segment/VXLAN Overlay Network
VXLAN is an L2 overlay over an L3 network. Each overlay network is known as a VXLAN Segment and identified by a unique 24-bit segment ID called a VXLAN Network Identifier (VNI). Only virtual machine on the same VNI are allowed to communicate with each other. Virtual machines are identified uniquely by the combination of their MAC addresses and VNI. As such it is possible to have duplicate MAC addresses in different VXLAN Segments without issue, but not in the same VXLAN Segments.
Figure 1: VXLAN Packet Header
The original L2 packet that the virtual machines send out is encapsulated in a VXLAN header that includes the VNI associated with the VXLAN Segments that the virtual machine belongs to. The resulting packet is then wrapped in a UDP->IP->Ethernet packet for final delivery on the transport network. Due to this encapsulation you can think of VXLAN as a tunneling scheme with the ESX hosts making up the VXLAN Tunnel End Points (VTEP). The VTEPs are responsible for encapsulating the virtual machine traffic in a VXLAN header as well as stripping it off and presenting the destination virtual machine with the original L2 packet.
The encapsulation is comprised of the following modifications from standard UDP, IP and Ethernet frames:
Destination Address – This is set to the MAC address of the destination VTEP if it is local of to that of the next hop device, usually a router, when the destination VTEP is on a different L3 network.
VLAN – This is optional in a VXLAN implementation and will be designated by an ethertype of 0x8100 and have an associated VLAN ID tag.
Ethertype – This is set to 0x0800 as the payload packet is an IPv4 packet. The initial VXLAN draft does not include an IPv6 implementation, but it is planned for the next draft.
Protocol – Set 0x11 to indicate that the frame contains a UDP packet
Source IP – IP address of originating VTEP
Destination IP – IP address of target VTEP. If this is not known, as in the case of a target virtual machine that the VTEP has not targeted before, a discovery process needs to be done by originating VTEP. This is done in a couple of steps:
- Destination IP is replaced with the IP multicast group corresponding to the VNI of the originating virtual machine
- All VTEPs that have subscribed to the IP multicast group receive the frame and decapsulate it learning the mapping of source virtual machine MAC address and host VTEP
- The host VTEP of the destination virtual machine will then send the virtual machines response to the originating VTEP using its destination IP address as it learned this from the original multicast frame
- The Source VTEP adds the new mapping of VTEP to virtual machine MAC address to its tables for future packets
Source Port – Set by transmitting VTEP
VXLAN Port – IANA assigned VXLAN Port. This has not been assigned yet
UDP Checksum – This should be set to 0x0000. If the checksum is not set to 0x0000 by the source VTEP, then the receiving VTEP should verify the checksum and if not correct, the frame must be dropped and not decapsulated.
VXLAN Flags – Reserved bits set to zero except bit 3, the I bit, set to 1 to for a valid VNI
VNI – 24-bit field that is the VXLAN Network Identifier
Reserved – A set of fields, 24 bits and 8 bits, that are reserved and set to zero
Putting it Together:
Figure 2: VM to VM communication
When VM1 wants to send a packet to VM2, it needs the MAC address of VM2 this is the process that is followed:
- VM1 sends a ARP packet requesting the MAC address associated with 192.168.0.101
- This ARP is encapsulated by VTEP1 into a multicast packet to the multicast group associated with VNI 864
- All VTEPs see the multicast packet and add the association of VTEP1 and VM1 to its VXLAN tables
- VTEP2 receives the multicast packet decapsulates it, and sends the original broadcast on portgroups associated with VNI 864
- VM2 sees the ARP packet and responds with its MAC address
- VTEP2 encapsulates the response as a unicast IP packet and sends it back to VTEP1 using IP routing
- VTEP1 decapsulates the packet and passes it on to VM1
At this point VM1 knows the MAC address of VM2 and can send directed packets to it as shown in in Figure 2: VM to VM communication:
- VM1 sends the IP packet to VM2 from IP address 192.168.0.100 to 192.168.0.101
- VTEP1 takes the packet and encapsulates it by adding the following headers:
- VXLAN header with VNI=864
- Standard UDP header and sets the UDP checksum to 0x0000, and the destination port being the VXLAN IANA designated port. Cisco N1KV is currently using port ID 8472.
- Standard IP header with the Destination being VTEP2’s IP address and Protocol 0x011 for the UDP packet used for delivery
- Standard MAC header with the MAC address of the next hop. In this case it is the router Interface with MAC address 00:10:11:FE:D8:D2 which will use IP routing to send it to the destination
- VTEP2 receives the packet as it has it’s MAC address as the destination. The packet is decapsulated and found to be a VXLAN packet due to the UDP destination port. At this point the VTEP will look up the associated portgroups for VNI 864 found in the VXLAN header. It will then verify that the target, VM2 in this case, is allowed to receive frames for VNI 864 due to it’s portgroup membership and pass the packet on if the verification passes.
- VM2 receives the packet and deals with it like any other IP packet.
The return path for packet from VM2 to VM1 would follow the same IP route through the router on the way back.
At this point you should be able to describe the VXLAN end point discovery, and how the VXLAN packets are encapsulated and decapsulated, as well as understand the VXLAN header structure. In the next couple of article I will walk you through a couple of uses cases where VXLAN would be useful as well as configuring a Cisco N1Kv to implement your first VXLAN.
This is great, thank you
Can you help me understand why does the arp rpely from vtep-2 follows ip routing and not vxlan excapsulation ?
This is because VTEP2 already knows that VM1 is fronted by VTEP1 and has in it’s IP address in it’s VXLAN tables from the initial multicast query. Just to be clear, the response from VM2 to VM1 is encapsulated by VTEP2 into a VXLAN packet and sent over regular IP routing to VTEP1 (you can think of this as a VXLAN tunnel). VTEP1 then decapsulates the VXLAN packet and sends the original payload to VM1.