Mar 202012
 

Now that the basics are in place with regards to VXLAN and Multicast, we can move on to what needs to be done to get your physical infrastructure ready for VXLAN.  The promise of VXLAN is that you do not need to “upgrade” your physical network gear for VXLAN capable gear in order to be up and running, in reality there is very little you need to do. The basic things that need to be addressed are:

  • MTU increase
  • Multicast support

Let us cover this in a little more detail.

MTU Increase

Looking back to the VXLAN Primer-Part 1 we found that the encapsulation of an IPv4 packet will add an extra 50 bytes to the original frame.  The  recommendation is to increase the MTU to 1600 bytes.  Why 1600 bytes when the VXLAN overhead is only 50 bytes?  The reason for this is that the Guest could be doing VLAN tagging, on a max MTU packet of 1514 bytes, adding 4 bytes to the resulting packet. If the transport network requires that the VXLAN traffic be VLAN tagged, this will add another 4 bytes to the final packet.  As such:

for IPv4:

1514(Guest) + 4(Guest VLAN tag) + 50(VXLAN) + 4(VXLAN Transport VLAN Tag) = 1572

for IPv6 (IPv6 headers add another 20 bytes):

1514(Guest) + 4(Guest VLAN) + 70(VXLAN IPv6) + 4(VXLAN Transport VLAN Tag) = 1592

IPv6 adds an extra 8 bytes of data and control packets bring this up to 1600 bytes.

The MTU change needs to be made at the vSwitch and on all physical gear that VXLAN traffic will traverse.  On the physical gear, this will usually include the TOR switches, Core Switches and routers.

Caution should be taken if one is considering using VXLAN to transport virtual machine traffic that is already configured for jumbo frames and or jumbograms due to the resulting fragmentation.

Multicast Support

 Multicast is required by VXLAN in order to transport  virtual machine originated traffic such as unknown destination MAC packets, broadcasts, multicast or non IP traffic.  It is also used for endpoint discovery by the VTEPs.  For details on how multicast works have a look at the previous blog entry on multicast.
There are a couple of ways to get started with multicasting for VXLAN use on the physical network, the simple way and the right way.

The Simple Way

For a simple one-datacenter configuration, you could take the simple route and put all your VTEPs on the same L2 network.  This will allow you to run VXLAN without any changes to your network for multicast support.  This is also an option to get you started as you prepare to do the right thing as detailed below.
You should be very aware in this configuration, that all multicast traffic will be treated like broadcast traffic by the physical switches.  This traffic will be flooded to all ports in the L2 network they are in.  This is not a terrible thing in a small VXLAN installation, or if the L2 is dedicated to the VTEPs, as it will get you up and running with no changes on the physical network.

The Right Way

The right way to prepare for VXLAN on the physical network is by enabling multicast support on the switches and routers.

  • On the layer 2 switches, you will need to enable IGMP snooping
  • On the routers you will need to setup an IGMP queryer

IGMP snooping is needed on the physical switches in order to build a map of the physical ports to multicast addresses in use by the end clients.  This allows for multicast pruning on ports that have not subscribed to groups being distributed by the switch.  For IGMP snooping to work there has to be at least one IGMP queryer on the network.

An IGMP enabled router sends out IGMP multicast queries to the networks it has configured for multicast.  These queries are used to find active multicast groups.  The end clients, in the case of VXLAN the VTEPs, will respond with an IGMP Report to join/rejoin or leave an active multicast group that maps to a VXLAN Network Identifier (VNI) associated with a VXLAN segment.   The VTEP will respond with the IGMP reply for all the multicast groups that are associated with the various VNIs for the VMs it hosts.   These join and leave messages are noted by the switch which modifies it’s multicast tables to match. See this detailed explanation on PIM for a detailed explanation on how this works for multicast clients and sources.